Skip to content

Debugging

Operator Logs

kubectl logs -n drop-system deploy/drop-controller-manager -f

The operator logs structured JSON. Look for "controller" and "reconcileID" fields to trace a specific reconciliation.

Inspect a CachedImage

kubectl get cachedimage <name> -o yaml

Key status fields:

  • phase: Pending → Pulling → Ready (or Degraded)
  • conditions[type=Ready]: The definitive health signal
  • cachedNodes: Which nodes have the image
  • nodesTargeted / nodesReady: Progress tracking
  • consecutiveFailures: Backoff trigger

Inspect Drop Pods

kubectl get pods -l app.kubernetes.io/managed-by=drop -o wide

Pods should be Succeeded (image pulled) or Failed (pull error). Check events for details:

kubectl describe pod <drop-pod-name>

Common Issues

SymptomCauseFix
Pod stuck PendingNode selector doesn’t match any nodeCheck nodeSelector on CachedImage
Pod ErrImagePullWrong image name or missing authCheck imagePullSecrets, verify image ref exists
CachedImage stays PullingPacing engine throttlingCheck PullPolicy maxConcurrentNodes / minDelayBetweenPulls
CachedImage DegradedConsecutive failures exceededCheck Pod events, increase backoff in PullPolicy
DiscoveryPolicy no imagesPrometheus query returns emptyRun query manually in Prometheus UI, check for image label
DiscoveryPolicy DNSErrorSource endpoint unreachableCheck network policies, DNS, service name

Pacing Engine Diagnostics

The pacing engine (in internal/pacing/) blocks new pulls when:

  1. Active (Pending/Running) Pods ≥ maxConcurrentNodes
  2. Time since last Pod creation < minDelayBetweenPulls

Pods stuck in ErrImagePull/ImagePullBackOff are excluded from the active count (so they don’t block other pulls).

To check pacing state:

# Count active drop pods
kubectl get pods -l app.kubernetes.io/managed-by=drop --field-selector=status.phase!=Succeeded,status.phase!=Failed

# Check the metric
curl -s localhost:8443/metrics | grep drop_active_pulls

Delve Debugging

# Run the operator locally with delve:
dlv debug ./cmd/ -- --metrics-bind-address=:8443

# Or attach to a running process:
dlv attach <pid>

When running locally, the operator uses your ~/.kube/config context.

Useful breakpoints

LocationWhy
cachedimage_controller.go:ReconcileEntry point for the core loop
pacing.go:CanStartPullPacing decision point
builder.go:BuildDropPodPod spec construction
discoverypolicy_controller.go:buildSourceSource creation

Metrics for Debugging

curl -s localhost:8443/metrics | grep drop_
MetricWhat it tells you
drop_active_pullsHow many Pods are in-flight right now
drop_pull_errors_totalWhich images/nodes are failing
drop_pull_duration_secondsHow long pulls take
drop_reconcile_total{result="error"}Controller errors
drop_discovery_source_healthWhether sources are reachable
Last updated on