Debugging
Operator Logs
kubectl logs -n drop-system deploy/drop-controller-manager -fThe operator logs structured JSON. Look for "controller" and "reconcileID" fields to trace a specific reconciliation.
Inspect a CachedImage
kubectl get cachedimage <name> -o yamlKey status fields:
phase: Pending → Pulling → Ready (or Degraded)conditions[type=Ready]: The definitive health signalcachedNodes: Which nodes have the imagenodesTargeted/nodesReady: Progress trackingconsecutiveFailures: Backoff trigger
Inspect Drop Pods
kubectl get pods -l app.kubernetes.io/managed-by=drop -o widePods should be Succeeded (image pulled) or Failed (pull error). Check events for details:
kubectl describe pod <drop-pod-name>Common Issues
| Symptom | Cause | Fix |
|---|---|---|
Pod stuck Pending | Node selector doesn’t match any node | Check nodeSelector on CachedImage |
Pod ErrImagePull | Wrong image name or missing auth | Check imagePullSecrets, verify image ref exists |
CachedImage stays Pulling | Pacing engine throttling | Check PullPolicy maxConcurrentNodes / minDelayBetweenPulls |
CachedImage Degraded | Consecutive failures exceeded | Check Pod events, increase backoff in PullPolicy |
| DiscoveryPolicy no images | Prometheus query returns empty | Run query manually in Prometheus UI, check for image label |
DiscoveryPolicy DNSError | Source endpoint unreachable | Check network policies, DNS, service name |
Pacing Engine Diagnostics
The pacing engine (in internal/pacing/) blocks new pulls when:
- Active (Pending/Running) Pods ≥
maxConcurrentNodes - Time since last Pod creation <
minDelayBetweenPulls
Pods stuck in ErrImagePull/ImagePullBackOff are excluded from the active count (so they don’t block other pulls).
To check pacing state:
# Count active drop pods
kubectl get pods -l app.kubernetes.io/managed-by=drop --field-selector=status.phase!=Succeeded,status.phase!=Failed
# Check the metric
curl -s localhost:8443/metrics | grep drop_active_pullsDelve Debugging
# Run the operator locally with delve:
dlv debug ./cmd/ -- --metrics-bind-address=:8443
# Or attach to a running process:
dlv attach <pid>When running locally, the operator uses your ~/.kube/config context.
Useful breakpoints
| Location | Why |
|---|---|
cachedimage_controller.go:Reconcile | Entry point for the core loop |
pacing.go:CanStartPull | Pacing decision point |
builder.go:BuildDropPod | Pod spec construction |
discoverypolicy_controller.go:buildSource | Source creation |
Metrics for Debugging
curl -s localhost:8443/metrics | grep drop_| Metric | What it tells you |
|---|---|
drop_active_pulls | How many Pods are in-flight right now |
drop_pull_errors_total | Which images/nodes are failing |
drop_pull_duration_seconds | How long pulls take |
drop_reconcile_total{result="error"} | Controller errors |
drop_discovery_source_health | Whether sources are reachable |
Last updated on