Monitoring

Prometheus Metrics

Metric	Type	Labels	Description
`drop_images_cached_total`	Counter	`image`, `node`	Total images successfully cached
`drop_pull_duration_seconds`	Histogram	`image`	Duration of pull operations
`drop_pull_errors_total`	Counter	`image`, `node`	Total failed pull attempts
`drop_discovery_images_found`	Gauge	`policy`, `source_type`	Images found per discovery source
`drop_active_pulls`	Gauge	—	Currently active pull Pods
`drop_reconcile_total`	Counter	`controller`, `result`	Reconciliation attempts

Enable ServiceMonitor

helm install drop oci://ghcr.io/breee/charts/drop \
  --set serviceMonitor.enabled=true

Example Queries

# Pull success rate
rate(drop_images_cached_total[1h])

# p95 pull duration
histogram_quantile(0.95, rate(drop_pull_duration_seconds_bucket[1h]))

# Error rate by image
rate(drop_pull_errors_total[1h])

# Active pulls right now
drop_active_pulls

Kubernetes Events

Reason	Type	Description
`PullStarted`	Normal	Image pull Pod created on a node
`PullSucceeded`	Normal	Image successfully cached on a node
`PullFailed`	Warning	Image pull failed on a node

kubectl get events --field-selector involvedObject.kind=CachedImage

Status Conditions

All resources use metav1.Condition with type Ready:

status:
  conditions:
    - type: Ready
      status: "True"
      reason: Cached
      message: "Image cached on all 5 target nodes"

Health Endpoints

Endpoint	Port	Description
`/healthz`	8081	Liveness probe
`/readyz`	8081	Readiness probe

Last updated on May 26, 2026

Discovery For AI Agents