Monitoring
Monitoring
Prometheus Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
drop_images_cached_total | Counter | image, node | Total images successfully cached |
drop_pull_duration_seconds | Histogram | image | Duration of pull operations |
drop_pull_errors_total | Counter | image, node | Total failed pull attempts |
drop_discovery_images_found | Gauge | policy, source_type | Images found per discovery source |
drop_active_pulls | Gauge | — | Currently active pull Pods |
drop_reconcile_total | Counter | controller, result | Reconciliation attempts |
Enable ServiceMonitor
helm install drop oci://ghcr.io/breee/charts/drop \
--set serviceMonitor.enabled=trueExample Queries
# Pull success rate
rate(drop_images_cached_total[1h])
# p95 pull duration
histogram_quantile(0.95, rate(drop_pull_duration_seconds_bucket[1h]))
# Error rate by image
rate(drop_pull_errors_total[1h])
# Active pulls right now
drop_active_pullsKubernetes Events
| Reason | Type | Description |
|---|---|---|
PullStarted | Normal | Image pull Pod created on a node |
PullSucceeded | Normal | Image successfully cached on a node |
PullFailed | Warning | Image pull failed on a node |
kubectl get events --field-selector involvedObject.kind=CachedImageStatus Conditions
All resources use metav1.Condition with type Ready:
status:
conditions:
- type: Ready
status: "True"
reason: Cached
message: "Image cached on all 5 target nodes"Health Endpoints
| Endpoint | Port | Description |
|---|---|---|
/healthz | 8081 | Liveness probe |
/readyz | 8081 | Readiness probe |
Last updated on