Introducing Playbooks: Self-Healing Queues That Fix Themselves
Queue lag spikes at 3am shouldn't page your on-call. OwlMQ Playbooks let you define automated remediation steps that trigger when conditions are met.
Arjun Shah
Product Lead
December 28, 2024
8 min read
The worst part of being on-call isn't the 3am page. It's spending 45 minutes doing the exact same thing you did last time: check lag, scale consumers, reroute traffic, resolve the alert. Toil that could be automated.
What are Playbooks?
Playbooks are declarative YAML files that define automated responses to queue anomalies. When a trigger condition is met — lag exceeds a threshold, error rate spikes, consumer heartbeat times out — OwlMQ executes the playbook automatically.
They're not scripts. They're not Kubernetes operators. They're a purpose-built DSL for queue remediation that understands OwlMQ's primitives natively.
A real example
Here's the playbook we run in production for our payment queue:
name: payment-queue-healing
triggers:
- type: lag_exceeded
threshold: 10000
window: 60s
actions:
- step: scale_consumers
scale_by: 3x
- step: alert
channels: [slack, pagerduty]
severity: highWhen lag exceeds 10,000 messages in a 60-second window, OwlMQ automatically triples the consumer count and fires an alert. No human needed. The alert tells us what happened and what was done about it — not a page asking us to figure it out.
Dry-run mode
Every playbook can be tested with --dry-run before deploying. OwlMQ simulates the trigger conditions and shows exactly which actions would execute, in what order, with what parameters. We've seen teams catch configuration mistakes before they matter using dry-run in CI.
Rollback and safety
Playbooks include a rollback block that executes if any step fails. OwlMQ takes a snapshot of queue state before execution and can restore it automatically. This makes playbooks safe to run aggressively — if something goes wrong, the system restores itself.
Playbooks are available on Pro and Enterprise plans. We're rolling them out in beta now — if you want early access, reach out in Discord.