Docs center

Self-hosted docs for Guardrails for AI.

Technical guidance for observability, guardrails, permissioning, and automation in one first-party documentation surface.

Self-hosted docs

Jobs and Queues

Reliable job processing and queue management for critical workflows.

docs.cyiro.comproduction-ready guidance

Deterministic retry policies

Jobs use exponential backoff with jitter to handle transient failures while avoiding thundering herd problems.

  • Alert dispatch: 3 attempts with 5s, 15s, 45s delays
  • Digest build: 2 attempts with 10s, 30s delays
  • Watcher fetch: 4 attempts with 3s, 10s, 30s, 60s delays
  • All retries respect queue latency SLOs and circuit breaker thresholds

Dead-letter handling

Failed jobs move to dead-letter queues with full context for manual review and reprocessing.

  • Dead-letter queues use workspace-scoped naming: dlq-{workspace}-{queue}
  • Messages include original payload, attempt count, and final error
  • Manual reprocessing available via API and dashboard

Queue naming conventions

Consistent queue naming across staging and production environments.

  • Staging: {queue}-staging
  • Production: {queue}-prod
  • Dead-letter: dlq-{queue}-{env}
  • Priority: {queue}-{env}-priority for high-priority workflows

Idempotency key strategy

Jobs use idempotency keys to prevent duplicate processing of the same logical operation.

  • Format: {workspace}-{job_type}-{entity_id}-{timestamp}
  • Example: prod-chat-alert-dispatch-inc-123-202403151430
  • TTL: 24 hours for most jobs, 7 days for critical operations
  • Storage: Redis with workspace-scoped keys

Job observability fields

Standard fields included in all job logs and metrics for observability.

  • job_id: Unique identifier for the job execution
  • job_type: Type of job (alert-dispatch, digest-build, etc.)
  • workspace_id: Workspace context
  • attempt: Current attempt number
  • status: current | retry | success | failed | dead-letter
  • duration_ms: Execution duration
  • timestamp: Start time of execution

Queue latency SLO

Target latency for job queue processing to ensure timely execution.

  • Target: 95% of jobs processed within 10 seconds of queue time
  • Measurement: Time from queue entry to job start
  • Exclusions: Throttled queues, paused workflows

Queue throughput SLO

Target throughput for job queue processing to handle workload spikes.

  • Target: Sustain 100 jobs/minute per queue during normal operations
  • Burst: Handle 500 jobs/minute for 5-minute bursts
  • Measurement: Successful job completions per minute