Skip to content

Circuit Breaker System

Flow ID: SY-26 | Module(s): CircuitBreaker, Cache, DI Container | Complexity: Medium

Business Overview

The circuit breaker system protects Ecommercen from cascading failures when external services become unresponsive. Instead of repeatedly timing out against a down service -- burning deferred-task budget and flooding logs -- the breaker trips open and fails fast. When the service recovers, a controlled probe mechanism gradually restores traffic with exponential backoff to avoid hammering a recovering upstream.

Five external integrations are protected: Meta Conversions API (Facebook CAPI), Matomo Analytics, Manago CRM, ProjectAgora ad serving, and Advisable AI recommendations. Each has an independent circuit breaker instance with individually tunable parameters via environment variables. All 25 env vars are optional -- sensible defaults apply out of the box.

Architecture

                          External Service
                               ^
                               |
                     try { HTTP call }
                       catch -> recordFailure()
                       ok    -> recordSuccess()
                               ^
                               |
                          isOpen() guard
                               ^
                               |
              +----------------+----------------+
              |                |                |
     FacebookConversion   MatomoTracking     Manago
     Service              (flush)            (doPostRequest)
              |                |                |
              +-----+  +------+  +-------------+
                    |  |         |
              ProjectAgoraHttp  AdvAdvisableAI
              Trait (sync+async) (lazy DI resolve)
                    |            |
                    +-----+------+
                          |
                   DeferredTaskRunner
                   (post-response execution)
                          |
                          v
                    Response sent to user
                    (zero TTFB impact)

State Machine

[Closed] --- (failures >= threshold) ---> [Open]
   ^                                         |
   |                                    (cooldown elapsed)
   |                                         v
   |  (probe succeeds)                  [Half-Open]
   +<----------------------------------------|
          recordSuccess()             (only 1 probe allowed)
                                             |
                                        (probe fails)
                                             |
                                    (reopen with N x cooldown)
                                         -> [Open]
StateBehavior
ClosedNormal operation. Every failure increments the counter. When failures reach the threshold, the circuit opens. A success at any point clears all state.
OpenAll calls short-circuit immediately (isOpen() returns true) for the duration of the current cooldown. No network requests are made.
Half-OpenCooldown has elapsed. Exactly one probe request is allowed through via an atomic storeIfAbsent() lock. All other concurrent callers are blocked until the probe resolves. On success the circuit closes; on failure it reopens with an exponentially increased cooldown.

Key Components

ComponentPathRole
CircuitBreakersrc/CircuitBreaker/CircuitBreaker.phpCore state machine (196 LOC). Manages closed/open/half-open transitions with exponential backoff.
Config fileapplication/config/circuit_breakers.phpPer-service thresholds and cooldowns, loaded from env vars with defaults
DI containerapplication/config/container/circuit_breakers.phpRegisters 5 named DI services backed by L2 cache
Module registrationapplication/config/container/modules.phpLoads circuit_breakers.php into the Symfony DI build
CacheAdapterInterfacesrc/Cache/Adapter/CacheAdapterInterface.phpStorage contract. Provides store(), fetch(), storeIfAbsent(), exists(), delete().
Unit teststests/Unit/CircuitBreaker/CircuitBreakerTest.php14 test cases covering all state transitions and edge cases

Protected Services

ServiceDI Service IDConsumer ClassGuarded MethodWhat It Protects
Meta Conversions APIcircuit_breaker.metaFacebookConversionServicedispatchEvent()Facebook Pixel server-side events (8 event types)
Matomo Analyticscircuit_breaker.matomoMatomoTrackingflush()Bulk analytics tracking requests
Manago CRMcircuit_breaker.managoManagodoPostRequest()Contact upsert, event tracking, product recommendations
ProjectAgoracircuit_breaker.project_agoraProjectAgoraHttpTraitdoPostRequest(), doPostRequestAsync()Ad serving requests (sync and async via Guzzle promises)
Advisable AIcircuit_breaker.advisable_aiAdvAdvisableAIAll 14+ public methodsUser setup, recommendations, cart/bookmark/rating/purchase/view events

Code Flow

Initialization (DI Container Build)

  1. Module registration: application/config/container/modules.php includes circuit_breakers.php.
  2. Config load: The container config loads application/config/circuit_breakers.php, which reads 25 env vars with defaults.
  3. Service creation: A loop registers 5 CircuitBreaker instances as named public services:
    circuit_breaker.meta
    circuit_breaker.matomo
    circuit_breaker.manago
    circuit_breaker.project_agora
    circuit_breaker.advisable_ai
  4. Cache injection: All breakers receive service('cache.l2')->nullOnInvalid() -- if L2 cache is misconfigured, $cache is null and the breaker degrades to a no-op (all calls pass through).
  5. Consumer wiring: Each protected service receives its breaker:
    • Meta: Constructor injection in Adv_front_controller via di()->get('circuit_breaker.meta')
    • Matomo: Constructor injection in Adv_front_controller via di()->get('circuit_breaker.matomo')
    • Manago: Constructor injection in Adv_front_controller via di()->get('circuit_breaker.manago')
    • ProjectAgora: DI container wiring in application/config/container/project_agora.php via service('circuit_breaker.project_agora'), injected into ProjectAgoraFactory
    • Advisable AI: Lazy resolution -- AdvAdvisableAI::circuitBreaker() calls di()->get('circuit_breaker.advisable_ai') on first use (legacy CI library, not DI-managed)

Guard Phase (isOpen())

  1. Null check: If $cache is null, return false (no-op -- call proceeds).
  2. State fetch: Load cb_{name} from cache. If missing or no openUntil key, circuit is closed -- return false.
  3. Open check: If openUntil > microtime(true), the circuit is still in cooldown. Log debug message and return true (skip call).
  4. Half-open probe: Cooldown has expired. Attempt to atomically claim the probe slot via storeIfAbsent('cb_{name}_probe', true, $currentCooldown).
    • Won the race: Log info "half-open, allowing probe". Return false (call proceeds as the probe).
    • Lost the race: Another request already claimed the probe. Return true (block this call).

Failure Recording (recordFailure())

  1. Null check: If $cache is null, return (no-op).
  2. State load: Fetch current state from cb_{name}. Increment failure count.
  3. Probe check: Check if cb_{name}_probe exists (was this a probe request?).
  4. Branch:
    • Probe failure: Delete probe key. Calculate new cooldown = min(currentCooldown * multiplier, maxCooldownSeconds). Set openUntil = now + newCooldown. Log warning with failure count and new cooldown.
    • Threshold reached (not probing, failures >= threshold): Set openUntil = now + cooldownSeconds. Set cooldown = cooldownSeconds. Log warning with failure count.
    • Below threshold: Store updated failure count only (circuit remains closed).
  5. Persist: Store state in cache with TTL = maxCooldownSeconds + stateTtlBuffer.

Success Recording (recordSuccess())

  1. Null check: If $cache is null, return (no-op).
  2. Probe cleanup: If cb_{name}_probe exists (this was a probe), log info "closed (probe succeeded)" and delete the probe key.
  3. Full reset: Delete cb_{name} -- clears all failure counts, cooldown state, and openUntil. Circuit is fully closed.

Diagnostics (getStatus())

Returns an array with:

  • state: 'closed', 'open', or 'half-open'
  • failures: Current failure count
  • lastFailure: Formatted datetime of last failure (or null)
  • opensFor: Seconds remaining in open state (or 0)
  • cooldown: Current cooldown duration in seconds

Integration Pattern

All five consumers follow the same guard-try-record pattern:

php
if ($this->circuitBreaker?->isOpen()) {
    return;  // fail fast
}

try {
    $result = $httpClient->post($url, $payload);
    $this->circuitBreaker?->recordSuccess();
} catch (\Throwable $e) {
    $this->circuitBreaker?->recordFailure();
    $this->logger?->error('Call failed: ' . $e->getMessage());
}

The nullable type (?CircuitBreaker) and null-safe operator (?->) ensure the pattern works identically whether a breaker is injected or not.

ProjectAgora additionally supports an async variant where recordSuccess() and recordFailure() are called inside Guzzle promise then/otherwise callbacks.

Cache Keys and Storage

State is stored directly via CacheAdapterInterface (the L2 adapter), not through the PSR-6 CachePool. The breaker requires adapter-level primitives that are not available through the pool interface:

  • store() with native TTL control
  • storeIfAbsent() for atomic probe locking (maps to Redis SET NX EX)
  • exists() for lightweight probe detection
KeyContentTTL
cb_{name}{failures, lastFailure, openUntil, cooldown}maxCooldownSeconds + stateTtlBuffer
cb_{name}_probetrue (lock token)Current cooldown duration

When L2 is backed by Redis, breaker state is shared across all PHP-FPM workers on all servers -- one worker tripping the breaker protects all others immediately. When backed by file cache (or APCu/SHM), state is per-server only.

The probe key auto-expires as a safety net: if the probe worker crashes before calling recordSuccess() or recordFailure(), the lock releases after the current cooldown duration and a new probe is allowed.

Exponential Backoff

Each time a probe fails in half-open state, the cooldown is multiplied by cooldownMultiplier, capped at maxCooldownSeconds:

Attempt 1: 30s   (initial cooldownSeconds)
Attempt 2: 60s   (30 x 2.0)
Attempt 3: 120s  (60 x 2.0)
Attempt 4: 240s  (120 x 2.0)
Attempt 5: 300s  (capped at maxCooldownSeconds)
Attempt 6: 300s  (stays at cap)

A single successful probe at any point resets the circuit to closed with all state cleared -- there is no gradual ramp-up.

Graceful Degradation

The system degrades gracefully at every level:

ConditionBehavior
L2 cache unavailable$cache is null (via nullOnInvalid()). All breaker methods are no-ops. Every call passes through.
L2 cache misconfiguredSame as above -- nullOnInvalid() catches the error at container build time.
No env vars setDefaults apply: threshold=3, cooldown=30s, max=300s, multiplier=2.0, buffer=300s.
DI service missingConsumer uses ?CircuitBreaker type hint or di()->has() check. Falls back to no-op.
Probe worker crashesProbe key TTL expires, unlocking the probe slot for the next request.

Configuration

Environment Variables (25 total)

Each protected service has 5 tuning parameters. All are optional -- defaults are shown.

Meta Conversions API (Facebook)

VariableDefaultDescription
APP_CB_META_THRESHOLD3Consecutive failures before circuit opens
APP_CB_META_COOLDOWN_SECONDS30Initial cooldown duration (seconds)
APP_CB_META_MAX_COOLDOWN_SECONDS300Maximum cooldown after exponential backoff (seconds)
APP_CB_META_COOLDOWN_MULTIPLIER2.0Backoff multiplier applied on each probe failure
APP_CB_META_STATE_TTL_BUFFER300Extra seconds added to max cooldown for cache TTL safety margin

Matomo Analytics

VariableDefaultDescription
APP_CB_MATOMO_THRESHOLD3Consecutive failures before circuit opens
APP_CB_MATOMO_COOLDOWN_SECONDS30Initial cooldown duration (seconds)
APP_CB_MATOMO_MAX_COOLDOWN_SECONDS300Maximum cooldown after exponential backoff (seconds)
APP_CB_MATOMO_COOLDOWN_MULTIPLIER2.0Backoff multiplier applied on each probe failure
APP_CB_MATOMO_STATE_TTL_BUFFER300Extra seconds added to max cooldown for cache TTL safety margin

Manago CRM

VariableDefaultDescription
APP_CB_MANAGO_THRESHOLD3Consecutive failures before circuit opens
APP_CB_MANAGO_COOLDOWN_SECONDS30Initial cooldown duration (seconds)
APP_CB_MANAGO_MAX_COOLDOWN_SECONDS300Maximum cooldown after exponential backoff (seconds)
APP_CB_MANAGO_COOLDOWN_MULTIPLIER2.0Backoff multiplier applied on each probe failure
APP_CB_MANAGO_STATE_TTL_BUFFER300Extra seconds added to max cooldown for cache TTL safety margin

ProjectAgora Ad Server

VariableDefaultDescription
APP_CB_PROJECT_AGORA_THRESHOLD3Consecutive failures before circuit opens
APP_CB_PROJECT_AGORA_COOLDOWN_SECONDS30Initial cooldown duration (seconds)
APP_CB_PROJECT_AGORA_MAX_COOLDOWN_SECONDS300Maximum cooldown after exponential backoff (seconds)
APP_CB_PROJECT_AGORA_COOLDOWN_MULTIPLIER2.0Backoff multiplier applied on each probe failure
APP_CB_PROJECT_AGORA_STATE_TTL_BUFFER300Extra seconds added to max cooldown for cache TTL safety margin

Advisable AI Recommendations

VariableDefaultDescription
APP_CB_ADVISABLE_AI_THRESHOLD3Consecutive failures before circuit opens
APP_CB_ADVISABLE_AI_COOLDOWN_SECONDS30Initial cooldown duration (seconds)
APP_CB_ADVISABLE_AI_MAX_COOLDOWN_SECONDS300Maximum cooldown after exponential backoff (seconds)
APP_CB_ADVISABLE_AI_COOLDOWN_MULTIPLIER2.0Backoff multiplier applied on each probe failure
APP_CB_ADVISABLE_AI_STATE_TTL_BUFFER300Extra seconds added to max cooldown for cache TTL safety margin

Parameter Reference

ParameterTypePurposeTuning Guidance
thresholdintNumber of failures before the circuit opensLower for critical-path services; higher for noisy/flaky APIs
cooldownSecondsintInitial open-state durationMatch to expected recovery time. 30s is conservative for most APIs.
maxCooldownSecondsintCeiling for exponential backoff300s (5 min) is the default. Increase for services with known long outages.
cooldownMultiplierfloatFactor applied to cooldown on each probe failure2.0 doubles each time. Use 1.0 to disable backoff (constant retry interval).
stateTtlBufferintExtra TTL added to cache entries beyond maxCooldownSecondsPrevents premature eviction. Should be >= maxCooldownSeconds.

State TTL Computation

Cache entries for breaker state use a fixed TTL of maxCooldownSeconds + stateTtlBuffer, regardless of the current cooldown value. With defaults this is 600 seconds (10 minutes). This ensures the state key outlives even the maximum possible cooldown, preventing stale-state issues where a key expires mid-cooldown and the circuit appears falsely closed.

Container Registration

php
di()->get('circuit_breaker.meta');
di()->get('circuit_breaker.matomo');
di()->get('circuit_breaker.manago');
di()->get('circuit_breaker.project_agora');
di()->get('circuit_breaker.advisable_ai');

All services are public and shared (singleton per request). They are backed by service('cache.l2')->nullOnInvalid().

Interaction with DeferredTaskRunner

All five protected services execute their external calls inside DeferredTaskRunner, which runs after fastcgi_finish_request() flushes the HTTP response to the client. This means:

  1. TTFB is never affected -- even without the circuit breaker, external call timeouts do not delay the user's page load.
  2. Budget conservation -- the DeferredTaskRunner has a finite time budget (default 10s via APP_DEFERRED_TASK_BUDGET_SECONDS). When a service is down, its calls would consume this budget with timeouts. The circuit breaker prevents this waste, freeing budget for other deferred tasks (e.g., another service that is healthy).
  3. Log noise reduction -- without the breaker, a 5-minute outage of a service called on every page view would generate thousands of identical error log entries. The breaker reduces this to a handful of warnings at state transitions.

Testing

Unit tests: tests/Unit/CircuitBreaker/CircuitBreakerTest.php

bash
vendor/bin/phpunit tests/Unit/CircuitBreaker

The test suite uses an in-memory mock of CacheAdapterInterface that simulates store, fetch, delete, storeIfAbsent, and exists. Test coverage includes:

TestWhat It Verifies
testClosedByDefaultNew breaker starts in closed state
testStaysClosedBelowThresholdFailures below threshold do not open circuit
testSuccessResetsFailureCountrecordSuccess() clears all accumulated failures
testOpensAtThresholdCircuit opens when failure count hits threshold
testStaysOpenDuringCooldownRepeated isOpen() calls during cooldown all return true
testTransitionsToHalfOpenAfterCooldownAfter cooldown expires, first caller is allowed through
testHalfOpenBlocksOtherRequestsDuring probe, other callers are blocked
testHalfOpenProbeSuccessClosesSuccessful probe fully resets the circuit
testHalfOpenProbeFailureReopensWithBackoffFailed probe doubles the cooldown
testExponentialBackoffCapsAtMaxCooldown cannot exceed maxCooldownSeconds
testCustomThresholdAndCooldownCustom constructor parameters are respected
testKeyIncludesNameCache keys are prefixed with cb_ and the breaker name
testNullCacheIsNoOpNull cache makes all methods no-ops
testStateTtlUsesMaxCooldownPlusBufferCache TTL = maxCooldownSeconds + stateTtlBuffer

Client Extension Points

  • Override thresholds per client: Set APP_CB_* env vars in the client's .env file to tune breaker behavior per deployment.
  • Disable a breaker: Set threshold to an extremely high value (e.g., APP_CB_META_THRESHOLD=999999) to effectively disable tripping.
  • Add a new protected service: Register a new CircuitBreaker instance in the DI container following the existing loop pattern in application/config/container/circuit_breakers.php, add corresponding config entries in application/config/circuit_breakers.php, and inject it into the target service.
  • Monitor breaker state: Call $cb->getStatus() from a health-check endpoint or admin panel to inspect current state, failure count, and remaining open time.

Business Rules

  1. Independence: Each service has its own breaker instance. Meta going down does not affect Matomo, Manago, ProjectAgora, or Advisable AI.
  2. No data loss guarantee: When the breaker is open, tracking events (analytics, CAPI, AI events) are silently dropped. There is no retry queue. This is acceptable because these are fire-and-forget tracking calls, not transactional operations.
  3. Atomic probe: Only one request wins the probe race (via storeIfAbsent). This prevents thundering herd on a recovering service.
  4. Full reset on success: A single successful call clears all failure history. There is no partial recovery or gradual ramp-up.
  5. Null-safe: All consumers use nullable types and null-safe operators. If the breaker is not injected (e.g., missing DI config), the call proceeds without protection.
  6. Redis-shared state: When L2 uses Redis, breaker state is shared across all PHP-FPM workers on all servers behind a load balancer. One server detecting a failure counts toward the threshold for all servers.
  7. File cache isolation: When L2 uses FileAdapter, breaker state is per-server only. Each server has its own independent breaker state.
  8. Default uniformity: All five services share identical defaults (threshold=3, cooldown=30s, max=300s, multiplier=2.0, buffer=300s). This simplifies reasoning but can be tuned per service.

File Locations

FilePurpose
src/CircuitBreaker/CircuitBreaker.phpCore implementation (196 LOC)
application/config/circuit_breakers.phpPer-service thresholds and cooldowns from env vars
application/config/container/circuit_breakers.phpDI service registration (5 named services)
application/config/container/modules.phpModule list that includes circuit_breakers.php
src/Cache/Adapter/CacheAdapterInterface.phpStorage contract used by the breaker
src/MetaConversionsApi/Service/FacebookConversionService.phpMeta CAPI consumer
src/Analytics/MatomoTracking.phpMatomo consumer
src/Manago/Manago.phpManago CRM consumer
src/ProjectAgora/ProjectAgoraHttpTrait.phpProjectAgora consumer (sync + async)
application/libraries/ProjectAgoraFactory.phpFactory that injects CB into ProjectAgora instances
application/config/container/project_agora.phpDI config wiring CB into ProjectAgoraFactory
ecommercen/ai/libraries/AdvAdvisableAI.phpAdvisable AI consumer (lazy DI resolution)
ecommercen/core/Adv_front_controller.phpWires Meta, Matomo, and Manago breakers to their services
src/DeferredTask/DeferredTaskRunner.phpPost-response execution framework
tests/Unit/CircuitBreaker/CircuitBreakerTest.phpUnit test suite (14 tests)
docs/guides/CircuitBreaker.mdDeveloper reference guide

Wiki Guide: Circuit Breaker Guide -- developer reference with API examples and backoff calculations Wiki Guide: Deferred Task Guide -- post-response execution framework that hosts all circuit-breaker-protected calls