Appearance
Circuit Breaker System
Flow ID: SY-26 | Module(s): CircuitBreaker, Cache, DI Container | Complexity: Medium
Business Overview
The circuit breaker system protects Ecommercen from cascading failures when external services become unresponsive. Instead of repeatedly timing out against a down service -- burning deferred-task budget and flooding logs -- the breaker trips open and fails fast. When the service recovers, a controlled probe mechanism gradually restores traffic with exponential backoff to avoid hammering a recovering upstream.
Five external integrations are protected: Meta Conversions API (Facebook CAPI), Matomo Analytics, Manago CRM, ProjectAgora ad serving, and Advisable AI recommendations. Each has an independent circuit breaker instance with individually tunable parameters via environment variables. All 25 env vars are optional -- sensible defaults apply out of the box.
Architecture
External Service
^
|
try { HTTP call }
catch -> recordFailure()
ok -> recordSuccess()
^
|
isOpen() guard
^
|
+----------------+----------------+
| | |
FacebookConversion MatomoTracking Manago
Service (flush) (doPostRequest)
| | |
+-----+ +------+ +-------------+
| | |
ProjectAgoraHttp AdvAdvisableAI
Trait (sync+async) (lazy DI resolve)
| |
+-----+------+
|
DeferredTaskRunner
(post-response execution)
|
v
Response sent to user
(zero TTFB impact)State Machine
[Closed] --- (failures >= threshold) ---> [Open]
^ |
| (cooldown elapsed)
| v
| (probe succeeds) [Half-Open]
+<----------------------------------------|
recordSuccess() (only 1 probe allowed)
|
(probe fails)
|
(reopen with N x cooldown)
-> [Open]| State | Behavior |
|---|---|
| Closed | Normal operation. Every failure increments the counter. When failures reach the threshold, the circuit opens. A success at any point clears all state. |
| Open | All calls short-circuit immediately (isOpen() returns true) for the duration of the current cooldown. No network requests are made. |
| Half-Open | Cooldown has elapsed. Exactly one probe request is allowed through via an atomic storeIfAbsent() lock. All other concurrent callers are blocked until the probe resolves. On success the circuit closes; on failure it reopens with an exponentially increased cooldown. |
Key Components
| Component | Path | Role |
|---|---|---|
CircuitBreaker | src/CircuitBreaker/CircuitBreaker.php | Core state machine (196 LOC). Manages closed/open/half-open transitions with exponential backoff. |
| Config file | application/config/circuit_breakers.php | Per-service thresholds and cooldowns, loaded from env vars with defaults |
| DI container | application/config/container/circuit_breakers.php | Registers 5 named DI services backed by L2 cache |
| Module registration | application/config/container/modules.php | Loads circuit_breakers.php into the Symfony DI build |
CacheAdapterInterface | src/Cache/Adapter/CacheAdapterInterface.php | Storage contract. Provides store(), fetch(), storeIfAbsent(), exists(), delete(). |
| Unit tests | tests/Unit/CircuitBreaker/CircuitBreakerTest.php | 14 test cases covering all state transitions and edge cases |
Protected Services
| Service | DI Service ID | Consumer Class | Guarded Method | What It Protects |
|---|---|---|---|---|
| Meta Conversions API | circuit_breaker.meta | FacebookConversionService | dispatchEvent() | Facebook Pixel server-side events (8 event types) |
| Matomo Analytics | circuit_breaker.matomo | MatomoTracking | flush() | Bulk analytics tracking requests |
| Manago CRM | circuit_breaker.manago | Manago | doPostRequest() | Contact upsert, event tracking, product recommendations |
| ProjectAgora | circuit_breaker.project_agora | ProjectAgoraHttpTrait | doPostRequest(), doPostRequestAsync() | Ad serving requests (sync and async via Guzzle promises) |
| Advisable AI | circuit_breaker.advisable_ai | AdvAdvisableAI | All 14+ public methods | User setup, recommendations, cart/bookmark/rating/purchase/view events |
Code Flow
Initialization (DI Container Build)
- Module registration:
application/config/container/modules.phpincludescircuit_breakers.php. - Config load: The container config loads
application/config/circuit_breakers.php, which reads 25 env vars with defaults. - Service creation: A loop registers 5
CircuitBreakerinstances as named public services:circuit_breaker.meta circuit_breaker.matomo circuit_breaker.manago circuit_breaker.project_agora circuit_breaker.advisable_ai - Cache injection: All breakers receive
service('cache.l2')->nullOnInvalid()-- if L2 cache is misconfigured,$cacheisnulland the breaker degrades to a no-op (all calls pass through). - Consumer wiring: Each protected service receives its breaker:
- Meta: Constructor injection in
Adv_front_controllerviadi()->get('circuit_breaker.meta') - Matomo: Constructor injection in
Adv_front_controllerviadi()->get('circuit_breaker.matomo') - Manago: Constructor injection in
Adv_front_controllerviadi()->get('circuit_breaker.manago') - ProjectAgora: DI container wiring in
application/config/container/project_agora.phpviaservice('circuit_breaker.project_agora'), injected intoProjectAgoraFactory - Advisable AI: Lazy resolution --
AdvAdvisableAI::circuitBreaker()callsdi()->get('circuit_breaker.advisable_ai')on first use (legacy CI library, not DI-managed)
- Meta: Constructor injection in
Guard Phase (isOpen())
- Null check: If
$cacheisnull, returnfalse(no-op -- call proceeds). - State fetch: Load
cb_{name}from cache. If missing or noopenUntilkey, circuit is closed -- returnfalse. - Open check: If
openUntil > microtime(true), the circuit is still in cooldown. Log debug message and returntrue(skip call). - Half-open probe: Cooldown has expired. Attempt to atomically claim the probe slot via
storeIfAbsent('cb_{name}_probe', true, $currentCooldown).- Won the race: Log info "half-open, allowing probe". Return
false(call proceeds as the probe). - Lost the race: Another request already claimed the probe. Return
true(block this call).
- Won the race: Log info "half-open, allowing probe". Return
Failure Recording (recordFailure())
- Null check: If
$cacheisnull, return (no-op). - State load: Fetch current state from
cb_{name}. Increment failure count. - Probe check: Check if
cb_{name}_probeexists (was this a probe request?). - Branch:
- Probe failure: Delete probe key. Calculate new cooldown =
min(currentCooldown * multiplier, maxCooldownSeconds). SetopenUntil = now + newCooldown. Log warning with failure count and new cooldown. - Threshold reached (not probing,
failures >= threshold): SetopenUntil = now + cooldownSeconds. Setcooldown = cooldownSeconds. Log warning with failure count. - Below threshold: Store updated failure count only (circuit remains closed).
- Probe failure: Delete probe key. Calculate new cooldown =
- Persist: Store state in cache with TTL =
maxCooldownSeconds + stateTtlBuffer.
Success Recording (recordSuccess())
- Null check: If
$cacheisnull, return (no-op). - Probe cleanup: If
cb_{name}_probeexists (this was a probe), log info "closed (probe succeeded)" and delete the probe key. - Full reset: Delete
cb_{name}-- clears all failure counts, cooldown state, andopenUntil. Circuit is fully closed.
Diagnostics (getStatus())
Returns an array with:
state:'closed','open', or'half-open'failures: Current failure countlastFailure: Formatted datetime of last failure (ornull)opensFor: Seconds remaining in open state (or0)cooldown: Current cooldown duration in seconds
Integration Pattern
All five consumers follow the same guard-try-record pattern:
php
if ($this->circuitBreaker?->isOpen()) {
return; // fail fast
}
try {
$result = $httpClient->post($url, $payload);
$this->circuitBreaker?->recordSuccess();
} catch (\Throwable $e) {
$this->circuitBreaker?->recordFailure();
$this->logger?->error('Call failed: ' . $e->getMessage());
}The nullable type (?CircuitBreaker) and null-safe operator (?->) ensure the pattern works identically whether a breaker is injected or not.
ProjectAgora additionally supports an async variant where recordSuccess() and recordFailure() are called inside Guzzle promise then/otherwise callbacks.
Cache Keys and Storage
State is stored directly via CacheAdapterInterface (the L2 adapter), not through the PSR-6 CachePool. The breaker requires adapter-level primitives that are not available through the pool interface:
store()with native TTL controlstoreIfAbsent()for atomic probe locking (maps to RedisSET NX EX)exists()for lightweight probe detection
| Key | Content | TTL |
|---|---|---|
cb_{name} | {failures, lastFailure, openUntil, cooldown} | maxCooldownSeconds + stateTtlBuffer |
cb_{name}_probe | true (lock token) | Current cooldown duration |
When L2 is backed by Redis, breaker state is shared across all PHP-FPM workers on all servers -- one worker tripping the breaker protects all others immediately. When backed by file cache (or APCu/SHM), state is per-server only.
The probe key auto-expires as a safety net: if the probe worker crashes before calling recordSuccess() or recordFailure(), the lock releases after the current cooldown duration and a new probe is allowed.
Exponential Backoff
Each time a probe fails in half-open state, the cooldown is multiplied by cooldownMultiplier, capped at maxCooldownSeconds:
Attempt 1: 30s (initial cooldownSeconds)
Attempt 2: 60s (30 x 2.0)
Attempt 3: 120s (60 x 2.0)
Attempt 4: 240s (120 x 2.0)
Attempt 5: 300s (capped at maxCooldownSeconds)
Attempt 6: 300s (stays at cap)A single successful probe at any point resets the circuit to closed with all state cleared -- there is no gradual ramp-up.
Graceful Degradation
The system degrades gracefully at every level:
| Condition | Behavior |
|---|---|
| L2 cache unavailable | $cache is null (via nullOnInvalid()). All breaker methods are no-ops. Every call passes through. |
| L2 cache misconfigured | Same as above -- nullOnInvalid() catches the error at container build time. |
| No env vars set | Defaults apply: threshold=3, cooldown=30s, max=300s, multiplier=2.0, buffer=300s. |
| DI service missing | Consumer uses ?CircuitBreaker type hint or di()->has() check. Falls back to no-op. |
| Probe worker crashes | Probe key TTL expires, unlocking the probe slot for the next request. |
Configuration
Environment Variables (25 total)
Each protected service has 5 tuning parameters. All are optional -- defaults are shown.
Meta Conversions API (Facebook)
| Variable | Default | Description |
|---|---|---|
APP_CB_META_THRESHOLD | 3 | Consecutive failures before circuit opens |
APP_CB_META_COOLDOWN_SECONDS | 30 | Initial cooldown duration (seconds) |
APP_CB_META_MAX_COOLDOWN_SECONDS | 300 | Maximum cooldown after exponential backoff (seconds) |
APP_CB_META_COOLDOWN_MULTIPLIER | 2.0 | Backoff multiplier applied on each probe failure |
APP_CB_META_STATE_TTL_BUFFER | 300 | Extra seconds added to max cooldown for cache TTL safety margin |
Matomo Analytics
| Variable | Default | Description |
|---|---|---|
APP_CB_MATOMO_THRESHOLD | 3 | Consecutive failures before circuit opens |
APP_CB_MATOMO_COOLDOWN_SECONDS | 30 | Initial cooldown duration (seconds) |
APP_CB_MATOMO_MAX_COOLDOWN_SECONDS | 300 | Maximum cooldown after exponential backoff (seconds) |
APP_CB_MATOMO_COOLDOWN_MULTIPLIER | 2.0 | Backoff multiplier applied on each probe failure |
APP_CB_MATOMO_STATE_TTL_BUFFER | 300 | Extra seconds added to max cooldown for cache TTL safety margin |
Manago CRM
| Variable | Default | Description |
|---|---|---|
APP_CB_MANAGO_THRESHOLD | 3 | Consecutive failures before circuit opens |
APP_CB_MANAGO_COOLDOWN_SECONDS | 30 | Initial cooldown duration (seconds) |
APP_CB_MANAGO_MAX_COOLDOWN_SECONDS | 300 | Maximum cooldown after exponential backoff (seconds) |
APP_CB_MANAGO_COOLDOWN_MULTIPLIER | 2.0 | Backoff multiplier applied on each probe failure |
APP_CB_MANAGO_STATE_TTL_BUFFER | 300 | Extra seconds added to max cooldown for cache TTL safety margin |
ProjectAgora Ad Server
| Variable | Default | Description |
|---|---|---|
APP_CB_PROJECT_AGORA_THRESHOLD | 3 | Consecutive failures before circuit opens |
APP_CB_PROJECT_AGORA_COOLDOWN_SECONDS | 30 | Initial cooldown duration (seconds) |
APP_CB_PROJECT_AGORA_MAX_COOLDOWN_SECONDS | 300 | Maximum cooldown after exponential backoff (seconds) |
APP_CB_PROJECT_AGORA_COOLDOWN_MULTIPLIER | 2.0 | Backoff multiplier applied on each probe failure |
APP_CB_PROJECT_AGORA_STATE_TTL_BUFFER | 300 | Extra seconds added to max cooldown for cache TTL safety margin |
Advisable AI Recommendations
| Variable | Default | Description |
|---|---|---|
APP_CB_ADVISABLE_AI_THRESHOLD | 3 | Consecutive failures before circuit opens |
APP_CB_ADVISABLE_AI_COOLDOWN_SECONDS | 30 | Initial cooldown duration (seconds) |
APP_CB_ADVISABLE_AI_MAX_COOLDOWN_SECONDS | 300 | Maximum cooldown after exponential backoff (seconds) |
APP_CB_ADVISABLE_AI_COOLDOWN_MULTIPLIER | 2.0 | Backoff multiplier applied on each probe failure |
APP_CB_ADVISABLE_AI_STATE_TTL_BUFFER | 300 | Extra seconds added to max cooldown for cache TTL safety margin |
Parameter Reference
| Parameter | Type | Purpose | Tuning Guidance |
|---|---|---|---|
threshold | int | Number of failures before the circuit opens | Lower for critical-path services; higher for noisy/flaky APIs |
cooldownSeconds | int | Initial open-state duration | Match to expected recovery time. 30s is conservative for most APIs. |
maxCooldownSeconds | int | Ceiling for exponential backoff | 300s (5 min) is the default. Increase for services with known long outages. |
cooldownMultiplier | float | Factor applied to cooldown on each probe failure | 2.0 doubles each time. Use 1.0 to disable backoff (constant retry interval). |
stateTtlBuffer | int | Extra TTL added to cache entries beyond maxCooldownSeconds | Prevents premature eviction. Should be >= maxCooldownSeconds. |
State TTL Computation
Cache entries for breaker state use a fixed TTL of maxCooldownSeconds + stateTtlBuffer, regardless of the current cooldown value. With defaults this is 600 seconds (10 minutes). This ensures the state key outlives even the maximum possible cooldown, preventing stale-state issues where a key expires mid-cooldown and the circuit appears falsely closed.
Container Registration
php
di()->get('circuit_breaker.meta');
di()->get('circuit_breaker.matomo');
di()->get('circuit_breaker.manago');
di()->get('circuit_breaker.project_agora');
di()->get('circuit_breaker.advisable_ai');All services are public and shared (singleton per request). They are backed by service('cache.l2')->nullOnInvalid().
Interaction with DeferredTaskRunner
All five protected services execute their external calls inside DeferredTaskRunner, which runs after fastcgi_finish_request() flushes the HTTP response to the client. This means:
- TTFB is never affected -- even without the circuit breaker, external call timeouts do not delay the user's page load.
- Budget conservation -- the
DeferredTaskRunnerhas a finite time budget (default 10s viaAPP_DEFERRED_TASK_BUDGET_SECONDS). When a service is down, its calls would consume this budget with timeouts. The circuit breaker prevents this waste, freeing budget for other deferred tasks (e.g., another service that is healthy). - Log noise reduction -- without the breaker, a 5-minute outage of a service called on every page view would generate thousands of identical error log entries. The breaker reduces this to a handful of warnings at state transitions.
Testing
Unit tests: tests/Unit/CircuitBreaker/CircuitBreakerTest.php
bash
vendor/bin/phpunit tests/Unit/CircuitBreakerThe test suite uses an in-memory mock of CacheAdapterInterface that simulates store, fetch, delete, storeIfAbsent, and exists. Test coverage includes:
| Test | What It Verifies |
|---|---|
testClosedByDefault | New breaker starts in closed state |
testStaysClosedBelowThreshold | Failures below threshold do not open circuit |
testSuccessResetsFailureCount | recordSuccess() clears all accumulated failures |
testOpensAtThreshold | Circuit opens when failure count hits threshold |
testStaysOpenDuringCooldown | Repeated isOpen() calls during cooldown all return true |
testTransitionsToHalfOpenAfterCooldown | After cooldown expires, first caller is allowed through |
testHalfOpenBlocksOtherRequests | During probe, other callers are blocked |
testHalfOpenProbeSuccessCloses | Successful probe fully resets the circuit |
testHalfOpenProbeFailureReopensWithBackoff | Failed probe doubles the cooldown |
testExponentialBackoffCapsAtMax | Cooldown cannot exceed maxCooldownSeconds |
testCustomThresholdAndCooldown | Custom constructor parameters are respected |
testKeyIncludesName | Cache keys are prefixed with cb_ and the breaker name |
testNullCacheIsNoOp | Null cache makes all methods no-ops |
testStateTtlUsesMaxCooldownPlusBuffer | Cache TTL = maxCooldownSeconds + stateTtlBuffer |
Client Extension Points
- Override thresholds per client: Set
APP_CB_*env vars in the client's.envfile to tune breaker behavior per deployment. - Disable a breaker: Set
thresholdto an extremely high value (e.g.,APP_CB_META_THRESHOLD=999999) to effectively disable tripping. - Add a new protected service: Register a new
CircuitBreakerinstance in the DI container following the existing loop pattern inapplication/config/container/circuit_breakers.php, add corresponding config entries inapplication/config/circuit_breakers.php, and inject it into the target service. - Monitor breaker state: Call
$cb->getStatus()from a health-check endpoint or admin panel to inspect current state, failure count, and remaining open time.
Business Rules
- Independence: Each service has its own breaker instance. Meta going down does not affect Matomo, Manago, ProjectAgora, or Advisable AI.
- No data loss guarantee: When the breaker is open, tracking events (analytics, CAPI, AI events) are silently dropped. There is no retry queue. This is acceptable because these are fire-and-forget tracking calls, not transactional operations.
- Atomic probe: Only one request wins the probe race (via
storeIfAbsent). This prevents thundering herd on a recovering service. - Full reset on success: A single successful call clears all failure history. There is no partial recovery or gradual ramp-up.
- Null-safe: All consumers use nullable types and null-safe operators. If the breaker is not injected (e.g., missing DI config), the call proceeds without protection.
- Redis-shared state: When L2 uses Redis, breaker state is shared across all PHP-FPM workers on all servers behind a load balancer. One server detecting a failure counts toward the threshold for all servers.
- File cache isolation: When L2 uses FileAdapter, breaker state is per-server only. Each server has its own independent breaker state.
- Default uniformity: All five services share identical defaults (threshold=3, cooldown=30s, max=300s, multiplier=2.0, buffer=300s). This simplifies reasoning but can be tuned per service.
File Locations
| File | Purpose |
|---|---|
src/CircuitBreaker/CircuitBreaker.php | Core implementation (196 LOC) |
application/config/circuit_breakers.php | Per-service thresholds and cooldowns from env vars |
application/config/container/circuit_breakers.php | DI service registration (5 named services) |
application/config/container/modules.php | Module list that includes circuit_breakers.php |
src/Cache/Adapter/CacheAdapterInterface.php | Storage contract used by the breaker |
src/MetaConversionsApi/Service/FacebookConversionService.php | Meta CAPI consumer |
src/Analytics/MatomoTracking.php | Matomo consumer |
src/Manago/Manago.php | Manago CRM consumer |
src/ProjectAgora/ProjectAgoraHttpTrait.php | ProjectAgora consumer (sync + async) |
application/libraries/ProjectAgoraFactory.php | Factory that injects CB into ProjectAgora instances |
application/config/container/project_agora.php | DI config wiring CB into ProjectAgoraFactory |
ecommercen/ai/libraries/AdvAdvisableAI.php | Advisable AI consumer (lazy DI resolution) |
ecommercen/core/Adv_front_controller.php | Wires Meta, Matomo, and Manago breakers to their services |
src/DeferredTask/DeferredTaskRunner.php | Post-response execution framework |
tests/Unit/CircuitBreaker/CircuitBreakerTest.php | Unit test suite (14 tests) |
docs/guides/CircuitBreaker.md | Developer reference guide |
Related Flows
- SY-07 Cache Management -- L2 cache adapter that backs breaker state storage
- IN-07 Facebook Catalog and CAPI -- Meta Conversions API integration with circuit breaker
- IN-12 Analytics -- Matomo server-side tracking with circuit breaker
- IN-13 Newsletter -- Manago CRM integration with circuit breaker
- CF-34 AI Recommendations -- Advisable AI with circuit breaker and lazy DI resolution
- SY-29 Deployment Architecture -- cache-backed breaker state requires Redis on K8s
Wiki Guide: Circuit Breaker Guide -- developer reference with API examples and backoff calculations Wiki Guide: Deferred Task Guide -- post-response execution framework that hosts all circuit-breaker-protected calls