A circuit breaker is a protective layer between your service and an unreliable dependency, designed to fail fast and prevent cascading failures in distributed systems.

Why Circuit Breakers Exist

In a microservice architecture, one slow or failing dependency can quickly exhaust threads, connection pools, and CPU of its callers, leading to a chain reaction of outages. The circuit breaker pattern monitors calls to these dependencies and, when failure or latency crosses a threshold, temporarily blocks further calls to give the system time to recover.

The rationale is simple: it is better to return a fast, controlled error or degraded response than to hang on timeouts and drag the entire system down.

Core States and Behaviour

Most implementations define three key states.

  • Closed
    • All calls pass through to the downstream service.
    • The breaker tracks metrics such as error rate, timeouts, and latency over a sliding window.
    • When failures or slow calls exceed configured thresholds, the breaker trips to Open.
  • Open
    • Calls are rejected immediately or routed to a fallback without touching the downstream.
    • This protects the unhealthy service and the caller’s resources from overload.
    • The breaker stays open for a configured cool‑down period.
  • Half‑open
    • After the cool‑down, a limited number of trial calls are allowed through.
    • If trial calls succeed, the breaker returns to Closed; if they fail, it flips back to Open and waits again.

The design rationale is to adapt dynamically: be optimistic while things are healthy, aggressively protect resources when they are not, and probe carefully for recovery.

When You Should Use a Circuit Breaker

Circuit breakers are most valuable when remote failures are frequent, long‑lasting, or expensive.

  • Protection and stability
    • Prevents retry storms and timeouts from overwhelming a struggling dependency.
    • Limits the blast radius of a failing service so other services remain responsive.
  • Better user experience
    • Fails fast with clear errors or fallbacks instead of long hangs.
    • Enables graceful degradation such as cached reads, default values, or “read‑only” modes.
  • High‑availability systems
    • Essential where you must keep the system partially available even when individual services are down.

You usually combine a circuit breaker with timeouts, retries (with backoff and jitter), and bulkheads for a robust resilience layer.

Java Example With Resilience4j

Below is a complete, runnable Java example using Resilience4j’s circuit breaker in a simple main program.

import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;

import java.time.Duration;
import java.util.concurrent.TimeoutException;
import java.util.function.Supplier;

static String callRemoteService() throws Exception {
    double p = Math.random();
    if (p < 0.4) {
        // Simulate a timeout-style failure
        throw new TimeoutException("Remote service timed out");
    } else if (p < 0.7) {
        // Simulate normal, fast success
        return "FAST OK";
    } else {
        // Simulate slow success
        Thread.sleep(1500);
        return "SLOW OK";
    }
}

void main() {
    var config = CircuitBreakerConfig.custom()
            .failureRateThreshold(50.0f)                      // trip if >= 50% failures
            .slowCallRateThreshold(50.0f)                     // trip if >= 50% slow calls
            .slowCallDurationThreshold(Duration.ofSeconds(1)) // >1s is “slow”
            .waitDurationInOpenState(Duration.ofSeconds(3))   // open for 3s
            .permittedNumberOfCallsInHalfOpenState(3)         // 3 trial calls
            .minimumNumberOfCalls(5)                          // need data first
            .slidingWindowSize(10)                            // last 10 calls
            .recordExceptions(TimeoutException.class)         // what counts as failure
            .build();

    CircuitBreakerRegistry registry = CircuitBreakerRegistry.of(config);
    CircuitBreaker breaker = registry.circuitBreaker("remoteService");

    Supplier<String> guardedCall = CircuitBreaker.decorateSupplier(
            breaker,
            () -> {
                try {
                    System.out.println("  executing remote call...");
                    return callRemoteService();
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }
            }
    );

    for (int i = 1; i <= 25; i++) {
        var state = breaker.getState();
        System.out.println("Attempt " + i + " | state=" + state);

        try {
            String result = guardedCall.get();
            System.out.println("  -> SUCCESS: " + result);
        } catch (Exception e) {
            System.out.println("  -> FAILURE: " + e.getClass().getSimpleName()
                    + " | " + e.getMessage());
        }

        try {
            Thread.sleep(500);
        } catch (InterruptedException ignored) {
            Thread.currentThread().interrupt();
        }
    }
}

How to Validate This Example

  • Observe Closed → Open → Half‑open transitions
    • Run the program; you should see some attempts in CLOSED with mixed successes and failures.
    • Once enough calls fail or are slow, the state switches to OPEN and subsequent attempts fail fast without printing “executing remote call…”.
    • After roughly 3 seconds, the state changes to HALF_OPEN, a few trial calls run, and then the breaker returns to CLOSED or back to OPEN depending on their outcomes.
  • Confirm protection behavior
    • The absence of “executing remote call…” logs during OPEN demonstrates that the breaker is blocking calls and thus protecting both caller and callee.

The rationale for this configuration is to keep the example small yet realistic: using a sliding window and explicit thresholds makes the breaker’s decisions explainable in production terms.

Circuit Breaker vs Retry vs Bulkhead

These patterns solve related but distinct concerns and are often composed together.

Pattern Concern addressed Typical placement
Circuit breaker Persistent failures, high error/slow rate. Around remote calls, per dependency.
Retry Transient, short‑lived faults. Inside Closed breaker, with backoff.
Bulkhead Isolation of resource usage across calls. At thread‑pool or connection‑pool level.

The key design idea is: bulkhead limits blast radius, circuit breaker limits how long you keep talking to something broken, and retry gives a flaky but recoverable dependency a second chance.