Load Balancing & High Availability

📖 16 min read | 📅 Updated: January 2025 | 🏷️ Backend & APIs

Introduction

High availability ensures your application remains accessible even during failures. This guide covers load balancing algorithms, failover strategies, health checks, redundancy patterns, and best practices for building resilient distributed systems.

1. Load Balancing Fundamentals

Load Balancing Types:

1. Layer 4 (Transport Layer)
   - Routes based on IP/Port
   - Fast, low overhead
   - No content inspection
   - Examples: AWS NLB, HAProxy

2. Layer 7 (Application Layer)
   - Routes based on HTTP headers, URL, cookies
   - Content-aware routing
   - SSL termination
   - Examples: AWS ALB, Nginx, Traefik

Load Balancing Algorithms:

1. Round Robin
   - Distributes requests evenly
   - Simple, no server awareness
   - Can overload slow servers

2. Least Connections
   - Routes to server with fewest active connections
   - Better for variable request times
   
3. Least Response Time
   - Routes to fastest responding server
   - Best performance

4. IP Hash
   - Routes based on client IP
   - Ensures same client → same server
   - Good for session affinity

5. Weighted Round Robin
   - Servers get traffic based on weight
   - Useful for heterogeneous servers

6. Random
   - Randomly selects server
   - Simple, works well with many servers

2. Nginx Load Balancer Configuration

# Basic round-robin load balancing
upstream backend {
    server backend1.example.com:3000;
    server backend2.example.com:3000;
    server backend3.example.com:3000;
}

server {
    listen 80;
    server_name api.example.com;

    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

# Weighted load balancing
upstream backend {
    server backend1.example.com:3000 weight=3; # Gets 3x traffic
    server backend2.example.com:3000 weight=2;
    server backend3.example.com:3000 weight=1;
}

# Least connections
upstream backend {
    least_conn;
    server backend1.example.com:3000;
    server backend2.example.com:3000;
}

# IP hash (session affinity)
upstream backend {
    ip_hash;
    server backend1.example.com:3000;
    server backend2.example.com:3000;
}

# Health checks with max failures
upstream backend {
    server backend1.example.com:3000 max_fails=3 fail_timeout=30s;
    server backend2.example.com:3000 max_fails=3 fail_timeout=30s;
    server backend3.example.com:3000 backup; # Only used if others fail
}

# Active health checks (Nginx Plus)
upstream backend {
    zone backend 64k;
    server backend1.example.com:3000;
    server backend2.example.com:3000;
}

server {
    location / {
        proxy_pass http://backend;
        health_check interval=10s fails=3 passes=2 uri=/health;
    }
}

# SSL/TLS termination
server {
    listen 443 ssl http2;
    server_name api.example.com;
    
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
    
    location / {
        proxy_pass http://backend;
    }
}

3. Health Checks

// Express health check endpoint
import express from 'express';
import mongoose from 'mongoose';
import Redis from 'ioredis';

const app = express();
const redis = new Redis();

interface HealthStatus {
    status: 'healthy' | 'degraded' | 'unhealthy';
    timestamp: string;
    uptime: number;
    checks: {
        database: boolean;
        redis: boolean;
        memory: boolean;
        disk: boolean;
    };
}

app.get('/health', async (req, res) => {
    const checks = {
        database: false,
        redis: false,
        memory: true,
        disk: true
    };
    
    // Database check
    try {
        await mongoose.connection.db.admin().ping();
        checks.database = true;
    } catch (error) {
        console.error('Database health check failed:', error);
    }
    
    // Redis check
    try {
        await redis.ping();
        checks.redis = true;
    } catch (error) {
        console.error('Redis health check failed:', error);
    }
    
    // Memory check
    const memUsage = process.memoryUsage();
    const memThreshold = 0.9; // 90%
    checks.memory = (memUsage.heapUsed / memUsage.heapTotal) < memThreshold;
    
    // Determine overall status
    const allHealthy = Object.values(checks).every(check => check);
    const someHealthy = Object.values(checks).some(check => check);
    
    const status: HealthStatus = {
        status: allHealthy ? 'healthy' : someHealthy ? 'degraded' : 'unhealthy',
        timestamp: new Date().toISOString(),
        uptime: process.uptime(),
        checks
    };
    
    const statusCode = status.status === 'healthy' ? 200 : 
                       status.status === 'degraded' ? 200 : 503;
    
    res.status(statusCode).json(status);
});

// Readiness check (for Kubernetes)
app.get('/ready', async (req, res) => {
    try {
        // Check if app can handle requests
        await mongoose.connection.db.admin().ping();
        res.status(200).json({ ready: true });
    } catch (error) {
        res.status(503).json({ ready: false });
    }
});

// Liveness check (for Kubernetes)
app.get('/live', (req, res) => {
    // Simple check that process is running
    res.status(200).json({ alive: true });
});

// Detailed health check
app.get('/health/detailed', async (req, res) => {
    const details = {
        status: 'healthy',
        version: process.env.APP_VERSION,
        environment: process.env.NODE_ENV,
        uptime: process.uptime(),
        timestamp: new Date().toISOString(),
        memory: {
            heapUsed: Math.round(process.memoryUsage().heapUsed / 1024 / 1024),
            heapTotal: Math.round(process.memoryUsage().heapTotal / 1024 / 1024),
            rss: Math.round(process.memoryUsage().rss / 1024 / 1024)
        },
        dependencies: {
            database: await checkDatabase(),
            redis: await checkRedis(),
            externalApi: await checkExternalApi()
        }
    };
    
    res.json(details);
});

4. Session Persistence (Sticky Sessions)

// Cookie-based sticky sessions with Nginx
upstream backend {
    ip_hash; # Method 1: IP-based
    server backend1.example.com:3000;
    server backend2.example.com:3000;
}

# Method 2: Cookie-based (Nginx Plus)
upstream backend {
    server backend1.example.com:3000;
    server backend2.example.com:3000;
    sticky cookie srv_id expires=1h domain=.example.com path=/;
}

// Redis-based session sharing (no sticky sessions needed)
import session from 'express-session';
import RedisStore from 'connect-redis';
import Redis from 'ioredis';

const redis = new Redis({
    host: process.env.REDIS_HOST,
    port: 6379
});

app.use(session({
    store: new RedisStore({ client: redis }),
    secret: process.env.SESSION_SECRET!,
    resave: false,
    saveUninitialized: false,
    cookie: {
        secure: true,
        maxAge: 24 * 60 * 60 * 1000
    }
}));

// Now all servers can access same session data
app.get('/api/profile', (req, res) => {
    // Session available regardless of which server handles request
    const userId = req.session.userId;
    // ...
});

5. Failover Strategies

// Active-Passive Failover (Master-Standby)
/*
┌────────────┐
│   Master   │ ← Active, handles all traffic
│  (Active)  │
└────────────┘
      ↓ Heartbeat
┌────────────┐
│  Standby   │ ← Passive, takes over on failure
│ (Passive)  │
└────────────┘

Pros: Simple, no split-brain
Cons: Wasted resources, slower failover
*/

// Active-Active Failover (Multi-Master)
/*
┌────────────┐     ┌────────────┐
│  Server 1  │←───→│  Server 2  │
│  (Active)  │     │  (Active)  │
└────────────┘     └────────────┘
       ↓                  ↓
    Traffic split between both

Pros: Better resource usage, faster failover
Cons: Complex, potential data conflicts
*/

// Database failover with Prisma
import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient({
    datasources: {
        db: {
            url: process.env.DATABASE_URL // Primary
        }
    }
});

// Implement retry logic with fallback
async function queryWithFailover(
    operation: () => Promise,
    retries = 3
): Promise {
    for (let i = 0; i < retries; i++) {
        try {
            return await operation();
        } catch (error) {
            if (i === retries - 1) throw error;
            
            console.log(`Attempt ${i + 1} failed, retrying...`);
            await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
        }
    }
    
    throw new Error('All attempts failed');
}

// Usage
const users = await queryWithFailover(() => 
    prisma.user.findMany()
);

// Redis Sentinel for automatic failover
import Redis from 'ioredis';

const redis = new Redis({
    sentinels: [
        { host: 'sentinel1', port: 26379 },
        { host: 'sentinel2', port: 26379 },
        { host: 'sentinel3', port: 26379 }
    ],
    name: 'mymaster', // Master name
    password: process.env.REDIS_PASSWORD
});

// Sentinel automatically promotes replica on master failure

6. Circuit Breaker Pattern

// Prevent cascading failures
class CircuitBreaker {
    private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';
    private failureCount = 0;
    private successCount = 0;
    private nextAttempt = Date.now();
    
    constructor(
        private failureThreshold = 5,
        private successThreshold = 2,
        private timeout = 60000
    ) {}
    
    async execute(operation: () => Promise): Promise {
        if (this.state === 'OPEN') {
            if (Date.now() < this.nextAttempt) {
                throw new Error('Circuit breaker is OPEN');
            }
            this.state = 'HALF_OPEN';
        }
        
        try {
            const result = await operation();
            this.onSuccess();
            return result;
        } catch (error) {
            this.onFailure();
            throw error;
        }
    }
    
    private onSuccess() {
        this.failureCount = 0;
        
        if (this.state === 'HALF_OPEN') {
            this.successCount++;
            
            if (this.successCount >= this.successThreshold) {
                this.state = 'CLOSED';
                this.successCount = 0;
            }
        }
    }
    
    private onFailure() {
        this.failureCount++;
        this.successCount = 0;
        
        if (this.failureCount >= this.failureThreshold) {
            this.state = 'OPEN';
            this.nextAttempt = Date.now() + this.timeout;
        }
    }
    
    getState() {
        return {
            state: this.state,
            failureCount: this.failureCount,
            nextAttempt: new Date(this.nextAttempt)
        };
    }
}

// Usage
const paymentServiceBreaker = new CircuitBreaker(5, 2, 60000);

app.post('/api/payments', async (req, res) => {
    try {
        const result = await paymentServiceBreaker.execute(async () => {
            return await callPaymentService(req.body);
        });
        
        res.json(result);
    } catch (error) {
        if (error.message === 'Circuit breaker is OPEN') {
            // Use fallback or queue for later
            await queuePayment(req.body);
            res.status(503).json({ 
                error: 'Payment service unavailable, payment queued' 
            });
        } else {
            throw error;
        }
    }
});

7. Database Replication

// PostgreSQL Primary-Replica setup
/*
┌─────────────┐
│   Primary   │ ← Write operations
│  (Master)   │
└─────────────┘
       ↓ Replication
    ┌──┴──┐
    ↓     ↓
┌────────┐ ┌────────┐
│Replica1│ │Replica2│ ← Read operations
└────────┘ └────────┘
*/

// Read-Write splitting with Prisma
import { PrismaClient } from '@prisma/client';

const prismaWrite = new PrismaClient({
    datasources: {
        db: { url: process.env.DATABASE_PRIMARY_URL }
    }
});

const prismaRead = new PrismaClient({
    datasources: {
        db: { url: process.env.DATABASE_REPLICA_URL }
    }
});

// Write to primary
async function createUser(data: any) {
    return await prismaWrite.user.create({ data });
}

// Read from replica
async function getUsers() {
    return await prismaRead.user.findMany();
}

// MongoDB replica set
/*
mongodb://node1:27017,node2:27017,node3:27017/
  mydb?replicaSet=rs0
*/

import mongoose from 'mongoose';

mongoose.connect(
    'mongodb://node1:27017,node2:27017,node3:27017/mydb?replicaSet=rs0',
    {
        readPreference: 'secondaryPreferred' // Read from secondary when available
    }
);

// Force write to primary
await User.create(data).writeConcern({ w: 'majority' });

// Force read from primary
await User.findOne({ id }).read('primary');

8. Graceful Shutdown

// Proper shutdown handling
class GracefulShutdown {
    private isShuttingDown = false;
    private connections = new Set();
    
    constructor(
        private server: any,
        private timeout = 30000
    ) {
        this.setupSignalHandlers();
    }
    
    private setupSignalHandlers() {
        process.on('SIGTERM', () => this.shutdown('SIGTERM'));
        process.on('SIGINT', () => this.shutdown('SIGINT'));
    }
    
    private async shutdown(signal: string) {
        if (this.isShuttingDown) return;
        
        console.log(`${signal} received, starting graceful shutdown...`);
        this.isShuttingDown = true;
        
        // Stop accepting new connections
        this.server.close(() => {
            console.log('Server stopped accepting new connections');
        });
        
        // Wait for existing connections to finish
        const shutdownTimeout = setTimeout(() => {
            console.log('Forcing shutdown after timeout');
            this.forceShutdown();
        }, this.timeout);
        
        try {
            // Close database connections
            await mongoose.connection.close();
            console.log('Database connections closed');
            
            // Close Redis connections
            await redis.quit();
            console.log('Redis connections closed');
            
            // Wait for all requests to complete
            await this.waitForConnections();
            
            clearTimeout(shutdownTimeout);
            console.log('Graceful shutdown complete');
            process.exit(0);
        } catch (error) {
            console.error('Error during shutdown:', error);
            this.forceShutdown();
        }
    }
    
    private async waitForConnections() {
        while (this.connections.size > 0) {
            await new Promise(resolve => setTimeout(resolve, 100));
        }
    }
    
    private forceShutdown() {
        console.log('Forcing immediate shutdown');
        process.exit(1);
    }
    
    trackConnection(connection: any) {
        this.connections.add(connection);
        connection.on('close', () => {
            this.connections.delete(connection);
        });
    }
}

// Usage
const app = express();
const server = app.listen(3000);
const graceful = new GracefulShutdown(server);

// Middleware to prevent new requests during shutdown
app.use((req, res, next) => {
    if (graceful.isShuttingDown) {
        res.set('Connection', 'close');
        return res.status(503).json({ 
            error: 'Server is shutting down' 
        });
    }
    next();
});

// Kubernetes readiness probe returns false during shutdown
app.get('/ready', (req, res) => {
    if (graceful.isShuttingDown) {
        return res.status(503).json({ ready: false });
    }
    res.json({ ready: true });
});

9. Monitoring & Alerting

// Prometheus metrics
import promClient from 'prom-client';

const register = new promClient.Registry();

// Default metrics (CPU, memory, etc.)
promClient.collectDefaultMetrics({ register });

// Custom metrics
const httpRequestDuration = new promClient.Histogram({
    name: 'http_request_duration_seconds',
    help: 'HTTP request duration in seconds',
    labelNames: ['method', 'route', 'status'],
    buckets: [0.1, 0.5, 1, 2, 5]
});

const httpRequestTotal = new promClient.Counter({
    name: 'http_requests_total',
    help: 'Total HTTP requests',
    labelNames: ['method', 'route', 'status']
});

const activeConnections = new promClient.Gauge({
    name: 'active_connections',
    help: 'Number of active connections'
});

register.registerMetric(httpRequestDuration);
register.registerMetric(httpRequestTotal);
register.registerMetric(activeConnections);

// Middleware to track metrics
app.use((req, res, next) => {
    const start = Date.now();
    
    res.on('finish', () => {
        const duration = (Date.now() - start) / 1000;
        
        httpRequestDuration.observe(
            { method: req.method, route: req.route?.path || req.path, status: res.statusCode },
            duration
        );
        
        httpRequestTotal.inc({
            method: req.method,
            route: req.route?.path || req.path,
            status: res.statusCode
        });
    });
    
    next();
});

// Metrics endpoint
app.get('/metrics', async (req, res) => {
    res.set('Content-Type', register.contentType);
    res.send(await register.metrics());
});

// CloudWatch metrics (AWS)
import { CloudWatchClient, PutMetricDataCommand } from '@aws-sdk/client-cloudwatch';

async function sendMetricToCloudWatch(name: string, value: number) {
    const client = new CloudWatchClient({});
    
    const command = new PutMetricDataCommand({
        Namespace: 'MyApp',
        MetricData: [
            {
                MetricName: name,
                Value: value,
                Unit: 'Count',
                Timestamp: new Date()
            }
        ]
    });
    
    await client.send(command);
}

10. High Availability Checklist

✓ High Availability Best Practices:

✓ Use load balancer with health checks
✓ Run multiple instances (min 3 for quorum)
✓ Deploy across multiple availability zones
✓ Implement circuit breakers for external dependencies
✓ Use database replication (primary + replicas)
✓ Implement graceful shutdown
✓ Set up automatic failover
✓ Use Redis Sentinel or Cluster for HA
✓ Implement retry logic with exponential backoff
✓ Monitor metrics and set up alerts
✓ Regular backup and disaster recovery testing
✓ Use CDN for static assets
✓ Implement rate limiting and DDoS protection
✓ Set up proper logging and distributed tracing
✓ Regular chaos engineering tests

SLA Calculations:

99.9% uptime = 8.76 hours downtime per year
99.95% uptime = 4.38 hours downtime per year
99.99% uptime = 52.56 minutes downtime per year
99.999% uptime = 5.26 minutes downtime per year

Conclusion

High availability requires redundancy, monitoring, and proper failover mechanisms. Use load balancers, database replication, health checks, and graceful shutdown to build resilient systems. Test failure scenarios regularly and always have a disaster recovery plan.

💡 Pro Tip: Practice chaos engineering—intentionally inject failures in production (with safeguards) to test your system's resilience. Netflix's Chaos Monkey randomly terminates instances to ensure systems can handle failures. Start small, monitor closely, and gradually increase complexity.