Skip to main content

Overview

Performance optimization is about making systems faster and more efficient. The key is measuring first, then optimizing the right things.

Performance Metrics

Latency

  • Time to complete one request
  • Measure: p50, p95, p99
  • Target: < 200ms for web APIs

Throughput

  • Requests per second (RPS)
  • Transactions per second (TPS)
  • Target: Depends on scale

Availability

  • Uptime percentage
  • 99.9% = 8.76 hours downtime/year
  • 99.99% = 52 minutes/year

Resource Usage

  • CPU, Memory, Disk, Network
  • Cost efficiency
  • Bottleneck identification

Caching Strategies

Cache Levels

┌─────────────────────────────────────────────────────────┐
│                    Client                               │
│  ┌─────────────────────────────────────────────────┐   │
│  │              Browser Cache                      │   │
│  │         (Static assets, API responses)          │   │
│  └─────────────────────────────────────────────────┘   │
└────────────────────────┬────────────────────────────────┘

┌────────────────────────┼────────────────────────────────┐
│                        ▼                                │
│  ┌─────────────────────────────────────────────────┐   │
│  │                   CDN                           │   │
│  │      (Edge caching, global distribution)        │   │
│  └─────────────────────────────────────────────────┘   │
│                        │                                │
│                        ▼                                │
│  ┌─────────────────────────────────────────────────┐   │
│  │            Application Cache                    │   │
│  │         (Redis, Memcached - in-memory)          │   │
│  └─────────────────────────────────────────────────┘   │
│                        │                                │
│                        ▼                                │
│  ┌─────────────────────────────────────────────────┐   │
│  │               Database                          │   │
│  │      (Query cache, buffer pool)                 │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Caching Patterns

# Cache-Aside (Lazy Loading)
def get_user(user_id):
    # 1. Check cache first
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    
    # 2. Cache miss - fetch from DB
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    
    # 3. Store in cache for next time
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))
    
    return user

# Write-Through
def update_user(user_id, data):
    # 1. Update database
    db.update("UPDATE users SET ... WHERE id = ?", data, user_id)
    
    # 2. Update cache immediately
    redis.setex(f"user:{user_id}", 3600, json.dumps(data))

# Cache Invalidation
def delete_user(user_id):
    db.delete("DELETE FROM users WHERE id = ?", user_id)
    redis.delete(f"user:{user_id}")

Cache Invalidation Strategies

StrategyDescriptionUse When
TTL (Time-to-Live)Auto-expire after timeAcceptable staleness
Event-basedInvalidate on updateReal-time consistency
Version tagsChange key on updateImmutable objects

Database Optimization

Query Optimization

-- ❌ Slow: Full table scan
SELECT * FROM orders WHERE YEAR(created_at) = 2024;

-- ✅ Fast: Use index-friendly query
SELECT * FROM orders 
WHERE created_at >= '2024-01-01' 
  AND created_at < '2025-01-01';

-- ❌ Slow: SELECT *
SELECT * FROM users WHERE id = 1;

-- ✅ Fast: Select only needed columns
SELECT id, name, email FROM users WHERE id = 1;

-- Explain query plan
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;

Indexing Strategy

-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters!)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- This index helps: WHERE user_id = 1 AND status = 'pending'
-- Also helps: WHERE user_id = 1
-- Does NOT help: WHERE status = 'pending'

-- Covering index (includes all needed columns)
CREATE INDEX idx_users_covering ON users(email) INCLUDE (name, created_at);

Connection Pooling

# ❌ Bad: New connection per request
def get_user(user_id):
    conn = psycopg2.connect(...)  # Expensive!
    result = conn.execute(query)
    conn.close()
    return result

# ✅ Good: Connection pool
from sqlalchemy import create_engine

engine = create_engine(
    DATABASE_URL,
    pool_size=20,
    max_overflow=10,
    pool_timeout=30
)

Scaling Strategies

Vertical vs Horizontal Scaling

Vertical Scaling (Scale Up)     Horizontal Scaling (Scale Out)
┌─────────────────────┐         ┌──────┐ ┌──────┐ ┌──────┐
│                     │         │Server│ │Server│ │Server│
│    Bigger Server    │         │  1   │ │  2   │ │  3   │
│                     │         └──────┘ └──────┘ └──────┘
│  More CPU, RAM      │              │       │       │
│                     │              └───────┼───────┘
└─────────────────────┘                      │
                                    ┌────────┴────────┐
                                    │  Load Balancer  │
                                    └─────────────────┘

Load Balancing Algorithms

AlgorithmDescriptionBest For
Round RobinRotate through serversEqual server capacity
Least ConnectionsSend to server with fewest connectionsVariable request duration
IP HashSame client → same serverSession affinity
WeightedDistribute based on capacityMixed server sizes

Database Scaling

                     Read Replicas

    ┌─────────────────────┼─────────────────────┐
    │                     │                     │
    ▼                     ▼                     ▼
┌────────┐          ┌────────┐           ┌────────┐
│Replica │          │Replica │           │Replica │
│   1    │          │   2    │           │   3    │
└────────┘          └────────┘           └────────┘
    ▲                     ▲                     ▲
    │                     │                     │
    └─────────────────────┴─────────────────────┘

                    Replication

                    ┌─────────┐
                    │ Primary │ ◄── All writes
                    │   DB    │
                    └─────────┘

Profiling & Benchmarking

Application Profiling

# Python profiling
import cProfile
import pstats

def profile_function(func):
    profiler = cProfile.Profile()
    profiler.enable()
    
    result = func()
    
    profiler.disable()
    stats = pstats.Stats(profiler)
    stats.sort_stats('cumulative')
    stats.print_stats(10)  # Top 10 slowest
    
    return result

# Memory profiling
from memory_profiler import profile

@profile
def memory_intensive_function():
    large_list = [i for i in range(1000000)]
    return sum(large_list)

Load Testing

# Using locust for load testing
from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 5)
    
    @task(3)
    def view_homepage(self):
        self.client.get("/")
    
    @task(1)
    def view_product(self):
        self.client.get("/products/1")
    
    @task(1)
    def create_order(self):
        self.client.post("/orders", json={
            "product_id": 1,
            "quantity": 2
        })

N+1 Query Problem

One of the most common performance issues in applications.
# ❌ N+1 Problem: 1 query for orders + N queries for users
orders = Order.objects.all()  # 1 query
for order in orders:
    print(order.user.name)    # N queries (one per order)

# ✅ Eager Loading: 2 queries total
orders = Order.objects.select_related('user').all()
for order in orders:
    print(order.user.name)    # No additional queries

# For many-to-many relationships
orders = Order.objects.prefetch_related('items').all()

Async Processing

Background Jobs with Celery

from celery import Celery

app = Celery('tasks', broker='redis://localhost:6379')

@app.task
def send_email(user_id, template):
    user = get_user(user_id)
    email_service.send(user.email, template)

@app.task
def generate_report(report_id):
    data = fetch_large_dataset()
    report = process_data(data)
    save_report(report_id, report)

# Usage - non-blocking
@api.post("/orders")
def create_order(order_data):
    order = save_order(order_data)
    
    # Queue async tasks instead of blocking
    send_email.delay(order.user_id, "order_confirmation")
    generate_report.delay(order.id)
    
    return {"order_id": order.id}  # Return immediately

Event-Driven with Message Queues

import aio_pika
import asyncio

async def publish_event(event_type: str, data: dict):
    connection = await aio_pika.connect_robust("amqp://localhost/")
    async with connection:
        channel = await connection.channel()
        await channel.default_exchange.publish(
            aio_pika.Message(body=json.dumps(data).encode()),
            routing_key=event_type,
        )

async def consume_events(queue_name: str, handler):
    connection = await aio_pika.connect_robust("amqp://localhost/")
    async with connection:
        channel = await connection.channel()
        queue = await channel.declare_queue(queue_name)
        async for message in queue:
            async with message.process():
                await handler(json.loads(message.body))

Frontend Performance

Critical Rendering Path

HTML ──► DOM ──┐
              ├──► Render Tree ──► Layout ──► Paint
CSS ──► CSSOM ─┘

Optimization Techniques

<!-- Defer non-critical JavaScript -->
<script src="analytics.js" defer></script>

<!-- Preload critical resources -->
<link rel="preload" href="critical.css" as="style">
<link rel="preload" href="hero-image.webp" as="image">

<!-- Lazy load images -->
<img src="placeholder.jpg" data-src="actual-image.jpg" loading="lazy">

<!-- Use modern image formats -->
<picture>
  <source srcset="image.avif" type="image/avif">
  <source srcset="image.webp" type="image/webp">
  <img src="image.jpg" alt="Description">
</picture>

Bundle Optimization

// Code splitting with dynamic imports
const HeavyComponent = lazy(() => import('./HeavyComponent'));

// Tree shaking - import only what you need
import { debounce } from 'lodash-es';  // ✅ Tree-shakeable
import _ from 'lodash';                 // ❌ Imports everything

Database Performance

Query Optimization Checklist

IssueSymptomSolution
Missing indexSlow queries, full table scansAdd appropriate index
Too many indexesSlow writesRemove unused indexes
N+1 queriesMany similar queriesUse eager loading/JOINs
SELECT *Fetching unused dataSelect only needed columns
Large result setsHigh memory usagePagination, streaming
Lock contentionTimeouts, deadlocksReduce transaction scope

Slow Query Analysis

-- PostgreSQL: Enable slow query log
ALTER SYSTEM SET log_min_duration_statement = 1000;  -- Log queries > 1s

-- Analyze query execution plan
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders 
WHERE user_id = 123 
ORDER BY created_at DESC 
LIMIT 10;

-- Look for:
-- - Seq Scan (table scan, consider index)
-- - High actual time
-- - Large rows removed by filter
-- - Sort operations on large datasets

Connection Pool Sizing

# Rule of thumb: connections = (core_count * 2) + effective_spindle_count
# For SSD: connections ≈ (cores * 2) + 1

from sqlalchemy import create_engine

engine = create_engine(
    DATABASE_URL,
    pool_size=20,           # Base pool size
    max_overflow=10,        # Extra connections allowed
    pool_timeout=30,        # Wait time for connection
    pool_recycle=1800,      # Recycle connections after 30 min
    pool_pre_ping=True,     # Test connection before use
)

Application-Level Optimization

Async I/O

import asyncio
import aiohttp

# ❌ Sequential - slow
def fetch_all_sequential(urls):
    results = []
    for url in urls:
        response = requests.get(url)
        results.append(response.json())
    return results  # Takes N * avg_response_time

# ✅ Concurrent - fast
async def fetch_all_concurrent(urls):
    async with aiohttp.ClientSession() as session:
        tasks = [fetch(session, url) for url in urls]
        return await asyncio.gather(*tasks)  # Takes max(response_times)

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.json()

Efficient Data Structures

# Use appropriate data structures
from collections import deque, defaultdict
from functools import lru_cache

# O(1) append/pop from both ends
queue = deque(maxlen=1000)  # Bounded queue

# Avoid repeated dictionary key checks
counts = defaultdict(int)
for item in items:
    counts[item] += 1  # No KeyError

# Memoization for expensive computations
@lru_cache(maxsize=1000)
def expensive_calculation(n):
    return fibonacci(n)

Performance Testing Strategy

Types of Tests

Test TypePurposeTools
Load TestingNormal expected loadLocust, k6, JMeter
Stress TestingBeyond normal capacitySame tools, higher load
Spike TestingSudden traffic spikesSimulate flash sales
Soak TestingExtended period (memory leaks)Run for hours/days
Breakpoint TestingFind system limitsGradually increase load

Key Metrics to Track

# Response time percentiles
p50 = 100ms   # Median - most users
p95 = 500ms   # 95% of requests faster than this
p99 = 1000ms  # Tail latency - worst 1%

# Throughput
rps = 10000   # Requests per second

# Error rate
error_rate = errors / total_requests  # Should be < 1%

# Saturation
cpu_usage = 70%      # Alert at 80%
memory_usage = 60%   # Alert at 75%

Optimization Checklist

1

Measure First

Never optimize without data. Profile your application, identify bottlenecks using APM tools (New Relic, Datadog, Jaeger).
2

Cache Aggressively

Add caching at every level - browser, CDN, application, database. Use cache-aside pattern with appropriate TTLs.
3

Optimize Queries

Use EXPLAIN ANALYZE, add proper indexes, avoid N+1 queries, use connection pooling.
4

Use Async

Don’t block on I/O. Use async/await, message queues, background jobs for long-running tasks.
5

Optimize Frontend

Minimize bundle size, lazy load, use CDN, optimize images, implement proper caching headers.
6

Scale Appropriately

Start with vertical scaling (simpler), move to horizontal when needed. Use auto-scaling.

Quick Reference

Latency Targets

OperationGoodAcceptablePoor
Page load< 1s< 3s> 5s
API response< 100ms< 500ms> 1s
Database query< 10ms< 100ms> 500ms
Cache lookup< 1ms< 10ms> 50ms

Capacity Planning Formula

Required Capacity = (Peak RPS × Avg Response Time) / Utilization Target

Example:
- Peak: 10,000 RPS
- Avg response: 100ms
- Target utilization: 70%

Capacity = (10000 × 0.1) / 0.7 = 1,429 concurrent connections needed
Remember: “Premature optimization is the root of all evil” - Donald Knuth. Focus on clean code first, then optimize the actual bottlenecks. Always measure before and after optimization.
Common Mistake: Optimizing based on assumptions. Always profile first, optimize the actual hot paths, and verify improvements with benchmarks.