Skip to main content

1. Latency Numbers Every Engineer Should Know

Memorize the order of magnitude — interviewers care that you know SSD is ~1000x slower than RAM, not the exact nanoseconds.
OperationLatencyNotes
L1 cache reference0.5 nsFastest memory access available
Branch mispredict5 nsCPU pipeline flush penalty
L2 cache reference7 ns~14x slower than L1
Mutex lock/unlock25 nsContention makes this much worse
Main memory (RAM) reference100 ns~200x slower than L1
Compress 1 KB with Snappy3 μsFast compression for real-time use
Read 1 MB sequentially from RAM3 μsRAM is fast for sequential access
SSD random read150 μs~1,500x slower than RAM
Read 1 MB sequentially from SSD1 msSSDs excel at sequential reads
Network round trip (same datacenter)500 μsAssumes modern datacenter networking
HDD disk seek10 msMechanical latency — avoid random reads
Read 1 MB sequentially from HDD20 msHDDs still viable for bulk sequential I/O
Network round trip (cross-continent)150 msSpeed of light is the bottleneck
TLS handshake250 ms1–2 round trips depending on version
DNS lookup (uncached)~50 msVaries widely; caching helps enormously
TCP connection setup (3-way handshake)~1.5x RTTOne and a half round trips
Key ratios to remember: RAM is ~1,000x faster than SSD. SSD is ~100x faster than HDD. Network within a datacenter is ~300x faster than cross-continent.

2. Database Selection Matrix

Use CaseRecommended DBReasoning
Transactions, complex joinsPostgreSQL / MySQLACID guarantees, mature tooling, SQL standard
Flexible schema, rapid devMongoDB / DynamoDBDocument model maps to application objects, schema-on-read
Session store, caching, leaderboardsRedis / MemcachedSub-ms latency, in-memory, simple key-value operations
Social networks, recommendationsNeo4j / Amazon NeptuneNative graph traversal, relationship-first data model
Metrics, IoT, monitoringTimescaleDB / InfluxDBOptimized for time-ordered writes and range queries
Full-text search, log analyticsElasticsearch / OpenSearchInverted index, fuzzy matching, aggregation pipelines
Wide-column, massive scaleCassandra / ScyllaDBLinear horizontal scaling, tunable consistency
Embedded / edge devicesSQLiteZero-config, single-file, surprisingly powerful
Multi-model (graph + doc + KV)ArangoDB / SurrealDBOne engine for multiple access patterns
No database is “best.” The right choice depends on your access patterns, consistency requirements, team expertise, and operational budget. Picking a DB because it is trendy is a career-limiting move.

3. Caching Strategy Decision Tree

PatternHow It WorksWhen to UseTrade-off
Cache-AsideApp checks cache first; on miss, reads DB, then populates cacheGeneral-purpose, read-heavy workloadsPossible stale data; app must manage cache logic
Read-ThroughCache itself fetches from DB on missWhen you want transparent cachingCache library must support DB integration
Write-ThroughWrite to cache and DB synchronouslyWhen you cannot tolerate stale readsHigher write latency (two writes per operation)
Write-BehindWrite to cache immediately; async flush to DBWrite-heavy workloads needing low latencyRisk of data loss if cache node crashes before flush
Refresh-AheadProactively refresh entries before TTL expiresPredictable access patterns with low-latency needsWasted resources if prediction is wrong
As Phil Karlton said: “There are only two hard things in Computer Science: cache invalidation and naming things.”Strategies for invalidation:
  • TTL (Time-To-Live): Simple, but stale data during the window.
  • Event-driven invalidation: Publish a cache-bust event on write. Accurate but adds coupling.
  • Version keys: Append a version number to cache keys; bump version on write.
  • Lease-based: Cache entry holds a lease; writer must acquire lease before updating.
Rule of thumb: If your data changes less than once per minute, TTL is usually fine. If it changes per-second, use event-driven invalidation.

4. API Style Comparison

DimensionRESTgRPCGraphQLWebSocket
ProtocolHTTP/1.1 or HTTP/2HTTP/2 (always)HTTP/1.1 or HTTP/2TCP (upgraded from HTTP)
Payload formatJSON (typically)Protocol Buffers (binary)JSONAny (text or binary frames)
Best forPublic APIs, CRUDInternal microservices, low-latencyMobile/frontend with varied data needsReal-time bidirectional communication
StreamingNot native (SSE possible)Bidirectional streaming built-inSubscriptions via WebSocketFull-duplex by design
ToolingExcellent (Postman, curl)Growing (grpcurl, BloomRPC)Good (GraphiQL, Apollo)Moderate (wscat)
Schema/ContractOpenAPI / Swagger.proto files (strict)SDL (strongly typed)No built-in contract
OverheadModerate (text-based)Low (binary, multiplexed)Moderate (single endpoint)Low after handshake
CacheabilityExcellent (HTTP caching)Hard (binary, no native HTTP cache)Hard (POST requests)Not applicable
Browser supportNativeRequires grpc-web proxyNativeNative
Default to REST for public APIs. Use gRPC for internal service-to-service communication where latency matters. Use GraphQL when clients have highly variable data needs. Use WebSockets only when you truly need server-push or bidirectional streaming.

5. Deployment Strategy Matrix

StrategyRisk LevelDowntimeInfra CostComplexityRollback SpeedBest For
RollingMediumZeroLowLowSlowStateless services, general use
Blue-GreenLowZeroHigh (2x)MediumInstantCritical services needing instant rollback
CanaryLowZeroMediumHighFastHigh-traffic services, gradual validation
ShadowVery LowZeroHighVery HighN/A (no live traffic affected)Testing new versions with real traffic patterns
RecreateHighYesLowLowSlowDev/staging, or when in-place upgrade is required
A/B TestingLowZeroMediumHighFastFeature experiments, UX testing
Canary + feature flags is the gold standard for production deployments at scale. Roll out to 1% of traffic, monitor error rates and latency, then gradually increase.

6. Authentication Method Decision Matrix

MethodUse CaseStateful?RevocationComplexityScalability
SessionTraditional web appsYesEasy (delete from store)LowRequires shared store (Redis)
JWTStateless APIs, microservicesNoHard (must wait for expiry or use blocklist)MediumExcellent (no central store)
OAuth 2.0Third-party access, SSODependsModerate (token revocation endpoint)HighGood
API KeyServer-to-server, developer APIsYesEasy (delete key)LowGood
mTLSZero-trust service mesh, internalNoHard (CRL/OCSP)Very HighExcellent
SAMLEnterprise SSOYesModerateHighGood
Passkeys/WebAuthnPasswordless consumer authNoEasy (remove credential)MediumExcellent
Never roll your own auth for production systems. Use battle-tested libraries and standards. The most common security breaches come from custom authentication implementations.

7. Message Queue Comparison

DimensionKafkaRabbitMQSQSRedis Streams
ThroughputMillions/secTens of thousands/secNearly unlimited (managed)Hundreds of thousands/sec
OrderingPer-partitionPer-queue (with caveats)Best-effort (FIFO available)Per-stream
PersistenceDisk (configurable retention)Optional (disk or memory)Managed (AWS handles it)AOF / RDB snapshots
DeliveryAt-least-once / exactly-onceAt-least-once / at-most-onceAt-least-once / exactly-once (FIFO)At-least-once
Consumer modelPull-based consumer groupsPush-based (with prefetch)Pull-based pollingConsumer groups (pull)
Best forEvent streaming, log aggregation, high-throughput pipelinesTask queues, RPC, complex routingServerless, AWS-native decouplingLightweight streaming, when you already have Redis
Operational costHigh (ZooKeeper/KRaft, brokers)Medium (Erlang runtime)Zero (fully managed)Low (add-on to existing Redis)
Use a message queue when:
  • The downstream service can be temporarily unavailable
  • You need to decouple producers from consumers
  • Work can be processed asynchronously
  • You need to buffer traffic spikes
  • Multiple consumers need the same event
Use a direct API call when:
  • You need a synchronous response
  • The operation must complete before proceeding
  • Latency is critical (queues add latency)
  • The system is simple enough that a queue adds unjustified complexity

8. Container Orchestration Quick Reference

Core Kubernetes Objects

ObjectWhat It Does
PodSmallest deployable unit; one or more containers sharing network/storage
DeploymentManages ReplicaSets; handles rolling updates and rollbacks
ReplicaSetEnsures a specified number of pod replicas are running at all times
ServiceStable network endpoint that routes traffic to a set of pods
IngressHTTP/HTTPS routing rules from external traffic to internal services
ConfigMapInjects non-sensitive configuration data into pods as env vars or files
SecretStores sensitive data (tokens, passwords) with base64 encoding
StatefulSetLike Deployment but with stable pod identity and persistent storage
DaemonSetRuns exactly one pod per node (logging agents, monitoring)
Job / CronJobRuns a task to completion once (Job) or on a schedule (CronJob)
NamespaceVirtual cluster for isolating resources within the same physical cluster
PersistentVolume (PV)A piece of storage provisioned in the cluster
PersistentVolumeClaim (PVC)A request for storage by a pod
HorizontalPodAutoscalerScales pod count based on CPU, memory, or custom metrics
NetworkPolicyFirewall rules controlling pod-to-pod and external traffic
Mental model: Deployments manage ReplicaSets, which manage Pods. Services give Pods a stable DNS name. Ingress gives Services an external URL. Everything else is configuration, storage, or scheduling.

9. Common HTTP Status Codes for Engineers

Success (2xx)

CodeNameWhen to Use
200OKStandard success for GET, PUT, PATCH
201CreatedResource successfully created (POST)
202AcceptedRequest accepted for async processing (not yet completed)
204No ContentSuccess with no response body (DELETE, PUT with no return)

Redirection (3xx)

CodeNameWhen to Use
301Moved PermanentlyResource URL has permanently changed (SEO-safe redirect)
302FoundTemporary redirect (use 307 for strict method preservation)
304Not ModifiedClient cache is still valid (conditional GET)

Client Error (4xx)

CodeNameWhen to Use
400Bad RequestMalformed syntax, invalid parameters, validation failure
401UnauthorizedMissing or invalid authentication credentials
403ForbiddenAuthenticated but not authorized for this resource
404Not FoundResource does not exist at this URI
405Method Not AllowedHTTP method not supported on this endpoint
409ConflictState conflict (duplicate resource, concurrent edit)
422Unprocessable EntitySyntactically valid but semantically incorrect
429Too Many RequestsRate limit exceeded — include Retry-After header

Server Error (5xx)

CodeNameWhen to Use
500Internal Server ErrorUnhandled exception — generic server failure
502Bad GatewayUpstream service returned an invalid response
503Service UnavailableServer is overloaded or in maintenance — temporary
504Gateway TimeoutUpstream service did not respond in time
401 vs 403: 401 means “I don’t know who you are” (authentication). 403 means “I know who you are, but you can’t do this” (authorization). Getting this wrong confuses every frontend developer on the team.

10. The “Nines” Table — Availability Reference

AvailabilityCommon NameDowntime / YearDowntime / MonthDowntime / Week
99%Two nines3.65 days7.31 hours1.68 hours
99.9%Three nines8.77 hours43.83 minutes10.08 minutes
99.95%Three and a half4.38 hours21.92 minutes5.04 minutes
99.99%Four nines52.60 minutes4.38 minutes1.01 minutes
99.999%Five nines5.26 minutes26.30 seconds6.05 seconds
99.9999%Six nines31.56 seconds2.63 seconds0.60 seconds
Combining availability: If Service A (99.9%) depends on Service B (99.9%), the combined availability is at best 99.9% x 99.9% = 99.8%. Each dependency in the critical path multiplies downtime.Improving availability:
  • Redundancy: Run multiple replicas across availability zones.
  • Eliminate single points of failure: Every component in the critical path needs failover.
  • Graceful degradation: Serve cached/stale data instead of failing entirely.
  • Health checks + auto-restart: Detect and recover from failures automatically.
Rule of thumb: Most production web apps target three nines (99.9%). Banks and telecom target four to five nines. Achieving five nines requires automated everything — humans are too slow.

11. Back-of-Envelope Estimation Cheat Sheet

Powers of 2 — Capacity Reference

PowerExact ValueApproximate Size
2^101,024~1 Thousand (1 KB)
2^201,048,576~1 Million (1 MB)
2^301,073,741,824~1 Billion (1 GB)
2^401,099,511,627,776~1 Trillion (1 TB)
2^50~1 Petabyte (1 PB)

Common Estimation Building Blocks

MetricValue
Seconds in a day~86,400 (~10^5)
Seconds in a month~2.6 million (~2.5 x 10^6)
Seconds in a year~31.5 million (~3 x 10^7)
Average size of a tweet / text post~0.5 KB
Average size of a photo (compressed)~200 KB – 2 MB
Average size of a short video (1 min)~10 MB
Average HTTP request/response~1–10 KB
Characters in a URL~100 bytes

QPS Quick Math

Daily Active UsersActions/User/DayQPS (avg)QPS (peak, ~3x avg)
1 million10~115~350
10 million10~1,150~3,500
100 million10~11,500~35,000
1 billion10~115,000~350,000
The formula: QPS = (DAU x actions per user) / 86,400. Peak QPS is typically 2x–5x the average. Always calculate peak, not just average — systems must handle bursts.

Storage Estimation Formula

Daily storage = DAU x actions/user x size per action
Monthly storage = Daily x 30
Yearly storage = Daily x 365
Plan for 3–5 years of growth + replication factor (usually 3x)

12. Design Pattern Quick Reference

PatternProblem It SolvesWhen NOT to Use
SingletonEnsures one instance globally (config, connection pool)When it hides dependencies or makes testing difficult
Factory MethodDecouples object creation from usageWhen there is only one concrete type and it will not change
ObserverOne-to-many notifications on state changeWhen the order of notification matters or chains get deep
StrategySwap algorithms at runtime without changing client codeWhen there is only one algorithm and no foreseeable variation
DecoratorAdds behavior to objects dynamically without subclassingWhen the combination explosion of wrappers becomes unreadable
AdapterMakes incompatible interfaces work togetherWhen you can modify the original interface instead
BuilderConstructs complex objects step-by-stepFor simple objects where a constructor with parameters suffices
ProxyControls access to an object (lazy load, access control, caching)When the indirection adds latency with no real benefit
Circuit BreakerPrevents cascading failures by stopping calls to failing servicesWhen failures are transient and retries are cheap
CQRSSeparates read and write models for scalabilityFor simple CRUD apps where read/write patterns are identical
Beyond OOP design patterns, these distributed system patterns come up frequently:
PatternPurpose
SagaManage distributed transactions across microservices
Event SourcingStore state changes as an immutable sequence of events
SidecarAttach utility processes alongside your main container
BulkheadIsolate failures to prevent one component from sinking all
Strangler FigIncrementally migrate from legacy to new system
Leader ElectionCoordinate a single active node among replicas
Consistent HashingDistribute load evenly with minimal remapping on scaling
Outbox PatternReliably publish events alongside database transactions

13. SOLID Principles — One-Liner

PrincipleOne-LinerCode Smell It Prevents
S — Single ResponsibilityA class should have only one reason to change.God classes that touch everything
O — Open/ClosedOpen for extension, closed for modification.Modifying existing code every time a new type appears
L — Liskov SubstitutionSubtypes must be usable wherever their parent type is expected.Subclasses that break parent behavior or throw unexpected errors
I — Interface SegregationNo client should be forced to depend on methods it does not use.Fat interfaces where implementors stub out half the methods
D — Dependency InversionDepend on abstractions, not concretions.Tightly coupled modules that cannot be tested or swapped
S — Single Responsibility: Bad: A User class that handles authentication, database access, and email sending. Good: Separate UserAuth, UserRepository, and EmailService classes.O — Open/Closed: Bad: A giant if/else chain that grows every time you add a payment method. Good: A PaymentProcessor interface with StripeProcessor, PayPalProcessor implementations.L — Liskov Substitution: Bad: A Square that extends Rectangle but breaks when setWidth is called independently. Good: Use a common Shape interface instead of inheritance.I — Interface Segregation: Bad: A Worker interface with work(), eat(), sleep() — robots do not eat. Good: Split into Workable, Eatable, Sleepable interfaces.D — Dependency Inversion: Bad: OrderService creates new MySQLDatabase() directly. Good: OrderService accepts a Database interface via constructor injection.

14. Git Commands Engineers Actually Use

Beyond the Basics

CommandWhat It Does
git log --oneline --graph --allVisualize the entire branch topology in your terminal
git diff --stagedSee exactly what will be committed (staged changes only)
git stash -uStash all changes including untracked files
git stash popRe-apply the most recent stash and remove it from the stash list
git cherry-pick <commit>Apply a single commit from another branch onto current branch
git rebase -i HEAD~NInteractively squash, reorder, or edit the last N commits
git bisect start / good / badBinary search through commits to find the one that introduced a bug
git reflogView the full history of HEAD — your safety net for “I lost my work”
git reset --soft HEAD~1Undo last commit but keep changes staged
git blame -L 10,20 file.pySee who last modified lines 10–20 (great for understanding context)
git log -S "functionName"Search commit history for when a string was added or removed
git shortlog -sn --no-mergesLeaderboard of contributors by commit count
git clean -fdRemove all untracked files and directories (destructive)
git worktree add ../feature-branch featureCheck out a branch in a separate directory without switching
git commit --fixup <commit>Mark a commit as a fixup for a previous commit (use with autosquash)

Aliases Worth Setting Up

git config --global alias.co checkout
git config --global alias.br branch
git config --global alias.st status
git config --global alias.lg "log --oneline --graph --all --decorate"
git config --global alias.unstage "reset HEAD --"
git config --global alias.last "log -1 HEAD --stat"
git config --global alias.amend "commit --amend --no-edit"
Dangerous commands to use with caution: git reset --hard, git push --force, and git clean -fd are destructive and cannot be undone easily. Always prefer --force-with-lease over --force when pushing, as it prevents overwriting teammates’ work.

Quick-Find Index

TopicSection
API styles (REST, gRPC, etc.)4
Authentication methods6
Availability (“nines” table)10
Back-of-envelope estimation11
Caching strategies3
Container orchestration (K8s)8
Database selection2
Deployment strategies5
Design patterns12
Git commands14
HTTP status codes9
Latency numbers1
Message queues7
SOLID principles13