Skip to main content

Database Interview Questions (50+ Detailed Q&A)

1. Fundamentals & SQL

Answer:
  • Atomicity: All or Nothing. (Transaction rolls back on error).
  • Consistency: DB remains in valid state (Constraints satisfied).
  • Isolation: Transactions don’t see each other’s partial writes.
  • Durability: Committed data is saved to disk (WAL) and survives crash.
Answer:
  • Read Uncommitted: Dirty Read possible.
  • Read Committed: No Dirty Read. Phantom/Non-repeatable read possible. (Default Postgres).
  • Repeatable Read: No Non-repeatable read. Phantom possible. (Default MySQL).
  • Serializable: Strict serial execution. Slowest.
Answer:
  • Clustered (Primary Key): Data rows stored inside the index leaf nodes. Sorts data on disk. Only 1 per table.
  • Non-Clustered: Leaf nodes contain a Pointer to the data row (Heap). Multiple allowed.
Answer:
  • B-Tree: Data stored in internal nodes and leaf nodes.
  • B+Tree: Data stored ONLY in leaf nodes. Internal nodes are keys. Leaf nodes linked (Linked List) -> Fast range scan. Used by MySQL/Postgres.
Answer:
  • 1NF: Atomic values (No lists).
  • 2NF: 1NF + No Partial dependency (Composite key).
  • 3NF: 2NF + No Transitive dependency (City depends on Zip, not UserID).
  • Denormalization: Intentionally duplicating data for read performance (Star Schema).
Answer:
  • Inner: Match in both.
  • Left: All from Left + Match Right (Null if missing).
  • Right: All from Right.
  • Full Outer: All from both.
  • Cross: Cartesian product (NxM).
Answer:
  • View: Saved Query. Runs every time you select from it. Virtual.
  • Materialized View: Snapshot on disk. Fast read. Stale data. Needs refresh (REFRESH MATERIALIZED VIEW).
Answer:
  • Function: Returns value. Can use in SELECT. No transaction control usually.
  • Proc: Returns void/cursor. Can manage transactions (COMMIT/ROLLBACK).
Answer: Primary Key, Foreign Key, Unique, Not Null, Check (age > 0).
Answer: Pointer to a result set row. Allows iterating row-by-row. Expensive (Network roundtrips). Avoid if set-based operation possible.

2. Internals & Optimization

Answer: Changes are written to Append-Only Log before Apply to Data File. Why? Sequential write is fast. Ensures Durability on crash (Replay log).
Answer: Readers don’t block Writers. Writers don’t block Readers. Each transaction sees a “Snapshot”. Implemented via Tuple Versioning (xmin, xmax in Postgres). Old versions cleaned by VACUUM.
Answer: Parser -> Optimizer -> Executor. EXPLAIN ANALYZE select *... Look for: Seq Scan (Bad on large table), Index Scan (Good), Hash Join.
Answer:
  • Seek: Jump to location B-Tree. O(log N).
  • Scan: Read all leaf nodes. O(N).
Answer: Index contains ALL columns required by query. SELECT name FROM users WHERE age = 10. Index on (age, name). No Heap lookup required. Super fast.
Answer: O(1) lookup. Equality only (=). No Range queries (>). Postgres supports it but rarely used over B-Tree.
Answer: Distributing rows across servers. Criteria: Range ID, Modulo Hash, Geo. Challenge: Cross-shard Join, Rebalancing.
Answer: Splitting huge table into smaller physical tables (Partitions) on SAME server. Ex: Logs_2023_01, Logs_2023_02. Optimizer skips partitions (WHERE date = ...).
Answer: Reclaiming space from Dead Tuples (Updated/Deleted rows). Prevents Transaction ID Wraparound. Auto-vacuum daemon.
Answer: Opening connection is expensive (Thread per connection model). Pool (PgBouncer) keeps connections open.

3. NoSQL (Mongo/Cassandra/Redis)

Answer:
  • Doc (Mongo): JSON. Flexible schema.
  • Key-Value (Redis): Cache. Simple.
  • Column (Cassandra): Wide column. High write throughput.
  • Graph (Neo4j): Relations.
Answer: Primary Node (RW) + Secondaries (RO). Async replication (Oplog). Auto-failover.
Answer: Masterless (Ring). Peer-to-peer. Tunable Consistency. Tokens/VNodes. Hinted Handoff (If node down, neighbor holds write).
Answer:
  • RDB: Snapshot every X minutes. Fast restart. Last X mins data loss.
  • AOF: Log every write. Slower restart. No data loss (fsync every sec).
Answer: Used in Cassandra/HBase to quickly check if a Key exists in an SSTable on disk before reading.
Answer:
  • Mongo: CP (Consistency). Break partition -> Unavailability during election.
  • Cassandra: AP (Availability). Network split -> Nodes accept writes -> Eventual consistency.
Answer: QuadTree / Geohash. Mongo $near operator. Store points in 2d plane.
Answer: w=1: Ack by Primary. w=majority: Ack by >50%. Safe. j=true: Written to Journal (Disk).
Answer: Yes, for command execution. Avoids context switch / locks. CPU is rarely bottleneck (Memory/Net is). Since v6, I/O threading available.
Answer: Fraud detection (Ring of accounts). Social Network (Friends of Friends). Recommendation Engine. Traversal is O(1) per hop. SQL Joins are O(N).

4. Advanced SQL Scenarios

Answer: Perform calcs across a set of rows related to current row. RANK() OVER (PARTITION BY dept ORDER BY salary DESC). Running Total, Moving Average.
Answer: WITH cte AS (...) SELECT .... Readable reusable subquery. Recursive CTE: Used for Hierarchical data (Org Chart).
Answer: Joining table to itself. Ex: Employee table has ManagerID. Find Manager name for Emp.
Answer: Loop executing 1 query per object. Fix: IN (...), JOIN, or Batching.
Answer:
  • Pessimistic: SELECT ... FOR UPDATE. Locks row. Usage: High conflict.
  • Optimistic: Version column. Check ver=1 on update. If row changed (ver=2), retry. Usage: Low conflict (Web).
Answer: Postgres GIN Index. Inverted Index for JSONB keys.
Answer: Add Column (Fast in PG 11+). Remove Column (Break code first). Rename (Downtime usually). Tool: Liquibase, Flyway.
Answer: ' OR '1'='1. Fix: Prepared Statements (Parametrized Queries).
Answer: deleted_at timestamp column. Pros: Recovery. Audit. Cons: Queries needing WHERE deleted_at IS NULL, Index bloat.

5. Operations & Scaling

Answer:
  • Full: Whole DB.
  • Differential: Changes since last Full.
  • Transaction Log: Stream of ops (Point in Time Recovery).
Answer: Capture writes from Transaction Log (Debezium reading Postgres WAL) -> Stream to Kafka -> DW. Zero impact on Query performance.
Answer: Offload Read traffic. Async replication. Reporting/Analytics queries.
Answer: Cycle of locks. DB detects and kills one transaction. App must Retry. Prevention: Acquire locks in same order.
Answer: Moving BLOB columns (Image, Text) to separate table. Keeps main table rows small (more fit in RAM page).
Answer: Treating multiple DBs as one. Postgres Foreign Data Wrapper (FDW). Query Mongo from SQL.
Answer: Optimized for append-only timestamped data. Compression (Delta-Delta), Downsampling (Rollups), Retention policies.
Answer: Immutable, cryptographically verifiable log changes. Financial/Supply Chain audit.
Answer: DB enforces “User can only see their own rows”. Policy defined in SQL.
Answer: Distributed Transaction.
  1. Prepare: All nodes vote “Yes/No” (Lock resources).
  2. Commit: If all Yes, persist. Slow. Blocked if Coordinator dies.