Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Concurrency
Concurrency allows your program to do multiple things simultaneously. It’s essential for modern applications to utilize multi-core processors and handle multiple users.1. Threads vs. Processes
- Process: An executing program (e.g., the JVM itself). It has its own isolated memory space.
- Thread: A lightweight unit of execution within a process. Threads share the same memory space.
Creating a Thread
2. Executor Framework
Manually creating threads (new Thread()) is expensive and error-prone. If you create 10,000 threads, you might crash the OS.
Executors manage a pool of threads for you. You just submit tasks, and the pool handles the execution.
Types of Pools
newFixedThreadPool(n): Fixed number of threads. Good for predictable loads.newCachedThreadPool(): Creates threads as needed, reuses idle ones. Good for many short-lived tasks.newSingleThreadExecutor(): One thread. Ensures tasks run sequentially.
3. Callable & Future
Runnable returns void. What if you want a result? Use Callable.
A Future represents the result of an asynchronous computation. It’s a placeholder for a value that will arrive later.
4. Synchronization
When multiple threads access shared data (like a counter), race conditions occur. Two threads might read “5”, increment it, and both write “6”, losing one increment.synchronized Keyword
Ensures only one thread can execute a block at a time. It’s like a lock on a door.
Atomic Classes
For simple variables,AtomicInteger is faster and lock-free.
5. CompletableFuture (Java 8+)
Future.get() blocks the thread. CompletableFuture allows you to build non-blocking, reactive pipelines (similar to Promises in JavaScript).
6. Virtual Threads (Java 21)
Game Changer: Traditional threads are mapped 1:1 to OS threads. They are heavy (2MB stack). You can only have a few thousand. Virtual Threads are lightweight threads managed by the JVM. You can create millions of them.Summary
- Threads: Basic unit of concurrency.
- Executors: Manage thread pools.
- Synchronization: Protect shared state from race conditions.
- CompletableFuture: Compose async tasks.
- Virtual Threads: High-throughput concurrency for Java 21+.
Interview Deep-Dive
What is the Java Memory Model, and why does it matter for concurrent programming? Explain happens-before.
What is the Java Memory Model, and why does it matter for concurrent programming? Explain happens-before.
Strong Answer:
- The Java Memory Model (JMM, defined in JSR-133 and the JLS Chapter 17) specifies how threads interact through memory. The core problem it solves: modern CPUs have per-core caches, store buffers, and instruction reordering optimizations. Without a memory model, a write by thread A might not be visible to thread B — ever — because B reads from its CPU cache, not from main memory. The JMM defines the rules for when writes by one thread are guaranteed to be visible to reads by another thread.
- The central concept is “happens-before.” If action A happens-before action B, then A’s effects are guaranteed to be visible to B. Key happens-before relationships: unlocking a monitor happens-before subsequent locking of the same monitor (
synchronized); writing avolatilevariable happens-before subsequent reads of the same variable;Thread.start()happens-before any action in the started thread; all actions in a thread happen-before any other thread successfully returns fromThread.join()on that thread. - Why this matters practically: without
volatileorsynchronized, the JVM and CPU are free to reorder instructions and cache values. A classic bug:boolean running = true;in a shared field, one thread setsrunning = false, another thread loopswhile (running). Withoutvolatile, the JIT compiler may hoist the read ofrunningout of the loop (since it is not modified within the loop from the compiler’s perspective), creating an infinite loop. I have seen this exact bug in production — a shutdown hook set a flag, but the worker thread never saw the update because the JIT optimized away the repeated read. Addingvolatilefixed it. - A senior-level nuance:
volatileguarantees visibility and prevents reordering around the volatile access, but it does not provide atomicity for compound operations.volatile int count; count++is not thread-safe because the increment is a read-modify-write sequence that can be interleaved. For compound atomicity, you needAtomicInteger,synchronized, or aLock.
- False sharing occurs when two threads modify independent variables that happen to reside on the same CPU cache line (typically 64 bytes on x86). Even though the variables are logically independent, the CPU’s cache coherency protocol (MESI or similar) forces the entire cache line to be invalidated and reloaded every time either variable is modified. Both threads repeatedly invalidate each other’s cached copy, causing continuous cache-line bouncing between cores.
- In Java, this commonly happens with adjacent fields in an object or adjacent elements in an array that are written by different threads. For example, two
AtomicLongcounters in the same object — if they end up on the same cache line, incrementing one counter from thread A invalidates the cache line for thread B reading the other counter. This can cause 10-100x performance degradation compared to the expected throughput. - The fix is cache line padding: ensure the hot variables are separated by at least 64 bytes. Java 8 introduced
@Contended(internal annotationsun.misc.Contended, requires-XX:-RestrictContendedto use outside the JDK). This annotation tells the JVM to pad the field to its own cache line. The JDK itself uses@ContendedonThread.threadLocalRandomSeedand onForkJoinPoolworker queues. In Java 9+ modules, it isjdk.internal.vm.annotation.Contended. - The Disruptor library (LMAX) is the canonical example of false sharing awareness in Java. Their ring buffer is designed to avoid false sharing between the producer’s write position and the consumer’s read position, which is a key reason it achieves millions of operations per second with sub-microsecond latency.
Compare synchronized, ReentrantLock, and StampedLock. When would you use each?
Compare synchronized, ReentrantLock, and StampedLock. When would you use each?
Strong Answer:
synchronizedis the built-in monitor lock. It is simple, scoped (automatically releases on block exit, even with exceptions), and the JVM applies biased locking (Java 15 deprecated biased locking, removed in later versions), thin locking, and lock inflation optimizations. Use it for simple mutual exclusion where you do not need advanced features. It has been aggressively optimized over 25 years and is the right default choice for most cases.ReentrantLock(fromjava.util.concurrent.locks) provides everythingsynchronizeddoes plus:tryLock()(non-blocking attempt to acquire),tryLock(timeout)(timed attempt),lockInterruptibly()(can be interrupted while waiting), fairness policy (optional FIFO ordering of waiting threads), and the ability to bind multipleConditionobjects (equivalent to multiple wait-sets per lock). Use it when you need any of these features — the most common reason istryLockfor avoiding deadlocks in lock-ordering protocols.StampedLock(Java 8) is a more advanced lock optimized for read-heavy workloads. It supports three modes: write lock (exclusive), read lock (shared, likeReadWriteLock), and optimistic read (lock-free). The optimistic read returns a stamp, you read your data, then validate the stamp. If validation succeeds, no locking occurred at all — pure throughput. If it fails (a writer intervened), you fall back to a pessimistic read lock. This is transformative for data structures with 95%+ read access patterns.- The decision matrix: 90% of the time, use
synchronized— it is simpler and the JVM optimizes it heavily. UseReentrantLockwhen you need tryLock, timed lock, interruptible lock, or multiple conditions. UseStampedLockfor read-dominated data structures where you have benchmarked that the optimistic read path provides measurable improvement. Never useStampedLockas a general-purpose lock — it is not reentrant, which means calling a method that acquires the same lock will deadlock.
- Detection:
jstack <pid>dumps all thread stacks and automatically detects deadlocks (it prints “Found one Java-level deadlock” with the threads and locks involved). In production, JMX exposesThreadMXBean.findDeadlockedThreads()which can be called programmatically or via monitoring tools. Java Flight Recorder (JFR) captures lock contention events that help identify near-deadlock conditions. - Prevention follows a simple discipline: always acquire locks in a consistent global order. If thread A needs locks L1 and L2, and thread B also needs L1 and L2, both must acquire L1 first, then L2. Deadlock occurs when A holds L1 and waits for L2 while B holds L2 and waits for L1. Enforcing a total ordering eliminates this. In practice, the ordering is often by object hash or a sequence number assigned at creation time.
tryLockwith a timeout is the pragmatic defense. Instead of blocking forever, the thread tries to acquire the lock for a bounded time and backs off on failure. This does not prevent deadlocks but converts them from hangs (infinite wait) to retriable failures (timeout exception). This is the pattern used in database systems (lock timeout) and is appropriate when strict lock ordering is impractical.- In my experience, the most common production deadlock in Java is not between explicit locks but between
synchronizedmethods in two classes that call each other. Class A’s synchronized method M1 calls class B’s synchronized method M2, and vice versa. Code review rarely catches this because the cycle spans multiple files. The fix is reducing the scope of synchronization — lock only the critical section, not the entire method — or switching to lock-free data structures.
Explain virtual threads in Java 21. How do they differ from platform threads, and what workloads benefit most?
Explain virtual threads in Java 21. How do they differ from platform threads, and what workloads benefit most?
Strong Answer:
- Platform threads (traditional Java threads) are thin wrappers around OS threads. Each platform thread has a 1:1 mapping to a kernel thread, consumes ~1MB of stack memory, and context-switching between them involves a kernel-mode transition. You can create a few thousand before the OS starts rejecting new threads or the system becomes unstable from context-switch overhead.
- Virtual threads are user-mode threads managed entirely by the JVM. They are scheduled onto a small pool of carrier threads (platform threads managed by a
ForkJoinPool). When a virtual thread blocks on I/O (socket read, file read,Thread.sleep,Lock.lock), the JVM unmounts it from the carrier thread and mounts a different virtual thread. The carrier thread is never actually blocked — it always has work to do. This is conceptually similar to goroutines in Go or fibers in other runtimes. - Virtual threads have very small stacks (they start at a few hundred bytes and grow dynamically, stored on the heap, not in native memory). You can create millions of them. The startup cost is negligible — creating a virtual thread is roughly the cost of creating a small object, compared to the microseconds required to create a platform thread (which involves a kernel syscall).
- The workloads that benefit most are I/O-bound applications with high concurrency: web servers handling thousands of simultaneous requests that each make database calls, HTTP calls to downstream services, or file reads. Before virtual threads, these applications needed reactive frameworks (WebFlux, Vert.x) with callback-based or Mono/Flux-style APIs to avoid blocking threads. Virtual threads let you write simple, sequential, blocking code (
resultSet.next(),httpClient.send()) that scales like reactive code because the JVM handles the thread-switching transparently. - The workloads that do NOT benefit: CPU-bound computation (virtual threads still run on the same number of CPU cores — they do not create parallelism, they create concurrency), applications that already use async I/O efficiently, and code that holds native locks or calls JNI methods while blocking (these “pin” the virtual thread to its carrier, negating the benefit).
- Pinning occurs when a virtual thread cannot be unmounted from its carrier thread during a blocking operation. When this happens, the carrier thread is blocked along with the virtual thread, which defeats the entire purpose of virtual threads (the carrier pool has fewer available threads, reducing throughput).
- The primary cause of pinning is
synchronizedblocks and methods. When a virtual thread enters asynchronizedblock and then blocks (e.g., callsThread.sleepor performs I/O inside the synchronized block), the JVM cannot unmount it because the monitor lock is associated with the specific OS thread. Unmounting would mean the lock is “held” by a thread that is no longer running, which breaks the monitor semantics. - The fix is straightforward: replace
synchronizedwithReentrantLockin code that runs on virtual threads.ReentrantLockis implemented in Java (not via native monitor instructions), so the JVM can unmount a virtual thread that blocks while holding aReentrantLock. This is the single most important migration step when adopting virtual threads in an existing codebase. - The JVM provides a diagnostic flag
-Djdk.tracePinnedThreads=full(orshort) that prints a stack trace whenever a virtual thread is pinned. In a migration, you run the application with this flag under load and fix the pinning hotspots. In practice, most pinning comes from third-party libraries (JDBC drivers, connection pools, logging frameworks) that usesynchronizedinternally. Library authors are actively migrating toReentrantLock— for example, HikariCP (the most popular connection pool) has been updated for virtual thread compatibility.
A production service is experiencing intermittent thread starvation under load. How do you diagnose and fix it?
A production service is experiencing intermittent thread starvation under load. How do you diagnose and fix it?
Strong Answer:
- Thread starvation means tasks are queued but not making progress because all threads in the pool are occupied. The symptoms are: increasing response latency, growing task queue depth, eventually timeouts and rejections. The first diagnostic step is capturing a thread dump (
jstack <pid>,kill -3 <pid>on Linux, or JFR/JMX). I look at what the threads are doing — if all threads in the pool are inBLOCKEDorWAITINGstate, something is holding them up. - The most common cause I have seen: a thread pool executing tasks that themselves make blocking calls to a slow downstream service. If you have a pool of 10 threads and each task makes an HTTP call to a service that occasionally takes 30 seconds to respond (due to GC pauses, network issues, or the downstream service being overloaded), it takes only 10 slow requests to exhaust the entire pool. All subsequent tasks queue up behind the slow ones. The fix is: (1) add timeouts to all downstream calls (connect timeout + read timeout), (2) use a circuit breaker (Resilience4j, Hystrix) to fail fast when the downstream is unhealthy, and (3) consider separate thread pools for different downstream services (bulkhead pattern) so one slow service cannot starve the others.
- Another common cause: lock contention. If every task acquires a shared lock and one task holds it for a long time (e.g., a database transaction under load), all other threads block waiting for the lock. The thread dump will show many threads in
BLOCKEDstate on the same monitor. The fix depends on the situation: reduce the scope of the lock, switch to a read-write lock if most operations are reads, use lock-free data structures, or remove the shared state entirely. - A less obvious cause: the common ForkJoinPool being saturated by parallel streams or CompletableFuture tasks. By default,
CompletableFuture.supplyAsync()runs on the common ForkJoinPool, which hasRuntime.getRuntime().availableProcessors() - 1threads. If many tasks use this pool and some block, the entire application’s parallel processing stalls. The fix is providing a dedicatedExecutortosupplyAsync(task, myExecutor)and never relying on the common pool for blocking work. - My diagnostic sequence: (1) thread dump to identify what threads are doing, (2) check pool metrics (queue depth, active count, task rejection count), (3) correlate with downstream latency metrics, (4) check for lock contention in the thread dump, (5) verify timeouts are configured on all I/O operations.
- The classic formula from Brian Goetz’s “Java Concurrency in Practice” is:
threads = N_cpu * U_cpu * (1 + W/C), whereN_cpuis the number of available processors,U_cpuis the target CPU utilization (usually 0.5 to 1.0),Wis the average time a task spends waiting (I/O, locks), andCis the average time a task spends computing. For CPU-bound tasks (W/C near 0), you want roughly N_cpu threads. For I/O-bound tasks (W/C is large), you want many more threads. - In practice, I do not trust the formula blindly. I start with the formula as a baseline, then load test the application with a realistic traffic pattern and observe: CPU utilization (should be high but not saturated — 70-80% is healthy), response latency percentiles (p50, p95, p99), thread pool queue depth (should be near zero in steady state), and GC pause behavior (too many threads create too many objects, increasing GC pressure). I adjust the pool size iteratively based on these observations.
- A common mistake: setting the pool size based on peak load. If you configure 200 threads for a traffic spike that happens once a day, you waste memory and OS resources for the other 23 hours. Use a dynamic pool (
ThreadPoolExecutorwith core and max pool sizes) that scales up under load and shrinks during quiet periods. Set the core size for average load and the max size for peak load, with a reasonable keep-alive time (60 seconds is the default). - With virtual threads (Java 21), the pool sizing question largely disappears for I/O-bound work. You create one virtual thread per task and let the JVM manage carrier thread scheduling. The JVM’s ForkJoinPool for carriers is sized to the number of processors, which is optimal for CPU utilization. This is one of the most compelling reasons to migrate to virtual threads.