Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
C++ Crash Course
C++ is the language of performance. From operating systems and game engines to high-frequency trading and AI infrastructure, C++ powers the world’s most critical systems.This crash course is designed to take you from the basics to advanced modern C++ (C++17/20) concepts quickly but deeply. We focus on modern best practices, memory safety, and performance.
Why C++?
Despite being over 40 years old, C++ remains dominant. Why?Unmatched Performance
Control
Portability
Modern Features
Course Roadmap
This course is structured to build your mental model of how C++ works under the hood.Fundamentals
Memory Mastery
Object-Oriented Design
The STL
Modern C++
Prerequisites
- Basic understanding of programming concepts (variables, loops, functions).
- A C++ compiler (GCC, Clang, or MSVC).
- A code editor (VS Code, CLion, or Visual Studio).
Who Uses C++ in Production?
C++ is not an academic curiosity — it powers the systems you rely on every day:- Game Engines: Unreal Engine, Unity’s native layer, and nearly every AAA game studio writes performance-critical code in C++.
- Browsers: Chrome (Blink engine), Firefox (parts of Gecko), and Safari (WebKit) are all C++.
- Databases: MySQL, PostgreSQL, MongoDB, and Redis have C or C++ at their core.
- Operating Systems: Windows, macOS, Linux kernel modules, and Android’s native layer.
- Finance: High-frequency trading systems where microseconds translate to millions of dollars.
- AI/ML Infrastructure: TensorFlow, PyTorch, and ONNX Runtime use C++ under the hood for GPU kernel execution and memory management. Python is the interface; C++ is the engine.
When to Choose C++ — A Decision Framework
Not every project needs C++. Use this framework to decide:| Question | If Yes | If No |
|---|---|---|
| Do you need microsecond-level latency? | C++ is the right tool. | Consider Go, Rust, or Java. |
| Do you need direct hardware/OS access? | C++ or C. | Higher-level languages are fine. |
| Is the project a library consumed by multiple languages? | C++ with C bindings is the standard approach. | Use the target language natively. |
| Is memory footprint critical (embedded, mobile)? | C++ or C give you precise control. | Managed languages may be acceptable. |
| Is developer velocity the top priority over raw performance? | C++ has slower iteration cycles. Consider Python, Go, or TypeScript. | C++ is viable. |
| Does the project need fearless concurrency guarantees? | Consider Rust. C++ gives you the tools but not the safety net. | C++ is fine. |
The “Zero-Overhead” Principle
One of the guiding principles of C++ is the Zero-Overhead Principle, articulated by Bjarne Stroustrup (the creator of C++):- What you don’t use, you don’t pay for. (No hidden costs). If you never use exceptions, they add zero runtime overhead to your code. If you never use virtual functions, there is no vtable lookup cost.
- What you do use, you couldn’t hand-code any better. (Abstractions are efficient). A
std::vectoris as fast as a manually managed dynamic array. Astd::sortis as fast as (often faster than) a hand-written quicksort, because the compiler can inline the comparator.
Thought Exercises
Before you dive into the chapters, consider these questions. You do not need to answer them now — revisit them after completing the course to measure your growth.- The trade-off question: A startup asks you to build a real-time video processing pipeline. They want it done in 3 months. Their team knows Python well but has no C++ experience. Would you recommend C++? What factors would change your answer?
- The migration question: You maintain a Python microservice that processes 100 requests/second. Product says you need to handle 50,000 requests/second within 6 months. Under what conditions would rewriting in C++ make sense vs. scaling horizontally vs. optimizing the Python code?
- The zero-overhead test: Pick any C++ abstraction you have heard of (e.g.,
std::vector, virtual functions, smart pointers). Can you articulate what the “zero-overhead” cost is — meaning, what would you pay if you hand-coded the equivalent in C?
Interview Deep-Dive
Explain the zero-overhead principle in C++. How does it hold up in practice, and where does it break down?
Explain the zero-overhead principle in C++. How does it hold up in practice, and where does it break down?
- The zero-overhead principle has two parts, both articulated by Stroustrup: (1) you do not pay for features you do not use, and (2) features you do use compile down to code that is as efficient as what you could write by hand. For example, a
std::vectorcompiles to the same pointer-plus-size-plus-capacity structure you would manage manually in C, andstd::sortwith an inlined comparator often beats a hand-written quicksort because the compiler can inline the comparison. - In practice, this holds remarkably well for features like templates, RAII wrappers, and move semantics. A
std::unique_ptrwith a default deleter is literally the same size and generates the same machine code as a raw pointer with manualdelete. You can verify this by examining the assembly output with-O2. - Where it gets nuanced: exceptions have zero cost on the “happy path” (no exception thrown) under the table-based model used by GCC and Clang on x86-64, but they inflate binary size with unwind tables, and the “sad path” (exception actually thrown) is extremely expensive — stack unwinding can take milliseconds. RTTI (
dynamic_cast,typeid) adds a vtable pointer and type metadata to every polymorphic class, which you pay for even if you never usedynamic_caston that specific class. Virtual functions add a vtable indirection that prevents inlining — in hot loops processing millions of elements, this indirection can cause measurable cache misses. - The real-world judgment call: at a trading firm processing millions of ticks per second, teams often disable exceptions and RTTI entirely (
-fno-exceptions -fno-rtti) and avoid virtual dispatch in hot paths, precisely because these features violate the zero-overhead principle in latency-critical contexts.
- The classic technique is CRTP (Curiously Recurring Template Pattern), which achieves static polymorphism. The base class is templated on the derived class:
template<typename Derived> class Base. Method calls resolve at compile time viastatic_cast<Derived*>(this)->method(), so the compiler inlines everything — zero indirection, zero cache miss. - Another approach is
std::variantwithstd::visit. If you have a closed set of types (you know all possible types at compile time), a variant-based dispatch can be faster than virtual because the compiler can generate a jump table or even inline all cases. Benchmarks on real workloads show variant-visit outperforming virtual dispatch by 20-40% in tight loops because it avoids the pointer chase through the vtable. - A third option is function pointers stored directly in the object, avoiding the vtable indirection layer. This is common in C-style polymorphism (Linux kernel style) and in game engines where ECS (Entity Component System) architectures replace inheritance hierarchies entirely.
- The trade-off is flexibility: virtual dispatch supports open extension (anyone can add a derived class without modifying existing code), while CRTP and variant require knowing all types at compile time. In systems where the type set is fixed and performance is paramount, static polymorphism wins. In plugin architectures or frameworks, virtual dispatch is the pragmatic choice.
When would you choose C++ over Rust for a new project, and when would you choose Rust over C++?
When would you choose C++ over Rust for a new project, and when would you choose Rust over C++?
- I would choose C++ when the project needs to integrate deeply with an existing C/C++ ecosystem. For example, if you are writing a plugin for Unreal Engine, a module for a database like PostgreSQL, or extending a trading system already built in C++, the interop cost of using Rust (FFI boundaries, data layout mismatches, build system integration) outweighs the safety benefits. The tooling ecosystem also matters: C++ has mature profilers (VTune, perf), sanitizers (ASan, TSan, MSan), and IDE support that Rust is still catching up on for certain domains.
- I would choose Rust for greenfield systems where memory safety is a hard requirement, particularly anything network-facing or security-sensitive. Rust’s borrow checker eliminates entire classes of CVEs (use-after-free, buffer overflows, data races) at compile time. Microsoft reported that 70% of their security vulnerabilities are memory safety issues — Rust removes those by construction.
- The hiring angle is real and often underappreciated. As of today, there are roughly 50x more experienced C++ developers than Rust developers in the market. For a startup that needs to staff a team of 10 in 3 months, C++ is a pragmatic choice even if Rust is technically superior for the problem.
- A nuanced point most candidates miss: Rust’s safety guarantees come with a learning curve cost that is front-loaded. Teams report 3-6 months of reduced productivity as developers learn to “think in ownership.” For projects with aggressive timelines, this matters.
- C and C++ share the same ABI on most platforms when using
extern "C", so calling C from C++ (and vice versa) is trivially zero-cost. Rust’s FFI with C is also clean viaextern "C"blocks, but C++ interop is fundamentally harder because C++ has no stable ABI. Features like name mangling, templates, exceptions, virtual dispatch, and STL types (std::string,std::vector) have no standardized binary layout. You cannot pass astd::vectoracross an FFI boundary. - Tools like
cxx(the Rust crate) provide a bridge, but they impose constraints: you must declare the shared interface explicitly, and only a subset of C++ types are supported. Complex C++ idioms like template-heavy APIs, RAII types with custom deleters, or overloaded operators require manual wrapper code. - In practice, teams that need deep Rust-C++ interop often create a “C shim layer” — a set of C-compatible functions that wrap the C++ API — and call those from Rust. This adds maintenance burden and eliminates some of Rust’s safety guarantees at the boundary.
A startup asks you to rewrite their Python microservice in C++ because it cannot handle the load. How do you evaluate this decision?
A startup asks you to rewrite their Python microservice in C++ because it cannot handle the load. How do you evaluate this decision?
- My first question is: have you profiled the Python service to identify the actual bottleneck? In my experience, most Python performance problems are not “Python is slow” but rather “we are doing N+1 database queries” or “we are serializing 50MB JSON payloads synchronously.” Fixing algorithmic issues in Python is 10x cheaper than a rewrite.
- Second, I would quantify the gap. If the service handles 100 req/s and needs 500 req/s, horizontal scaling (more instances behind a load balancer) or moving hot paths to a C extension (via pybind11 or Cython) is almost certainly the right call. If the service handles 100 req/s and needs 50,000 req/s with sub-millisecond p99 latency, and the bottleneck is CPU-bound computation (not I/O), then a rewrite in a systems language is justified.
- Third, I would evaluate the team. A rewrite in C++ by a team that does not have C++ expertise will produce a codebase with memory leaks, undefined behavior, and security vulnerabilities that are worse than the original performance problem. If the team knows Go or Java but not C++, those languages can often hit the performance target with much less risk.
- The hidden cost most people miss: a rewrite means you stop shipping features for months. Your competitors do not stop. The business cost of a rewrite is not just engineering time — it is opportunity cost. Joel Spolsky’s “Things You Should Never Do” essay is still relevant here.
- If the decision is to proceed: rewrite the hot path only, not the entire service. Keep Python for the request handling, routing, and business logic. Extract the CPU-intensive computation into a C++ shared library called via FFI, or into a separate microservice with a gRPC interface.
- For CPU profiling:
py-spyis the go-to tool because it samples the Python process without modifying the code or adding overhead. It gives you a flame graph showing exactly which functions consume wall-clock time. For more granular analysis,cProfile(built-in) gives function-level call counts and cumulative time, though it adds ~30% overhead. - For I/O and latency: instrument the service with distributed tracing (Jaeger, Datadog APM). This reveals whether the bottleneck is the Python code itself or downstream calls (database, external APIs, network). I have seen cases where 95% of request latency was a slow database query, and the “Python is slow” narrative was completely wrong.
- For memory:
tracemalloc(built-in since Python 3.4) shows memory allocation by source line.objgraphhelps find reference cycles that prevent garbage collection. - For async I/O bottlenecks: if the service uses asyncio,
aiomonitorandasyncio.get_event_loop().slow_callback_durationreveal coroutines that block the event loop.
What does 'zero-cost abstraction' mean for templates specifically? Walk me through how a template function compiles differently from a virtual function call.
What does 'zero-cost abstraction' mean for templates specifically? Walk me through how a template function compiles differently from a virtual function call.
- When you write a template function like
template<typename T> T max(T a, T b), the compiler does not generate a single generic function. It generates a separate, fully specialized version for every type you actually use it with. If you callmax(3, 5)andmax(3.14, 2.71), the compiler emits two functions: one operating onintregisters and one ondoubleregisters. Each is optimized as aggressively as if you had written it by hand for that specific type. The comparisons are inlined, the types are known, and the optimizer has full visibility. This is “zero cost” because the abstraction (generic code) compiles away entirely. - A virtual function call, by contrast, goes through an indirection. The object has a hidden vptr (virtual table pointer) that points to a vtable — an array of function pointers, one per virtual method. Calling
obj->doSomething()at runtime means: load the vptr from the object, index into the vtable, load the function pointer, then call through it. This is 2-3 pointer dereferences, and critically, the compiler usually cannot inline the function because it does not know at compile time which concrete function will be called. - The performance difference is measurable in hot paths. In a tight loop calling a templated comparator, the comparator is inlined into the loop body — the function call overhead is literally zero. In the same loop using a virtual comparator, each iteration pays for the vtable lookup and the inability to inline prevents further optimizations like vectorization and loop unrolling.
- The trade-off: templates cause “code bloat” — each instantiation is a separate copy of the function in the binary. If you instantiate
std::sortfor 50 different comparator types, you get 50 copies of the sort algorithm in your binary. Virtual dispatch has one copy of the function but pays the indirection cost at runtime. For most applications the binary size increase is negligible, but in embedded systems with tight flash constraints, it matters.
- Templates require all types to be known at compile time. If you are building a plugin system where third-party code can register new types at runtime (think: a game engine where modders add new entity types via DLLs), templates cannot help you because the types do not exist when the host application is compiled. Virtual dispatch is the mechanism for open-ended runtime polymorphism.
- Templates also increase compile times significantly because the compiler must re-instantiate and re-optimize template code for every unique type combination. Large template-heavy codebases (like those using Boost or Eigen heavily) can have compile times measured in tens of minutes. Virtual dispatch compiles once.
- The practical engineering answer: use templates for performance-critical, closed-type-set code paths (algorithms, containers, math libraries). Use virtual dispatch for extensible interfaces, plugin architectures, and code where the set of types changes over time. Most real systems use both.