Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Frontend Engineering — Architecture, Performance, and Production Reality
The frontend is not the “easy” side of engineering. It is the side where every millisecond of latency is felt by a human being, where a single layout shift can cost millions in lost conversions, where you ship code that runs on hardware you don’t control — thousands of device models, dozens of browser versions, network conditions ranging from fiber to 2G. Frontend engineering at a senior level is about understanding the browser as a runtime environment with the same rigor that backend engineers understand Linux. This chapter covers everything a senior frontend engineer should be able to think about, talk about, and reason through in an interview.Real-World Stories: Why Frontend Engineering Matters
How Shopify's Storefront Renderer Rebuild Changed Their Business
How Shopify's Storefront Renderer Rebuild Changed Their Business
Airbnb's Design System -- From Chaos to Consistency at Scale
Airbnb's Design System -- From Chaos to Consistency at Scale
Button component works correctly in a modal, in a form, in a sticky header, and in a responsive grid — all with proper keyboard navigation and screen reader support?DLS shipped with strict API contracts, comprehensive Storybook documentation, visual regression testing using Chromatic, and a migration tool that could automatically update import paths when components moved between packages. Within 18 months, Airbnb reported a 34% reduction in CSS shipped to production and a measurable improvement in accessibility scores. The investment took over a year and involved a dedicated team of 8 engineers and 3 designers. The lesson: design systems are infrastructure projects, not side quests. They pay for themselves in engineering velocity, but only if you invest in them like you would invest in a database migration — with dedicated resources, clear milestones, and a multi-year commitment.Google's Core Web Vitals Rollout -- When Performance Became a Ranking Signal
Google's Core Web Vitals Rollout -- When Performance Became a Ranking Signal
Part I — Frontend Architecture & Rendering
1. Component Architecture
Component architecture is the foundation of modern frontend engineering. Every major framework — React, Vue, Angular, Svelte, Solid — is built on the idea that UIs are compositions of reusable, encapsulated pieces. But “components” is where the easy part ends and the hard questions begin: How big should a component be? When should you split one? How do components communicate? What goes in a component versus a utility function versus a hook?1.1 Mental Models Across Frameworks
- React
- Vue
- Angular
useState, useReducer). Side effects are declared with useEffect. The mental model is functional: given props and state, return a view. React re-renders a component whenever its state changes or its parent re-renders — understanding this is the single most important concept in React performance.useMemo, and callbacks passed to children should be stabilized with useCallback — but only when you have measured a performance problem, not as a premature optimization.1.2 Composition vs. Inheritance
In modern frontend development, composition won — and it won decisively. Inheritance-based component hierarchies (common in early Angular and pre-hooks React class components) created rigid, hard-to-refactor trees where changing a base class behavior affected every descendant. Composition — building complex components by combining simpler ones and sharing logic through hooks, composables, or utilities — is more flexible and easier to reason about. Why composition wins:- Flexibility: A component that accepts children or render props can be used in contexts the author never anticipated. An inherited component can only extend the base class.
- Testability: Composed components can be tested in isolation. Inherited components carry implicit behavior from their entire ancestor chain.
- Readability: With composition, you can see every behavior a component uses by reading its imports and hooks. With inheritance, you have to trace the entire class hierarchy.
1.3 Component Patterns That Scale
Container / Presenter Pattern (also called Smart / Dumb) Containers handle data fetching, state management, and business logic. Presenters receive data via props and render UI. This separation means presenters are trivially reusable and testable — they are pure functions of their props.Interview Question: How do you decide when to split a component?
Interview Question: How do you decide when to split a component?
- Reuse — The same UI pattern appears (or will appear) in multiple places.
- Complexity — The component has grown past the point where I can hold its behavior in my head (~200-300 lines, or when it has more than 3-4 independent state variables).
- Independent testing — I need to test a specific piece of logic in isolation (e.g., a complex form validation section within a larger form).
- Performance — A section re-renders frequently due to local state changes, and extracting it prevents the parent from re-rendering unnecessarily.
- Saying “components should be less than 100 lines” as a rule without context
- Immediately reaching for Context or Redux to avoid any prop passing
- Decomposing a component that is only used once into 5+ tiny files that are harder to understand as a whole
2. Rendering Strategies
Rendering strategy is the single most impactful architectural decision in a frontend application. It determines how fast your users see content, how your application performs on low-end devices, how search engines index your pages, and how much infrastructure you need. Getting this wrong is expensive to fix — it often means rewriting the application layer.2.1 Client-Side Rendering (CSR)
The browser downloads a minimal HTML shell (often just a<div id="root"></div>), a JavaScript bundle, and the JS creates the entire DOM. This is the default for create-react-app, vanilla Vite React apps, and most SPAs.
How it works:
- Browser requests the page. Server sends a near-empty HTML document.
- Browser downloads and parses the JavaScript bundle (50KB-2MB+ depending on the app).
- JavaScript executes, makes API calls, builds the DOM.
- User sees a blank page (or spinner) until step 3 completes.
- Authenticated dashboards behind a login wall (SEO does not matter, users expect a loading state).
- Highly interactive applications like design tools (Figma), spreadsheets (Google Sheets), or IDEs (VS Code for Web) where the entire app is a single interactive surface.
- Internal tools where Time to Interactive matters more than First Contentful Paint.
- Content-heavy pages that need SEO (blogs, e-commerce product pages, marketing sites).
- Low-end devices and slow networks — a 500KB JS bundle takes 8-15 seconds to parse and execute on a low-end Android phone on a 3G connection.
- Pages where first meaningful paint matters — users see nothing until all JS has downloaded and executed.
2.2 Server-Side Rendering (SSR)
The server runs the application code, generates the full HTML for the page, and sends it to the browser. The browser displays the HTML immediately (fast First Contentful Paint), then downloads JavaScript which “hydrates” the page — attaching event handlers and making it interactive. How it works:- Browser requests the page. Server runs the component tree, fetches data, generates complete HTML.
- Browser receives full HTML and renders it immediately — the user sees content.
- Browser downloads and executes JavaScript.
- JavaScript “hydrates” the existing HTML — attaching event listeners, restoring state, making it interactive.
- The page is now fully interactive (Time to Interactive).
2.3 Static Site Generation (SSG)
Pages are generated at build time — the HTML is pre-rendered and served from a CDN as static files. The fastest possible delivery because there is no server computation at request time. A CDN can serve a static HTML file in 5-20ms from an edge node, versus 50-500ms for an SSR response that requires server computation. When SSG works:- Content that changes infrequently (documentation, blogs, marketing pages).
- Pages that are the same for every user (no personalization in the initial HTML).
- Sites where build time is acceptable (a 10,000-page site might take 10-30 minutes to build).
- Frequently updated content (a news site with hundreds of articles per day cannot rebuild the entire site for each article).
- Personalized content (the page is different for every user — cannot pre-render at build time).
- Very large sites (100,000+ pages create build times measured in hours).
2.4 Incremental Static Regeneration (ISR)
A hybrid strategy pioneered by Next.js. Pages are statically generated but can be re-generated in the background after a configurable time interval. The first visitor after the interval gets the stale page (fast) while triggering a rebuild. Subsequent visitors get the fresh page.2.5 Streaming SSR
Instead of waiting for the entire page to render before sending any HTML, the server streams HTML fragments as they become ready. The browser starts rendering the header and layout immediately while the server is still fetching data for the product grid below. How React 18 Streaming SSR works:<Layout> and <Header> HTML immediately. When ProductDetails data is ready, it streams that HTML (replacing the skeleton). When ProductReviews data is ready, it streams that too. Each <Suspense> boundary is an independent streaming unit.
The key benefit: Time to First Byte (TTFB) is dramatically reduced because the server starts sending HTML before all data is ready. The user sees the page structure immediately, then content fills in progressively.
2.6 React Server Components (RSC) — What They Actually Change
React Server Components are the most significant architectural shift in React since hooks. Understanding what they actually do (versus the marketing) is essential for senior-level interviews. What RSCs are: Components that run only on the server. They are never shipped to the browser. They never hydrate. Their JavaScript is never included in the client bundle. They can directly access databases, file systems, and other server-side resources without an API layer. What RSCs are NOT: They are not SSR. SSR renders components to HTML on the server but still sends the component JavaScript to the client for hydration. RSCs send the rendered output to the client as a serialized tree (React’s streaming format), and the client React runtime inserts this into the DOM without needing the component’s source code. The actual change: In a traditional React SSR app, a product page component that fetches data from a database:- Runs on the server to generate HTML (SSR).
- The component’s JavaScript is sent to the client (~5-50KB for the component + its dependencies).
- The component re-executes on the client during hydration.
- Runs on the server. Its rendered output (not its source code) is streamed to the client.
- Zero JavaScript is sent to the client for this component.
- No hydration needed for this component.
"use client".
Interview Question: When would you choose SSR over SSG, and when would you choose neither?
Interview Question: When would you choose SSR over SSG, and when would you choose neither?
- Saying “SSR is always better than CSR” without mentioning the hydration cost
- Not knowing that SSG exists or treating it as “just for blogs”
- Not mentioning streaming SSR or React Server Components in a 2024+ interview
2.7 Framework Comparison for Rendering
| Framework | SSG | SSR | Streaming SSR | ISR | RSC | Edge Runtime | Key Trade-off |
|---|---|---|---|---|---|---|---|
| Next.js (App Router) | Yes | Yes | Yes | Yes | Yes | Yes | Most feature-complete but complex; App Router has a steep learning curve |
| Remix | No (by design) | Yes | Yes | No | Yes (via Next.js) | Yes | Leans into web standards; no SSG by design (“just use a CDN cache”) |
| Nuxt 3 | Yes | Yes | Yes (via Nitro) | Yes | No (Vue ecosystem) | Yes | Vue’s answer to Next.js; excellent DX for Vue teams |
| Astro | Yes (default) | Yes (on-demand) | Yes | No | N/A (framework-agnostic) | Yes | Islands architecture by default; best for content-heavy sites |
| SvelteKit | Yes | Yes | Yes | No | No | Yes | Smallest runtime (~2KB); compiles away the framework |
| Gatsby | Yes (primary) | Limited | No | Yes (via DSG) | No | No | GraphQL data layer is powerful but heavy; declining community |
3. State Management
State management is where frontend architecture either stays clean or collapses into an unmaintainable mess. The core challenge: UI state, server state, URL state, and form state are fundamentally different things with different lifecycles, update patterns, and persistence needs — but they all affect what the user sees.3.1 The State Management Hierarchy
Before reaching for any library, exhaust simpler solutions first:Local component state (useState / ref)
Lifted state (shared parent)
Composition (children, render props, slots)
Context (React Context / Vue provide/inject)
Server state library (TanStack Query / SWR)
External state library (Zustand / Redux / Jotai)
3.2 Server State vs Client State
This distinction is the most important mental model in modern state management. Server state is data that lives on the server and you are merely caching a copy. It has a source of truth outside your application. It can become stale. It can be updated by other users. Examples: user profile data, product listings, order history. Client state is data that exists only in the browser. There is no server truth. It is ephemeral. Examples: whether a modal is open, the current tab in a tab bar, the items in an unsaved draft.3.3 State Library Comparison
| Library | Mental Model | Bundle Size | Learning Curve | Best For |
|---|---|---|---|---|
| Redux Toolkit | Single store, actions, reducers, middleware | ~11KB | High | Large teams needing strict patterns, DevTools, middleware ecosystem |
| Zustand | Single or multiple stores, direct mutations | ~1.5KB | Low | Most applications — simpler API, less boilerplate than Redux |
| Jotai | Atomic state — each piece of state is an atom | ~3KB | Medium | Bottom-up state composition, derived state, avoiding re-renders |
| Recoil | Atoms + selectors (graph-based) | ~22KB | Medium | (Meta’s library, but development has stalled — prefer Jotai) |
| XState | State machines and statecharts | ~15KB | High | Complex workflows with explicit states and transitions (forms, wizards, auth flows) |
| Valtio | Proxy-based — mutate directly, re-render automatically | ~3KB | Low | Teams who want mutable-style API with immutable guarantees |
| TanStack Query | Server state cache with automatic lifecycle | ~12KB | Medium | All server data — this is not optional, it is the standard |
useState/useReducer for local state, and Zustand or Jotai if you genuinely need shared client state. Redux is still a fine choice for large teams with existing Redux codebases, but starting a new project with Redux when Zustand exists is choosing complexity without benefit.3.4 State Machines (XState)
State machines are underused in frontend development. They shine when a component has explicit states with defined transitions — and invalid states should be impossible. Consider an async operation button:isLoading, isError, isSuccess) and invariant bugs are easy: what happens if isLoading and isError are both true? With a state machine, that state is impossible by construction.
if (isLoading && !isError && data !== null) — that boolean soup is a state machine trying to escape.
Interview Question: Your team's React application has performance problems -- components are re-rendering too frequently. How do you diagnose and fix this?
Interview Question: Your team's React application has performance problems -- components are re-rendering too frequently. How do you diagnose and fix this?
memo() everything?Strong answer framework:Diagnosis first:- Open React DevTools Profiler and record a session. Identify which components re-render most frequently and why (parent re-render, context change, state change).
- Look for the “cascading re-render” pattern: a state change high in the tree causes re-renders of dozens of children, most of which did not need to update.
- Check for unstable references — objects or functions created in render that change identity on every render, defeating
React.memoanduseMemoin children.
onChange updates state in a top-level component, causing the entire page to re-render on every keystroke.
Fix: Move the search state down to the search component. Only lift the final query (on submit or debounced) to the parent.Problem 2: Context providing a new object on every render.ThemeContext and UserContext.The senior nuance: React.memo, useMemo, and useCallback are not free — they add memory overhead and code complexity. I only add them when the Profiler shows a measurable problem. Premature memoization is a form of premature optimization — it makes code harder to read without evidence it improves performance.What separates a Staff-level answer: “Before reaching for memoization, I ask whether the state architecture is wrong. If a state change in component A causes re-renders in components B through Z, the problem is usually not that B-Z need memo — the problem is that A should not own that state, or the state should be in a more granular store like Zustand or Jotai atoms that only notify the specific subscribers that care about the changed value.”4. Micro-Frontends
Micro-frontends apply the microservices pattern to the frontend: instead of a single monolithic SPA, multiple independently deployed frontend applications compose a single user-facing page. Each “micro-frontend” is owned by a different team, has its own repository, build pipeline, and deployment cycle.4.1 When Micro-Frontends Make Sense (Hint: Rarely)
Micro-frontends solve an organizational problem, not a technical one. They make sense when:- You have 5+ teams shipping features to different sections of the same application.
- Teams need independent deployment — Team A cannot be blocked by Team B’s broken build.
- Teams use different frameworks (one team is Angular, another is React) and a full rewrite is not feasible.
- The application is large enough that a monolithic build takes 30+ minutes and affects developer velocity.
- You have fewer than 3-4 frontend teams. The coordination overhead exceeds the independence benefit.
- Your teams can reasonably share a codebase (same framework, same repo, compatible release cadence).
- Performance is a top priority. Micro-frontends add overhead: multiple framework runtimes, duplicate dependencies, inter-app communication costs.
4.2 Implementation Approaches
- Module Federation (Webpack/Vite)
- Single-SPA
- iframe-Based
Part II — Performance & Core Web Vitals
5. Core Web Vitals Deep Dive
Core Web Vitals are Google’s metrics for measuring real user experience on the web. They are not vanity metrics — they are ranking signals in Google Search, and they correlate strongly with business outcomes. Every senior frontend engineer should be able to explain what each metric measures, diagnose common causes of poor scores, and fix them.5.1 Largest Contentful Paint (LCP)
What it measures: The time from when the page starts loading to when the largest content element in the viewport finishes rendering. This is usually a hero image, a large text block, or a video poster frame. LCP captures the user’s perception of “this page has loaded.” Targets: Good: < 2.5s. Needs improvement: 2.5-4.0s. Poor: > 4.0s. Common causes of poor LCP and how to fix them:| Cause | Diagnosis | Fix |
|---|---|---|
| Slow server response (TTFB > 600ms) | Check TTFB in WebPageTest or Chrome DevTools | SSG/ISR for cacheable pages, edge rendering, database query optimization, CDN caching |
| Render-blocking resources | Lighthouse flags “Eliminate render-blocking resources” | Inline critical CSS, defer non-critical CSS, async/defer on scripts |
| Slow resource load (LCP image takes 3s) | Network waterfall in DevTools shows late/slow image load | <link rel="preload"> for LCP image, responsive images with srcset, modern formats (WebP/AVIF), CDN |
| Client-side rendering delay | LCP element does not exist in initial HTML | Move to SSR or SSG for the LCP element; use React Server Components |
| Lazy-loaded LCP element | LCP image has loading="lazy" | NEVER lazy-load the LCP element — it should be in the initial HTML and eagerly loaded |
5.2 Interaction to Next Paint (INP)
What it measures: The latency of the slowest interaction (click, tap, key press) during the page visit, measuring from the moment the user interacts to the moment the browser paints the next frame. INP replaced First Input Delay (FID) as a Core Web Vital in March 2024 because FID only measured the first interaction, while INP measures all interactions throughout the page lifecycle. Targets: Good: < 200ms. Needs improvement: 200-500ms. Poor: > 500ms. Why INP is harder than FID: FID only measured input delay — the time between the user’s click and when the browser starts processing the event handler. INP measures the complete lifecycle: input delay + processing time + presentation delay (time to render the visual update). This means your event handler’s execution time and the subsequent paint cost both count. Common causes of poor INP:-
Long event handlers: An
onClickthat synchronously filters 10,000 items, re-sorts them, and updates a complex component tree can take 200-500ms. Fix: debounce heavy operations, usestartTransition(React 18+) to mark non-urgent updates, move computation to a Web Worker. -
Layout thrashing in handlers: Reading layout properties (offsetHeight, getBoundingClientRect) and then writing to the DOM (changing styles) in the same handler forces the browser to recalculate layout synchronously. Fix: batch reads and writes separately, use
requestAnimationFramefor DOM writes. -
Third-party scripts blocking the main thread: Analytics scripts, chat widgets, and ad scripts that register long-running event listeners. Fix: load third-party scripts with
async/defer, useloading="lazy"on iframes, audit third-party impact with Chrome DevTools Performance panel. - Hydration blocking interaction: On SSR pages, the user clicks a button before hydration completes. The browser queues the event, but the handler is not yet attached. Once hydration finishes and the handler fires, the time since the click counts as input delay. Fix: use selective hydration (React 18), islands architecture (Astro), or resumability (Qwik).
5.3 Cumulative Layout Shift (CLS)
What it measures: The sum of all unexpected layout shifts that occur during the entire lifespan of the page. A “layout shift” is when a visible element changes position after it has been rendered — text jumps down when an ad loads, a button moves when a font finishes loading, an image without dimensions pushes content below it. Targets: Good: < 0.1. Needs improvement: 0.1-0.25. Poor: > 0.25. Common causes and fixes:| Cause | Example | Fix |
|---|---|---|
| Images without dimensions | <img src="photo.jpg"> with no width/height | Always set width and height attributes, or use CSS aspect-ratio |
| Dynamically injected content | An ad banner loads 2s after page load and pushes content down | Reserve space with min-height or a placeholder container |
| Web fonts causing FOUT | Text renders in fallback font, shifts when web font loads | font-display: optional (prevents FOUT entirely) or font-display: swap with size-adjusted fallback fonts |
| Late-loading components | A cookie consent banner slides in and shifts content | Use position: fixed or position: sticky so it overlays rather than displacing content |
Interview Question: Your e-commerce site has a CLS of 0.35 on product pages. How do you diagnose and fix it?
Interview Question: Your e-commerce site has a CLS of 0.35 on product pages. How do you diagnose and fix it?
- Run a Lighthouse audit in Chrome DevTools — it highlights the top CLS-contributing elements.
- Use the Performance panel with “Layout Shift Regions” enabled to see exactly which elements shifted and when.
- Check the Layout Instability API via
PerformanceObserverin production (Real User Monitoring) to understand CLS in the field, not just in lab conditions.
- Hero image without dimensions: Add
widthandheightattributes matching the aspect ratio. This alone often reduces CLS by 0.1-0.15. - Above-the-fold ad slots: Reserve explicit space with a CSS container of the exact ad dimensions. If the ad does not load, the space remains empty — this is better than having content jump.
- Product image carousel: Set the container’s
aspect-ratio: 4/3(or whatever the image ratio is) so the space is reserved before images load. - Late-loading review count/stars: If product reviews load asynchronously and push the “Add to Cart” button down, either SSR the review count or reserve space for it.
- Font-induced shift: Measure the fallback font’s metrics (
ascent-override,descent-override,line-gap-overridein@font-face) to match the web font’s dimensions, eliminating the shift when the web font loads.
- Saying “just add
widthandheightto images” without checking if the issue is actually caused by images - Not distinguishing between above-the-fold shifts (critical) and below-the-fold shifts (less impactful on user experience)
- Not knowing that CLS is measured differently in lab (Lighthouse) vs field (CrUX) — lab measures only load-time shifts, field measures the entire session
6. JavaScript Performance
6.1 Bundle Size — The Silent Killer
Every kilobyte of JavaScript has three costs: download (network transfer), parse (the browser’s JS engine reads the source), and execute (the engine runs the code). On a fast laptop with fiber internet, these costs are invisible. On a mid-range Android phone on a 3G connection, they are brutal. Real numbers to calibrate your intuition:| Bundle Size | Download (3G) | Parse (Mid-range Phone) | Total Before Interactive |
|---|---|---|---|
| 100 KB | ~1.0s | ~0.3s | ~1.3s |
| 300 KB | ~3.0s | ~0.9s | ~3.9s |
| 500 KB | ~5.0s | ~1.5s | ~6.5s |
| 1 MB | ~10.0s | ~3.0s | ~13.0s |
import { debounce } from 'lodash', tree shaking (with proper ES module syntax) includes only the debounce function, not the entire 72KB lodash library. But tree shaking only works with ES modules (import/export), not CommonJS (require/module.exports). Many older libraries do not support tree shaking.
Code splitting breaks your bundle into multiple chunks loaded on demand. Route-based splitting is the most common pattern: each page is a separate chunk loaded only when the user navigates to it.
webpack-bundle-analyzer, source-map-explorer, and Vite’s built-in --report flag visualize exactly what is in your bundle and how much each dependency costs.
6.2 Runtime Performance
Main thread blocking: The browser’s main thread handles JavaScript execution, DOM updates, layout calculation, painting, and user input handling — all on a single thread. When your JavaScript runs for 100ms straight, the browser cannot respond to clicks, cannot update animations, and the page feels frozen. Any task that blocks the main thread for more than 50ms is a Long Task (flagged in Chrome DevTools Performance panel). Breaking up long tasks:6.3 Memory Leaks in SPAs
Single-page applications are particularly prone to memory leaks because the page never fully reloads. In a traditional multi-page site, navigating to a new page discards all JavaScript state. In an SPA, navigating between routes unmounts components but does not clear the JavaScript heap — leaked references accumulate over the session. Common SPA memory leak patterns:- Uncleared event listeners:
- Uncleared timers and intervals:
- Stale closures holding large data:
- Detached DOM nodes: Components that create DOM elements outside React’s tree (portals, tooltips, direct DOM manipulation) may not clean them up on unmount.
7. Asset Optimization
7.1 Image Optimization
Images are typically the largest payload on a web page — 50-80% of total page weight on content-heavy sites. Choosing the right format and delivery strategy has more impact on page load time than any JavaScript optimization. Format comparison:| Format | Compression | Transparency | Animation | Browser Support | When to Use |
|---|---|---|---|---|---|
| JPEG | Lossy, good | No | No | Universal | Photographs where transparency is not needed |
| PNG | Lossless | Yes | No | Universal | Icons, logos, images requiring transparency and sharp edges |
| WebP | Lossy + lossless, 25-35% smaller than JPEG | Yes | Yes | 97%+ global | Default choice for most images in 2024+ |
| AVIF | Lossy + lossless, 50% smaller than JPEG | Yes | Yes | 92%+ global | Best compression but slower to encode; use for high-traffic pages |
| SVG | Vector (infinite scaling) | Yes | Yes (via CSS/JS) | Universal | Icons, logos, illustrations, diagrams — anything that is not a photograph |
7.2 Font Loading Strategies
Web fonts are a common source of both CLS (layout shift when the font loads) and LCP delay (text is invisible until the font loads).font-display values:
| Value | Behavior | CLS Risk | Use When |
|---|---|---|---|
auto | Browser decides (usually block) | Medium | Never — always be explicit |
block | Invisible text for ~3s, then fallback (FOIT) | Low | Rarely — only for icon fonts |
swap | Fallback immediately, swap when font loads (FOUT) | High | Body text where readability beats aesthetics |
fallback | Brief invisible (~100ms), fallback, swap if loaded within 3s | Medium | Good balance — text appears quickly, font swaps if fast |
optional | Brief invisible (~100ms), uses fallback if font not loaded | None | Best for CLS — font is a progressive enhancement |
7.3 Critical CSS and Resource Hints
Critical CSS: Inline the CSS needed for above-the-fold content directly in the<head>. This eliminates the render-blocking CSS download for the initial viewport. Tools like critical (npm package) or critters (Webpack plugin) automate extraction.
Resource hints:
<link rel="preconnect" href="https://fonts.googleapis.com">— Establish the TCP/TLS connection to a third-party origin early. Saves 100-300ms per origin.<link rel="preload" href="/critical.css" as="style">— Tell the browser to download this resource immediately at high priority.<link rel="prefetch" href="/next-page.js">— Download this resource at low priority during idle time.<link rel="dns-prefetch" href="https://analytics.example.com">— Resolve the DNS for this origin ahead of time.
preload resources compete with other high-priority resources (CSS, above-the-fold images). Preloading too many resources is counterproductive — it delays everything equally. Preload at most 2-3 critical resources. Prefetch is fine to use more liberally because it only downloads during idle time.8. Performance Budgets
8.1 Setting Frontend Performance Budgets
A performance budget is a set of limits on metrics that affect user experience — bundle size, number of requests, LCP, INP, CLS, total page weight — that the team agrees not to exceed. Example budget for an e-commerce product page:| Metric | Budget | Rationale |
|---|---|---|
| Total JavaScript | < 200 KB (compressed) | ~3s parse+execute on mid-range phone |
| Total page weight | < 1.5 MB | Loads in < 5s on 3G |
| LCP | < 2.5s (P75, field data) | Google “Good” threshold |
| INP | < 200ms (P75, field data) | Google “Good” threshold |
| CLS | < 0.1 (P75, field data) | Google “Good” threshold |
| TTFB | < 600ms (P75, field data) | Server response time |
| Third-party script count | < 5 | Each script is a performance and security risk |
| Total image weight | < 500 KB | Responsive images with modern formats |
8.2 Enforcing Budgets
Lighthouse CI runs Lighthouse in your CI pipeline and fails the build if scores drop below thresholds:size-limit:
8.3 RUM vs Synthetic Monitoring
Synthetic monitoring (Lighthouse, WebPageTest) runs automated tests from controlled environments. It is reproducible and good for catching regressions in CI. But it does not represent real users. Real User Monitoring (RUM) collects performance data from actual users in production. It captures the full distribution of experiences. You need both. Synthetic catches regressions before deployment. RUM tells you what users actually experience. RUM tools: Google’s CrUX (free, aggregated data from Chrome users), Vercel Analytics, Datadog RUM, Sentry Performance, SpeedCurve.Interview Question: Your team just added a new feature and Lighthouse performance score dropped from 95 to 72. How do you investigate?
Interview Question: Your team just added a new feature and Lighthouse performance score dropped from 95 to 72. How do you investigate?
-
Compare Lighthouse reports. Run
lighthouse --output=jsonbefore and after, then diff the reports. Look for new render-blocking resources, increased JS bundle size, or new layout shifts. -
Check the bundle. Run
npx source-map-explorer dist/main.jsto see what is in the bundle. Did the new feature add a large dependency? Did it break code splitting? - Check the network waterfall. Open the Network panel in DevTools, throttle to “Fast 3G,” and trace the critical path. Is there a new API call blocking render?
- Profile with DevTools Performance panel. Record page load and look for long tasks. Did the new feature add computation to the critical rendering path?
- Check CLS. Enable “Layout Shift Regions” in DevTools. Did the new feature add a dynamically loaded element that shifts existing content?
- Bundle size increase: lazy-load the new feature’s code.
- New blocking API call: move it behind a Suspense boundary or load data after initial paint.
- Large images: add responsive sizing, modern formats, explicit dimensions.
- New third-party script: load with
async/defer, or move to a Web Worker.
Part III — Testing & Quality
9. Frontend Testing Strategy
9.1 The Testing Trophy for Frontend
Kent C. Dodds’ Testing Trophy (not the test pyramid) is the right model for frontend testing:| Level | Proportion | Tools | What to Test |
|---|---|---|---|
| Static analysis (base) | Always-on | TypeScript, ESLint, Prettier | Type errors, import errors, obvious bugs caught at compile time |
| Unit tests | 15-20% | Vitest, Jest | Pure functions, hooks with complex logic, utility functions, state machines |
| Integration tests (largest) | 50-60% | Testing Library + Vitest/Jest | Component behavior from the user’s perspective — render, interact, assert |
| E2E tests (top) | 15-20% | Playwright, Cypress | Critical user journeys — signup, checkout, core workflow |
useState in isolation tells you almost nothing. What matters is: “When the user types in the search box and presses Enter, does the results list update?” That requires rendering the component, simulating user interaction, and asserting on the rendered output — an integration test. Frontend bugs live in the integration between components, not in individual functions.9.2 What to Test (and What NOT to Test)
Test:- User-facing behavior. “When I click ‘Add to Cart’, the cart count increases.”
- Error states. “When the API returns 500, the error message is displayed.”
- Accessibility. “The modal can be closed with the Escape key.”
- Edge cases in business logic. “When the discount exceeds the item price, the total is $0, not negative.”
- Implementation details. Do not test that
useStatewas called with a specific value. - Third-party library internals.
- Exact snapshot matching of large component trees.
- Pixel-perfect layouts. Use visual regression testing (Chromatic, Percy) instead.
9.3 E2E Testing: Playwright vs Cypress
| Feature | Playwright | Cypress |
|---|---|---|
| Multi-browser | Chromium, Firefox, WebKit | Chromium, Firefox, WebKit (v10+) |
| Multi-tab/origin | Yes | Limited |
| Parallel execution | Built-in | Requires paid Cypress Cloud |
| Speed | Faster (out-of-process) | Slower (in-browser) |
| DX for debugging | Trace viewer, codegen | Time-travel debugging (excellent) |
| CI cost | Free, self-hosted | Free tier limited, Cloud is paid |
10. Accessibility Engineering
Accessibility is not a feature — it is a quality of your engineering. A button that cannot be activated with a keyboard is a broken button for millions of users.10.1 WCAG 2.1 AA — What You Actually Need to Know
The non-negotiable AA requirements for frontend engineers:- Color contrast: Normal text must have a contrast ratio of at least 4.5:1. Large text must have at least 3:1.
- Keyboard navigation: Every interactive element must be reachable and operable with a keyboard.
- Focus indicators: When an element receives keyboard focus, there must be a visible indicator. Use
:focus-visibleto show focus rings only for keyboard navigation. - Alt text for images: Every
<img>must have analtattribute. Decorative images needalt="". - Form labels: Every input must have an associated
<label>. Placeholder text is not a label.
10.2 ARIA Patterns That Matter
The first rule of ARIA: do not use ARIA if a native HTML element does the job. A<button> is always better than <div role="button">.
10.3 Focus Management in SPAs
In SPAs, route changes do not trigger a full page load, so the browser does not reset focus. Without explicit focus management, screen reader users have no idea the page changed. The fix: On route change, move focus to the main content heading and announce the navigation.10.4 Legal Requirements
- ADA (Americans with Disabilities Act): Major lawsuits: Domino’s Pizza (2019), Target ($6M settlement in 2008).
- European Accessibility Act (EAA): Takes effect June 28, 2025. Requires private sector digital products sold in the EU to meet accessibility standards.
Interview Question: How do you ensure accessibility in a complex SPA with custom components?
Interview Question: How do you ensure accessibility in a complex SPA with custom components?
<button> for clickable actions, <a> for navigation, <nav> for navigation landmarks, <main> for primary content. These give you keyboard handling, screen reader semantics, and focus management for free.2. Component level — ARIA when needed. For custom widgets, I follow the WAI-ARIA Authoring Practices Guide patterns.3. Testing — Automated and manual. I run axe-core for automated checks (catches ~30-40% of issues). For the other 60%, I do manual testing: keyboard-only navigation, screen reader testing with VoiceOver or NVDA, and color contrast verification.4. Process — Built into the workflow. Accessibility is a PR review checklist item, not a quarterly audit. Every component in Storybook has an accessibility panel via @storybook/addon-a11y.Common mistakes:- Saying “we use an accessibility overlay” (widely criticized as ineffective)
- Only testing with automated tools
- Not knowing the difference between
role,aria-label,aria-labelledby, andaria-describedby
11. Design Systems
11.1 Token-Based Design
Design tokens are the atomic values of a design system — colors, spacing, typography — stored in a format-agnostic way and transformed into platform-specific outputs using tools like Style Dictionary or Tokens Studio.11.2 Component API Design
- Composability over configuration. Prefer
<Card><CardHeader /><CardBody /></Card>over<Card headerTitle="..." bodyContent="..." />. - Sensible defaults, explicit overrides. A
<Button>should look correct with zero props. - Consistent prop naming. If
<Button>usesvariant, so should<Badge>,<Alert>, and<Tag>. - Forward refs and spread rest props. Components should forward
refand spread additional HTML attributes.
11.3 Versioning and Adoption
Design systems should use semantic versioning. Adoption strategies: lint rules flagging non-system components, codemods for API migrations, Storybook as documentation, and a dedicated team treating the design system as a product.Part IV — Frontend System Design
12. Frontend System Design Interview Patterns
12.1 Design a Real-Time Collaborative Text Editor
Full System Design Walkthrough
Full System Design Walkthrough
-
Rendering engine: ProseMirror (used by Notion, The New York Times) or Slate.js. Do NOT build on raw
contenteditable. - Conflict resolution: CRDTs using Yjs or Automerge. CRDTs allow concurrent edits to merge automatically without a central server. Why CRDTs over OT: CRDTs work peer-to-peer and support offline editing with automatic merge on reconnect.
- Real-time transport: WebSocket connection relaying edits between clients.
- State architecture: Document state (CRDT), awareness state (cursor positions via Yjs awareness), UI state (local component state).
- Offline support: Yjs persists to IndexedDB. Offline edits merge automatically on reconnect because CRDTs are commutative.
- Performance: Virtualize rendering for large documents. Debounce awareness updates to 50-100ms. Batch CRDT updates for persistence.
- “How do you handle undo/redo with multiple users?” (Local undo stack for your own operations.)
- “What happens when a user joins and the document is 50MB?” (Compressed snapshot, not full operation history.)
12.2 Design an Infinite Scroll Feed
Full System Design Walkthrough
Full System Design Walkthrough
-
Virtualization: Only render visible items plus a buffer. Use
react-virtuosoor@tanstack/react-virtual. - Data fetching: Cursor-based pagination. IntersectionObserver on a sentinel element triggers the next page fetch.
- New content: “N new posts” banner at top — do NOT auto-prepend while scrolling (causes CLS).
-
Image/video: Responsive
srcset, lazy loading, blur-up placeholders. Videos usepreload="none", autoplay only in viewport. - Memory management: Remove items far from viewport. Keep ~500 most recent, refetch older items on scroll-back.
12.3 Design a Complex Form Wizard
Full System Design Walkthrough
Full System Design Walkthrough
- State machine (XState): Each step is a state with defined transitions. Conditional steps are guards on transitions.
- Form library: React Hook Form with per-step Zod schemas.
-
Persistence:
sessionStorageon every field change (debounced). Hydrate on page load. -
Accessibility: Each step is a
<fieldset>with<legend>. Focus moves to first field on step navigation. Error messages viaaria-describedby.
Interview Question: Design a frontend architecture for a dashboard with real-time updating charts
Interview Question: Design a frontend architecture for a dashboard with real-time updating charts
requestAnimationFrame render. Pause off-screen charts via IntersectionObserver.The senior nuance: “The most common mistake is updating the DOM on every incoming message. If you get 50 messages/second across 10 charts, that is 500 render cycles/second — impossible at 60fps. The solution is always batching.”13. Browser Internals That Matter
13.1 The Event Loop
- Macrotask queue:
setTimeout,setInterval, I/O callbacks. - Microtask queue:
Promise.then/catch/finally,queueMicrotask,MutationObserver.
13.2 The Critical Rendering Path
Parse CSS -> Build CSSOM tree
<head> is parsed.13.3 Reflow vs Repaint
Reflow: Recalculates geometry. Triggered by changing dimensions, adding/removing elements, reading layout properties. Expensive. Repaint: Redraws pixels without changing geometry. Triggered by changing color, background, visibility. Cheaper. Composite-only:transform and opacity are handled by the GPU without reflow or repaint. Only animate these for 60fps.
13.4 How V8 Optimizes JavaScript
V8 has a multi-tier pipeline: Ignition (interpreter) -> Sparkplug (baseline) -> Maglev (mid-tier) -> TurboFan (optimizing). Hot functions get progressively more optimized. If type assumptions are violated, V8 deoptimizes back to slower code.Interview Question: Explain why CSS animations on 'transform' are faster than animations on 'left/top'
Interview Question: Explain why CSS animations on 'transform' are faster than animations on 'left/top'
left triggers reflow (recalculate geometry) + repaint on every frame. At 60fps, that is 60 reflows per second.Animating transform: translate() moves the element on a compositor layer handled by the GPU. No reflow, no repaint, no main thread involvement. The main thread is free for JavaScript.Words that impress: “compositor layer,” “layout thrashing,” “the 16ms frame budget,” “main thread contention.”14. Security in the Browser
14.1 XSS Prevention
Types: Stored, Reflected, DOM-based. Prevention layers:- Output encoding: React’s JSX auto-escapes by default.
dangerouslySetInnerHTMLbypasses this. - Content Security Policy (CSP): HTTP header restricting script sources.
script-src 'self' 'nonce-abc123'. - Sanitization: DOMPurify for user-provided HTML.
14.2 CSRF, CORS, and Cookie Security
Cookie security attributes:HttpOnly— Not accessible from JavaScript.Secure— Only sent over HTTPS.SameSite=Strict— Never sent on cross-site requests.
14.3 Third-Party Script Risks
Mitigation: Subresource Integrity (SRI) hashes, iframe sandboxing, CSP script-src restrictions.Interview Question: Your application needs to render user-generated HTML safely. How?
Interview Question: Your application needs to render user-generated HTML safely. How?
<iframe> with sandbox="".The nuance: The allowlist is the hard part. Think about javascript: URLs, onerror handlers, CSS injection, SVG XSS. DOMPurify handles all of these by default.Part V — Modern Frontend & Career
15. Modern Frontend Trends
15.1 Edge Rendering
Server rendering at CDN edge nodes (~5-20ms vs ~50-200ms from origin). Trade-off: restricted runtime (no Node.js APIs). Database access via HTTP-based APIs (PlanetScale, Neon, Turso).15.2 Islands Architecture
Render static HTML, hydrate only interactive components. 80-90% of a content site is static — islands architecture ensures you only pay JavaScript cost for the interactive parts. Popularized by Astro.15.3 Resumability (Qwik)
Serializes application state into HTML. No hydration step. JavaScript loaded on interaction via global event delegation. Trade-off: first interaction has a module-loading latency cost.15.4 WebAssembly in the Browser
Near-native speed for compiled code. Use cases: image processing (Squoosh), PDF rendering (PDF.js), data visualization, cryptography, game engines. Limitation: cannot directly access the DOM.15.5 AI-Assisted UI Development
v0 (Vercel) for component generation. Figma-to-code tools. AI-assisted testing. AI excels at boilerplate (80% of UI code), struggles with the hard 20% (accessibility, edge cases, performance, state management).16. Cross-Chapter Connections
16.1 Frontend and API Design
REST: Multiple requests, waterfall risk, over/under-fetching. GraphQL: Exact data in one query. Complexity shifts to server. tRPC: End-to-end TypeScript type safety. Best for full-stack TypeScript teams.16.2 CDN Strategy
Static assets: immutable caching with content hashing. HTML: varies by rendering strategy. Edge functions for personalization at CDN speed.16.3 Authentication Flows
SPA auth: Access tokens in memory, refresh tokens inHttpOnly cookies. NEVER store tokens in localStorage.
OAuth: Redirect flow with backend code exchange (client secret must not be exposed).
SSR auth: Session cookies read directly by server components.
16.4 Frontend Observability
Error tracking (Sentry, Bugsnag), RUM (Core Web Vitals, interaction data), and custom performance marks via the Performance API. Source maps for readable production error reports.Interview Questions Compendium
Entry-Level Questions
What is the Virtual DOM and why do frameworks use it?
What is the Virtual DOM and why do frameworks use it?
- Saying “the Virtual DOM makes React fast” (it makes React fast enough while enabling a declarative model)
- Confusing Virtual DOM with shadow DOM (shadow DOM is browser-native encapsulation for Web Components)
Explain the difference between localStorage, sessionStorage, and cookies
Explain the difference between localStorage, sessionStorage, and cookies
Mid-Level Questions
Walk me through your decision process for choosing between client-side and server state management
Walk me through your decision process for choosing between client-side and server state management
- Server data -> TanStack Query / SWR (cache with lifecycle management)
- URL state -> Router params/searchParams (shareable, survives refresh)
- Form state -> React Hook Form / local state
- Shared client state -> Zustand / Jotai / Context
- Local UI state ->
useState/useReducer
How would you implement a performant search-as-you-type feature?
How would you implement a performant search-as-you-type feature?
keepPreviousData (show previous results while loading), AbortController for request cancellation, and combobox ARIA patterns for accessibility.Explain hydration and its problems
Explain hydration and its problems
Senior-Level Questions
You're tech lead for a frontend platform serving 50M monthly users. Core Web Vitals are deteriorating. How do you approach this?
You're tech lead for a frontend platform serving 50M monthly users. Core Web Vitals are deteriorating. How do you approach this?
@next/bundle-analyzer, image optimization (Next.js <Image>, AVIF/WebP, preload LCP image), third-party script audit, TTFB investigation (N+1 data fetches in server components? Move to ISR? Edge rendering?).Phase 4 — Sustain: Lighthouse CI in pipeline, bundle size checks, performance review in PRs, weekly CWV dashboard check.The organizational insight: Performance deteriorated because nobody was accountable. The tech lead makes performance a first-class metric.Compare micro-frontends with a modular monolith for 8 frontend teams
Compare micro-frontends with a modular monolith for 8 frontend teams
Design a frontend architecture for a product that needs to work offline
Design a frontend architecture for a product that needs to work offline
Interview: Your team is rolling out React Server Components across a large e-commerce site. The client bundle drops 40 percent but TTFB jumps from 180ms to 520ms. Product is ready to roll back. How do you respond?
Interview: Your team is rolling out React Server Components across a large e-commerce site. The client bundle drops 40 percent but TTFB jumps from 180ms to 520ms. Product is ready to roll back. How do you respond?
Promise.all, push non-critical data to streamed Suspense boundaries so the shell flushes early, and introduce a per-request data-loader cache so three components asking for the same SKU hit the datastore once. Cache tags (Next.js revalidateTag or your framework equivalent) let you cache the server render at the edge for anonymous traffic, which kills the TTFB problem for 80 percent of visits.Step 3 - Negotiate the rollout, don’t roll back wholesale: Keep RSC on routes where it won (product listings, category pages) and pin the still-slow personalized routes (cart, account) to client rendering behind a flag while you fix the waterfall. Report weekly: TTFB p75, LCP p75, JS bytes shipped, conversion delta. A partial rollout that preserves the bundle win is almost always better than the all-or-nothing rollback product is asking for.Real-World Example:
Vercel’s own case studies on Next.js App Router migrations (2023-2024) show the same pattern: large bundle wins, initial TTFB regressions, then recovery once teams adopt parallel data fetching and partial prerendering. Shopify’s Hydrogen team has publicly discussed the same waterfall trap when moving storefronts to server-first rendering.Senior Follow-up Questions:- “How do you decide what belongs in a server component versus a client component in a design system?” - Strong answer: Leaf interactive widgets (modals, dropdowns, forms) are client; layout and data-bound presentational components are server. The rule is “serializability and interactivity” — if a prop must be a function or state, it crosses a client boundary. Document boundaries explicitly in the DS.
- “How would you detect a regression where a server component accidentally imports a client-only library?” - Strong answer: A CI check that bundles the server graph and fails on
window,document, or known client-only packages. Next.js catches some of this with the"use client"directive but large teams need an explicit lint rule and a bundle size guardrail per route. - “What is your p99 TTFB budget for an RSC route and how do you enforce it?” - Strong answer: Budget per route tier (critical checkout: 200ms p75, 400ms p99; content pages: 400ms/800ms). Enforce via synthetic checks in CI, RUM alerting on a rolling 1-hour window, and a release gate that blocks deploys if the staging p99 regresses beyond 10 percent.
- “Just roll back to pages router, RSC is not production-ready” - why it fails: Throws away a real bundle win and signals to leadership you can’t diagnose server performance. The regression is almost always a data-fetching pattern, not the framework.
- “Add more caching” - why it fails: Caching personalized pages without thinking about cache keys causes data leaks and stale-content incidents far worse than a TTFB regression. You need a cache strategy, not “more caching.”
- “Making Next.js App Router 50% Faster” - Vercel engineering blog (2024)
- React Server Components RFC - reactjs/rfcs on GitHub
- Related chapter: “Rendering Strategies” earlier in this file
Interview: You are shipping a breaking change to the design system's Button component used in 2,400 places across 12 apps. How do you ship it without breaking production?
Interview: You are shipping a breaking change to the design system's Button component used in 2,400 places across 12 apps. How do you ship it without breaking production?
Interview: After an SSR framework upgrade, you ship a hydration regression. LCP is fine but INP regresses from 180ms to 340ms on mobile. How do you find and fix it?
Interview: After an SSR framework upgrade, you ship a hydration regression. LCP is fine but INP regresses from 180ms to 340ms on mobile. How do you find and fix it?
PerformanceObserver with event entries plus the Long Animation Frames API will tell you which interaction on which route is slow. Most “hydration-caused” INP regressions are actually long tasks on the main thread during the first few seconds after load — hydration work blocks the click handler even though the pixels are visible.Step 2 - Find the long task, not the framework: Capture a mobile CPU-throttled trace in Chrome DevTools of a real user journey. Look for long tasks over 50ms in the first 5 seconds. Common culprits after an SSR upgrade: the framework now hydrates eagerly instead of lazily, a third-party script is being evaluated earlier, or a large client component was moved above the fold. React 18’s selective hydration helps, but only if your tree is actually split by Suspense boundaries; many codebases have one giant client root that hydrates as a single blocking unit.Step 3 - Fix by splitting, deferring, and measuring: Split the client tree with Suspense boundaries around non-critical interactive regions (comments, recommended products, chat widget) so they hydrate after the primary content. Defer third-party scripts with next/script strategy="lazyOnload" or equivalent. For heavy leaf components, lazy-hydrate with useSyncExternalStore on visibility or interaction. Re-measure INP p75 on mobile after each change. Set an INP budget (200ms p75) as a CI gate using Lighthouse CI or WebPageTest.Real-World Example:
The Chrome Aurora team’s 2024 case studies on INP optimization at sites like eBay and Shopify show the exact pattern: hydration was the visible symptom, but the fix was almost always reducing main-thread work in the first 5 seconds. Vercel’s own partial prerendering work was motivated by the same class of regression.Senior Follow-up Questions:- “INP is a p75 metric. Why does that matter versus p50?” - Strong answer: p50 hides the regression entirely. Most interactions are fast; the bad ones happen during hydration or on low-end devices. INP’s “worst of the user session” design forces you to fix tail latency, which is what users remember.
- “What if the regression is caused by a third-party analytics script you can’t remove?” - Strong answer: Move it to a web worker using Partytown or the vendor’s own off-main-thread mode. If neither exists, defer it to
requestIdleCallbackafter first interaction and accept the analytics loss from bounced sessions — product will prefer that to a 40 percent INP regression. - “How would you prevent this regression class from landing again?” - Strong answer: CI budget on total JS execution time in the first 5 seconds (via WebPageTest or Calibre), a synthetic check that measures INP on a mid-tier Android profile, and a PR-level bundle diff that flags any new client component over 20KB gzipped.
- “Just use
use clientless” - why it fails: Shows you don’t understand hydration. A server component tree that ends in one giant client boundary hydrates exactly as slowly as a fully client tree. The fix is splitting the tree, not reducing client components. - “Upgrade to the newest React/Next canary, they fixed it” - why it fails: Canaries in production are a root-cause-avoidance tactic, not a fix. You still need to understand what changed and why.
- “Optimize Interaction to Next Paint” - web.dev (Philip Walton)
- “The cost of hydration” - Addy Osmani’s writings on partial and progressive hydration
- Related chapter: “Hydration Failures — Diagnosis, Root Causes, and Production Fixes” later in this file
Additional Questions
How do you handle error boundaries in React?
How do you handle error boundaries in React?
Controlled vs uncontrolled components -- when would you use each?
Controlled vs uncontrolled components -- when would you use each?
How would you migrate a large Angular app to React?
How would you migrate a large Angular app to React?
Tailwind CSS vs CSS Modules for a new project?
Tailwind CSS vs CSS Modules for a new project?
Follow-Up Question Handling
Buying Time Gracefully
- “That’s a great question — let me think about the rendering implications for a moment.”
- “Before I answer, are we optimizing for Time to Interactive or Largest Contentful Paint? The solutions are different.”
- “Let me trace through the browser’s behavior step by step…”
- “I want to be precise here rather than give a hand-wavy answer.”
Redirecting to Strength
- “I haven’t worked deeply with [framework], but the underlying problem works the same way. In React, I would…”
- “I know the observable behavior is [X], and I’ve used that to [optimization]. The exact V8 internals — I’d want to verify.”
Admitting Gaps with Confidence
- “I haven’t hit that edge case in production, but my instinct is [X] because [reasoning]. I’d validate by [investigation step].”
- “That’s at the boundary of my knowledge. What I do know is [related thing], and I’d learn this by [reading the spec / building a minimal repro].”
Professional Best Practices Checklist
Before — Planning and Setup
- Define rendering strategy (CSR/SSR/SSG/ISR) based on SEO, freshness, and personalization requirements
- Establish performance budget with numeric targets for LCP, INP, CLS, bundle size
- Set up TypeScript in strict mode
- Choose state management strategy by categorizing all state types
- Configure ESLint with accessibility rules (
eslint-plugin-jsx-a11y) - Set up Lighthouse CI and bundle size checks from day one
During — Execution
- Test from user perspective using Testing Library queries (
getByRole,getByText) - Lazy-load below-the-fold content and heavy dependencies
- Use semantic HTML first, ARIA second
- Set explicit dimensions on images and video
- Profile before optimizing — use React DevTools Profiler before adding
useMemo
After — Review and Monitoring
- Monitor Core Web Vitals in production (RUM), segmented by device and geography
- Review error tracking weekly — fix top 3 errors by frequency
- Audit dependencies quarterly for security, unused packages, and bloat
- Run accessibility audits with keyboard and screen reader, not just automated tools
- Track bundle size trends over time
When Things Go Wrong
- JS errors in production: Check error tracker for stack trace, affected count, session replay. New deployment regression? Rollback.
- CWV regression: Check CrUX dashboard, compare deployments, audit third-party scripts.
- Blank page: JavaScript error preventing rendering. Check console. Common causes: uncaught promise rejection, failed module import.
- Hydration mismatch: Compare server/client output. Common causes: browser extensions,
windowchecks, timezone differences.
Above & Beyond
Advanced Techniques
1. Partial Pre-rendering (PPR) — Next.js 14+: Static shell served instantly from CDN, dynamic holes stream in from server. SSG-speed initial paint with SSR-level dynamism. 2. View Transitions API: Browser-native page transition animations. Smooth crossfade and morph transitions without JavaScript animation libraries. 3. Signals: Fine-grained reactivity primitive (Angular v16+, Solid, Preact, TC39 proposal). Track exactly which DOM nodes depend on which values. Eliminates need foruseMemo/useCallback/React.memo.
4. Server Actions (React / Next.js): Call server functions directly from client components without API endpoints. Eliminates boilerplate for form submissions and mutations.
5. Container Queries: CSS queries based on container size, not viewport. Enables truly reusable responsive components.
Cross-Domain Connections
- Frontend ↔ Distributed Systems: CRDTs, eventual consistency, and optimistic updates are direct applications of distributed systems theory.
- Frontend ↔ Networking: TCP, TLS, HTTP/2 multiplexing, and HTTP/3 QUIC directly inform performance optimization.
- Frontend ↔ Operating Systems: The browser event loop is analogous to an OS scheduler. Web Workers mirror thread offloading.
- Frontend ↔ Product Thinking: Every rendering/performance/accessibility decision is ultimately a product decision.
Emerging Trends (2025-2027)
- React Compiler (React Forget): Automatic memoization, eliminating manual
useMemo/useCallback. - TC39 Signals proposal: Native JavaScript reactivity primitive standardizing framework reactivity.
- AI-powered testing: Session replay + AI generating E2E test suites from real user behavior.
- WebGPU: Modern GPU API for compute shaders, ML inference, and graphics in the browser.
- Multi-page app transitions: View Transitions API for cross-document navigations.
Recommended Reading
Beginner
- web.dev Learn Performance — Best structured introduction to web performance fundamentals.
- React documentation (react.dev) — Genuinely excellent. Start with “Learn React.”
- Testing Library docs — The philosophy section is as important as the API docs.
Intermediate
- Patterns.dev by Lydia Hallie & Addy Osmani — Free book covering modern web development patterns.
- Web Performance in Action by Jeremy Wagner — Practical performance optimization.
- Inclusive Components by Heydon Pickering — Accessible component patterns.
Advanced
- High Performance Browser Networking by Ilya Grigorik (free) — Deep dive into TCP, TLS, HTTP/2, WebSocket from a browser perspective.
- CRDT resources by Martin Kleppmann — Essential for collaborative features.
- Chrome DevTools Performance docs — The frontend equivalent of learning to read a database EXPLAIN plan.
Part VI — Production Frontend Engineering
17. Hydration Failures — Diagnosis, Root Causes, and Production Fixes
Hydration mismatches are the most subtle class of frontend bugs because they only manifest when server and client environments diverge. The server renders HTML based on one reality, the client tries to reconcile against a different one, and React either silently patches the DOM (React 17) or throws an error and falls back to full client-side rendering (React 18+). Both outcomes are bad — silent patching means the UI is wrong, and client-side fallback means you lose all SSR benefits for that render.17.1 Common Hydration Failure Patterns
Pattern 1 — Date/Time Divergence The server renders at UTC, the client renders at the user’s local timezone. A timestamp that reads “April 11, 2026” on the server becomes “April 12, 2026” for a user in Tokyo.suppressHydrationWarning on the <body> or <html> element, and ensuring error boundaries catch hydration fallback gracefully.
Pattern 3 — Conditional Rendering on window or navigator
Math.random(), Date.now(), or crypto.randomUUID() during render produces different output on server and client. The fix is always the same: generate the value once on the server and pass it as a prop or serialized state.
17.2 Debugging Hydration Failures in Production
Capture the mismatch
error.message to Sentry or Datadog.Reproduce with SSR disabled
Diff server and client HTML
curl the page), then compare against the client’s document.documentElement.outerHTML after hydration. The diff reveals exactly which element diverged.Interview Question: Your SSR application works perfectly in development but users report seeing a flash of wrong content followed by a page reload in production. What is happening and how do you fix it?
Interview Question: Your SSR application works perfectly in development but users report seeing a flash of wrong content followed by a page reload in production. What is happening and how do you fix it?
- Check the browser console for React hydration warnings — they tell you exactly which element mismatched.
- The fact that it works in dev but not prod strongly suggests an environmental divergence: timezone differences (server in UTC, client in local time), CDN serving stale HTML, or feature flags evaluating differently on server vs client.
- Check if the issue is user-specific — if only some users see it, it is likely browser extensions, locale differences, or A/B test assignment diverging between server and client.
useEffect so it does not participate in hydration.Red flag answers:- “Just add
suppressHydrationWarningeverywhere” — this hides the bug, it does not fix it - Not knowing that React 18 falls back to full CSR on mismatch instead of silently patching
- “How would you prevent hydration mismatches from reaching production in the first place?” (E2E tests that compare server-rendered HTML with post-hydration DOM, CI checks with headless Chrome, monitoring hydration error rates as a deployment metric.)
- “What is the performance cost of a hydration bailout?” (Full CSR fallback means the user downloads all JS, the page flashes, and LCP/CLS both suffer. On mobile, this can add 3-5 seconds to interactive.)
18. SSR, RSC, and Client Boundary Trade-offs
18.1 The Boundary Decision Framework
In the Next.js App Router model, every component is a Server Component by default. Adding"use client" creates a boundary — everything below that boundary (the component and all its children) becomes a client component. This boundary placement is the single most consequential architectural decision in a modern Next.js application.
The trade-off matrix:
| Factor | Server Component | Client Component |
|---|---|---|
| Bundle size contribution | Zero — code never ships to client | Full — code included in client bundle |
| Data access | Direct database/filesystem access | Must go through API or server action |
| Interactivity | None — no useState, useEffect, event handlers | Full interactivity |
| Hydration cost | None | Full hydration required |
| Rendering latency | Adds to TTFB (server must execute) | Adds to TTI (client must execute + hydrate) |
| Caching | Can be cached at CDN/edge level | Client JS is cached via standard HTTP caching |
| Access to browser APIs | No window, document, navigator | Full browser API access |
18.2 Common Boundary Mistakes
Mistake 1 — Putting"use client" too high in the tree
useEffect or TanStack Query. Server-fetched data is available on first render — no loading state, no waterfall, no CLS from content popping in.
18.3 The RSC Mental Model for Senior Engineers
Think of RSC as a compilation boundary, not a rendering strategy. Server Components are not “SSR” — they are components that compile to serialized output and never exist as JavaScript in the browser. The client runtime receives a description of what the Server Component produced, not the code that produced it. This means:- Server Components can be cached independently from client components
- Server Components can be streamed — the client can start rendering Client Components while Server Components are still executing
- Server Components and Client Components can be interleaved in the tree — a Server Component can render a Client Component child, which can render a Server Component grandchild (via the
childrenprop pattern)
Interview Question: You are designing a product detail page in Next.js App Router. Walk me through where you place the client boundary and why.
Interview Question: You are designing a product detail page in Next.js App Router. Walk me through where you place the client boundary and why.
"use client" only on interactive leaves.Server Components (no JS shipped):- Product description, specs, breadcrumbs, SEO metadata, related products, footer — all pure display
- Data fetching happens here, directly from the database or CMS, with no API round-trip
- Markdown/rich-text rendering happens here — libraries like
remarknever ship to the client
- Add-to-cart button (needs
onClickand state) - Image carousel/gallery (needs swipe gestures and state)
- Quantity selector (needs
onChange) - Review submission form (needs form state and validation)
- Wishlist toggle (needs optimistic update)
"use client" boundary should be as low in the tree as possible. I would never put "use client" on the page component itself. The interactive pieces are leaf nodes.Red flag answers:- Putting
"use client"at the page level “because it’s easier” - Not knowing that Server Components can have Client Component children
- Fetching data with
useEffectin a Client Component when a parent Server Component could pass it as a prop
- “What happens when a Client Component needs data that changes based on user interaction — like filtering reviews by star rating?” (The filter UI is a Client Component with local state. The filtered data can be fetched via a server action or TanStack Query. If the reviews dataset is small enough, pass all reviews from the Server Component and filter on the client.)
- “How does caching work differently for Server Components vs Client Components?” (Server Component output is cached at the RSC payload level — Next.js can revalidate individual server component segments. Client Component JS is cached via standard HTTP caching with content hashes.)
19. Frontend Observability — Beyond Error Tracking
Frontend observability is not just “install Sentry.” At scale, you need to answer: What are users actually experiencing? Where is the bottleneck — network, server, client JS, third-party scripts, or the user’s device? Can I correlate a frontend symptom with a backend root cause?19.1 The Four Pillars of Frontend Observability
Pillar 1 — Error Monitoring Capture JavaScript exceptions, unhandled promise rejections, and React error boundary catches. Source maps are non-negotiable — without them, minified stack traces are useless. Group errors by component stack, not just file/line (Sentry’s React integration does this automatically). Key metrics to alert on:- Error rate spike (> 2x baseline within 15 minutes) — likely a deployment regression
- New error type appearing for > 1% of sessions — likely a new bug
- Error concentrated in a single browser/OS combination — likely a compatibility issue
- Device class (high-end, mid-range, low-end)
- Connection type (4G, 3G, WiFi)
- Geography (CDN edge proximity matters)
- Page type (homepage vs product page vs checkout)
- User cohort (new vs returning, free vs paid)
fetch call through the API gateway, backend service, database query, and back. This is the only way to answer “is the problem frontend or backend?” definitively.
19.2 Alerting Strategy for Frontend
| Signal | Threshold | Response |
|---|---|---|
| JS error rate | > 2x rolling 1-hour baseline | Page the on-call; check last deployment |
| LCP P75 | > 3.5s for 15 minutes | Investigate; likely CDN or backend TTFB issue |
| INP P75 | > 300ms for 15 minutes | Check for new long tasks; audit recent JS changes |
| CLS P75 | > 0.15 for 30 minutes | Check for new dynamically loaded content or ad changes |
| Hydration error rate | > 0.5% of page loads | Investigate server/client divergence; check CDN cache |
| API error rate (from frontend) | > 5% of requests to a specific endpoint | Correlate with backend monitoring; may be backend issue |
Interview Question: Your team has Sentry installed but keeps getting surprised by production issues users report. What is missing from your observability setup?
Interview Question: Your team has Sentry installed but keeps getting surprised by production issues users report. What is missing from your observability setup?
- RUM for performance degradation. A page that takes 8 seconds to load on mobile is not an “error” — Sentry will not catch it. You need RUM (Vercel Analytics, Datadog RUM, or web-vitals library reporting to your analytics) to monitor CWV in the field, segmented by device and connection.
- Session replay for UX bugs. A user clicks “Buy” and nothing happens because a race condition prevents the handler from firing. No error is thrown. Session replay lets you watch what the user experienced.
- Synthetic monitoring for availability. A Playwright test hitting your critical paths every 5 minutes from multiple regions catches outages before users report them.
- Frontend-to-backend trace correlation. When a user reports “the page is slow,” you need to know if the frontend is slow (heavy JS, layout thrashing) or the backend is slow (slow API response). Propagating trace IDs through fetch calls connects the dots.
- Custom business metrics. Track time-to-interactive for key flows (search-to-result, add-to-cart-to-confirmation), not just generic CWV. A checkout flow that takes 12 seconds is a revenue problem even if LCP is fine.
- “We have Sentry, so we’re covered”
- Not mentioning RUM or the distinction between lab and field metrics
- No awareness of distributed tracing across frontend/backend
- “How do you avoid alert fatigue when monitoring both synthetic and RUM data?” (Separate alerts: synthetic failures are high-urgency pages, RUM regressions are investigated during business hours unless they cross critical thresholds. Use anomaly detection rather than static thresholds for RUM.)
- “How do you handle source maps in production securely?” (Upload source maps to Sentry/Datadog at build time, do not serve them publicly. Use artifact bundles with release versioning so stack traces resolve correctly across deployments.)
20. Auth and Session Edge Cases in the Browser
Authentication in SPAs and SSR applications has failure modes that backend engineers rarely consider. The browser is an adversarial environment — tokens expire mid-session, tabs share cookies, users have multiple accounts open, and third-party cookie restrictions break OAuth flows.20.1 Token Lifecycle Edge Cases
Silent token refresh race condition: Two tabs open the same app. Both detect the access token is expired at the same time. Both send a refresh token request. If the backend invalidates the refresh token on use (rotation), the second tab’s request fails, logging the user out unexpectedly. Fix: Use aBroadcastChannel or localStorage event listener to coordinate refresh across tabs. Only one tab performs the refresh; others wait for the new token.
20.2 SSR Auth Patterns and Pitfalls
The cookie-forwarding problem: In SSR, the server makes API calls on behalf of the user. But the user’s auth cookies are in the browser, not on the server. The server must forward cookies from the incoming request to outgoing API calls — and this is easy to forget, resulting in every SSR page seeing “unauthenticated” data. The flash-of-unauthenticated-content (FOUC) problem: If auth state is only available on the client (e.g., stored in a cookie that JavaScript reads), the server renders the “logged out” version of the page. The client hydrates, reads the cookie, and re-renders the “logged in” version. The user sees a flash of the wrong UI. Fix: For SSR apps, auth state must be available to the server. UseHttpOnly cookies that the server can read, or a session store (Redis, database) that the server queries. Never rely on client-side-only auth state for SSR.
20.3 Third-Party Cookie Restrictions
Safari’s ITP and Chrome’s Privacy Sandbox are progressively blocking third-party cookies. This breaks:- OAuth redirect flows that rely on third-party cookies for session linking
- Embedded iframes (payment widgets, chat widgets) that need their own cookies
- Analytics that use cross-domain cookies for user identification
Interview Question: A user reports they keep getting logged out randomly. How do you investigate?
Interview Question: A user reports they keep getting logged out randomly. How do you investigate?
- Check if it is correlated with time. If logouts happen at consistent intervals (e.g., every 30 minutes), the access token TTL is shorter than the user’s session, and silent refresh is failing. Check the refresh token endpoint for errors.
- Check if it is correlated with multiple tabs. If the user has 2+ tabs open, refresh token rotation may be causing the race condition I described. Check server logs for “refresh token already used” errors.
- Check if it is browser-specific. Safari’s ITP aggressively purges cookies for domains it classifies as “trackers.” If your auth cookies are on a different subdomain than the page, Safari may delete them after 7 days or even 24 hours for some classifications.
-
Check cookie attributes. Missing
Secureflag means cookies are not sent over HTTPS. MissingSameSite=Lax(minimum) means cookies may not be sent on cross-site navigation. IncorrectDomainattribute means cookies are not sent to API subdomains. - Check for deployment-related session invalidation. If sessions are stored in memory (not Redis/database), a server restart clears all sessions.
- “Just increase the token expiry time” — this weakens security without solving the root cause
- Not considering the multi-tab race condition
- Not knowing that Safari’s ITP affects cookie persistence
- “How would you implement ‘remember me’ functionality securely?” (Long-lived refresh token in an
HttpOnly,Secure,SameSite=Laxcookie. Short-lived access token in memory. The refresh token is rotated on each use and stored hashed in the database for revocation.) - “The user is on a corporate network with a proxy that strips cookies. How do you handle auth?” (Fall back to
Authorization: Bearerheader with token storage in memory. Detect the condition by checking if a set cookie is readable on the next request.)
21. CDN Behavior and Caching Edge Cases
CDNs are not magic — they are distributed HTTP caches with their own failure modes. Understanding how CDN caching interacts with your rendering strategy, deployment pipeline, and user experience is essential for senior frontend engineers.21.1 The CDN Mental Model
A CDN edge node makes a caching decision based on the response’sCache-Control header, the URL, and optionally the Vary header. If it has a cached copy that is still “fresh,” it serves that copy without contacting your origin server.
The key insight: CDN caching operates on URLs. Two requests to the same URL get the same cached response — even if the users are different, even if the underlying data has changed. This is why personalized content and CDN caching are in tension.
21.2 Caching Strategy by Asset Type
| Asset Type | Cache-Control | Why |
|---|---|---|
| Static JS/CSS (hashed filenames) | public, max-age=31536000, immutable | Content-addressed — the hash guarantees the content matches the filename. Cache forever. |
| HTML pages (SSG) | public, max-age=3600, s-maxage=86400, stale-while-revalidate=86400 | CDN caches aggressively (s-maxage), browser cache is shorter. SWR allows serving stale while fetching fresh. |
| HTML pages (SSR, non-personalized) | public, s-maxage=60, stale-while-revalidate=300 | Short CDN cache. SWR ensures users always get a fast response. |
| HTML pages (SSR, personalized) | private, no-store or private, max-age=0 | NEVER cache personalized content at the CDN. The private directive prevents CDN caching. |
| API responses (public) | public, s-maxage=300, stale-while-revalidate=600 | CDN caches for 5 minutes. Origin is only hit every 5 minutes per edge node. |
| API responses (personalized) | private, no-cache | no-cache means “revalidate every time” — the CDN sends a conditional request to origin. |
21.3 CDN Failure Modes That Break Production
Stale HTML + fresh JS: You deploy a new version. The CDN purge for HTML pages fails (or is delayed), but JS assets have new hashes. Users get old HTML that references old JS bundle hashes. The old JS files have been deleted from the CDN (or the origin). Result: white page, 404 errors for JavaScript. Fix: Never delete old JS bundles immediately after deployment. Keep at least the previous 2-3 versions available for 24-48 hours. This is the “deployment overlap” strategy. Cache poisoning fromVary misconfiguration: If your server sends Vary: *, the CDN never caches anything. If your server sends Vary: Cookie, the CDN creates a separate cache entry per unique cookie value — effectively disabling caching for authenticated users (which may be correct, but must be intentional).
Geographic inconsistency: CDN edge nodes are independent caches. After a deployment, the New York edge might serve the new version while the London edge still has the old version cached. For 60-300 seconds after deployment, different users see different versions.
Fix: For critical deployments, explicitly purge the CDN cache after deployment. Most CDN providers (Cloudflare, Fastly, CloudFront) support instant global purge, but it still takes 5-30 seconds to propagate.
Interview Question: After a deployment, some users see the new version and some see the old version. It resolves itself after about 10 minutes. What is happening?
Interview Question: After a deployment, some users see the new version and some see the old version. It resolves itself after about 10 minutes. What is happening?
Cache-Control has a max-age or s-maxage of ~600 seconds (10 minutes). Each edge node cached the HTML at different times during the previous 10 minutes. After deployment, each node continues serving its cached copy until its local TTL expires.Fixes (in order of preference):- Explicit CDN purge after deployment. Add a cache invalidation step to the deployment pipeline. Cloudflare’s API purge propagates globally in under 5 seconds.
- Use
stale-while-revalidate. The CDN serves the stale version immediately (fast) but asynchronously fetches the fresh version. The next request gets the new version. This reduces the inconsistency window to one request per edge node. - Version-aware HTML. Include a version identifier in the HTML that the client checks. If the version does not match the expected version (from a separate version endpoint), force-refresh.
- “Just set
Cache-Control: no-cacheon everything” — this defeats the entire purpose of the CDN - Not knowing the difference between
max-ageands-maxage - Not mentioning CDN purge as part of the deployment pipeline
- “How do you handle CDN caching for A/B tests where different users should see different content?” (Use edge functions or
Varyon a cookie that contains the experiment assignment. Or move experiment logic entirely to the client side with feature flags.) - “What happens if your CDN purge API fails during deployment?” (This is why you need the “deployment overlap” strategy — old assets must remain available. Also, add CDN purge failure as an alert and manual step in the deployment runbook.)
22. TypeScript and API Contract Safety
22.1 Why TypeScript Matters for Production Frontend
TypeScript is not “just types for JavaScript.” At scale, TypeScript is the contract enforcement layer between frontend teams, backend teams, and the runtime. Without it, every API response isany, every prop is a guess, and refactoring is a prayer.
The contract chain:
22.2 Generated Types from API Schemas
res.json() returns any at runtime. TypeScript checks types at compile time but the API can return anything at runtime — a different shape, null values where you expected strings, or an error object instead of the expected data.
22.3 Runtime Validation with Zod
Cannot read property 'name' of undefined three layers deep in a component.
22.4 End-to-End Type Safety Patterns
tRPC: Shares TypeScript types between a Next.js backend and frontend with zero code generation. The backend defines a procedure, and the frontend gets full autocompletion and type checking on the call site. Best for full-stack TypeScript monorepos. GraphQL Code Generator: Generates TypeScript types from your GraphQL schema and queries. EveryuseQuery call returns a typed result. Schema changes that break queries are caught at build time.
OpenAPI TypeScript: Generates types from OpenAPI/Swagger specs. Works with any backend language (Java, Go, Python, Rust) as long as it produces an OpenAPI spec. The spec becomes the contract artifact.
Interview Question: Your frontend team keeps getting broken by unannounced backend API changes. How do you solve this organizationally and technically?
Interview Question: Your frontend team keeps getting broken by unannounced backend API changes. How do you solve this organizationally and technically?
- The backend publishes an OpenAPI spec (or GraphQL schema) as a CI artifact on every PR.
- The frontend generates TypeScript types from this spec automatically (CI step or git hook).
- The frontend CI runs a “contract check” that compares the current spec against the previous spec and flags breaking changes.
- Runtime validation (Zod) at the API boundary catches any remaining mismatches in production and reports them as structured errors to Sentry.
- Backend PRs that change API response shapes must include a tag/label (e.g.,
api-breaking-change) that triggers a notification to the frontend team’s Slack channel. - Deprecation headers (
Sunset,Deprecation) in API responses give the frontend time to migrate.
- API schemas live in a shared repository or are co-owned by frontend and backend teams.
- API design reviews include a frontend engineer.
- “Just communicate better” without any technical enforcement
- Not mentioning runtime validation — TypeScript alone does not catch runtime shape changes
- Not knowing about OpenAPI or GraphQL code generation
- “What do you do when the backend returns a new optional field that your frontend does not handle?” (Zod’s
passthrough()orstrict()modes. Strict mode would throw on unknown fields, which is too aggressive. Passthrough ignores unknown fields, which is safe. The real question is whether to start using the new field — that is a product decision, not a technical one.) - “How do you handle API versioning from the frontend perspective?” (Pin the frontend to a specific API version via URL path or header. Migrate to new versions explicitly. Never let the frontend float on “latest” — that is how unannounced changes break you.)
23. Production Debugging Paths
This section covers the systematic investigation paths a senior frontend engineer follows when production breaks. These are not theoretical — they are the actual steps you take when you get paged at 2am.23.1 The Universal Triage Framework
When a production issue is reported, the first 5 minutes determine whether you fix it in 30 minutes or 3 hours. Use this framework:Scope the blast radius
Correlate with deployments
Isolate the layer
Reproduce or find a session replay
23.2 Debugging Path: White Page / Blank Screen
Symptom: Users see a completely white page. No content renders. Investigation path:- Check the console. A white page in a React SPA almost always means an uncaught JavaScript error prevented rendering. The error is in the console.
- Check if HTML was delivered. View page source. If the HTML is present (SSR), the issue is a client-side JS error during hydration or initialization. If the HTML is empty (
<div id="root"></div>), the issue is that JS never executed. - Check if JS loaded. Network panel — did the main JS bundle return 200? If 404, the deployment deleted old assets while the CDN was still serving old HTML with old bundle references.
- Check for CSP violations. Console will show “Refused to execute inline script” if a Content Security Policy blocks your scripts. Common after CDN or infrastructure changes that alter script nonces.
- Check for chunk loading failures. If code-split chunks fail to load (CDN issue, ad blocker intercepting), React’s
lazy()throws an error. Without a Suspense error boundary, this crashes the entire app.
23.3 Debugging Path: Partial Rendering / Missing Sections
Symptom: The page loads but a section is missing — no reviews, no product images, no sidebar. Investigation path:- Check if the section’s data loaded. Network panel — did the API call for that section succeed? If it returned an error or empty data, the issue is backend.
- Check if the component rendered. React DevTools — is the component in the tree? If the component is in the tree but invisible, check CSS (
display: none,opacity: 0,height: 0,overflow: hidden). - Check for error boundaries. If the section’s component threw an error and an error boundary caught it, the section renders the fallback (which might be nothing). Check Sentry for error boundary catches on that route.
- Check for race conditions. If the section depends on data from a parent component, and the parent’s data arrived after the section’s Suspense timeout, the section might have fallen back to the loading state permanently. Check the waterfall timing.
- Check feature flags. Is the section behind a feature flag that was accidentally turned off?
23.4 Debugging Path: Bundle Size Regression
Symptom: Lighthouse performance score dropped. Bundle size increased by 150KB. Investigation path:- Identify what was added. Run
npx source-map-explorer dist/main.js(or@next/bundle-analyzer) on the current and previous builds. Diff the treemaps. - Common culprits:
- A new dependency pulled in a transitive dependency (e.g., adding
date-fnsbut the import syntax pulled the entire library instead of tree-shaking). - A dynamic import was accidentally changed to a static import, pulling a lazy chunk into the main bundle.
- A dev-only dependency (Storybook, test utilities) leaked into the production build.
- CSS-in-JS library added at the component level that duplicates styles.
- A new dependency pulled in a transitive dependency (e.g., adding
- Fix by category:
- Wrong import: change to named import from a subpath (
import debounce from 'lodash/debounce'instead ofimport { debounce } from 'lodash'). - Lost code splitting: verify
React.lazy()and dynamicimport()are used for route-level components. - Dev dependency in prod: check
package.jsonfor misplaceddevDependencies.
- Wrong import: change to named import from a subpath (
- Prevent recurrence: Add
size-limitto CI with per-chunk budgets. Any PR that exceeds the budget fails the check.
23.5 Debugging Path: “Works on My Machine” — Browser-Specific Bugs
Symptom: Bug is reported by users but the engineering team cannot reproduce it. Investigation path:- Check Sentry for browser/OS distribution. Is the error concentrated in Safari 16? Chrome on Android? Samsung Internet?
- Check for missing polyfills. Features like
structuredClone,AbortController.any(), or CSS@containerqueries are not available in all browsers. Check caniuse.com against your support matrix. - Check for Safari-specific behavior. Safari handles
Dateparsing differently (rejects2026-04-11without time zone), has stricter ITP cookie policies, and implements some CSS differently (flexbox gap in older versions). - Check for extension interference. Ask the user if they have ad blockers, privacy extensions, or corporate security software. These can block scripts, modify DOM, or intercept network requests.
- Test in BrowserStack/LambdaTest. If you cannot reproduce locally, use a cloud browser testing service to test the exact browser/OS/version combination.
Interview Question: It is 2am and you get paged -- users are reporting that the checkout page shows a spinner that never resolves. Walk me through your first 10 minutes.
Interview Question: It is 2am and you get paged -- users are reporting that the checkout page shows a spinner that never resolves. Walk me through your first 10 minutes.
- Open Sentry — check for new errors on the checkout route. Check error count trend — is it spiking or gradual?
- Open Datadog/Grafana — check backend health. Is the checkout API responding? What are the response times and error rates?
- Check deployment timeline — was there a deploy in the last 2 hours? If yes, the deploy is the prime suspect.
- Open the checkout page in an incognito browser. Open Network panel.
- Is the checkout API call returning successfully? If the response is pending (never resolves), the issue is backend — the API is hanging.
- Is the API returning data but the spinner still shows? Then the frontend is not processing the response — a JavaScript bug.
- Is the API returning an error? Then the frontend might be stuck in a loading state because the error handler does not clear the loading flag.
- If correlated with a deployment: rollback immediately. Investigate later. Every minute of broken checkout is lost revenue.
- If not deployment-related: check if the backend team is aware. If the backend is down, the frontend fix is graceful degradation — show an error message instead of an infinite spinner.
- Post in the incident channel: what you know, what you have done (rolled back or identified root cause), and what is next.
- Starting by reading code instead of checking monitoring
- Not considering rollback as the first option
- Not checking backend health — assuming the problem is frontend because it manifests in the frontend
- No mention of communicating the incident status
- “The checkout API is returning 200 with correct data, but the spinner persists. What now?” (Check the JS console for errors. Check if the response shape matches what the frontend expects — a new field or removed field could cause a Zod validation error that is silently caught. Check if a feature flag or A/B test changed the rendering path.)
- “After the incident, how do you prevent this from happening again?” (Add a timeout to the spinner with a “something went wrong, please try again” fallback. Add an alert for checkout conversion rate drops. Add an E2E test for the checkout flow in CI. Add a canary deployment step for checkout-related changes.)
24. Rollout Safety and Experiment Impact
24.1 Progressive Rollout for Frontend Changes
Shipping a frontend change to 100% of users simultaneously is a gamble. Progressive rollout reduces blast radius. Rollout ladder:| Stage | Audience | Duration | Gate to next stage |
|---|---|---|---|
| Canary | 1% of traffic (or internal users only) | 1-4 hours | No new errors, CWV stable, business metrics flat |
| Small rollout | 5-10% of traffic | 4-24 hours | Error rate < baseline + 0.1%, conversion rate stable |
| Broad rollout | 50% of traffic | 24-48 hours | No customer-reported issues, A/B metrics comparable |
| Full rollout | 100% | Permanent | Remove feature flag after 1-2 weeks |
24.2 Measuring Experiment Impact on Frontend Metrics
When running A/B tests on frontend changes, you must measure both the business metric (conversion rate, engagement) and the technical metric (CWV, error rate, JS bundle size). The hidden danger: An experiment that increases conversion by 2% but degrades INP by 150ms is a net negative — the short-term conversion boost will erode as Google penalizes the page’s search ranking and users develop learned avoidance of the slow interaction. Experiment analysis checklist:- Compare CWV (LCP, INP, CLS) between control and treatment, segmented by device class
- Compare JS error rates between variants — a new feature may work for 99% of users but crash for the 1% on older browsers
- Check for interaction effects with other experiments — two experiments modifying the same page section can create unexpected CLS
- Measure time-to-interactive for the specific feature, not just page-level metrics
24.3 Rollback Strategies
Instant rollback via feature flag: Flip the flag. The next page load serves the old experience. This is the fastest rollback (seconds). Requires the feature flag to be in the critical rendering path. Deployment rollback: Redeploy the previous version. Takes 2-15 minutes depending on your CI/CD pipeline. Required when the issue is in the deployment artifact itself (wrong build configuration, missing assets). CDN purge: If the issue is stale cached content, purge the CDN and let fresh content populate. Takes 5-30 seconds. Required when the deployment is correct but the CDN is serving old content.25. Accessibility Regressions — Prevention and Detection
25.1 How Accessibility Regresses
Accessibility does not break all at once — it erodes through a series of small, individually reasonable changes:- A developer replaces a
<button>with a<div onClick>because they want custom styling. Keyboard and screen reader support vanish. - A design change reduces contrast from 4.5:1 to 3.8:1. It looks “fine” visually but fails WCAG AA.
- A modal redesign removes the focus trap. Screen reader users can now interact with content behind the modal.
- A dynamic content loader does not announce new content via
aria-live. Screen reader users do not know new items appeared. - A team adds a custom dropdown using
<div>elements instead of<select>and does not implement the ARIA listbox pattern.
25.2 Automated Accessibility Testing in CI
Layer 1 — Static analysis:alt text, missing form labels, non-interactive elements with click handlers, and other statically detectable issues. Coverage: ~20-25% of accessibility bugs.
Layer 2 — axe-core in integration tests:
- Navigate the entire flow using only keyboard
- Test with VoiceOver (macOS) and NVDA (Windows)
- Test with screen zoom at 200%
- Test with forced colors mode (Windows High Contrast)
25.3 Preventing Regressions Organizationally
- Storybook accessibility addon: Every component shows real-time accessibility violations in the Storybook panel. Developers see issues during development, not in code review.
- PR review checklist item: “Does this PR change any interactive element? If yes, has keyboard navigation been tested?”
- Design system as guardrail: If the design system’s
<Button>component handles keyboard, focus, and ARIA correctly, individual teams cannot regress by using it. Regressions come from teams building custom components outside the design system. - Accessibility champions: One engineer per team is designated as the accessibility point-of-contact. They review PRs that touch interactive components and escalate systemic issues.
Interview Question: A visually impaired user reports that your newly redesigned checkout flow is unusable with a screen reader. The old design worked fine. How do you handle this?
Interview Question: A visually impaired user reports that your newly redesigned checkout flow is unusable with a screen reader. The old design worked fine. How do you handle this?
- Acknowledge the report and commit to a timeline.
- Test the checkout flow with NVDA (Windows) and VoiceOver (macOS) myself. Record the session.
- Compare the old and new checkout flow side-by-side with a screen reader — identify exactly which step breaks.
- Custom form controls (dropdowns, date pickers) replaced native ones without implementing ARIA patterns
- Focus order is wrong — the screen reader reads elements in an order that does not match the visual layout (check
tabindexand DOM order) - Form error messages are not announced — the old design used native validation, the new design uses custom validation without
aria-liveoraria-describedby - Step transitions are not announced — the user completes step 1 but the screen reader does not announce that step 2 is now visible
- Fix the specific issues found.
- Add E2E accessibility tests for the checkout flow using
@axe-core/playwright. - Add a “screen reader walkthrough” step to the QA process for any future redesigns of critical flows.
- Establish a pre-launch accessibility review for any UX redesign of revenue-critical paths.
- “We’ll add ARIA labels” — this is the right direction but too vague. Which labels? On which elements?
- “We’ll install an accessibility overlay widget” — these are widely criticized as ineffective and sometimes make things worse
- Not testing with an actual screen reader
- “How do you prevent this from happening again on the next redesign?” (Accessibility acceptance criteria in the design spec. Storybook accessibility addon for component development. E2E accessibility tests in CI. Design system components that bake in accessibility.)
- “The PM pushes back — ‘only 0.1% of users use screen readers, we have higher priority bugs.’ How do you respond?” (Legal risk: ADA lawsuits have resulted in millions in settlements. Business reality: accessible design benefits everyone — keyboard navigation for power users, semantic HTML for SEO, high contrast for outdoor use. Engineering quality: if a feature breaks for screen readers, it is probably brittle in other ways too.)
26. Proving the Fault Layer — Frontend, Backend, CDN, or Browser
The most valuable skill a senior frontend engineer can have in an incident is the ability to definitively prove where the problem is. “I think it’s a backend issue” is opinion. “The API is returning 200 but the response body is an empty array instead of the expected product list — here is the curl command that reproduces it” is evidence.26.1 The Diagnostic Decision Tree
26.2 Definitive Evidence for Each Layer
Proving it is a backend issue:- Reproduce in BrowserStack with the exact browser/OS/version
- Check
caniuse.comfor the CSS or JS feature being used - Check the browser’s release notes for known regressions
- The API returns correct data (verified by Network panel or curl)
- The HTML is correct (verified by View Source)
- The issue is visible in the rendered page — this means the frontend code is mishandling the correct data
26.3 Multi-Team Ownership and Blame Routing
In large organizations, the frontend, backend, and infrastructure are owned by different teams. When production breaks, the first 30 minutes are often wasted by teams pointing fingers. The antidote is evidence-based routing. Best practice: The first responder (regardless of team) collects the diagnostic evidence above and routes the incident to the correct team with a structured handoff:Interview Question: The product team says 'the frontend is broken -- the product page shows wrong prices.' The backend team says 'our API returns correct prices.' How do you resolve this?
Interview Question: The product team says 'the frontend is broken -- the product page shows wrong prices.' The backend team says 'our API returns correct prices.' How do you resolve this?
-
Capture the API response. Open the product page, check the Network panel, and copy the exact JSON response for the product price endpoint. If the API returns
{"price": 29.99}but the page shows$19.99, the question becomes: where does the transformation happen? -
Check for caching layers. Is the frontend caching an old API response? Check TanStack Query’s cache (React DevTools > TanStack Query panel). Is the CDN caching an old SSR response? Check
curl -sIfor cache headers and age. - Check for data transformation. Does the frontend apply discounts, currency conversion, or A/B test pricing on the client side? Search the codebase for where the price value is read from the API response and rendered to the DOM.
- Check for hydration issues. If the page is SSR, the server might have fetched one price (at SSR time) and the client hydrates with a newer price from a different API call. The “wrong” price might be the stale SSR value that flashes before the client update.
-
Present the evidence. “The API response at 10:42am returned
price: 29.99. The CDN cache header showsage: 7200(2 hours). The page was SSR’d 2 hours ago when the price was 19.99. The CDN is serving stale HTML. This is a CDN TTL issue, not a frontend or backend bug.”
- Taking either team’s word at face value without independent verification
- Not checking caching layers
- Not knowing how to read CDN cache headers
- “How do you prevent this class of bug permanently?” (For price-critical pages: reduce CDN TTL to 60 seconds, use
stale-while-revalidate, or move to SSR with no-cache for personalized pricing. Add a monitoring check that compares rendered prices against API prices.) - “What if the wrong price resulted in orders placed at the incorrect price — who is responsible?” (This is a business and legal question, not just a technical one. The immediate technical fix is correcting the cache. The process fix is making price-sensitive pages exempt from aggressive caching. The organizational fix is a cross-team SLA for cache TTL on revenue-critical data.)
Interview Question: Three teams own different parts of the same page -- Team A owns the header, Team B owns the product grid, Team C owns the recommendation sidebar. The page is slow. How do you figure out which team's code is the problem?
Interview Question: Three teams own different parts of the same page -- Team A owns the header, Team B owns the product grid, Team C owns the recommendation sidebar. The page is slow. How do you figure out which team's code is the problem?
Self-Assessment
Key Takeaways
- Rendering strategy is an architectural decision, not a framework default. Choose based on content, audience, and business requirements.
- Most state does not need a state library.
useState+ TanStack Query handles 90% of needs. - Core Web Vitals are business metrics disguised as technical metrics. Treat them as SLAs, not suggestions.
- The browser is a constrained single-threaded runtime. Design for the 16ms frame budget.
- Accessibility is engineering quality, not a compliance checkbox.
- Micro-frontends solve organizational problems at the cost of technical complexity.
- Frontend security is defense in depth. XSS prevention requires encoding + CSP + sanitization.
Confidence Rating Guide
Beginner level — you can:- Explain CSR vs SSR and when to use each
- Describe the Virtual DOM
- Write React components with hooks
- Identify basic accessibility requirements
- Choose rendering strategies for given requirements and defend your choice
- Diagnose React re-rendering problems using DevTools
- Implement a testing strategy using the Testing Trophy model
- Explain hydration and its modern alternatives
- Set up and enforce performance budgets in CI
- Architect a frontend platform for 50M+ users with performance monitoring and CI enforcement
- Evaluate micro-frontends vs modular monolith for your team’s context
- Design real-time collaborative features with offline support and conflict resolution
- Lead a Core Web Vitals initiative from diagnosis through sustainable improvement
- Explain browser rendering pipeline internals to debug production performance issues
- Make architectural decisions balancing technical excellence with business impact