Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Engineering Career Growth — From Junior to Staff+

Your career is not a ladder. It is a series of increasingly complex problems you learn to solve — for yourself, your team, and your organization. This guide breaks down what actually matters at each stage and how to move through them with intention.
Think of your career like a tree, not a ladder. It branches in many directions, and the healthiest growth comes from deep roots (fundamentals) not just height. Some of the most impactful engineers in the industry took winding paths — lateral moves, open source detours, entire career pivots — and those branches became the source of their unique strengths.

Real-World Career Stories

Before we get into frameworks and checklists, here are four real stories that illustrate what career growth actually looks like in practice. None of these follow a neat, linear path — and that is the point.
Kelsey Hightower is one of the most respected engineers in the cloud-native ecosystem, and he never got a computer science degree. He started as a sysadmin, racking servers and writing Puppet manifests. He learned to code not in a classroom but because automation was the only way to manage the growing complexity of the systems he was responsible for.What set Hightower apart was not raw technical talent — it was his relentless drive to teach and share. He became a fixture in the Kubernetes community early on, giving talks that made container orchestration accessible to people who had never touched it. His “Kubernetes the Hard Way” guide became legendary — not because it made things easy, but because it forced practitioners to understand every layer of the stack by building it from scratch.Google hired him as a staff developer advocate, and he eventually became a principal engineer. He retired from Google in 2023 after a career that reshaped how an entire industry thinks about infrastructure.The lesson: Your path does not need to match anyone else’s template. Hightower’s lack of a CS degree did not hold him back because he replaced credentials with demonstrated expertise, teaching ability, and community impact. If you can explain complex systems clearly and build trust across an industry, titles and degrees become secondary.
In 2015, Dan Abramov was a relatively unknown developer from Russia who had been coding professionally for just a couple of years. He had an idea about how to manage state in React applications — inspired by Elm’s architecture and Flux’s ideas — and built a library called Redux. More importantly, he gave a conference talk (“Hot Reloading with Time Travel” at React Europe) that demonstrated the concept in a way that made the entire audience’s jaws drop.Redux took off immediately. But what got Abramov hired onto Facebook’s React core team was not just the code — it was his ability to communicate ideas, write clear documentation, and engage with the community with unusual patience and empathy. He spent enormous amounts of time answering GitHub issues and writing blog posts that explained not just how to use his tools but why they worked the way they did.Later, Abramov became a central figure in developing React Hooks, fundamentally changing how millions of developers write React code. He has consistently been transparent about his own gaps in knowledge (his “Things I Don’t Know” blog post went viral for its honesty) and has pushed the industry toward a healthier relationship with expertise and imposter syndrome.The lesson: Open source is one of the most powerful career accelerators available to any engineer. Abramov did not climb a corporate ladder — he built something useful, explained it clearly, engaged with the community generously, and the career opportunities followed. You do not need to build the next Redux. You need to solve a real problem, share it publicly, and be genuinely helpful to the people who use it.
In 2019, Tanya Reilly (then a principal engineer at Squarespace, previously at Google for over a decade) gave a talk called “Being Glue” that struck a nerve across the entire engineering industry. The talk told the story of a composite character — an engineer who was brilliant at the work that holds teams together: onboarding new hires, writing documentation, facilitating cross-team communication, reviewing design docs, running project coordination. This “glue work” was essential — without it, projects stalled and teams fell apart.But here is the painful part: when promotion time came, this engineer was told they did not have enough “technical impact.” All the work that kept the team functional was invisible in the performance review process. Meanwhile, engineers who had been freed up by the glue worker’s efforts — because someone else handled the coordination, the docs, the onboarding — got promoted for their visible technical contributions.Reilly’s talk forced companies to confront an uncomfortable truth: organizations systematically undervalue the work that makes everything else possible, and this burden falls disproportionately on women and underrepresented groups. The talk led to real changes in how many companies evaluate engineering contributions, broadening promotion criteria to include organizational impact.Reilly went on to write The Staff Engineer’s Path (O’Reilly, 2022), one of the definitive guides to the IC leadership track.The lesson: Glue work is real, valuable, and dangerous if it is not recognized. If you find yourself doing this work, make it visible — track it in your brag document, frame it in terms of team outcomes, and have explicit conversations with your manager about how it is valued. And if your organization refuses to recognize it, that tells you something important about whether you should stay.
Charity Majors has had one of the more unusual and instructive career paths in the engineering world. She spent years as a systems engineer and DBA at companies like Linden Lab (Second Life) and Parse (acquired by Facebook). At Facebook, she was a production engineering manager — and she hated it. Not because management is bad, but because she realized the management role at that particular moment was not where she could have the most impact.She left Facebook, co-founded Honeycomb.io (an observability company), and became its CTO. As CTO of a startup, she was back to writing code, designing systems, and managing people — all at once. As Honeycomb grew, she transitioned back to a pure IC role, handing off management responsibilities to focus on technical strategy.Majors has been unusually vocal about the IC-to-management-and-back journey. Her core argument: management is not a promotion, it is a career change. The skills are different. The day-to-day is different. And there is no shame in trying management, realizing it is not for you, and going back to IC. In fact, engineers who have done a stint in management often become better ICs because they understand organizational dynamics, prioritization, and the human side of engineering in ways that pure ICs often do not.The lesson: The IC vs. management fork is not a one-way door. Try management if you are curious — but go in with your eyes open. It is a different job, not a better one. And if you go back to IC, you are not “demoting” yourself. You are choosing the role where you can have the most impact. The best organizations understand this. If yours does not, factor that into your career calculus.

1. Engineering Levels Demystified

The Level Map

DimensionJunior (L1-L2)Mid (L3)Senior (L4)Staff (L5)Principal (L6+)
ScopeSingle task / functionFeature / small projectEntire system / serviceMultiple systems / cross-teamOrg-wide / company-wide
AutonomyNeeds guidance on approachNeeds guidance on directionSelf-directed, sets direction for othersDefines technical direction for a domainShapes company technical strategy
InfluenceOwn codeTeam codebaseTeam + adjacent teamsEngineering orgEntire company
Technical DepthLearns patternsApplies patterns correctlyChooses and adapts patternsCreates patterns others followDefines industry-level patterns
AmbiguityWell-defined tasksLoosely defined featuresAmbiguous problemsUndefined problem spaces”We don’t even know the question yet”
Failure ImpactBug in a functionBroken featureService outageCross-team architectural debtCompany-wide strategic misalignment
Levels are about impact, not years. A developer with 10 years of experience who has repeated year 1 ten times is not a senior engineer. Conversely, someone with 3 years of intense, ownership-driven experience can absolutely operate at a senior level. Stop counting years. Start measuring impact.

The One-Sentence Litmus Test at Each Level

If you remember nothing else from this section, remember this progression. It captures the single sharpest distinction between levels:
  • A junior engineer writes code that works.
  • A mid-level engineer finishes well-defined tasks reliably and independently.
  • A senior engineer defines the tasks. They look at an ambiguous problem, break it into solvable pieces, and own the outcome end-to-end.
  • A staff engineer defines what problems to solve. They look across teams and systems, identify the highest-leverage opportunities, and build consensus around pursuing them.
  • A principal engineer defines what problems are worth solving — and in what order — by connecting technical possibilities to business strategy.
Notice the pattern: at each level, the ambiguity increases and the scope of ownership expands. A mid-level engineer needs someone to hand them a well-scoped ticket. A senior engineer writes the tickets. A staff engineer writes the roadmap that determines which tickets exist. A principal engineer shapes the strategy that determines which roadmaps get funded. The further you go, the less your job looks like “writing code” and the more it looks like “making decisions that determine what code gets written.”

Calibration by Company Stage: How Levels Map Differently

The level map above is a useful abstraction, but it hides a critical reality: what “senior” or “staff” means varies enormously depending on company stage, size, and culture. An L4 at Google is not the same as an L4 at a 30-person startup, and neither is the same as an L4 at a Fortune 500 enterprise. Calibrating your self-assessment and your interview answers to the company context is one of the most overlooked career skills.
Why this matters for interviews: When an interviewer at a startup asks “Tell me about a time you operated at a senior level,” they mean something fundamentally different from when a Google interviewer asks the same question. The startup interviewer wants to hear about shipping under ambiguity with minimal process. The big-tech interviewer wants to hear about navigating organizational complexity at scale. Giving the wrong calibration story — even a good one — signals that you do not understand the environment you are interviewing for.
DimensionStartup (Seed to Series B, <50 eng)Growth-Stage (Series C+, 50-500 eng)Big Tech (FAANG-scale, 1000+ eng)Enterprise (Traditional large co)
What “Senior” meansYou own entire systems, ship to production solo, make architecture decisions with minimal reviewYou own a significant domain, write design docs, mentor a small team, coordinate with 2-3 adjacent teamsYou own a service or feature area within a larger system, navigate complex review processes, influence your team’s directionYou own a well-defined scope within established frameworks, follow rigorous change management, navigate approval chains
What “Staff” meansOften does not exist as a formal level. If it does, it means “the person who defines what we build and how” — essentially a CTO-liteCross-team technical leadership, authoring org-wide RFCs, defining engineering standards as the company scalesDeep technical influence across a large org, multi-year technical strategy, mentoring senior engineers, building consensus across 5-10+ teamsArchitecture governance, standards bodies, cross-divisional technical alignment, often with formal authority structures
Scope of “ownership”End-to-end. You designed it, built it, deployed it, and you get paged when it breaks at 3amSignificant but bounded. You own a domain but share infrastructure and platform with other teamsDeep but narrow. You own a specific area with well-defined interfaces to adjacent systemsProcess-mediated. You own your area but changes require formal approvals, security reviews, compliance checks
Promotion signalsShip impact. Revenue, users, uptime. Process is informal — if you are clearly making the company successful, you get promotedImpact + influence + documentation. You need to show both technical delivery AND organizational contributionImpact + scope + consensus. You need evidence of cross-team influence, written artifacts (RFCs, design docs), and peer recognitionTenure + visibility + governance. You need formal documentation, management sponsorship, and often committee approval
Common failure modeConfusing “busy” with “impactful” — shipping lots of code without strategic thinkingGrowing past your startup habits — still trying to do everything yourself instead of building systems and mentoringGetting lost in the machine — doing excellent work that nobody notices because you have not built organizational visibilityOptimizing for process compliance over genuine impact — checking boxes without moving meaningful metrics
The calibration trap in interviews: Candidates from startups interviewing at big tech often undersell their impact because they lack the vocabulary of org-wide RFCs and cross-team alignment. Candidates from big tech interviewing at startups often oversell process and undersell scrappiness. Know your audience. If you are interviewing at a startup, emphasize speed, ownership, and direct business impact. If you are interviewing at a large company, emphasize how you built consensus, navigated ambiguity across teams, and documented your decisions.

How This Answer Changes: Startup vs. Big Tech vs. Enterprise

Throughout this chapter, many of the frameworks and interview answers assume a certain organizational context. Here is a quick reference for how key concepts shift depending on where you are:
At a startup: “We had no dedicated SRE team, so I built our observability stack from scratch — Datadog dashboards, PagerDuty alerting, runbooks for the three most common failure modes. I was an L3 engineer doing what a startup expects of its seniors: owning something end-to-end that was not in my job description because it needed to exist.”At big tech: “I was an L4 engineer but I identified that three teams were independently building rate-limiting solutions. I wrote a cross-team RFC proposing a shared library, presented it at the architecture review with all three team leads, and coordinated the migration. That is L5-scope work — the cross-team influence and consensus-building that defines the senior-to-staff transition.”At enterprise: “I was a senior developer but I noticed our change management process was adding five days to every deployment. I documented the bottlenecks, built a business case showing the cost in developer hours, and presented it to the architecture review board. The board approved a streamlined process that I then piloted with my team. That is principal-level work in our org — driving process change through governance structures.”The pattern: The skill being demonstrated is the same (operating above your defined scope). The evidence looks completely different depending on the organizational context.
At a startup: “You cannot afford a dedicated tech debt sprint when you have 6 months of runway. I handle it by building quality into the delivery process — every feature PR includes one small improvement to the area I touched. I also maintain a ruthless priority: only fix debt that is actively slowing us down or creating risk. Aesthetic debt can wait.”At big tech: “I maintain a tech debt register with quantified impact. Each quarter, I present the top 5 items to the engineering director with cost-of-inaction estimates. We allocate 15-20% of sprint capacity to debt reduction. The key is framing debt in terms that compete with feature work — ‘This refactor recovers 3 engineer-hours per week’ competes better than ‘This code is messy.’”At enterprise: “Technical debt in enterprise environments is often entangled with compliance, vendor contracts, and organizational dependencies. I work through the architecture review board to get formal acknowledgment of debt items, which creates organizational commitment to resolution. I also create debt reduction as part of required security and compliance updates — ‘While we are updating the auth library for the CVE, we should also address the token handling refactor that has been on the backlog.’”The meta-point: The underlying judgment is identical (prioritize debt by business impact, quantify the cost, make it visible). The execution is completely different because the organizational levers are different.

The Key Transitions

What changes: You stop needing someone to check every decision. You write code that works, is tested, and can be reviewed without major rewrites.Concrete signals you have made the transition:
  • You break down tasks yourself before starting
  • Your PRs rarely need architectural-level feedback anymore
  • You handle edge cases proactively, not after code review comments
  • You write tests without being asked
  • You can debug production issues in your area without hand-holding
Concrete behaviors that demonstrate the difference: A junior engineer gets a Jira ticket that says “Add pagination to the user list endpoint” and asks their lead how to implement it. A mid-level engineer gets the same ticket, investigates the current endpoint, checks the data volume, picks a pagination strategy (cursor-based vs. offset), implements it with tests, and opens a PR with a note about why they chose cursor-based pagination for this use case.What holds people back:
  • Waiting for perfect specifications instead of clarifying ambiguity yourself
  • Not reading enough production code written by senior engineers
  • Avoiding unfamiliar parts of the codebase
What changes: You shift from “I completed the ticket” to “I own this system.” You think about the full lifecycle — design, implementation, deployment, monitoring, maintenance, and eventual deprecation.Concrete signals you have made the transition:
  • You write the design doc before you write the code
  • You push back on requirements when they don’t make technical sense
  • You proactively identify and address tech debt, not just complain about it
  • Other engineers come to you with questions about your domain
  • You can estimate work for your team, not just yourself
  • You think about failure modes during design, not after the first outage
Concrete behaviors that demonstrate the difference: A mid-level engineer is told “we need a notification system” and builds exactly what the spec says. A senior engineer is told “we need a notification system” and asks: “What problem are we solving? What are the delivery requirements? Do we need exactly-once delivery or is at-least-once acceptable? What’s the budget for build vs. buy? What does failure look like for the user?” — and then writes a design doc with three options before writing a single line of code.What holds people back:
  • Optimizing for code elegance over system reliability
  • Not developing opinions about architecture (always deferring to others)
  • Avoiding cross-functional conversations with PMs, designers, and stakeholders
What changes: Your work is no longer contained within a single team. You identify problems that span team boundaries, propose solutions, build consensus, and drive execution across organizational lines.Concrete signals you have made the transition:
  • You have authored RFCs or design docs that were adopted across multiple teams
  • You are pulled into architectural decisions outside your team
  • You mentor senior engineers, not just juniors
  • You identify systemic problems (tooling, patterns, processes) and fix them at the org level
  • Leadership asks for your input on technical strategy
  • You can articulate trade-offs to non-technical stakeholders
Concrete behaviors that demonstrate the difference: A senior engineer notices their service has a caching problem and fixes it. A staff engineer notices that five services across three teams all have the same caching problem because there is no shared caching strategy — then writes an RFC proposing a unified caching layer, gets buy-in from all three team leads, and coordinates the migration. The senior engineer defined the task. The staff engineer defined the problem.What holds people back:
  • Staying heads-down in code and never building organizational influence
  • Not writing — design docs, RFCs, blog posts, postmortems
  • Inability to communicate technical decisions to non-engineers
  • Not building relationships outside your immediate team
Cross-chapter connections: The senior-to-staff transition depends heavily on skills covered in other chapters. See Communication & Soft Skills for frameworks on presenting technical decisions to non-engineers and running effective design reviews. See Leadership, Execution & Infrastructure for the staff+ skills around influence without authority and driving cross-team initiatives.
What changes: You operate at the intersection of technology and business. You don’t just solve the hard technical problems — you decide which problems are worth solving and in what order.Concrete signals you have made the transition:
  • You define multi-year technical roadmaps that align with business goals
  • Your decisions affect hundreds of engineers
  • You identify existential technical risks before they become crises
  • You shape hiring, team structure, and engineering culture
  • Industry peers look to your work as a reference point
  • You can translate between CTO-level strategy and individual team execution
Concrete behaviors that demonstrate the difference: A staff engineer proposes migrating from a monolith to microservices because the current architecture is causing deployment bottlenecks. A principal engineer evaluates that proposal against the company’s 3-year business strategy, the current team size, the hiring plan, and the operational maturity of the organization — and concludes that the right move is actually a modular monolith with clear domain boundaries, because the company cannot hire enough platform engineers to operate 40 microservices responsibly. The principal engineer defines what problems are worth solving given the constraints that exist beyond the purely technical.What holds people back:
  • Thinking “principal” means “the best coder” — it means the best technical decision-maker
  • Not understanding the business deeply enough
  • Inability to influence without authority
  • Not developing a point of view on where the industry is heading

Anti-Patterns at Every Level

Pattern: Has the title because they have been around for 5+ years. Does solid work within a narrow comfort zone. Avoids new technologies, hard problems, and cross-team initiatives. Never writes design docs. Never mentors.Why it is a problem: They block the level for others and create a false ceiling. Junior engineers look at them and think seniority means doing the same thing for a long time.How to avoid it: Every 6 months, ask yourself: “What can I do now that I could not do 6 months ago?” If the answer is nothing, you are stagnating. For more on developing the growth mindset that prevents stagnation, see the Engineering Mindset chapter — especially the sections on first-principles thinking and deliberate practice.
Pattern: Draws diagrams, attends meetings, writes documents — but hasn’t shipped code in months (or years). Their designs are theoretically sound but practically impossible. They don’t feel the pain of their own decisions.Why it is a problem: Architecture disconnected from implementation reality leads to over-engineering, impractical abstractions, and resentment from the teams who have to build it.How to avoid it: Even at the staff+ level, write code regularly. It doesn’t have to be feature work — write tooling, fix production bugs, contribute to critical path items. Stay connected to the developer experience.
Pattern: Produces enormous volumes of code. Solves hard problems solo. But their code is unreadable to anyone else. They don’t do code reviews. They don’t document. They don’t teach.Why it is a problem: They are a single point of failure and a team bottleneck. When they leave, their systems become legacy code overnight. A true 10x engineer makes 10 people 2x more effective.How to avoid it: Measure your impact by what the team ships, not what you personally ship.
Interview Question: “Tell me about a time you operated above your level.” This is one of the strongest signals in a behavioral interview. It shows self-awareness about leveling AND initiative. Prepare 2-3 examples where you took on scope beyond your title.

2. What Makes a Senior Engineer

Being senior is not about knowing every framework or having all the answers. It is about a mindset shift — from “I write code” to “I solve problems and enable others.”

The Six Pillars of Senior Engineering

What it means: Your code is not just correct — it is clear, well-structured, and maintainable by someone who has never seen it before. You optimize for readability over cleverness.In practice:
  • Functions do one thing and have descriptive names
  • Error handling is explicit, not an afterthought
  • Complex logic has comments explaining why, not what
  • Tests document expected behavior, including edge cases
  • You follow established patterns in the codebase (consistency over personal preference)
The test: Can a mid-level engineer on your team understand, modify, and extend your code without asking you a question? If not, it is not senior-quality code.
// Junior: clever but opaque
const r = d.filter(x => x.s > 2 && x.t !== 'b').map(x => ({...x, p: x.a * 0.15}));

// Senior: clear and maintainable
const activeNonBetaUsers = users.filter(user => {
  const isActive = user.sessionsThisMonth > MIN_ACTIVE_SESSIONS;
  const isNotBetaTester = user.accountType !== AccountType.BETA;
  return isActive && isNotBetaTester;
});

const usersWithDiscountApplied = activeNonBetaUsers.map(user => ({
  ...user,
  discountedPrice: user.annualPrice * LOYALTY_DISCOUNT_RATE,
}));
What it means: You don’t wait for someone to assign you the hard problems. When something breaks, you step up — even if it is not “your” code. You see gaps and fill them.In practice:
  • You monitor your services proactively, not just when paged
  • You write runbooks for on-call scenarios before they happen
  • When you find a bug in another team’s service that affects yours, you fix it or file a detailed issue with a proposed solution — not just “it’s broken”
  • You own the outcome of a project, not just your individual tasks
  • You follow up after deployment to verify the feature works as expected in production
The anti-pattern: “That’s not my job.” / “That’s the platform team’s problem.” / “I just write the code, ops handles deployment.”
What it means: Your presence makes the entire team more productive, not just yourself. You invest in code reviews, mentoring, documentation, and tooling that saves everyone time.In practice:
  • Your code reviews teach, not just gatekeep — you explain why something should change
  • You create shared utilities, templates, and tooling that the whole team uses
  • You document tribal knowledge so it is not locked in your head
  • You pair with struggling teammates instead of just solving it yourself
  • You create a “paved road” for common patterns so others don’t reinvent the wheel
How to measure it: If you went on vacation for two weeks, would the team slow down or keep moving? A true senior engineer’s team keeps moving because they have built systems and shared knowledge. A “senior” who is actually a bottleneck causes the team to stall.
What it means: You understand that the best code is often no code. You evaluate build-vs-buy, complexity-vs-value, and perfect-vs-good-enough trade-offs with nuance.In practice:
  • You push back on features that add complexity without proportional value
  • You choose boring technology for critical systems
  • You can articulate: “We should use a third-party service for this because building it ourselves would take 3 months and maintaining it would take 0.5 FTE forever”
  • You avoid premature optimization AND premature abstraction
  • You know when a quick hack is the right call and when it is technical debt
The framework: For every technical decision, ask:
  1. What is the simplest thing that could work?
  2. What are the maintenance costs over 2 years?
  3. Who else will need to understand this?
  4. What happens when this fails at 10x scale?
  5. Is this reversible? If not, how much certainty do we need?
What it means: You design for production from day one. Monitoring, alerting, deployment strategy, rollback plans, and failure modes are part of your design, not afterthoughts.In practice:
  • Every feature you build has metrics and alerts before it ships
  • You think about: What does graceful degradation look like?
  • You design for zero-downtime deployments (feature flags, blue-green, canary)
  • Your error messages are actionable, not just “something went wrong”
  • You write postmortems that actually prevent recurrence, not just document the outage
The checklist (ask yourself before any deploy):
  • Can I roll this back in under 5 minutes?
  • Do I have dashboards that will show me if this is broken?
  • What happens to users if this feature fails?
  • Have I tested this with production-like data volume?
  • Is there a runbook for when this goes wrong at 3 AM?
What it means: You can explain complex technical concepts to different audiences — engineers, PMs, executives. You write clearly. Your design docs are actionable. Your Slack messages are concise.In practice:
  • Your design docs have a clear problem statement, proposed solution, alternatives considered, and trade-offs
  • You can run a productive technical discussion without it becoming a debate
  • Your postmortems are blameless and focused on systemic improvements
  • You give feedback that is specific, actionable, and kind
  • You document decisions and their rationale so future engineers understand why
The test: Can you explain your system’s architecture to a new hire in 15 minutes and have them understand the key trade-offs?
For a deep dive into the specific communication frameworks that senior engineers use — including how to write design docs, run blameless postmortems, and present technical trade-offs to executives — see the Communication & Soft Skills chapter.

The Senior Engineer Task Checklist

Use this checklist for every meaningful task. It is not about checking every box every time — it is about consciously considering each dimension. Over time, this becomes automatic.
1

Correctness

Does it actually solve the problem? Have you validated the requirements, not just the implementation? Did you handle edge cases, error states, and invalid input?
2

Performance

Will this work at current scale AND 10x scale? Have you identified the hot path? Are there unnecessary database queries, N+1 problems, or missing indexes? Did you measure, not guess?
3

Security

Is user input validated and sanitized? Are you following the principle of least privilege? Is sensitive data encrypted at rest and in transit? Are there any injection vectors?
4

Observability

Can you tell if this is working in production? Do you have metrics, logs, and traces? Are alerts set up for failure conditions? Can you debug a problem using only the telemetry data?
5

Maintainability

Will someone else understand this in 6 months? Is the code well-structured and documented? Are there tests that prevent regressions? Does this follow existing patterns in the codebase?
6

Operability

Can this be deployed safely? Is there a rollback plan? Does it need a feature flag? Are there any dependencies that could cause cascading failures?

3. The Staff+ Engineer Path

IC Track vs. Management Track

The problem with a single track: If management is the only path to seniority and compensation, you force great engineers into management roles they don’t want and aren’t suited for. The result is bad managers AND lost technical talent.IC (Individual Contributor) Track: Deep technical impact without managing people. Staff, Principal, Distinguished, Fellow.Management Track: Impact through people and organizational design. Tech Lead Manager, Engineering Manager, Director, VP.
DimensionIC Track (Staff+)Management Track (EM+)
Primary leverTechnical decisions, architecture, codePeople, process, organizational design
Day-to-dayDesign docs, code, technical mentoring, cross-team alignment1:1s, hiring, performance reviews, roadmap planning
Measured byTechnical impact on systems and engineering qualityTeam output, retention, growth, delivery
Failure modeIvory tower architectMeeting-only manager disconnected from tech
ReversibilityEasier to move to managementHarder to return to IC (skills atrophy)
The key insight: Both tracks require leadership. A staff engineer who cannot influence people is not operating at staff level. A manager who cannot understand technical trade-offs is not effective either.
1. The Tech Lead
  • Partners with a single team’s manager to set technical direction
  • Drives execution on the team’s most important projects
  • Day-to-day: code reviews, design docs, unblocking engineers, sprint planning
  • Risk: becomes a bottleneck if they don’t delegate effectively
2. The Architect
  • Responsible for technical direction of a broad area (e.g., “frontend architecture” or “data platform”)
  • Depth of knowledge in their domain, breadth of influence across teams
  • Day-to-day: RFCs, reviewing designs from multiple teams, defining standards
  • Risk: becomes disconnected from implementation reality
3. The Solver
  • Parachutes into the hardest technical problems across the org
  • Moves from team to team as needed
  • Day-to-day: deep technical work on the most ambiguous, high-stakes problems
  • Risk: never builds lasting relationships or institutional knowledge
4. The Right Hand
  • Extends a senior leader (VP, CTO) by taking on their hardest problems
  • Operates with borrowed authority
  • Day-to-day: organizational strategy, cross-cutting initiatives, special projects
  • Risk: impact is invisible and hard to attribute
Which one are you? Most staff engineers are a blend, but understanding these archetypes helps you articulate what kind of staff engineer you want to be and what kind your org needs. All four archetypes require strong leadership skills — for frameworks on influence without authority, consensus building, and organizational navigation, see Leadership, Execution & Infrastructure.
Design Docs and RFCs:
  • Write design docs that change how your org builds software
  • An RFC adopted by 3+ teams is stronger signal than any amount of code
  • Focus on: problem framing, trade-off analysis, alternatives considered
  • Your RFCs should teach, not just propose
Cross-Team Projects:
  • Identify problems that affect multiple teams but nobody owns
  • Examples: shared authentication, observability standards, CI/CD improvements, API design guidelines
  • Lead the initiative without formal authority — this IS the staff skill
Technical Strategy Documents:
  • Write the “state of the world” for your technical domain
  • Include: where we are, where we need to be, how we get there, what we stop doing
  • Present to engineering leadership and get buy-in
Mentoring Senior Engineers:
  • Help senior engineers make the jump to staff
  • This is a multiplier on a multiplier
  • Share your frameworks for decision-making, not just your technical knowledge
One of the highest-leverage activities in your career is a well-run career conversation with your manager. Most engineers either never have this conversation, have it reactively (after a disappointing review), or waste it on vague sentiments like “I want to grow.” Here is a concrete template for making these conversations productive.Before the Meeting — What to Prepare (spend 30-60 minutes on this):
  1. Your self-assessment: Where do you honestly sit on the level expectations? Which dimensions are you strong in, which are gaps? Be specific — “I’m strong at system design within my team but I haven’t demonstrated cross-team architectural influence yet.”
  2. Your impact summary: Bring your brag document. Highlight 3-5 items from the last quarter that demonstrate the level you are operating at (or aspiring to). Use the impact story formula — problem, action, measurable result.
  3. Your growth hypothesis: “I believe the gap between where I am and where I want to be is [specific skill or scope]. Here is my plan to close it: [concrete actions].”
  4. Your asks: What do you need from your manager? Specific project assignments? Introductions to engineers on other teams? Air cover to take on a cross-team initiative? Feedback on a specific dimension?
During the Meeting — The Conversation Flow:
1

Open with your self-assessment (5 minutes)

“Here’s where I think I am relative to the next level. I’d like to check my understanding against yours.” This sets a collaborative tone and shows self-awareness. Your manager will either confirm or correct — both are valuable.
2

Share your impact evidence (5 minutes)

Walk through your top 3 items from your brag document. Frame them in terms of the level criteria your company uses. “This project demonstrated staff-level scope because it required aligning three teams on a shared data model.”
3

Ask the calibration question (5 minutes)

The most important question you can ask: “If I were being put up for promotion today, what would be the strongest argument against it?” This question forces your manager to be specific about gaps. Vague feedback like “just keep doing great work” is useless. Push for specifics. “What specific evidence would you need to see to feel confident putting me forward?”
4

Propose your growth plan (5 minutes)

“Based on what we’ve discussed, here is what I plan to focus on over the next quarter. Does this seem like the right priority?” This shows initiative and gives your manager a chance to redirect if needed.
5

Agree on checkpoints (5 minutes)

“Can we check in on this progress monthly? I’d like to make sure I’m on track and course-correct early if needed.” This creates accountability on both sides and prevents surprises at review time.
After the Meeting — Follow Up:
  1. Send a written summary within 24 hours. “Here’s what we discussed, here’s what I committed to, here’s what I’m asking from you.” This creates a paper trail and ensures alignment.
  2. Update your brag document with the agreed-upon focus areas.
  3. Set calendar reminders for the monthly check-ins.
  4. Track progress against the specific gaps identified. When you make progress on a gap, document it with evidence.
What to do if your manager is unhelpful: Some managers are bad at career conversations. If yours gives only vague feedback, try these escalation strategies:
  • Ask for examples: “Can you point to someone who recently got promoted to [target level]? What did their impact look like?”
  • Ask for specificity: “You mentioned I need more visibility. Can you give me an example of what that would look like concretely?”
  • Seek other sponsors: Talk to skip-level managers, staff+ engineers, or mentors outside your team. Your manager is not the only person who can help you grow.
  • If all else fails, this is signal about whether this organization can support your growth.
Cross-chapter connection: Career conversations are high-stakes communication moments. The frameworks in Communication & Soft Skills — especially the sections on giving and receiving feedback, framing technical decisions for non-technical audiences, and navigating difficult conversations — apply directly here.
What is glue work? (Tanya Reilly’s concept) Work that is essential for the team to function but doesn’t get recognized in performance reviews: onboarding new hires, improving documentation, facilitating meetings, coordinating cross-team dependencies, writing postmortems, reviewing design docs.Why it is a problem: This work disproportionately falls on underrepresented groups. It is essential but invisible. If you do too much of it, you get passed over for promotion because you “didn’t have enough technical impact.”How to handle it:
1

Make it visible

Track all glue work in your brag document. Quantify the impact: “Onboarded 3 new hires, reducing their ramp-up time from 8 weeks to 4 weeks.”
2

Get it recognized

Talk to your manager about it explicitly. Frame it as organizational impact. If your manager doesn’t value it, that is a red flag about the org.
3

Distribute it

Don’t be the only person doing glue work. Create systems (onboarding docs, runbooks, templates) so the work scales beyond you.
4

Balance it

Ensure you also have visible technical projects. The ratio should shift over your career: early on, more code; at staff level, more organizational work.
5

Choose your org wisely

Some companies recognize and reward glue work. Some don’t. Know which kind you are at.
What it is: A running document where you record your accomplishments, impact, and growth. Updated weekly or biweekly. Used for performance reviews, promotion packets, and resume updates.
Think of a brag document like a highlight reel. You don’t remember every shot you took, but you SHOULD remember the ones that mattered. Without a highlight reel, your entire season of work gets compressed into whatever your manager vaguely recalls from the last two weeks before review time. That is not a system for getting recognized — that is a lottery.
Template:
# Brag Document — [Your Name] — [Quarter/Year]

## Projects
- [Project Name]: [1-sentence description]
  - My role: [what you specifically did]
  - Impact: [quantified outcome]
  - Skills demonstrated: [technical, leadership, etc.]

## Design and Architecture
- [RFC/Design Doc Title]: [outcome — adopted? by how many teams?]
- Key decisions I drove: [list]

## Mentoring and Team Building
- Mentored [Name] on [Topic] — [outcome]
- Improved onboarding: [specific improvement and result]

## Operational Excellence
- On-call improvements: [what you changed, impact on MTTR/incident count]
- Reliability wins: [uptime improvements, toil reduction]

## Learning and Growth
- Learned [Technology/Skill] — applied in [Project]
- Conference talk / blog post / internal presentation: [title]

## What I Want to Do Next
- [Goals for next quarter]
Rules:
  1. Update it every Friday. Set a calendar reminder. If you wait until review time, you will forget 80% of your impact.
  2. Quantify everything. “Improved performance” means nothing. “Reduced p99 latency from 800ms to 200ms, saving an estimated $50K/year in compute” means everything.
  3. Include the work that was hard to do, even if it was invisible. This is your evidence.
  4. Share it with your manager before review season. Don’t make them guess your impact.
  5. Frame every item as a story, not an activity. Not “I built X” but “I built X, which resulted in Y, saving/generating Z.” The impact story formula (covered in the next section) is the difference between a list of tasks and a compelling promotion case.
Most engineers describe their work like this: “I built the new caching layer.” That is a description of activity, not impact. Nobody gets promoted for activity. You get promoted for outcomes.The single most important career skill — more important than any framework, language, or architecture pattern — is the ability to tell the story of your impact. Not “I built X” but “I built X, which reduced incident response time by 40%, saving approximately $200K per year in engineering hours and preventing an estimated 15 customer-facing outages per quarter.”The Impact Story Formula:
I noticed [PROBLEM] affecting [WHO/WHAT].
I proposed [SOLUTION] and got buy-in from [STAKEHOLDERS].
I led [SCOPE — team size, timeline, complexity].
The result was [MEASURABLE OUTCOME — revenue, cost savings, time saved, reliability improvement].
The second-order effect was [BROADER IMPACT — team velocity, engineering culture, customer satisfaction].
Why this matters so much: Your manager has 6-10 direct reports. Their manager has 5-8 managers reporting to them. When your promotion packet goes to a calibration committee, the people deciding your fate have never seen your code. They have never watched you debug a production issue at 2 AM. All they have is a narrative — a story about what you did and why it mattered. If you cannot tell that story compellingly, with numbers, you are leaving your career in someone else’s hands.Practice this constantly. Every time you finish a project, write down the impact story in your brag document. When someone asks “what are you working on?” at a standup, practice the impact framing, not the activity framing. Instead of “I’m refactoring the authentication module,” say “I’m refactoring the authentication module to eliminate the class of session-expiry bugs that caused 12 support tickets last month.” Same work. Radically different framing.
For a deeper dive into how to communicate your impact in design reviews, postmortems, and stakeholder meetings, see the Communication & Soft Skills chapter. The frameworks there pair directly with the impact storytelling approach here.
Interview Question: “What does a staff engineer do that a senior engineer does not?” Strong answer: “A senior engineer owns a system. A staff engineer owns the problem space across systems. They identify which problems matter most, build consensus on solutions, and ensure execution across team boundaries. The shift is from technical depth in one area to technical leadership across areas.”

4. Building Your Technical Portfolio

Side Projects That Actually Matter

What impresses hiring managers and peers:
  • Tools that solve a real problem you personally had
  • Projects with actual users (even 10 users is meaningful)
  • Contributions to infrastructure, developer tooling, or open source
  • Projects that demonstrate systems thinking, not just UI building
  • Well-documented projects with clear READMEs, architecture docs, and deployment instructions
What does NOT impress:
  • A collection of tutorial follow-alongs (“I built a todo app in 15 frameworks”)
  • Repos with no README, no tests, and no documentation
  • Projects started but never finished (demonstrates nothing)
  • Copy-paste from a YouTube course with no original modifications
The litmus test: Can you explain every technical decision you made and every trade-off you accepted? If not, you didn’t build it — you copied it.
1

Start with documentation

Find a project you use. Read the docs. Find something confusing or missing. Fix it. This is the lowest-barrier entry point and maintainers love it.
2

Tackle 'good first issue' labels

Most major projects tag beginner-friendly issues. Pick one, read the contributing guide, and submit a PR. Expect feedback — that is the point.
3

Fix a bug you encountered

If you hit a bug in an open source tool, don’t just work around it — fix it upstream. This demonstrates real-world problem-solving.
4

Build tooling or plugins

Create a plugin, extension, or integration for a tool you use. This is often higher-impact than fixing typos and gives you ownership of something.
5

Contribute consistently, not heroically

One PR per month for a year is more impressive than 20 PRs in one weekend followed by silence. Consistency signals reliability.
Why it matters: Open source contributions demonstrate that you can read unfamiliar code, follow contribution guidelines, communicate with strangers professionally, handle code review feedback, and ship in someone else’s codebase. These are exactly the skills you need on any engineering team.
Types of technical writing that build your reputation:
  1. Blog posts that teach — Explain something you learned. The bar is not “original research.” The bar is “clearly explaining something that others struggle with.”
  2. Internal documentation — The engineer who writes the best internal docs is often the most valued on the team. Architecture decision records (ADRs), runbooks, onboarding guides.
  3. Conference talks — Start with lightning talks (5 min) at local meetups. Work up to full talks. You do NOT need to be a world expert. You need to explain one thing well.
  4. Postmortems — A well-written postmortem is one of the highest-leverage documents in engineering. It teaches the whole org.
  5. Design docs and RFCs — These are the primary “currency” at staff+ levels. Practice writing them even if your org doesn’t require them.
The compounding effect: A blog post you write today will generate inbound interest for years. A conference talk becomes a YouTube video. Internal docs get referenced in every onboarding. Writing is the highest-ROI career investment most engineers ignore.
The problem: Technology moves fast. You cannot learn everything. Trying to learn everything leads to burnout and shallow knowledge.The system:
  1. Define your T-shape: Deep expertise in 1-2 areas. Broad familiarity with many. Decide what your “deep” areas are and protect that depth.
  2. Weekly learning budget: Allocate 3-5 hours per week. Block it on your calendar. Protect it like a meeting.
  3. Input sources (curate ruthlessly):
    • 2-3 newsletters (e.g., TLDR, Pointer, ByteByteGo)
    • 1 technical book per quarter (not per year)
    • Follow 10-15 high-signal engineers on social media
    • Read 1-2 RFCs/design docs from top companies per month
  4. Output requirement: Learning without output is just entertainment. For everything you learn, produce something:
    • Write a summary in your own words
    • Build a small prototype
    • Teach it to a teammate
    • Add it to your notes system
  5. Prune aggressively: Unsubscribe from noisy channels. Ignore hype cycles. Ask: “Will this matter to my work in 12 months?” If not, skip it.
The spaced repetition principle: Review your notes monthly. You will be surprised how much you forget — and how much sticks when you revisit.
Learning is the input. Your personal brand is the output. For strategies on turning your learning into visible career assets — blog posts, conference talks, and open-source contributions — see “Building Your Personal Brand as an Engineer” later in this chapter.
Exercise: Audit your GitHub profile right now. If a hiring manager spent 60 seconds on it, what would they conclude? If the answer is “nothing useful,” pick one project this month and make it portfolio-worthy: add a proper README, write tests, deploy it, and document the architecture.

5. Interview Strategy for Engineers

Evaluating Companies (Not Just Getting Hired)

Remember: an interview is a two-way evaluation. You are assessing whether this company deserves 2,000+ hours of your life per year.Evaluate on these dimensions:
DimensionGreen FlagsRed Flags
Engineering CultureDesign docs, code review, blameless postmortems”We move fast and break things” (without fixing them)
Technical QualityCI/CD, automated testing, monitoring/observability”We deploy by SSHing into production”
GrowthPromotion criteria are documented, mentorship exists”Just keep doing great work and it’ll happen”
Work-Life BalanceSustainable on-call, reasonable hours, PTO actually used”We’re like a family” (meaning: no boundaries)
Team HealthLow turnover, engineers speak positively, candid answersInterviewers dodge questions about culture
Technical DebtAcknowledged and systematically addressed”We’ll fix it later” (they never do)
Most candidates walk into an interview knowing the company name and the job title. That is not preparation — that is showing up. The engineers who consistently get offers do 2-4 hours of research before every on-site. Here is exactly what to do:Step 1: Understand Their Tech Stack (30 minutes)
  • Check their engineering blog (most mid-to-large companies have one). Search “[Company Name] engineering blog.”
  • Look at their open-source repos on GitHub. What languages? What frameworks? What problems are they solving publicly?
  • Check job postings for the team you are interviewing with — the required skills tell you what they use.
  • Look at StackShare or BuiltWith for their technology profile.
Step 2: Read Their Engineering Blog Posts (30-60 minutes)
  • Find 2-3 recent posts relevant to the role. Read them carefully.
  • This gives you ammunition for the interview: “I read your blog post about migrating to Kubernetes and I would love to discuss how you handled state management for the stateful services during that transition.” This single sentence signals more preparation and genuine interest than anything else you could say.
  • Note any technical challenges they mention — these might be problems you would be working on.
Step 3: Check Glassdoor and Blind (20 minutes)
  • Read the engineering-specific reviews. Look for patterns, not individual complaints.
  • Pay attention to: management quality, work-life balance, promotion velocity, on-call burden, tech debt attitudes.
  • Prepare questions that probe the patterns you see — but do NOT say “I read on Glassdoor that…” Instead, ask open-ended questions that let interviewers confirm or deny the patterns organically.
Step 4: Understand Their Business (20 minutes)
  • How does the company make money? Who are their customers? What is their competitive position?
  • What has been in the news about them recently? Funding rounds, product launches, layoffs, acquisitions?
  • Engineers who understand the business context of their work get promoted faster — and interviewers notice when you connect technical decisions to business outcomes.
Step 5: Prepare Your “Specific Reference” Moments (10 minutes)
  • From your research, prepare 2-3 specific things you can reference during the interview:
    • “I noticed your team open-sourced [tool] — I’ve been using it and I’m curious about the design decision to [specific detail].”
    • “Your recent blog post about [topic] resonated with me because at my current company, we faced a similar challenge with [specific parallel].”
    • “I saw that your team is working on [initiative from job posting or blog]. I have experience with [relevant skill] and I’d be excited to contribute to that.”
These references accomplish three things: they demonstrate genuine interest (not just “I need a job”), they show intellectual curiosity, and they give you natural conversation anchors that keep the interview flowing.
About Architecture and Engineering Practices:
  • “What does your deployment pipeline look like? How often do you deploy to production?”
  • “How do you handle incidents? Walk me through your last major outage.”
  • “What’s the ratio of feature work to tech debt / infrastructure work?”
  • “How are architectural decisions made? Who has input?”
  • “What’s the oldest, most painful part of your codebase, and what’s the plan for it?”
About Team Culture:
  • “How do code reviews work here? What’s the average turnaround time?”
  • “What does on-call look like? How often are people paged, and how is it compensated?”
  • “How does your team handle disagreements about technical direction?”
  • “What did the last person in this role go on to do?”
  • “Can you tell me about someone who was recently promoted and what they did to earn it?”
About Growth:
  • “What does the career ladder look like for ICs here?”
  • “How are engineers evaluated? What does the performance review process look like?”
  • “What learning budget or professional development support exists?”
  • “How much autonomy would I have in choosing what to work on?”
The power move: “What’s the one thing you’d change about working here if you could?” Genuine answers reveal a lot. Deflection reveals even more.
Core principles:
  1. Never give a number first. “I’m focused on finding the right fit. What’s the range for this role at this level?” If pressed: “I’d want compensation to be competitive with the market for this level in this area.”
  2. Negotiate total compensation, not just base salary.
ComponentWhat to negotiateNotes
Base SalaryThe floor of your compEasiest to negotiate, smallest upside
Equity (RSUs/Options)Often the largest component at tech companiesUnderstand vesting schedule, cliff, refresh grants
Signing BonusOne-time, often used to bridge gapsCan compensate for unvested equity from current job
LevelYour title and scopeA higher level is worth more than a higher salary at a lower level
Scope / TeamWhat you’ll actually work onThe right team can be worth more than extra comp
  1. Get competing offers. This is the single most effective negotiation tool. You don’t have to be aggressive. “I have an offer from [Company] at [Level]. I’d prefer to work here — can you match?”
  2. Negotiate for level, not just comp. Starting at L5 vs L4 has compounding effects on your career trajectory. Fight for the right level.
  3. Everything is negotiable. PTO, remote work, start date, relocation, learning budget, conference attendance. Ask.
Advanced Negotiation Tactics:Using Competing Offers Effectively:
  • You do not need to lie or bluff. Apply to multiple companies simultaneously and time your processes to align. Having two or more offers at the same time is the single most powerful negotiation lever.
  • Be transparent but strategic: “I have an offer from [Company] at [Level] with total comp of [Range]. I prefer your team and mission, but I need the compensation to be competitive.” This is not adversarial — recruiters expect it.
  • If you only have one offer, you can still reference market data: “Based on Levels.fyi data and conversations with peers, the market range for this level in this location is [X-Y]. I’d like to be in the upper half of that range.”
Total Compensation Calculation — What to Actually Compare:
ComponentHow to Value It
Base SalaryFace value. Most straightforward.
RSUsUse the current stock price, NOT the grant price or recruiter’s optimistic projection. Apply a 20-30% discount for risk if it is a pre-IPO company.
OptionsValue = (current fair market value - strike price) x number of shares. For early-stage startups, multiply by the probability the company reaches a liquidity event (be honest — most don’t).
Signing BonusDivide by 4 to get the annualized value (since it is one-time). Factor in clawback clauses.
Annual BonusUse the target percentage, not the max. In practice, most companies pay at or slightly above target in normal years.
Benefits401k match, health insurance quality, and equity refresh grants are often worth $10-30K/year and frequently overlooked.
Level Negotiation vs. Salary Negotiation:This is the most underrated negotiation move in engineering. Getting leveled one step higher (e.g., L5 instead of L4) typically results in:
  • A 20-40% higher total compensation band
  • Larger equity grants and refresh rates
  • More scope and autonomy from day one
  • A head start on your next promotion cycle
If a company offers you L4 and you believe you should be L5, say: “Based on my experience leading [specific cross-team project] and my track record of [staff-level scope], I believe L5 is the right level. I am happy to discuss the specifics of what I have done that maps to your L5 criteria.” Back this up with your brag document and specific examples of impact at that level.The Negotiation Script When They Give You a Number:
  1. Pause. Do not react. Say: “Thank you for sharing this. I appreciate the offer.”
  2. Buy time. “I’d like to take a couple of days to review the full details.”
  3. Counter with reasoning, not just a number. “After reviewing the offer and considering [my competing offer / market data / the level discussion], I’d like to discuss adjusting the [base / equity / level]. Here is why…”
  4. Be willing to walk away. This is the hardest part and the most powerful. If you cannot walk away, you cannot negotiate. Always have a BATNA (Best Alternative To a Negotiated Agreement).
Never accept on the spot. Always say: “Thank you, I’m excited about this. I’d like to take a couple of days to review the full offer.” This is expected and professional. Companies that pressure you into immediate decisions are waving a red flag.
Cross-chapter connection: Negotiation is only one piece of the offer process. For strategies on managing multiple interview timelines, handling exploding offers, and the psychology of decision-making under pressure, see the Interview Meta-Skills chapter. For evaluating whether a company’s values and practices will actually match what they promised during the interview, see the Ethical Engineering chapter’s sections on organizational culture assessment.
Red flags about the company:
  • Interviewers are unprepared or late — shows they don’t value your time
  • They can’t explain what the team is building or why it matters
  • Nobody asks if you have questions — they’re not evaluating mutual fit
  • High turnover they explain away as “culture fit” issues
  • The tech stack is ancient AND there’s no migration plan
  • “We wear many hats” means “you’ll do 3 jobs for 1 salary”
  • They focus entirely on trick questions instead of real problem-solving
Red flags about the candidate (be aware of these in yourself):
  • Cannot explain past projects clearly — suggests limited ownership
  • Badmouths previous employers or teammates
  • No questions about the team, product, or engineering culture
  • Cannot articulate what they want to learn or where they want to grow
  • Over-indexes on compensation without curiosity about the actual work
  • Claims credit for team achievements without acknowledging others
Cross-chapter connections: For a comprehensive framework on spotting and reasoning about ethical red flags during company evaluation — beyond the technical and cultural signals above — see Ethical Engineering. For interview-day strategies on how to probe these red flags effectively without being adversarial, see Interview Meta-Skills.
1

First 30 Days: Learn and Listen

Goal: Understand the codebase, team dynamics, and business context. Ship something small.
  • Set up local development environment on day 1
  • Read every design doc, README, and runbook you can find
  • Do 1:1s with every team member — ask: “What’s the biggest problem we’re not talking about?”
  • Ship a small PR in week 1 (bug fix, doc improvement, small feature). This builds confidence and proves you can operate in the codebase.
  • Identify the team’s on-call pain points
  • Understand the deployment pipeline end-to-end
  • Start your brag document on day 1
2

Days 30-60: Contribute and Connect

Goal: Take ownership of a meaningful feature or project. Build relationships outside your team.
  • Own a medium-sized project end-to-end (design, implement, deploy, monitor)
  • Start doing code reviews with genuine, helpful feedback
  • Join on-call rotation (or shadow on-call if the team has a ramp-up period)
  • Meet engineers on adjacent teams — understand how your team’s work fits into the broader system
  • Identify one process improvement and propose it
  • Start forming opinions about the architecture — write them down
3

Days 60-90: Lead and Influence

Goal: Demonstrate ownership and begin shaping team direction.
  • Lead a design discussion or write a design doc for an upcoming project
  • Mentor a newer team member on something you’ve learned
  • Propose a technical improvement based on your fresh perspective (“beginner’s eyes” are valuable — use them before they fade)
  • Have a career conversation with your manager: “Here’s what I’ve observed, here’s what I want to work on, here’s where I think I can have the most impact”
  • Deliver your first significant project

Career Growth Interview Questions

These questions come up in behavioral rounds, manager screens, and skip-level interviews. They test self-awareness, ownership, and career intentionality — not just technical skill.
What they are really testing: Ownership, initiative, and end-to-end thinking. They want to see if you can identify an important problem, rally people around a solution, navigate obstacles, and deliver measurable results. The word “led” is doing heavy lifting — they are evaluating whether you drove the project or just participated in it.How to answer well:
  1. Start with the problem — why did this project matter? What was the cost of not doing it?
  2. Explain how you identified the opportunity (did someone assign it, or did you see the gap yourself?)
  3. Walk through how you built alignment — who did you need to convince, and how?
  4. Describe the execution — what was your approach, what trade-offs did you make, what went wrong and how did you adapt?
  5. End with measurable impact — revenue, reliability, developer productivity, user metrics, whatever is relevant
  6. Briefly mention what you learned or what you would do differently
Common mistakes:
  • Describing a project you contributed to but did not drive — interviewers will probe and the distinction becomes obvious
  • Focusing entirely on the technical implementation without explaining the business context or stakeholder management
  • Not quantifying the impact — “it was really successful” is not an answer
  • Skipping the obstacles — the interesting part is how you navigated problems, not that everything went smoothly
Example framing: “I noticed our deployment pipeline was causing 3-4 hours of developer downtime per week across 6 teams. I wrote a proposal, got buy-in from my manager and the platform team lead, and led a 3-person team over 6 weeks to rebuild the CI/CD system. We reduced average deploy time from 45 minutes to 8 minutes, which recovered roughly 15 engineer-hours per week. The biggest challenge was migrating 30+ services without disrupting ongoing work — we handled this by running both pipelines in parallel for two weeks.”Notice the structure of that example: it follows the impact story formula — problem (3-4 hours of downtime per week), stakeholder alignment (manager and platform lead buy-in), scope (3-person team, 6 weeks, 30+ services), measurable result (15 engineer-hours per week recovered). Every number makes it more credible. Practice converting your projects into this format until it becomes automatic.
What they are really testing: Self-awareness about career management, ability to advocate for yourself, and understanding of organizational dynamics. This question also tests whether you understand the difference between being busy and being impactful — and whether you can navigate a situation where the system is not working in your favor.How to answer well:
  1. Show that you know what glue work is and why it matters — reference the concept directly
  2. Explain your approach to making invisible work visible: tracking it, quantifying it, framing it as organizational impact
  3. Describe how you would have the conversation with your manager — proactively, not resentfully
  4. Discuss distribution — how you would create systems so the glue work does not depend entirely on you
  5. Acknowledge the balance — you still need visible technical contributions alongside organizational work
  6. Show judgment about when to push back — if an organization consistently refuses to value this work, that is signal about the org, not about you
Common mistakes:
  • Saying “I’d just stop doing it” — this is immature and shows you do not understand organizational needs
  • Being resentful or passive-aggressive about it — the interviewer wants to see mature problem-solving
  • Not knowing what “glue work” is — if this question catches you off guard, you have not thought about career management deeply enough
  • Pretending it has never happened to you — it happens to almost everyone at some point
Example framing: “I’ve experienced this firsthand. I was doing a lot of cross-team coordination, onboarding, and documentation that wasn’t showing up in my review. My approach was three-fold: first, I started tracking this work in a brag document with quantified impact — ‘reduced new hire ramp-up from 6 weeks to 3 weeks’ is a concrete outcome. Second, I had a direct conversation with my manager about how this work was valued in our promotion framework. Third, I worked on distributing the load — I created onboarding templates and self-serve docs so the work scaled beyond me. The key insight from Tanya Reilly’s ‘Being Glue’ talk is that this work is essential but you cannot let it crowd out your visible technical contributions entirely.”
What they are really testing: Career intentionality, self-awareness, and whether your goals align with what the role and company can offer. They are NOT looking for a specific title — they want to see that you have thought about your growth and that this role fits into a coherent plan.How to answer well (as an engineer who understands the IC vs. management fork):
  1. Acknowledge the fork — show you know there are two paths and you have thought about which one fits you
  2. Be specific about the type of impact you want to have, not just the title you want to hold
  3. Connect your answer to this role — explain why this position is a logical step toward your goals
  4. Show depth over breadth — “I want to go deep on distributed systems and be the person my org trusts with our hardest scaling challenges” is better than “I want to be a tech lead”
  5. Be honest about what you are still figuring out — “I’m leaning toward the IC track but I want to try leading a small project team first to test that assumption” shows maturity
For the IC-track answer: “In three years, I want to be operating at a staff or senior staff level on the IC track. Specifically, I want to be the person who can take an ambiguous, cross-team technical problem — like designing our next-generation data pipeline — and drive it from problem definition through architecture through execution. I’ve seen that the jump from senior to staff is less about writing better code and more about influencing technical direction across team boundaries, and that’s the skill I’m deliberately building.”For the open-ended answer: “Honestly, I’m still exploring the fork between deep IC work and engineering management. I know I love the technical side — designing systems and solving hard problems gives me energy. But I’ve also found that mentoring and cross-team coordination come naturally to me. My plan is to lean into technical leadership over the next year or two and see whether my impact scales more through architecture decisions or through building and growing a team. What I’m certain about is that I want to work on problems at the intersection of technical complexity and real user impact.”Common mistakes:
  • “I want to be a manager” without explaining why or what kind of manager — sounds like you just want a title
  • “I just want to keep coding” — sounds like you have not thought about growth
  • Being so vague that it could apply to any role at any company — “I want to grow and learn” says nothing
  • Giving an answer that this company obviously cannot support — if it is a 20-person startup, do not say “I want to lead an organization of 200 engineers”

6. Common Career Mistakes

The trap: Taking the highest-paying offer out of college or bootcamp without considering what you’ll learn, who you’ll learn from, and how fast you’ll grow.Why it hurts: Early career is when your learning curve is steepest. The difference between a team that pushes you and a team where you coast is 3-5 years of career acceleration. A $20K salary difference at 25 is irrelevant compared to the compound effect of developing senior-level skills 2 years earlier.The rule: In your first 5 years, optimize for: rate of learning, quality of mentorship, scope of problems, and quality of your peer group. After that, optimize for whatever matters to you.Exception: If you have significant financial obligations (debt, family), take the money. Financial stress impairs learning too.
The trap: You’ve mastered your current role. Work is easy. You’re the expert on your team. You get great performance reviews. Everything is comfortable. So you stay. For years.Why it hurts: You stop growing. Your skills become specific to one company’s stack. You become a “big fish in a small pond” whose skills don’t transfer. When you finally leave (or get laid off), you discover the market has moved on.The test: Are you learning something new every month? Are you regularly uncomfortable? Are you working on problems you don’t already know how to solve? If not, you’re coasting.The fix: Either find new challenges at your current company (different team, new project, larger scope) or move. Comfort is the enemy of growth. For a structured decision framework on when staying becomes actively harmful to your career, see the “When to Leave a Job” checklist later in this chapter.
The trap: You do excellent work, but you never write it down. Review season comes, and you can’t remember what you did in Q1. Your manager has 8 direct reports and doesn’t remember either. You get a “meets expectations” review and wonder why.Why it hurts: Promotions go to people who can articulate their impact, not just people who had impact. If you can’t demonstrate your value, it doesn’t exist in the organization’s eyes.The fix: Start a brag document today. Update it every Friday. Before review season, send your manager a summary of your impact with quantified results. Do not rely on anyone else to track your contributions.
The trap: “I’m an engineer, I just need to be good at coding.” You avoid writing, presenting, giving feedback, and having difficult conversations. You communicate through code and Slack messages.Why it hurts: After mid-level, soft skills are the primary differentiator. Two engineers with equal technical skill — the one who communicates better gets promoted to senior. The one who can build consensus gets promoted to staff. The one who can influence an organization gets promoted to principal.Soft skills that matter most for engineers:
  1. Written communication — design docs, emails, Slack messages, documentation
  2. Verbal communication — presenting ideas, running meetings, explaining trade-offs
  3. Giving and receiving feedback — code reviews, performance conversations
  4. Stakeholder management — working with PMs, designers, leadership
  5. Conflict resolution — technical disagreements, priority conflicts
  6. Teaching and mentoring — making others better
The fix: Treat soft skills like technical skills. Practice deliberately. Read one book on communication or leadership per quarter. Ask for feedback on your communication specifically. For a comprehensive breakdown of exactly which communication skills matter most and how to develop them, see the Communication & Soft Skills chapter.
The trap: You see a 26-year-old staff engineer at a FAANG company on social media and feel like a failure. You read about someone who built a startup in 6 months and exited for millions. You compare your chapter 3 to someone else’s chapter 10.Why it hurts: Comparison leads to either discouragement or reckless career decisions (chasing titles, job-hopping for prestige, pursuing trendy technologies instead of building depth).The reality:
  • You don’t see the 70-hour weeks, the failed projects, the lucky timing, or the privilege that enabled their path
  • Career trajectories are not linear — many successful engineers had “slow” periods that were actually foundational
  • “Staff at 28” at a startup that folds means nothing. “Senior at 35” at a company building meaningful technology means a lot.
The fix: Define YOUR success criteria. Write them down. Review them quarterly. Some people optimize for compensation, some for impact, some for work-life balance, some for learning. All are valid. None should be dictated by social media.
The trap: You learn the basics of many technologies but master none. You can set up a React project, spin up a Django API, deploy to AWS, write a bit of Go, and dabble in machine learning. But you can’t design a complex React application, optimize a Django API for high throughput, architect a multi-region AWS deployment, write production Go services, or train and deploy an ML model.Why it hurts: The market rewards depth. A “full-stack developer who knows a bit of everything” is commoditized. A “backend engineer who can design and operate high-throughput distributed systems” is rare and valuable.The progression:
StageDescriptionValue
BeginnerLearning the basicsLow (but everyone starts here)
Advanced BeginnerCan build basic thingsMedium-low
Expert BeginnerKnows basics of many things, mastery of nonePlateau — dangerous
CompetentDeep knowledge in 1-2 areasMedium-high
ProficientIntuitive understanding, can handle novel situationsHigh
ExpertDefines best practices, others learn from youVery high
The fix: Pick 1-2 areas. Go deep. Spend a year going from “I can use this” to “I understand how this works internally.” Read the source code. Write about it. Teach it. Build something non-trivial with it. Breadth comes naturally over a career. Depth requires intentional investment.
The biggest career mistake of all: not being intentional. Drifting from job to job, task to task, without a plan is how you end up 10 years into a career wondering why you feel stuck. You don’t need a rigid plan. You need a direction, regular reflection, and the willingness to adjust.

7. Remote Engineering Career Tips

Remote work is not a perk anymore — it is a structural reality for a large portion of the engineering industry. But the career playbook that works in an office does not transfer directly to a distributed environment. The engineers who thrive remotely are not the ones who simply do the same things from home. They are the ones who deliberately redesign how they build visibility, relationships, and influence without the ambient context of a shared physical space.
The core challenge of remote work is not productivity — it is visibility. Most remote engineers are plenty productive. The problem is that nobody sees it. In an office, your manager watches you whiteboard a solution, overhears you helping a junior engineer, and bumps into you in the hallway after you’ve just debugged a production issue. Remotely, all of that is invisible unless you make it visible. The engineers who get promoted remotely are the ones who have mastered the art of intentional visibility without becoming annoying self-promoters.
The problem: In an office, your presence IS your visibility. People see you working, overhear your conversations, and form impressions passively. Remote work eliminates all of this. Your work only exists to others when it is explicitly communicated.Tactical visibility practices:
  1. Write detailed PR descriptions — Not just “fixes bug.” Instead: “This fixes the race condition in the payment queue that caused 3 duplicate charges last week. Root cause was X, fix approach was Y, I considered Z but chose Y because [trade-off reasoning].” Every PR description is a micro-advertisement for your engineering judgment.
  2. Send weekly impact summaries — Every Friday, send your manager a 3-5 bullet update. Not what you did (activity), but what impact it had (outcome). “Reduced p95 query latency by 40ms by adding a covering index to the orders table” beats “worked on database optimization.” This takes 10 minutes and is the single highest-ROI visibility habit for remote engineers.
  3. Be the person who writes things down — After every meeting, post a summary in the team channel: decisions made, action items, owners, deadlines. The person who documents becomes the person who shapes the narrative. This is especially powerful because most people hate doing it.
  4. Demo your work — Volunteer to demo completed features in team meetings. A 3-minute live demo creates more visibility than a week of Slack messages. Record demos for async consumption when timezone differences make synchronous attendance impractical.
  5. Share learnings publicly — When you debug a tricky issue, write a short post in your team’s knowledge base or Slack channel. “TIL: Our Kafka consumer was silently dropping messages when the payload exceeded 1MB because of [config]. Here’s how I found it and how I fixed it.” This builds your reputation as someone who makes the team smarter.
What a senior engineer would say: “I treat every written artifact — PR descriptions, Slack messages, meeting summaries, design docs — as a chance to demonstrate engineering judgment. In a remote environment, your writing IS your presence.”
Why async matters: In a remote team spread across timezones, synchronous communication (meetings, real-time Slack) becomes a bottleneck. The engineers who master async communication can collaborate effectively with people they never overlap with in real-time.Async communication principles:
  1. Write for someone reading it 8 hours later — Provide full context. Instead of “Can we talk about the auth issue?”, write “The auth service is returning 401s for 2% of requests since the deploy at 3pm UTC. I’ve traced it to the token refresh logic — the new code expects a refresh_token field that legacy clients don’t send. I see three options: (a) add a fallback path, (b) force-upgrade legacy clients, (c) roll back. I recommend (a) because [reasoning]. Thoughts?”
  2. Batch your communication — Do not send 12 Slack messages in a row. Collect your thoughts, write a single coherent message. Threaded, structured, with headers if it is long. Respect other people’s focus time by making your messages complete and self-contained.
  3. Use the right medium for the right message:
    • Slack/Chat: Quick questions, status updates, social interaction
    • Document/RFC: Anything requiring input from multiple people or lasting more than a day
    • Video (recorded): Complex explanations, demos, design walkthroughs
    • Meeting (synchronous): Decisions requiring real-time debate, sensitive conversations, relationship building
  4. Set expectations about response times — “I’ll review this by EOD Tuesday” is better than silence. Acknowledging a message even when you cannot respond fully builds trust across timezones.
  5. Over-communicate context in code reviews — Remote code reviews lack the ability to walk over and discuss. Write detailed review comments. When suggesting a change, explain the reasoning. When approving, say what you checked and what you liked. “LGTM” is lazy in any context; remotely, it is almost hostile.
Cross-chapter connection: The async communication principles here are the remote-specific application of the broader communication frameworks in Communication & Soft Skills. The design doc structure, feedback frameworks, and stakeholder communication techniques in that chapter all apply — they just need to be adapted for async delivery.
The problem: Relationships in an office form through a thousand small interactions — coffee runs, lunch conversations, watercooler chats, walking to meetings together. None of this exists remotely. If you do not deliberately build relationships, you will be effective but isolated — and isolated engineers do not get promoted to staff.Relationship-building tactics for remote engineers:
  1. Schedule intentional 1:1s with people outside your team — 30 minutes, biweekly, with engineers on adjacent teams, your skip-level manager, engineers whose work you admire. The agenda is simple: “What are you working on? What’s challenging? How does your team think about [shared concern]?” These conversations build the cross-organizational network that is essential for staff+ visibility.
  2. Be generous in public channels — Answer questions in shared Slack channels, even when they are not about your area. Help debug issues. Share relevant articles. The goal is to become known as someone who is helpful and knowledgeable beyond their immediate team.
  3. Create virtual “watercooler” moments — Join optional social channels. Participate in team rituals (show-and-tell, Friday demos, book clubs). Share something personal occasionally. Remote relationships require more deliberate effort, but they are not less real.
  4. Pair program across timezone boundaries — Even 30-60 minutes of pairing per week builds stronger relationships than months of async-only interaction. You learn someone’s thinking style, build trust, and often solve problems faster.
  5. Travel for high-leverage moments — If your company has offsites or optional in-person gatherings, attend them. The ROI of spending 3 days in person with your team is enormous. Prioritize relationship-building over agenda items during in-person time — the documents can be written async, the trust cannot.
  6. Proactively share your working style — Create a personal “working with me” document: your timezone, preferred communication channels, when you are most responsive, how you prefer to receive feedback, what energizes or drains you. Share it with your team. This transparency reduces friction in a way that is uniquely valuable in remote settings.
The staff+ remote challenge: The senior-to-staff transition is already difficult because it requires cross-team influence. Remotely, it is harder because you cannot build casual relationships through proximity. Remote staff engineers must be twice as intentional about their network. Block time on your calendar specifically for relationship-building — it is not optional, it is part of the job.
RiskWhy It Happens RemotelyMitigation
Proximity biasManagers unconsciously favor people they see in personSend regular impact summaries, request explicit evaluation criteria, ask for data-driven reviews
Timezone isolationYou are consistently left out of decisions made during other timezones’ business hoursEstablish async decision-making norms, request that decisions are documented before they are final, rotate meeting times
Social disconnectionWithout casual interactions, you become a “screen name” not a personInvest in 1:1s, attend offsites, use video for important conversations (not just audio)
Skill stagnationWithout hallway learning, you miss ambient knowledge transferCreate explicit learning channels, pair program, attend internal tech talks, subscribe to team knowledge bases
Burnout from always-on cultureWithout a commute or physical boundary, work bleeds into everythingSet hard stop times, communicate them, close the laptop physically, create a dedicated workspace with a “shutdown ritual”

8. Building Your Personal Brand as an Engineer

The phrase “personal brand” makes most engineers cringe — it sounds like marketing fluff. But here is the reality: every engineer has a reputation, whether they manage it or not. The question is whether that reputation accurately represents your capabilities and attracts the opportunities you want. Building your personal brand is not about self-promotion. It is about making it easy for the right opportunities to find you.
The compound interest of visibility: A single blog post might get 100 reads. But if one of those readers is a hiring manager at your dream company, or a conference organizer, or a senior engineer who remembers your name when their team has an opening — that one post has an outsized return. Personal brand compounds. Every artifact you create — a GitHub contribution, a blog post, a conference talk, a well-written answer on a forum — is a node in a network that grows over time. Five years of consistent, small contributions add up to a reputation that no resume can match.
Your GitHub profile is often the first thing a technical interviewer or hiring manager looks at — before your resume, before your LinkedIn. Most engineers’ profiles are wastelands of half-finished tutorial projects and forked repos they never touched. Here is how to make yours work for you.The essentials:
  1. A profile README — Create a repository with the same name as your GitHub username and add a README.md. This becomes your profile page. Include: a one-sentence bio, your areas of expertise, 2-3 pinned projects with brief descriptions, and links to your blog or talks. Keep it concise and professional — this is not a MySpace page.
  2. Pin your best 6 repositories — Choose projects that demonstrate the skills you want to be known for. A hiring manager for a backend role does not care about your React todo app. Pin: projects with real users, open-source contributions, tools that solve real problems, and well-documented codebases.
  3. Write real READMEs — Every pinned project should have a README that includes: what the project does and why it exists, how to set it up locally, architecture overview (even a brief one), screenshots or demo links if applicable, and what technical decisions you made and why. A well-written README is more impressive than the code itself because it demonstrates communication skills.
  4. Maintain a contribution streak (but don’t game it) — Consistent green squares signal active engagement with code. But do not game this by making trivial commits — experienced engineers can tell. Genuine contributions — code, documentation, reviews — count.
  5. Contribute to projects you actually use — A PR to a well-known open-source project is worth more than 10 personal repos. Even documentation fixes or bug reports demonstrate that you operate in the broader engineering ecosystem.
What NOT to do:
  • Do not pin tutorial follow-alongs (every bootcamp grad has the same ones)
  • Do not have dozens of repos with a single commit
  • Do not neglect issues and PRs in your own repos (if someone opens an issue, respond)
  • Do not use auto-generated profile readmes with flashy badges and no substance
Most engineers treat LinkedIn as an afterthought — a place they update once a year when job searching. That is a missed opportunity. Recruiters, hiring managers, and conference organizers all use LinkedIn. A strong profile generates inbound opportunities so you are never scrambling when you decide to look.The high-impact changes:
  1. Headline — Not just your title. Include what you do and what you care about. “Senior Backend Engineer | Distributed Systems | Building reliable data pipelines at scale” tells a recruiter exactly what roles to send you. “Software Engineer at Company” tells them nothing.
  2. About section — Write 3-4 sentences in first person about what you do, what you are good at, and what kind of work excites you. Include specific technologies and domains. This is SEO for recruiters — they search by keywords.
  3. Experience descriptions — Do not list responsibilities. List impact. Use the impact story formula: “Led the migration of 200+ microservices from ECS to Kubernetes, reducing deployment time by 60% and infrastructure costs by $400K/year.” Numbers. Impact. Outcomes.
  4. Featured section — Pin your best blog posts, talks, or open-source projects. This is prime real estate that most engineers leave empty.
  5. Skills and endorsements — Add skills that reflect your target role, not just your current one. If you want to move into ML engineering, add those skills even if your current title is “Backend Engineer.”
  6. Engage minimally but consistently — Share one post per month: a learning, a project update, a thoughtful comment on an industry trend. You do not need to become a LinkedIn influencer. You need to be visible enough that your profile appears active.
The recruiters’ perspective: A senior recruiter at a major tech company typically reviews a LinkedIn profile for 10-15 seconds during an initial screen. They look at: current company and title, headline keywords, years of experience, and whether the profile has any substance beyond bare-bones entries. Make those 15 seconds count.
Giving a conference talk is one of the highest-leverage career activities available to any engineer. A single 30-minute talk can reach hundreds of people live and thousands more on YouTube. It establishes you as a domain expert, builds your network with other speakers and attendees, and creates a permanent artifact of your expertise.The objection: “I’m not expert enough to give a talk.” This is almost always wrong. You do not need to be the world’s leading authority. You need to explain one thing clearly that others find valuable. If you solved an interesting problem at work, debugged a tricky issue, evaluated a technology and learned something surprising, or built something useful — you have a talk.How to start:
  1. Lightning talks (5 minutes) — Find a local meetup and volunteer for a lightning talk slot. Five minutes is low-risk and forces you to be concise. Topics that work well: “A bug that taught me something,” “One thing I wish I knew about [technology],” “How we solved [specific problem].”
  2. Internal tech talks (15-30 minutes) — Most companies have internal presentation slots (brown bags, tech talks, show-and-tell). Present to your colleagues first. The stakes are low, the feedback is immediate, and the practice is invaluable.
  3. Regional conferences — Once you have 2-3 internal talks under your belt, submit to a regional conference CFP (Call for Papers). Target conferences with 200-500 attendees. Acceptance rates are higher than you think, especially for talks with concrete, specific topics (“How We Reduced Our P99 Latency by 80%”) rather than abstract ones (“Thoughts on Microservices”).
  4. Major conferences — After regional experience, aim for larger venues. Your talk proposal should include: a specific, outcome-oriented title, a clear description of what attendees will learn, your credibility to speak on this topic, and why this talk is timely.
Talk structure that works:
  • The hook (2 minutes) — A specific problem or story that grabs attention
  • Context (3 minutes) — What the audience needs to understand before the solution
  • The journey (15-20 minutes) — What you tried, what failed, what worked, and why
  • Key takeaways (3 minutes) — The 3-5 things the audience should remember
  • Q&A (5 minutes) — Where the real learning often happens
Cross-chapter connection: The communication frameworks in Communication & Soft Skills — especially the sections on presenting to different audiences and structuring technical arguments — apply directly to conference talks. A conference talk is essentially a design review presented to a larger audience.
Open source is covered in Section 4 from the contribution perspective. Here, we focus on the career strategy angle — how to use open source deliberately as a professional growth tool.Three levels of open-source career impact:
LevelActivityCareer Impact
ConsumerUsing open-source tools, filing issues, reading source codeDemonstrates awareness and curiosity
ContributorSubmitting PRs (docs, bugs, features) to existing projectsDemonstrates ability to work in unfamiliar codebases, follow processes, handle feedback
Creator/MaintainerCreating or maintaining a project with usersDemonstrates technical leadership, communication, product thinking, and community management
The career calculus: Contributing to a well-known project (React, Kubernetes, PostgreSQL, etc.) carries brand value that personal projects do not. A merged PR to a CNCF project tells a hiring manager: “This person can navigate a complex codebase, follow contribution standards, communicate with strangers, and ship code that meets production-quality requirements.” That is a strong signal.The maintainer path: If you create a project that gets traction (even 100 GitHub stars), you develop skills that are rare and valuable: prioritization (what issues matter?), community management (how do you handle demanding users?), roadmap planning (where should this project go?), and documentation (how do you make this usable by strangers?). These are exactly the skills that distinguish staff engineers from seniors.Caution: Open source can become an unpaid second job. Set boundaries. Contribute in bursts aligned with your learning goals. Do not let guilt about open issues drain your energy. Sustainable contribution beats heroic burnout.
You do not need an audience to start. You do not need original research. You need to explain something you learned in a way that helps someone else. That is the entire bar.What to write about:
  • Something you just learned that was harder to find than it should have been
  • A debugging journey — the problem, what you tried, what failed, what worked
  • A comparison of two approaches with concrete trade-offs (not “X vs Y” hot takes, but “Here’s when X wins and here’s when Y wins and here’s how to decide”)
  • An internal tool or process your team built and why
  • A conference talk summary with your own commentary
Where to publish:
  • Personal blog (using a static site generator like Hugo, Gatsby, or Astro) — Maximum control, builds your personal domain authority
  • Dev.to or Hashnode — Built-in audience, lower setup friction, good for early posts
  • Company engineering blog — If your company has one, contribute. It carries the company’s brand weight and demonstrates internal leadership
The secret of technical blogging: Consistency beats quality. One post per month for a year builds a body of work. One “perfect” post that takes 6 months does not. Aim for useful, not perfect. Publish when it is 80% done. You can always update later.

9. When to Leave a Job — The Decision Checklist

One of the hardest career decisions is knowing when it is time to move on. Engineers tend to stay too long — out of loyalty, comfort, fear of the unknown, or a vague hope that things will improve. This section gives you a structured framework for evaluating whether your current role is still serving your growth, or whether the signals point to leaving.
No job is perfect. Every role has frustrations, boring stretches, and organizational dysfunction. The question is not “Is this job perfect?” — it is “Is this job still net-positive for my career, my skills, my values, and my well-being?” The checklist below is designed to distinguish between normal workplace friction and genuine signals that it is time to go.
The test: What have you learned in the last 6 months that you did not know before? Can you name specific new skills, technologies, or capabilities? If your answer is vague or empty, you are coasting.Normal: Every role has periods of execution where you are applying existing skills more than learning new ones. A 2-3 month execution phase is fine.Warning signal: You have been doing essentially the same work for 12+ months. You could do your job on autopilot. You are the expert on your team, but there is nobody who pushes you to grow. New challenges are not available because the org does not have them or will not give them to you.Before leaving, try:
  • Request a transfer to a different team with harder problems
  • Propose a stretch project that requires skills you want to develop
  • Ask for sponsorship to lead a cross-team initiative
  • Set a personal learning goal with a 3-month deadline: if the org cannot support it, that is your answer
The honest question: “Am I not learning because of the environment, or because of me?” If you would coast at any job, changing companies will not fix it.
The test: Does the company’s actual behavior — not their stated values, their actual behavior — align with your ethical and professional standards?Warning signals:
  • The company ships features you believe are harmful to users and dismisses concerns when raised
  • Leadership says one thing publicly and does another internally
  • Ethical shortcuts are normalized (“just ship it, we’ll fix the privacy issues later”)
  • You find yourself defending the company’s decisions to friends in ways that feel dishonest
  • The company’s business model fundamentally conflicts with your values (and this was not clear when you joined)
The nuance: Every company makes imperfect decisions. The question is whether the pattern is occasional misjudgment (which you can influence) or systemic misalignment (which you cannot). If you have raised concerns through appropriate channels and nothing changes, that is your signal.
Cross-chapter connection: The Ethical Engineering chapter covers the frameworks for reasoning about ethical decisions in engineering — when to push back, how to escalate, and when walking away is the right call. If you are feeling values misalignment, read that chapter’s section on ethical career decisions before making a move.
Before leaving, try:
  • Raise your concerns explicitly with your manager or skip-level
  • Find allies who share your concerns — collective voice is more powerful
  • Document specific instances where values were violated (for your own records, not as a threat)
  • Give the organization a reasonable window to change (3-6 months)
The test: Is your total compensation within 15-20% of what you could earn elsewhere for the same role and level? If you are significantly below market, your company is either unaware or uninterested in paying you fairly.How to assess:
  • Check Levels.fyi for your level, location, and company type
  • Talk to peers at other companies (engineers are surprisingly open about comp in private)
  • If you have been at the same company for 3+ years without a significant adjustment, you are almost certainly below market — internal raises rarely keep pace with market movement
Warning signals:
  • You have been told “we don’t have budget” for two consecutive cycles
  • New hires at your level are being offered significantly more than you earn (inversion)
  • The company’s equity has declined significantly and there is no adjustment to base compensation
  • Cost-of-living adjustments are below actual inflation
Before leaving, try:
  • Have a direct conversation with your manager using market data: “Based on Levels.fyi data and conversations with peers, I believe my current comp of [X] is below market for my level. I’d like to discuss an adjustment to [Y range].”
  • If your manager is supportive but says they need approval, set a deadline: “I understand this needs to go through channels. Can we have an answer by [date]?”
  • Get a competing offer. This is the nuclear option but it works because it provides concrete proof of your market value. Only do this if you are genuinely willing to leave.
The math: Staying at a below-market salary for 3 years can cost $100K-300K+ in cumulative lost compensation, not counting the compounding effect on future offers (which are often anchored to current comp). Loyalty is admirable. Leaving six figures on the table is not loyalty — it is a financial mistake.
The test: Does your work environment consistently drain your energy, undermine your confidence, or make you dread Monday mornings — not because the work is hard, but because the people or processes are dysfunctional?Specific toxic patterns to watch for:
SignalWhat It Looks LikeWhy It Matters
Blame culturePostmortems assign fault to individuals; people hide mistakesYou cannot learn or take risks in a blame culture
Chronic crunchSustained 50+ hour weeks normalized as “commitment”Burnout is not a badge of honor; it is a management failure
Information hoardingKey decisions made behind closed doors; engineers left out of contextYou cannot grow if you do not have access to the information needed to make decisions
Gaslighting about problemsYou raise concerns and are told “that’s not happening” or “you’re the only one who feels that way”Trust your observations; document them
Retaliation for feedbackPeople who give honest feedback are punished, sidelined, or managed outA culture that punishes honesty will never improve
FavoritismPromotions and opportunities go to friends of leadership regardless of meritYour growth ceiling is determined by politics, not performance
Constant reorgsTeams are restructured every 6-12 months with no clear rationaleInstability prevents you from building deep expertise or meaningful relationships
The crucial distinction: Difficult is not the same as toxic. A demanding job with high standards, hard problems, and direct feedback is challenging but healthy. A job where you are undermined, gaslit, or systematically deprived of growth opportunities is toxic. Know the difference.Before leaving, try:
  • Talk to your skip-level manager (sometimes the toxicity is one manager, not the org)
  • Transfer to a different team (culture varies significantly between teams, even at the same company)
  • Use anonymous feedback channels if they exist
  • Set a personal deadline: “If X does not change by [date], I will start interviewing”
When to leave immediately (do not try to fix it):
  • You are being asked to do something illegal or deeply unethical
  • Your mental health is seriously affected (sustained anxiety, depression, sleep disruption)
  • You have experienced or witnessed harassment that HR has failed to address
  • You have retaliation concerns for having raised legitimate issues
The test: Is there a realistic path to your next career milestone at this company? Not a theoretical path on a career ladder document, but an actual path with precedents?Warning signals:
  • Nobody at your target level exists on the IC track at this company (if you want to be staff, but the highest IC is senior, the path does not exist)
  • Promotions require “hero moments” rather than sustained impact
  • Your manager does not understand your target level or cannot articulate what you need to do to get there
  • The company is too small to offer the scope you need (you cannot do staff-level cross-team work at a 15-person startup with 3 engineers)
  • You have been told to wait for a specific project or reorg that keeps getting delayed
The question to ask yourself: “If I do everything right for the next 18 months, is there a realistic, supported path to [my goal] at this company?” If the answer is no, or if you are relying on hope rather than evidence, it is time to start looking.
When you notice multiple warning signals, run through this framework:
1

Diagnose — Is the problem fixable from your position?

Some problems are within your influence (team dynamics, your manager’s awareness of your work, specific process issues). Some are not (company strategy, organizational values, market compensation trends). Be honest about which category your concerns fall into.
2

Communicate — Have you explicitly raised the issue?

Many engineers leave without ever telling anyone what was wrong. Before leaving, have the direct conversation with your manager: “Here is what I’m experiencing. Here is what I need to change for me to see a future here.” This is not an ultimatum — it is honest communication. Sometimes the fix is straightforward and your manager simply did not know.
3

Set a timeline — Give the fix a deadline

“I’ve raised my concerns about [issue]. I’m going to give it 3 months to see meaningful progress. If I don’t see [specific change], I’ll start exploring other options.” Write this down for yourself. Hold yourself to it.
4

Prepare in parallel — Do not wait for the deadline to start preparing

Update your resume, your brag document, and your LinkedIn profile now. Start casual conversations with your network. Preparation is not disloyalty — it is prudence.
5

Decide — Make the call and commit

If the deadline passes without meaningful progress, start interviewing. Do not fall into the trap of extending your deadline repeatedly. If you have given an honest effort to fix the situation and it has not changed, it will not change.
The two-body problem of job decisions: Leaving a job involves both push factors (what is wrong with the current situation) and pull factors (what is attractive about the new opportunity). A strong decision has both. Leaving purely because of push factors often leads to lateral moves that do not solve the underlying issue. Leaving with strong pull factors — toward a specific opportunity, technology, team, or growth path — leads to better outcomes.
A senior engineer’s perspective on leaving: The best time to look for a new job is when you do not need one. When you are employed, happy, and in demand, you negotiate from strength. When you are desperate, burned out, or just laid off, you negotiate from weakness. Build the habit of one exploratory conversation per quarter — a coffee chat with a recruiter, a peer at another company, or an old colleague. Keep your network warm so that when the time comes to move, you have options already in motion.
How This Chapter Connects to OthersCareer growth does not happen in isolation. The skills covered in this chapter intersect deeply with several other chapters in this series:
  • The Engineering Mindset — The growth mindset, first-principles thinking, and deliberate practice frameworks are the engine behind career progression. Without the right mental models, all the career tactics in the world will not help.
  • Communication & Soft Skills — Presenting your impact, writing design docs, giving and receiving feedback, and influencing stakeholders are the skills that differentiate senior from staff, and staff from principal. Communication is how your impact becomes visible. The section on career conversations there pairs directly with the career conversation template in this chapter.
  • Leadership, Execution & Infrastructure — The staff+ path is fundamentally a leadership path, even on the IC track. Influence without authority, driving cross-team alignment, and organizational thinking are covered there.
  • Interview Meta-Skills — The offer negotiation tactics in this chapter are the starting point; the meta-skills chapter covers how to perform well in the interviews that generate those offers, including time management, whiteboard strategies, and recovery techniques. The company evaluation framework here feeds directly into the “questions to ask” strategies there.
  • Ethical Engineering — Career decisions are not purely transactional. The ethical engineering chapter covers when to push back on decisions that conflict with your values, how to evaluate whether a company’s practices align with your ethical standards, and what to do when you discover your employer is causing harm. The “When to Leave a Job” checklist in this chapter includes values misalignment as a signal — that chapter explains how to reason about it.

Practice Exercises

Take 30 minutes and write answers to these questions:
  1. What are the 2-3 things I’m best at technically?
  2. What kind of problems do I enjoy solving most?
  3. Where do I want to be in 3 years — and what specific skills gap stands between here and there?
  4. Who do I admire in my field, and what specifically about their career path appeals to me?
  5. What am I avoiding that I know would accelerate my growth?
Review this quarterly. Your answers will change — that’s the point.
Look at the last 6 months of your work. For each significant project or contribution:
  1. What was the measurable impact? (If you can’t measure it, figure out how)
  2. Who benefited besides you? (Your team? Other teams? Users? The company?)
  3. What did you learn that you didn’t know before?
  4. What would you do differently if you did it again?
If you struggle to answer these, you have a documentation problem, an impact problem, or both.
Write a promotion packet for yourself — even if you’re not up for promotion. Include:
  1. Current level and target level
  2. 3-5 projects demonstrating target-level scope and impact
  3. Evidence of influence beyond your immediate team
  4. Evidence of mentoring or multiplier effects
  5. Areas of growth and how you’ve addressed them
If you can’t fill this out convincingly, you now know exactly what gaps to close.
Prepare 10 questions you would ask a company you’re interviewing with. For each:
  1. What are you actually trying to learn?
  2. What would a great answer sound like?
  3. What would a red-flag answer sound like?
This exercise sharpens your ability to evaluate opportunities even when you’re not actively job searching.
Pick a technical concept you understand well. Write a blog post, give a team presentation, or record a video explaining it. Target audience: someone one level below you.Constraints:
  • Must include a concrete example or code sample
  • Must explain WHY, not just HOW
  • Must be understandable without specialized knowledge of your specific codebase
If you can teach it clearly, you truly understand it. If you struggle, you’ve found a gap in your own knowledge.

Curated Resources for Career Growth

These are not random links. Each one is specifically chosen because it offers a perspective or framework that is genuinely hard to find elsewhere. Prioritized for quality over quantity.
  • “An Elegant Puzzle” by Will Larson — The best book on engineering management and organizational design. Even if you are on the IC track, understanding how engineering orgs work gives you an unfair advantage in navigating your career. Larson writes from deep experience at Digg, Uber, and Stripe.
  • “Staff Engineer” by Will Larson — The companion to An Elegant Puzzle, focused specifically on the staff+ IC track. Includes detailed interviews with staff engineers about what they actually do day-to-day. Essential reading if you are targeting senior or staff level.
  • “The Staff Engineer’s Path” by Tanya Reilly (O’Reilly, 2022) — The definitive guide to operating as a staff+ IC. Covers the three pillars (big-picture thinking, execution, and leveling up) with practical frameworks drawn from Reilly’s experience at Google and Squarespace. More tactical and actionable than Larson’s book — the two complement each other well.
  • StaffEng.com — A collection of stories from staff+ engineers across the industry, organized by the four archetypes (Tech Lead, Architect, Solver, Right Hand). Read 5-10 of these stories and you will develop a much clearer picture of what staff-level work looks like in practice. Also includes guides on getting the title, operating at the level, and building a promotion packet.
  • Julia Evans’ blog (jvns.ca) — Julia Evans writes about learning, debugging, and demystifying technical topics with infectious enthusiasm and clarity. Her posts on “how to ask good questions,” “things that surprised me about management,” and her zine series on systems concepts are career-growth gold. Her approach to learning in public is a model worth emulating.
  • Levels.fyi — Crowdsourced compensation data across tech companies, broken down by level, location, and company. Essential for understanding where you stand in the market and for negotiation preparation. Also useful for understanding how different companies map their leveling systems.
  • The Pragmatic Engineer by Gergely Orosz — The best newsletter for understanding how the tech industry actually works. Orosz covers compensation, engineering culture, industry trends, and career strategy with insider knowledge from his years at Uber and Microsoft. The free tier is valuable; the paid tier is one of the few subscriptions worth paying for as a career investment.
  • “Don’t Call Yourself a Programmer” by Patrick McKenzie (patio11) — A foundational essay on how the software industry actually works, written over a decade ago but still deeply relevant. McKenzie’s core argument: companies do not hire engineers to write code, they hire engineers to increase revenue or reduce costs. Understanding this reframes every career decision you make. Read this early in your career and revisit it annually.
  • Charity Majors on the IC-to-management journey — Majors has written extensively on her blog and on social media about the transition between IC and management roles. Her core insight — that management is a career change, not a promotion — is one of the most important reframes in engineering career thinking. Search for her posts on “the engineer/manager pendulum” for the key pieces.
  • Tanya Reilly’s “Being Glue” talk — Available on YouTube and as a written blog post. If you have ever felt like your important contributions are invisible, this talk will validate your experience and give you a framework for addressing it. Required viewing for anyone on the IC track and for every engineering manager.
  • The career advice compendium from senior engineers — There is no single URL, but searching for career advice threads from staff+ engineers on platforms like HackerNews, the StaffEng blog, and engineering blogs from companies like Dropbox, Stripe, and Netflix will surface patterns. The consistent themes: write more, build relationships deliberately, own outcomes not just tasks, and make your impact measurable.
Final thought: The engineers who have the best careers are not the ones who write the most code or know the most technologies. They are the ones who consistently solve important problems, make the people around them better, and communicate their impact clearly. Technical skills get you in the door. Everything else determines how far you go.

Interview Deep-Dive Questions

These questions go beyond surface-level behavioral prompts. They are designed the way a seasoned engineering leader would actually probe during a bar-raiser or senior-panel interview round — starting broad, then drilling into the nuances that separate candidates who have truly operated at level from those who can only describe it in theory.

How do you decide when a problem is worth solving versus when to accept the status quo?

The core of this question is judgment, not ambition. The interviewer wants to see that you do not reflexively chase every problem you notice and that you can reason about opportunity cost.The way I think about this is through a three-axis framework:
  1. Severity times frequency. A problem that causes a minor annoyance once a month is different from one that blocks a deployment pipeline three times a week. I start by quantifying the actual cost — in engineer-hours, customer impact, or revenue. At my last company, I noticed our test suite was flaky, failing about 8% of runs. Gut instinct said “fix it.” But when I measured the actual cost — roughly 2 hours of wasted CI time per week across the team — it was a medium-priority issue, not the urgent crisis it felt like during a frustrating retry.
  2. Trajectory. Is this problem getting worse, staying stable, or naturally resolving? A database that is 60% full and growing 5% per month is a very different problem from one that is 60% full and stable. I always ask: “If we do nothing for 6 months, what does this look like?” The problems worth solving are the ones that compound.
  3. Opportunity cost. What am I NOT doing if I work on this? The hardest part of senior engineering is not solving problems — it is choosing which ones to solve. I maintain a rough mental ranking of the 3-5 highest-leverage things I could be doing, and I evaluate new problems against that list. If fixing flaky tests means delaying a data pipeline migration that three teams are blocked on, the tests wait.
The status quo is underrated. Accepting a known imperfection that is stable and manageable is often the right call. The anti-pattern is the engineer who wants to rewrite everything they see — that is a sign of poor prioritization, not high standards.Real example: At a previous role, we had a legacy notification service that was ugly, poorly documented, and used a framework nobody liked. Multiple engineers wanted to rewrite it. I argued against it because it worked, it was stable, it had zero incidents in 18 months, and rewriting it would consume a quarter of eng effort with zero user-facing improvement. We documented it better instead and spent the time on the payment reliability project that actually moved revenue metrics. The notification service is still running and still ugly — and that is fine.
This is where organizational skill matters as much as technical judgment. The key is to not just say “no” but to show your work.I make the case with data, not opinion. I present the cost of fixing it (engineer-hours, risk, opportunity cost) alongside the cost of living with it (current incident rate, developer friction, customer impact). Then I let the numbers make the argument.When I was pushing back on the notification rewrite, I wrote a one-page document that compared the two paths: “Rewrite: 2 engineers, 8 weeks, zero customer-facing improvement, moderate risk of regression” versus “Document and monitor: 1 engineer, 1 week, same reliability.” I shared this with the team and let them decide. When you lay it out transparently, most reasonable engineers reach the same conclusion.The harder case is when the person pushing for the fix has legitimate technical concerns you are underweighting. I have been wrong about “leave it alone” before — once I argued against migrating off a deprecated authentication library and six months later it had a critical CVE. I learned to give extra weight to security and compliance arguments even when the status quo appears stable.The meta-skill is separating “I don’t want to do this boring work” from “this work is genuinely not worth doing.” Honest self-assessment matters here.
The mechanics of the framework stay the same, but the scope of what you are evaluating changes dramatically.As a senior engineer, I am evaluating problems within my system or my team. “Should we add caching to this service?” or “Should we refactor this module?” The blast radius of both the problem and the fix is contained.At staff level, the problems you evaluate span teams, and the cost of getting it wrong is organizational, not just technical. You are asking questions like: “Should we standardize on a single API gateway, or let each team choose their own?” Getting this wrong does not just waste your time — it wastes 40 engineers’ time for the next two years.The other shift is that at staff level, you spend more time convincing others which problems are worth solving than actually solving them. Senior engineers mostly need to convince their manager. Staff engineers need to build consensus across multiple teams, each of whom has their own priorities. The document I write is no longer a one-pager for my team — it is an RFC that needs buy-in from three team leads and a director.The biggest trap at staff level is solving interesting problems instead of important ones. The problems that are intellectually fascinating are often not the ones that move the business. Disciplining yourself to focus on boring-but-important work — like standardizing deployment practices or cleaning up a shared library’s API — is a sign of genuine staff-level judgment.

Walk me through a time you had to make a critical technical decision with incomplete information. How did you handle the uncertainty?

This question tests whether you can operate in ambiguity — the defining characteristic of senior-and-above engineers.I will give a specific example. We were building a real-time event processing system and had to choose between Kafka and Amazon Kinesis. The decision had to be made within two weeks because downstream teams were blocked on our architecture choice. The problem: we had no prior experience operating either at the scale we anticipated (roughly 50K events per second at peak), and our team was split.Here is how I handled it:Step 1 — Define the decision criteria before evaluating options. I wrote down what actually mattered: operational complexity (we had a small platform team), cost at projected volume, latency requirements, ecosystem compatibility with our existing AWS stack, and the ability to replay events for debugging. I explicitly deprioritized features we did not need, like exactly-once semantics (we could handle at-least-once with idempotent consumers).Step 2 — Time-box the research. I gave the team three days to build throwaway prototypes of both. Not full implementations — just enough to measure write latency, understand the operational model, and feel the developer experience. We ran load tests at 2x our projected volume.Step 3 — Identify what was reversible and what was not. Switching messaging systems later would be painful but technically possible — the key was to put a clean abstraction layer between our application logic and the messaging transport. So I made the data model and consumer contract the hard commitment, and the messaging backend the softer one.Step 4 — Make the call and document the reasoning. We chose Kafka because the prototype showed Kinesis shard management would become a significant operational burden at our scale, and two engineers on the team had some Kafka experience. I wrote a decision record that included: what we chose, why, what we explicitly traded off, and what signals would tell us we made the wrong call.Step 5 — Build in checkpoints. I scheduled a 90-day review: “Are we seeing the operational patterns we expected? Is the cost where we projected? Are there surprises?” This made it safe to commit — we were not saying “Kafka forever,” we were saying “Kafka for now, with eyes open.”The result: Kafka worked well for us. But the real lesson was not that we picked the right tool — it was that the framework for deciding under uncertainty was reusable. Define criteria, time-box research, separate reversible from irreversible decisions, document reasoning, and build in review points.
This happens more often than people admit, and it is one of the hardest moments in technical leadership. Two competent engineers can look at the same evidence and reach opposite conclusions because they are weighting the criteria differently.My approach has three phases:First, make the disagreement explicit. I put the two positions side by side in a document: “Person A recommends X because they are weighting [criteria]. Person B recommends Y because they are weighting [different criteria].” Often, the disagreement is not about facts but about values — one person values operational simplicity, another values raw performance. Making this visible usually de-escalates the tension because people realize they are not actually disagreeing about reality.Second, look for a decision-maker, not a vote. Consensus is overrated for technical decisions. I try to identify who will live with the consequences most directly. If it is a choice between two database technologies, the engineer who will operate it at 3 AM should have the heaviest vote. If neither person owns the consequences, I escalate to whoever does.Third, if it is genuinely a coin flip, just flip the coin and move forward. I have seen teams waste weeks debating between two options that are 90% equivalent. The cost of delayed decision-making almost always exceeds the cost of picking the “wrong” 90%-good option. I will sometimes say: “Both options are defensible. We’re going with X. If we hit [specific failure signal], we’ll revisit. Let’s move.”The thing I explicitly avoid is seeking false consensus. Getting everyone to say “I agree” when half the room is just tired of arguing is worse than making a clear decision that some people disagree with. Disagree-and-commit is a real and necessary pattern.
This is the meta-skill behind decision-making, and honestly, it is something I still work on.The diagnostic I use is: “What specific new information would change my decision?” If I can name a concrete experiment, data point, or conversation that would genuinely shift my thinking, then more research is justified. If I cannot — if more research is just collecting confirming evidence for a decision I have already subconsciously made — then I am procrastinating.Another signal: if the decision has been on the table for more than two weeks without new information surfacing, you are almost certainly in analysis paralysis. The information environment is not going to improve meaningfully. You have what you have. Decide.I also pay attention to my emotional state. When I am excited about researching, I am usually learning something useful. When I feel anxious about deciding, that anxiety is often the thing to pay attention to — it usually means the stakes feel high and I am trying to achieve certainty in a domain where certainty is not available. The right response to that anxiety is not more research. It is accepting that you will be partially wrong and building the feedback loops that let you correct quickly.The best principal engineer I ever worked with told me: “Your job is not to make the right decision. Your job is to make a good-enough decision quickly and create the conditions to detect and fix mistakes fast.” That reframe changed how I approach ambiguity.

Tell me about a time you had to influence a technical direction across teams without having direct authority over those teams.

This is the canonical staff-engineer question. If you cannot answer this concretely, with specifics, you have not operated at staff level regardless of your title.Here is the situation: I was at a mid-size company (~300 engineers) where five teams had independently built their own solutions for rate limiting. Each one worked for its own service, but the inconsistency was causing real problems — customer-facing APIs had different rate limit headers, internal services had no rate limiting at all (leading to cascading failures during traffic spikes), and every new service reinvented the same logic.How I built the case. I did not start by proposing a solution. I started by quantifying the problem. I pulled incident data from the last 6 months and found three outages directly caused by unbounded internal traffic — collectively costing about 14 hours of engineering time and impacting approximately 50,000 users. I also estimated that each new service was spending 1-2 weeks building its own rate limiting, and we were spinning up roughly one new service per month.How I built alignment. I wrote a short problem statement (not a solution) and shared it with the tech leads of the five affected teams. I explicitly asked: “Am I seeing this right? Is this a real problem?” Three said yes immediately. One said “it’s a problem but not a priority.” One said “we’ve already solved it for our service.” The last response was the most important — I needed to understand why they did not see value, so I scheduled a 30-minute conversation and learned that their concern was being forced to adopt a one-size-fits-all solution that did not handle their edge cases.The RFC. Armed with the feedback, I wrote an RFC that proposed a shared rate-limiting library (not a centralized service) with pluggable strategies. The library provided the common cases out of the box — fixed window, sliding window, token bucket — and let teams implement custom strategies when needed. Critically, I showed how each of the five existing implementations could be migrated with minimal effort. I included a migration plan with each team’s effort estimate.The persuasion. I presented the RFC in an engineering-wide design review. I explicitly addressed the “not my priority” concern by showing the incident cost data. I addressed the “one-size-fits-all” concern by demonstrating the pluggable architecture. I offered to do the migration work for the first two teams myself, which lowered the adoption barrier.The result. Four of five teams adopted the library within two months. The fifth adopted it six months later when they spun up a new service and the library was easier than building from scratch. Rate-limiting-related incidents dropped to zero over the next quarter.What I learned. You do not influence by being right. You influence by understanding everyone’s constraints, doing the work to make adoption easy, and leading with the problem — not the solution.
This happens, and honestly, it is fine — sometimes they are right to refuse.My first step is genuine curiosity: why? There are a few common reasons, and the response is different for each:“It doesn’t fit our use case.” Take this seriously. Either your solution has a real gap you missed, or their constraint is unusual enough that a custom approach is justified. If it is a gap, fix it — this feedback makes the solution better for everyone. If their use case is genuinely unique, carve out an exception and document why. Forcing uniformity when it does not fit destroys trust.“We don’t have time.” This is often real. Teams have roadmaps and deadlines. My response is to make adoption effortless — write the migration PR for them, offer to pair on the integration, or sequence the migration after their current deadline. If you are asking a team to prioritize your initiative over their committed work, you need to either get their manager on board or wait.“We don’t think the problem is important enough.” Show them the data. If the data does not convince them, escalate to their manager or a shared engineering leader — not as a political move, but as a legitimate “we need organizational alignment on priorities” conversation. Sometimes the answer really is “this is not important enough to prioritize right now,” and you have to accept that.“We just don’t want to.” This is rare when you have done the work to build a good solution, but it happens. It is usually about autonomy or trust, not about the technical merits. The best approach is to keep building adoption momentum with the teams that are on board. Success is contagious — when four out of five teams are using the shared solution and reporting zero incidents, the holdout team usually comes around on their own timeline.The key principle: influence without authority means you cannot force anyone. You can only make the right path the easy path.
Measurement has to be defined upfront. For the rate-limiting example, I set three success criteria before we started: (1) number of teams using the shared library (target: 4 of 5 within 3 months), (2) rate-limiting-related incidents per quarter (target: fewer than 1, down from 3-4), and (3) time for a new service to implement rate limiting (target: under 2 hours, down from 1-2 weeks).I tracked these monthly and shared the numbers in a short update to the engineering mailing list. This served double duty — it kept the initiative visible to leadership and it gave adopting teams recognition for their contribution.For when to walk away: I set a mental “kill criterion” before starting any cross-team initiative. Mine is usually: “If I cannot get 2+ teams meaningfully engaged within 6 weeks of the RFC, the problem is either not real enough or my solution is wrong.” At that point, I go back to the listening phase — maybe I misdiagnosed the problem, or maybe the solution needs to be fundamentally different.The hardest walk-away is when the initiative is technically sound but organizationally doomed. If the company does not value infrastructure investment, or if there is active political resistance from a powerful team, no amount of good engineering will overcome it. Recognizing this early and redirecting your energy to a problem where you CAN make progress is itself a staff-level skill. Not every hill is worth dying on.

What is the difference between a senior engineer who is ready for staff and one who is not?

This is one of my favorite questions because the answer reveals how deeply someone understands leveling, not just for themselves but as a framework.The short version: a senior engineer who is ready for staff has shifted from “I can solve any problem you give me” to “I can identify which problems are worth solving and make others successful at solving them.”Let me be more specific about the five signals I look for:1. They operate on problems, not tasks. A strong senior engineer takes a well-defined problem and delivers an excellent solution. A staff-ready engineer looks at the landscape and says, “Here are the three biggest problems we should be solving, here is why, and here is the order.” They write the problem statement, not just the solution. This is the single biggest differentiator.2. Their impact is not bounded by their own output. A senior engineer’s impact is roughly proportional to the code they write and the systems they design. A staff-ready engineer has disproportionate impact — they write an RFC that changes how five teams build services, or they create a library that saves 100 engineer-hours per quarter, or they mentor three senior engineers who each become more effective. When I review their work, I see fingerprints across the organization, not just deep marks in one area.3. They can communicate across audiences. Staff-ready engineers can explain the same decision three ways: to another senior engineer (technical depth), to a product manager (business impact), and to a VP (strategic alignment). If someone can only speak “engineer-to-engineer,” they are not ready. This communication versatility is what enables cross-team influence.4. They have demonstrated judgment, not just skill. They have stories about times they chose NOT to build something — times they killed a project, simplified a design, or chose a boring technology over an exciting one because it was the right call. Senior engineers are proud of what they built. Staff-ready engineers are proud of the decisions they made, including the decisions to not build.5. They already operate at staff scope. In my experience, promotions to staff are almost always lagging indicators — the engineer is already doing staff work before they get the title. If someone is waiting for the title to start doing the work, they are not ready. The question to ask is: “Are you already doing this?” not “Could you do this if given the chance?”The engineer who is NOT ready typically has one or more of these gaps: they are technically brilliant but cannot build consensus across teams, they solve problems given to them brilliantly but never identify problems independently, or they operate as a “hero” individual contributor whose impact disappears when they go on vacation.
This is the most common gap I see, and it is the hardest to close because it requires a fundamentally different type of work than what made them successful as a senior engineer.Start with low-stakes practice. I would not throw someone into leading a cross-org RFC on day one. Instead, I would start with something like: “Your team’s authentication library is being used informally by two other teams. Write a one-page proposal to formalize the API contract, get feedback from those teams, and present the result at our architecture review.” This is staff-level work — cross-team alignment, written proposal, consensus building — but with a small blast radius.Teach the writing habit. Most senior engineers who struggle with influence have never written a document that was designed to persuade. They write design docs that are technically correct but do not make a compelling case. I have them read 2-3 well-written RFCs from the organization, and then we co-write one together. The shift from “here is my design” to “here is the problem, here are the options, here is what I recommend and why” is transformative.Create exposure opportunities. I would pair them with a staff engineer on a cross-team initiative, not as a helper but as a co-lead. They need to see how influence without authority works in practice — how a staff engineer navigates disagreements, how they frame proposals, how they handle resistance. Shadowing is underrated.Give direct feedback on communication. After every design review or cross-team meeting, I debrief with them. “When you presented your proposal, you jumped straight to the solution. The other team lead tuned out because they didn’t understand why they should care. Next time, start with the problem and its cost to their team.” This kind of specific, immediate feedback accelerates growth faster than anything else.Set a timeline. I am honest: “Based on where you are, I think 6-9 months of deliberate work on cross-team influence would get you ready. Here are the three things I want to see. Let’s check in monthly.” Vagueness helps nobody.

Describe a situation where you had to balance shipping speed against technical quality. How did you decide where to draw the line?

Every interviewer who asks this is testing for nuance. The weak answer is “I always prioritize quality” (idealist who has never shipped under real constraints) or “I always prioritize speed” (cowboy who creates tech debt). The strong answer shows a framework for navigating the tension.Here is a real example. We had a major customer expansion launching in six weeks — a contract worth about $2M ARR. The feature they needed was a multi-tenant reporting dashboard. We had a clean design that would take 10 weeks to build properly — normalized data model, proper access controls, optimized queries, the works. We also had a “good enough” version that could be built in four weeks — denormalized data, slightly slower queries, some manual steps for tenant provisioning.How I decided:First, I separated the irreversible from the improvable. Security and data isolation were non-negotiable — a multi-tenancy bug that leaks data between customers is an existential risk. So the access control layer got the full 10-week treatment within the 4-week timeline. We did not cut corners on the security model.What we did cut: query optimization (acceptable because the initial customer had moderate data volume), automated tenant provisioning (acceptable because we were onboarding one customer, not a hundred), and the front-end polish (acceptable because we could iterate post-launch).Second, I documented the tech debt explicitly. I wrote three tickets with clear descriptions of what we deferred and why, estimated the cost to fix later, and identified the trigger conditions. “When we onboard a third customer OR when query latency exceeds 2 seconds at p95, we need to implement the optimized data model. Estimated effort: 3 weeks.” This turns implicit debt into a planned roadmap item.Third, I got stakeholder alignment. I presented both options to the product manager and the engineering director: “Option A ships in 4 weeks with these trade-offs. Option B ships in 10 weeks with a complete solution. Here is what we gain and lose with each.” They chose Option A with full awareness of the trade-offs. This is critical — the decision was shared, not unilateral.The result: We shipped on time. The customer launched successfully. Three months later, when we onboarded the second customer, we did the optimization work — and it was easier because we understood the access patterns from real production usage. The tech debt tickets were addressed within one quarter.The principle: Speed and quality are not opposites. The real question is: “What quality dimension can I defer without creating irreversible damage?” Security, data integrity, and core abstractions cannot be deferred. Performance optimization, UI polish, and automation often can.
This is the real challenge, because organizations have a strong gravitational pull toward “it works, don’t touch it.”I use three mechanisms:Explicit debt registration. Every shortcut gets a ticket immediately — not “someday” but with a specific trigger condition and a rough effort estimate. I tag these tickets distinctly (we used a “tech-debt-contract” label) and review them monthly with the team lead. This makes the debt visible in sprint planning, not buried in a wiki page nobody reads.Degradation metrics. For performance-related debt, I set up monitoring that will alarm when the deferred optimization becomes necessary. In the reporting dashboard example, I set an alert for p95 query latency exceeding 1.5 seconds. When it fired (about 4 months later), the ticket was already written and estimated — we just pulled it into the sprint.The 20% rule. I advocate for allocating 15-20% of every sprint to tech debt reduction, and I protect that allocation politically. When product managers push back, I frame it in their language: “If we skip tech debt this sprint, here is the feature velocity we will lose next quarter because of accumulated friction.” Making the cost tangible — “each sprint without this fix adds 30 minutes to every developer’s build time, which is 10 engineer-hours per week” — is more persuasive than abstract arguments about code quality.The shortcut that actually becomes permanent is the one nobody wrote down. If the debt lives only in someone’s head, it will never be prioritized. Documentation is the immune system against permanent tech debt.
This is a conversation I have had multiple times, and it is one of the defining moments for a senior or staff engineer’s credibility.First, quantify the damage. Saying “we have too much tech debt” is meaningless to a product leader. Saying “our deployment frequency has dropped from 8 per week to 3 per week over the last quarter, and 40% of our sprint capacity is going to workarounds for known issues in the payment module” is a business argument. I have literally pulled commit history and JIRA data to show that feature velocity was declining quarter-over-quarter because of accumulated shortcuts.Second, propose a specific plan, not an open-ended pause. “We need to stop feature work and pay down tech debt” is terrifying to a product organization. “I propose a 3-week focused sprint where 4 engineers address the top 5 tech debt items. Here is the expected improvement in deployment frequency and developer velocity. Feature work resumes on [date].” This is a bounded ask with a clear return-on-investment story.Third, tie it to a business outcome they care about. I once convinced a VP to approve a 4-week tech debt sprint by framing it as: “We cannot reliably ship the Q4 features on time if we do not fix the CI pipeline and the flaky integration test suite first. The risk of missing the Q4 deadline is higher if we keep pushing features than if we pause for a month now.” I was reframing tech debt from “engineering wants to clean up” to “this is a risk to the revenue plan.”The nuclear option. If the organization consistently refuses to address tech debt despite evidence of harm, that is one of the signals from the “When to Leave” section of this chapter. An organization that does not invest in its engineering infrastructure is either unaware (fixable) or culturally opposed (not fixable). If you have made the case three times with data and been overruled, it is a cultural issue.

You join a new team and discover the architecture is deeply flawed. What do you do?

The trap here is the word “discover.” Junior engineers “discover” problems on day one. Senior engineers hold their judgment until they understand the full context.My honest answer is: I do almost nothing for the first 4-6 weeks, and that is deliberate.Phase 1 — Listen and learn (weeks 1-4). Every architecture that looks “deeply flawed” to a newcomer made sense to someone at some point. My first job is to understand why. I read every design doc, ADR, and postmortem I can find. I talk to the engineers who built it. I ask: “What constraints existed when this was designed? What has changed since then? What have you already tried to fix?” Nine times out of ten, the “flawed” architecture was the right call given the constraints at the time — a smaller team, a tighter deadline, a different product direction, a technology that did not exist yet.This listening phase also builds trust. Nothing destroys your credibility faster on a new team than a week-one “here’s everything wrong with your system” speech. People built this system with care, under constraints you do not yet understand. Respect that.Phase 2 — Validate the diagnosis (weeks 4-8). Once I understand the history, I can form an informed opinion about what is actually broken versus what is merely unfamiliar. I write down my observations — not as a critique, but as a “state of the world” document. I share it with the team lead privately: “Here is what I am seeing. Am I missing anything? Does this match your understanding?” This collaborative framing turns my assessment from an outsider’s criticism into a shared diagnosis.Phase 3 — Propose incremental improvement (weeks 8+). I almost never propose a rewrite. Rewrites are high-risk, hard to scope, and often fail. Instead, I identify the single highest-leverage improvement — the thing that, if fixed, would unlock the most value. Maybe it is extracting a shared data model, or adding an abstraction layer between two tightly coupled services, or migrating one critical path off a deprecated library.I write a short RFC for this one change. I scope it to something deliverable in 4-6 weeks. I show the expected impact. And I volunteer to do the work — not delegate it. Leading the first migration myself demonstrates commitment and builds trust that my ideas are practical, not ivory-tower architecture.Real example: I joined a team that had a monolithic Node.js application where business logic, data access, and HTTP handling were all mixed together in route handlers — some of them 500+ lines long. My instinct was “this needs a total restructuring.” What I actually did was refactor the single most-changed route handler into a clean three-layer architecture (handler, service, repository), wrote tests for the extracted service layer, and presented it to the team as a pattern we could follow incrementally. Over the next two quarters, the team adopted this pattern for every new feature and backfilled the highest-churn files. No big-bang rewrite, no disruption, gradual improvement.
Rewrites get a bad reputation, and it is mostly deserved — most rewrites fail. But there are situations where incremental improvement is genuinely insufficient:The system has a fundamental data model problem. If the core data model cannot represent the business domain correctly, no amount of refactoring around the edges will help. I worked on a system where customer billing was modeled as a single entity but the business had evolved to need per-service, per-region billing. Every feature required increasingly absurd workarounds. The incremental path would have taken longer than a targeted rewrite of the billing model.The technology is end-of-life with no migration path. If you are running on a framework or runtime that will stop receiving security patches and there is no path to upgrade in-place, a rewrite (or, more precisely, a re-platforming) is unavoidable.The system is so poorly understood that nobody can safely change it. If there are no tests, no documentation, no original authors, and every change causes unexpected failures, incremental improvement is actually riskier than a disciplined rewrite because you cannot reason about the impact of changes.Even in these cases, I never rewrite everything at once. I use the strangler fig pattern: build the new system alongside the old one, migrate traffic gradually, and decompose the old system one piece at a time. This lets you deliver value incrementally, catch problems early, and revert if something goes wrong.The key question: “Is the cost of understanding and safely modifying the existing system higher than the cost of replacing it?” If yes, rewrite. If no — and it usually is no — refactor.

How do you approach mentoring someone who is stuck at their current level?

The first thing I do is figure out WHY they are stuck, because the remedy is completely different depending on the cause.There are typically four root causes, and most struggling engineers have a combination:1. They do not know what the next level looks like. This is the most common and most fixable problem. Many engineers are stuck because they are optimizing for the wrong things — they think “write better code” is the path to senior, when the actual gap is “own a system end-to-end” or “influence without being asked.” My first move is to sit down with them and map their current work against the level expectations. I make it concrete: “Here is what a senior engineer on this team does that you are not yet doing. Let’s pick the one gap that would have the most impact to close.”2. They lack opportunity, not ability. Sometimes the engineer has the skills but the team does not offer the scope. If someone needs cross-team impact to reach staff but their team operates in isolation, no amount of coaching will help — they need a different assignment. I work with their manager to find or create the right opportunity: a cross-team project, an RFC to write, a service to own end-to-end.3. They have a specific skill gap they are not addressing. Maybe they avoid writing design documents. Maybe their code reviews are superficial. Maybe they cannot articulate trade-offs to non-engineers. I identify the specific gap and create a targeted development plan: “For the next month, I want you to write a 1-page design doc for every task before you start coding. I will review each one and give you feedback.”4. They are doing the work but not making it visible. This is the “glue work” trap. They are operating at the next level, but nobody in a position of power knows it. Here, the coaching is about storytelling and self-advocacy: start a brag document, frame work in terms of impact, present at team meetings, get on leadership’s radar.My mentoring mechanics: I do biweekly 30-minute 1:1s. Each one starts with “What did you work on that you are proud of?” (trains impact thinking) and “What is the hardest decision you had to make this week?” (trains judgment). I assign one concrete growth action per session — small enough to do in two weeks, specific enough to evaluate. “Give a thorough code review on the database migration PR and focus on operational concerns, not just correctness” is better than “do more thorough code reviews.”I also model the behavior I want to see. If I want them to write better design docs, I write one with them and walk through my thought process. If I want them to push back on requirements, I bring them into a meeting where I do it and debrief afterward. Telling someone to “be more senior” is useless. Showing them what it looks like and letting them practice is effective.
This is one of the most delicate conversations in engineering leadership, and doing it well is a skill that separates good mentors from great ones.The principle: be specific, be kind, and be honest — in that order.I start by validating what they ARE doing well. “I can see the growth in your technical work over the last two quarters. Your system design skills have improved significantly, and the work you did on the caching layer was genuinely senior-level.” This is not flattery — it is accurate calibration, and it establishes that my assessment is nuanced, not dismissive.Then I transition to the gap with a concrete example: “The area where I do not yet see senior-level work is in how you handle ambiguity. When the requirements for the notification system were unclear, you waited two weeks for clarification instead of going to the PM with a proposal. A senior engineer would have drafted three options, brought them to the PM, and driven the decision. That pattern of waiting for clarity instead of creating it is the biggest gap I see.”I always anchor to observable behavior, never to identity. “You waited for clarification” is feedback. “You’re not proactive enough” is a character judgment. The first one is actionable. The second one is demoralizing.Then I make it forward-looking: “Here is what I want to see over the next quarter. The next time you get an ambiguous requirement, I want you to write a short options document and bring it to the stakeholder within two days. Let’s do that three times and then reassess.”The hardest part is when they push back: “But I’ve been here for three years” or “But other people who do less got promoted.” I acknowledge the frustration without conceding the point: “I understand that feels unfair, and tenure should count for something. But the criteria are about demonstrated impact, not time served. Let me help you close the specific gap so we can make the strongest possible case.”What I never do: give false hope. “You’re almost there, just keep doing what you’re doing” when they are not close is cruel — it wastes their time and erodes trust when the promotion does not come. Honest assessment delivered with respect is always better than comfortable vagueness.

How do you evaluate whether to stay on the IC track or move into management?

The most important thing to understand is that this is a career change, not a promotion. Charity Majors’ framing of the “engineer/manager pendulum” is exactly right — management is a different job with different skills, different day-to-day activities, and different sources of satisfaction.Here is how I think about the decision:Start with energy, not prestige. Which activities give you energy versus drain you? I literally tracked this for a month. The things that gave me energy: designing systems, debugging hard problems, writing technical documents, code review, and teaching engineers how to think about architecture. The things that drained me: status meetings, navigating organizational politics, performance reviews, and resolving interpersonal conflicts on the team. My energy profile pointed clearly toward the IC track.But I did not just trust my self-assessment. I tested the hypothesis.Run an experiment. I took on a 3-month stint as a tech lead for a project with 4 engineers. This gave me a taste of management-adjacent work — running standups, doing 1:1s, coordinating across teams, shielding the team from distractions, and making people-level decisions about who should work on what. I learned two things: (1) I was decent at it, and (2) I missed the deep technical work every single day. After three months, I knew the IC track was right for me — not because management was bad, but because it was not where I would have the most impact or satisfaction.The diagnostic questions I would ask anyone considering the switch:
  • When a project succeeds, what gives you more satisfaction — the elegant architecture, or watching a team member you coached deliver something they could not have done before?
  • Would you rather spend an afternoon debugging a complex distributed system issue or helping an engineer navigate a career crisis?
  • When there is a conflict between two engineers about a technical approach, do you want to resolve it by analyzing the options and deciding (IC), or by coaching them to resolve it themselves (management)?
  • Are you drawn to management because you want to have impact through people, or because you think it is the only path to seniority and higher compensation?
The last question is the critical one. If the answer is “it’s the only path to more comp or seniority,” you are at the wrong company, not making the wrong career choice. Good companies pay their staff ICs as well as their engineering managers.The pendulum is real. I have worked with several excellent engineers who went into management, learned invaluable skills about organizational dynamics and people, and then came back to IC roles where they were dramatically more effective because they understood the human side. The worst thing you can do is treat this as a one-way door. Try it if you are curious. Come back if it is not right. The skills transfer in both directions.
I have seen three patterns repeatedly:The “Player-Coach” who never stops playing. A strong IC becomes a manager but continues to own the hardest technical work personally. They do not delegate the interesting problems because (a) they can do it faster and (b) the technical work is what they enjoy. The result: they are a bottleneck on both the technical work AND the management work, their reports do not grow because they never get challenging assignments, and the manager burns out trying to do two full-time jobs.The fix is painful but simple: stop writing production code in the critical path. You can review, advise, prototype, and pair — but the interesting work must go to your reports. Your job is now to make THEM excellent, not to be excellent yourself.The “absent technical manager.” The opposite failure mode. They dive fully into management and lose touch with the technical reality. Their reports come to them with architectural decisions and they rubber-stamp everything because they do not have enough context to evaluate it. Over time, the team makes worse technical decisions because nobody in a leadership position is providing technical judgment.The fix: stay technically engaged without being on the critical path. Read design docs carefully. Ask probing questions in reviews. Write internal tools or do small improvements during your focus time. You do not need to be the best coder on the team, but you need to be able to smell when something is wrong.The “feedback-avoidant” manager. Many ICs transition to management without ever developing the skill of giving direct, constructive feedback about someone’s performance. They can critique code all day, but telling an engineer “your communication in meetings is undermining your credibility and holding back your promotion” feels terrifyingly personal. So they avoid it. Their reports stagnate, problems fester, and eventually someone quits because they never received the guidance they needed.The fix: treat feedback as a skill to practice, not a personality trait. Have a framework — I use Situation-Behavior-Impact. “In yesterday’s design review (situation), when you dismissed Sarah’s suggestion without engaging with it (behavior), the rest of the team stopped offering ideas for the rest of the meeting (impact). Here is what I would like to see instead.” Practice it like you practiced writing code — deliberately, repeatedly, with coaching.

Tell me about a time you were wrong about a significant technical decision. What happened and what did you learn?

The interviewer is not looking for perfection — they are looking for self-awareness, intellectual honesty, and the ability to extract lessons from failure. If a candidate claims they have never been significantly wrong, that is either a lie or a sign they have never made decisions important enough to be wrong about.Here is mine: Early in my time as a senior engineer, I championed a migration from REST APIs to GraphQL for our customer-facing platform. I had read extensively about GraphQL, built a proof-of-concept, and was genuinely convinced it would solve our growing over-fetching problem — mobile clients were making 6-8 REST calls per screen to assemble the data they needed.I wrote a design doc, presented it enthusiastically, and got buy-in from leadership. We started the migration with two services.What went wrong:The technical merits of GraphQL were real — clients loved the flexible queries. But I had dramatically underestimated three things:First, operational complexity. Our team had deep REST expertise and zero GraphQL operational experience. Query performance was unpredictable because clients could construct arbitrarily deep queries. We ended up building a query complexity analyzer, a cost estimator, and custom rate limiting — work I had not budgeted for.Second, the N+1 problem at the resolver level. Our naive implementation hit the database per-field resolution, which was catastrophically slow. We had to learn and implement DataLoader patterns across every resolver. This was solvable but took 3 weeks I had not planned for.Third, and most importantly, the team’s learning curve. I had assumed that because I found GraphQL intuitive, everyone would. I was wrong. Two mid-level engineers struggled significantly, and the code reviews took 3x longer during the transition because everyone was learning the idioms simultaneously. Velocity dropped by roughly 40% for two months.What I learned:The biggest lesson was not about GraphQL — it was about adoption cost accounting. I had evaluated the technology on its technical merits and ignored the organizational cost of the transition: training, velocity loss, operational learning, and tooling gaps. Now, whenever I propose a technology change, I include a “transition cost” section in the design doc with realistic estimates of the learning curve, the expected velocity dip, and the timeline to full productivity.The second lesson was about piloting correctly. I should have migrated one non-critical service first, let the team develop operational intuition over 2-3 months, and THEN decided whether to expand. Instead, I committed two services simultaneously, which meant the learning pain was multiplied without a safety net.The third lesson was about confirmation bias in technology evaluation. I was excited about GraphQL and unconsciously downweighted the risks. Now I assign a “devil’s advocate” for every significant technology decision — someone whose job is to make the strongest possible case against the proposal. If the case still holds after serious adversarial testing, I have more confidence in it.We eventually completed the migration successfully and it did solve the original problem. But it took 5 months instead of the 3 I had estimated, and the first two months were painful. I was right about the destination and wrong about the journey.
This is fundamentally about building psychological safety, and it starts with modeling the behavior yourself.I share my own mistakes publicly. When I am wrong about a technical call — an estimate that was off, an architecture that did not hold up, a technology choice that created unexpected pain — I write it up in a short postmortem-style document and share it with the team. Not as self-flagellation, but as “here is what I learned and here is how it changes my approach.” When the senior or staff engineer on the team does this, it gives everyone else permission.Blameless postmortems are non-negotiable. When an incident happens or a decision turns out to be wrong, the question is never “who messed up?” It is always “what did we learn, and what system change prevents this from happening again?” I have literally interrupted a postmortem meeting where someone started pointing fingers and said: “We are not here to assign blame. We are here to make our systems better. If we needed to fire someone to fix this, our systems would still be broken.”I celebrate well-reasoned decisions that turned out wrong. A teammate once argued for a particular database indexing strategy with a clear rationale and load test results. In production, the access patterns were different from what the load test simulated, and the index caused more harm than good. In our retro, I explicitly praised the decision process: “The reasoning was sound, the testing was thorough, and the fact that production surprised us is something we can learn from. This is exactly how good engineering works — sometimes you are wrong, and the goal is to detect and correct quickly.”The failure mode to watch for: people who never admit mistakes are either not taking risks or are hiding problems. Both are dangerous. If nobody on my team has been wrong about anything in the last quarter, I am worried — it usually means they are only making safe, conservative decisions or they are covering up problems that will surface later.

How do you think about building your “T-shaped” expertise, and how has your T-shape evolved over your career?

The T-shape model is simple but the execution is subtle — most engineers get it backwards by going wide too early.The idea is straightforward: you want deep expertise in one or two areas (the vertical bar of the T) and broad familiarity across many adjacent areas (the horizontal bar). The deep expertise is what makes you valuable. The broad knowledge is what makes you versatile and capable of connecting dots across domains.Here is how my T-shape has evolved:Early career (years 0-3): All vertical, almost no horizontal. I started as a backend engineer and went deep on one stack — Python, PostgreSQL, and the Django ecosystem. I could trace a request through every layer of the framework, understood the ORM’s SQL generation, and could optimize database queries at the index and query-plan level. I deliberately resisted the temptation to learn React, try Go, or experiment with Kubernetes during this phase. Depth first.Mid-career (years 3-6): Expanding the vertical and starting the horizontal. My deep expertise expanded from “Django web apps” to “building and operating Python services at scale” — which naturally pulled in distributed systems concepts, message queues, caching strategies, and observability. At the same time, I started building the horizontal bar: I learned enough frontend to review React PRs intelligently, enough DevOps to understand our Terraform configs, and enough data engineering to have informed opinions about our ETL pipelines. I was not expert in any of these — but I could have productive conversations with the specialists.Senior and beyond (years 6+): A second vertical bar (the “pi-shaped” engineer). As I moved toward staff, I developed a second deep area: system design and architecture. This was not about any single technology but about the meta-skill of designing systems that are reliable, scalable, and maintainable. This second vertical bar is what enabled cross-team work — I could reason deeply about backend systems AND about the architectural patterns that connect them.How I decide what to add to the horizontal bar: I follow a rule: learn the things that are adjacent to your deep expertise and that create the most leverage for collaboration. As a backend engineer, learning enough about databases to have a real conversation with a DBA was higher leverage than learning mobile development. Learning enough about observability to design my own dashboards was higher leverage than learning machine learning. Each horizontal extension should make your deep expertise more effective.The common mistake: Going wide too early — knowing a little about 12 technologies but being the expert on none. In my experience, the market rewards the first deep vertical bar more than anything else. A “full-stack developer” who can build a basic app in five frameworks is less valuable than a backend engineer who can design, build, and operate a payment processing system that handles $10M per day without losing a transaction.The career hack: Your horizontal knowledge does not need to be hands-on expertise. Reading architecture blogs, attending tech talks, and doing occasional code reviews in unfamiliar areas builds horizontal awareness efficiently. Your vertical expertise requires hands-on, sustained practice. Allocate your time accordingly.
The industry appears to reward breadth because breadth is visible — you can list 15 technologies on your resume and look impressive. But when you get to the interview room, depth is what separates “I’ve used Kafka” from “I can design a Kafka-based event pipeline that handles back-pressure, exactly-once semantics, and multi-region replication.” The second person gets the offer.My decision framework is simple:Deepen when you are building your competitive advantage. If you are not yet the go-to person for anything on your team, you need more depth. Being “the person you call when the payment system is broken” or “the person who designs our API contracts” is a career asset that breadth cannot replace.Broaden when you hit collaboration bottlenecks. If you find yourself blocked in conversations with other teams because you do not understand their domain, that is a signal to broaden. When I was regularly working with the data team and could not evaluate their pipeline proposals because I did not understand Spark or Airflow, that was my cue to spend two weeks getting conversational in data engineering.Broaden when you are pursuing the next level. The jump from senior to staff almost always requires broadening, because staff-level problems span domains. But this broadening is purposeful — it is driven by the specific cross-cutting problems you are trying to solve, not by a vague desire to “learn new things.”The trap I warn people about: do not chase breadth because of fear of obsolescence. “What if my framework becomes irrelevant?” is a common anxiety, but deep skills are more transferable than people think. Someone who deeply understands relational databases can learn any new database quickly. Someone who deeply understands distributed systems can evaluate any new messaging framework. The fundamentals transfer. The syntax does not matter.

You have been asked to build a team’s technical strategy document for the next 12-18 months. Walk me through your process.

This is a staff-to-principal-level question that tests strategic thinking, organizational awareness, and writing ability simultaneously.My process has five phases:Phase 1: Gather inputs (1-2 weeks). I start by understanding the constraints before forming opinions:
  • Business context. Where is the company headed? What are the revenue goals, the product roadmap priorities, and the competitive pressures? I schedule 30-minute conversations with the product lead, the engineering director, and ideally a business stakeholder. The technical strategy must serve the business strategy, not exist in isolation.
  • Current state assessment. What is the honest state of our systems? I review incident data, deployment metrics, developer satisfaction surveys (if they exist), and the top pain points from the last three retrospectives. I also assess the team’s skills inventory — what can we do well and where are we weak?
  • Team input. I run a lightweight exercise with the engineers: “What are the three things that slow you down the most, and what are the three things you think we should invest in?” This surfaces ground-level reality that leadership often misses.
Phase 2: Identify themes (3-5 days). From the inputs, I identify 3-5 strategic themes — not specific projects, but problem areas that need investment. Examples: “Our deployment pipeline is a bottleneck that limits iteration speed,” “Our data model cannot support the multi-tenant roadmap,” “We lack observability to diagnose customer-facing issues quickly.”For each theme, I articulate: the current state (with data), the desired state, the cost of inaction, and the rough investment required.Phase 3: Prioritize ruthlessly (3-5 days). This is the hardest and most important step. Every team has more problems than capacity. I use a 2x2 framework: impact (how much does solving this improve our ability to deliver on the business roadmap) versus urgency (what happens if we defer this 6 months). I also consider dependencies — some investments unlock others.I force-rank the themes into three tiers: must-do (cannot achieve business goals without this), should-do (significantly improves velocity or reliability), and nice-to-have (valuable but can be deferred). I am honest about what we are NOT doing and why — a strategy that tries to do everything is not a strategy.Phase 4: Write the document (1 week). The structure I use:
  1. Executive summary (half a page) — What we are investing in, what we are not, and what business outcomes this enables
  2. Current state (1-2 pages) — Honest assessment of where we are, with data
  3. Strategic priorities (2-3 pages) — Each theme with problem statement, proposed direction, investment estimate, and success metrics
  4. What we are explicitly NOT doing (half a page) — And why. This section builds credibility
  5. Risks and dependencies (half a page)
  6. Timeline — Rough quarterly milestones, not a detailed project plan
The document should be readable by someone outside the team in 15 minutes. If it takes longer, it is too detailed.Phase 5: Socialize and iterate (1-2 weeks). I share the draft with three audiences: the team (for ground-level feedback), peer tech leads (for cross-team alignment), and leadership (for strategic alignment). I expect significant feedback and plan two revision cycles. The goal is not to have MY strategy adopted — it is to build a shared understanding that everyone can execute against.What makes a strategy document great versus mediocre: Mediocre strategy docs are wish lists. Great ones make trade-offs explicit and explain what you are sacrificing to focus on your priorities. The phrase “we will NOT invest in X this year because Y is higher leverage” is the hallmark of genuine strategic thinking.
Most strategy documents die within a month of being written because nobody revisits them. I treat the document as a living artifact with a specific review cadence.Quarterly reviews. Every quarter, I schedule a 90-minute session with the team to review the strategy against reality. What has changed? Did our assumptions hold? Are the priorities still right? I bring data: “We said we would reduce deployment time to under 10 minutes — we are at 12 minutes. We said our data model migration would be complete by Q2 — we are on track but the testing effort is larger than estimated.”Explicit change triggers. I define upfront what would cause a strategy change. For example: “If the company pivots to a new market segment,” “If we acquire a company with overlapping infrastructure,” or “If our team size changes by more than 25%.” These triggers prevent both unnecessary changes (reacting to every shiny new thing) and necessary ones being ignored (refusing to adapt when conditions change).The strategy should be boring by quarter two. If you are constantly surprised by what is in the strategy document, it was not socialized well enough. By the second quarter, the strategy should feel obvious — everyone should know the priorities, reference them in design decisions, and use them to evaluate new proposals. “Does this align with our strategy?” should be a common question in design reviews.When to tear it up and start over: If more than two of your core assumptions have been invalidated — a major pivot, a significant reduction in team size, a competitor move that changes the landscape — do not try to patch the existing strategy. Acknowledge that conditions have changed, run a compressed version of the original process (2-3 weeks instead of 4-6), and produce a revised strategy. The willingness to abandon a strategy you authored when reality changes is a sign of maturity, not failure.

How do you manage your energy and prevent burnout while maintaining high performance — and what advice do you give to engineers you mentor about this?

Most career advice treats engineers as productivity machines. This question is about sustainability, and the honest answer requires vulnerability.I will be direct: I have burned out once in my career, and it taught me more about sustainable high performance than any book. I was at a high-growth company, working 60+ hour weeks for about 8 months straight — not because anyone forced me to, but because I was deeply invested in the work and could not see a natural stopping point. The result was not dramatic collapse — it was a slow erosion of judgment, creativity, and empathy. I started making worse technical decisions, my code reviews became terse and unhelpful, and I dreaded opening my laptop in the morning. It took me three months to recover fully after I finally took a real break.What I do now — the system, not the aspiration:
  1. I protect recovery time like I protect production uptime. I do not work weekends except for genuine incidents. I take every day of PTO I am allocated. I do not check Slack after 7 PM. These are not casual preferences — they are commitments I treat with the same seriousness as a deployment checklist. When a manager pushed back on my boundaries during a crunch period, I said: “I will deliver exceptional work during working hours. If the project cannot be done within those constraints, we have a scoping problem, not an effort problem.”
  2. I manage energy, not just time. Some tasks give me energy (system design, mentoring, debugging hard problems), and some drain me (long meetings, organizational politics, repetitive operational work). I deliberately structure my week so that draining activities are followed by energizing ones. I never stack five meetings in a row if I can help it.
  3. I have an explicit “shutdown ritual.” At the end of each workday, I write down where I left off and what I will start with tomorrow. This clears the mental cache and prevents the “lying in bed thinking about work” problem. It is a small habit that has an outsized effect on my ability to actually rest.
  4. I watch for early warning signals. The first sign of burnout for me is not exhaustion — it is irritability. When I start feeling resentful about code reviews, annoyed by reasonable questions, or cynical about projects, I know I am approaching my limit. I have learned to treat these emotional signals as seriously as I would treat a monitoring alert.
What I tell mentees:The biggest thing I emphasize is that burnout is not a badge of honor, and rest is not laziness — it is maintenance. Just like you would not run a server at 100% CPU indefinitely, you should not run yourself at 100% capacity indefinitely. Sustained output of 80% is higher over a year than 110% for three months followed by 40% for three months.I also tell them to be honest about what they need. If they need a mental health day, take it. If a project is unsustainable, say so early — do not wait until they are breaking down to raise the alarm. The engineers who have the longest, most impactful careers are not the ones who work the hardest — they are the ones who work sustainably and are still doing excellent work a decade later.The organizational responsibility: Individual coping strategies are necessary but not sufficient. If burnout is widespread on a team, that is a management failure — poor prioritization, inadequate staffing, unrealistic deadlines, or a culture that rewards heroics over sustainable delivery. As a senior engineer, I consider it part of my job to flag these patterns to leadership, not just cope with them individually.
This is a real tension, especially in companies that conflate availability with commitment.My approach is to be highly responsive during working hours and explicitly unavailable outside them. The key word is “explicitly” — I communicate my availability clearly so that people know what to expect. My Slack status shows my working hours. My calendar has focus blocks. When I sign off, I post a short message: “Signing off for the day. If something urgent comes up, here’s the on-call runbook / escalation path.”The important realization is that most “urgent” things are not actually urgent. In my experience, fewer than 5% of after-hours messages truly require immediate response. The rest can wait until morning. By having good documentation, runbooks, and on-call rotation, I ensure that genuine emergencies are handled without me being personally available 24/7.When I see a team culture where everyone is expected to respond to Slack at 10 PM, I address it as a systemic issue, not a personal boundary negotiation. “If our system requires someone to be available at all times, that is an on-call rotation problem that should be formalized with proper compensation — not a distributed expectation that everyone is always on.”The hardest scenario is when leadership is the source of the always-on expectation. In that case, I model the behavior I believe in and let the results speak. If my output is excellent during working hours and I am unavailable outside them, and the team still functions, that is data that the always-on culture is unnecessary. If leadership explicitly requires constant availability, that becomes one of the signals in the “when to leave” framework.

Advanced Interview Scenarios

These are the questions that separate engineers who have been through the fire from those who have only read about it. Each one targets a specific real-world crucible moment where the “textbook answer” is wrong, incomplete, or dangerously naive. Interviewers use these to detect whether you have operated in genuine ambiguity, survived production chaos, and made hard organizational calls with real consequences.

A critical production incident is happening right now. Walk me through exactly how you run the first 30 minutes.

“I would look at the logs, find the bug, fix it, and deploy.” This answer reveals someone who has never run a real incident. It skips triage, communication, blast radius assessment, and the organizational coordination that separates a contained incident from a multi-hour catastrophe. Some also say “I would roll back immediately” without considering that rollbacks can make things worse if the issue involves data corruption or schema migrations.
The first 30 minutes of an incident are not about fixing the bug. They are about stabilization, communication, and triage — in that order. I have run incidents at a company processing roughly 2M transactions per day, and here is exactly what happens:Minutes 0-5 — Acknowledge and establish command. Someone declares themselves incident commander. In our team, we used PagerDuty’s incident response integration with a dedicated Slack channel auto-created per incident. My first message is always: “I am IC for this incident. Current impact: [what we know]. I need [roles: someone on database, someone on application logs, someone drafting customer comms].” Establishing clear command prevents five engineers from independently restarting the same service.Minutes 5-15 — Assess blast radius, NOT root cause. This is where most engineers go wrong. They immediately start grepping logs for the root cause. Instead, I ask: “Who is affected? How many users? Is it getting worse or stable? Is data being corrupted or just delayed?” I pull up our Datadog dashboards — error rate by endpoint, p99 latency, queue depth, database connection pool utilization. At one incident, our error rate was spiking but only on one of four API gateway pods. That immediately told me it was not a code issue but likely a host-level or network issue, and we could drain that pod rather than roll back the entire fleet.Minutes 15-25 — Mitigate, then diagnose. Mitigation is not the same as a fix. If one pod is bad, drain it. If a new deploy correlates with the incident, roll back to the last known good — but ONLY if you have confirmed the rollback is safe (no irreversible migration ran). If a downstream dependency is failing, flip the circuit breaker. I once worked an incident where everyone wanted to “find the bug” while customers were actively losing money. I cut through it: “We are going to disable the feature flag for the new checkout flow. That is our mitigation. We will diagnose when customers are no longer impacted.” We had the feature flag off in 90 seconds and bought ourselves two hours to diagnose calmly.Minutes 25-30 — Communicate outward. Engineering comms go to the status page. I draft a customer-facing message: “We are aware of [issue]. [X]% of users are affected. We have mitigated the immediate impact and are investigating root cause. Next update in 30 minutes.” That “next update in 30 minutes” is critical — it creates a cadence that prevents stakeholders from pinging you every 2 minutes asking for updates.The tools that matter: PagerDuty or Opsgenie for alerting and escalation. A dedicated Slack channel per incident (we used Jeli for incident management). Datadog or Grafana for real-time dashboards. Feature flags (LaunchDarkly or equivalent) as a kill switch. Runbooks in Notion or Confluence that engineers can follow at 3 AM without thinking. The runbook is the most underrated incident tool — I have seen 20-minute incidents become 2-hour incidents because nobody had written down “how to drain a pod” or “how to flip the circuit breaker for service X.”War Story: The worst incident I ever ran was a cascading failure triggered by a Kafka consumer lag spike during a Black Friday traffic surge. The consumer fell behind, backpressure caused the producer queue to fill, which started rejecting writes from the API layer, which returned 500s to users. The obvious fix — scale up consumers — actually made it worse initially because the rebalancing protocol caused a 3-minute pause in all consumption. The real fix was to enable consumer throttling on the producer side (a config we had never tested under load), drain the toxic messages that had accumulated, and then scale up consumers one at a time. Total incident time: 47 minutes. We wrote a runbook afterwards that became the template for every Kafka-related incident. Six months later, a junior engineer resolved a similar issue in 8 minutes using that runbook. That is the multiplier effect of good incident documentation.

Follow-up: How do you handle disagreements about the right mitigation approach during a live incident?

During a live incident, democracy is the enemy. This is not the time for a design review. The incident commander makes the call, and everyone else executes. I have a rule I state explicitly at the start of every incident I run: “I am making mitigation decisions. If you disagree, say so once with your reasoning. If I still go with my call, we execute. We can debrief after the incident.”I learned this the hard way. During one incident, two senior engineers spent 12 minutes debating whether to roll back or scale up while the error rate climbed. I now cut those debates at 60 seconds: “We have two options. Option A is reversible in 5 minutes, Option B is not. We are going with Option A. If it does not work, we try Option B.”The postmortem is where disagreements belong. During the incident, speed beats perfection.

Follow-up: How do you write a postmortem that actually prevents recurrence rather than just documenting what happened?

Most postmortems fail because they produce action items like “be more careful” or “add more monitoring.” Those are wishes, not fixes.My postmortem structure has teeth:
  1. Timeline — Minute-by-minute, no opinions, just facts. “14:03 UTC: deploy of commit abc123 began. 14:07: error rate exceeded 5% threshold. 14:09: PagerDuty alert fired.”
  2. Five Whys — But I stop when I hit a systemic cause, not a human one. “The engineer deployed without checking the dashboard” is NOT a root cause. “Our deployment pipeline does not have automated health checks that block rollout on error rate spikes” IS a root cause.
  3. Action items with owners, deadlines, and verification criteria. Not “improve monitoring” but “Add a Datadog monitor on payment-service error rate > 2% sustained for 3 minutes, alert to #payments-oncall, owner: @jane, deadline: 2024-03-15, verified when: alert fires correctly in staging load test.” Every action item must have a name, a date, and a definition of done. I track these in Linear and review them weekly until complete.
At my last company, we reduced repeat incident categories by 60% over two quarters by enforcing this structure. The single highest-leverage change was requiring that every postmortem action item be verified in staging before marking it complete.

You are evaluating whether your team should adopt a hyped new technology (say, a new database, framework, or AI tool). The community is excited. Your CTO is enthusiastic. But your gut says the hype is ahead of the reality. How do you navigate this?

Either “I would adopt it because staying on the cutting edge is important” (hype-driven engineering) or “I would refuse because we should use boring technology” (blanket conservatism without analysis). Both answers reveal someone who does not have a real framework for technology evaluation. The worst variant is “I would do what the CTO wants” — pure abdication of technical judgment.
This is one of the most politically charged decisions in engineering, because saying “no” to a technology your CTO is excited about feels career-limiting. But adopting the wrong technology to please leadership is far more career-damaging when it fails in production six months later and your name is on the RFC.My evaluation framework has four stages, and I make it transparent to all stakeholders:Stage 1 — Separate the capability from the hype. I ask: “What specific problem does this solve that our current tools do not?” If the answer is vague (“it’s faster” or “it’s the future”), that is a red flag. If the answer is specific (“it reduces our query latency from 200ms to 15ms for graph traversals that currently require 4 JOINs”), that is worth investigating. I went through this exact exercise when our CTO wanted us to adopt a vector database for search. The hype said “AI-native search.” The reality was that our PostgreSQL full-text search with pg_trgm was handling 99.3% of our search queries under 50ms. The vector database would only help the 0.7% of queries that needed semantic similarity — and adding an entirely new database to our infrastructure for 0.7% of queries was not a trade-off I could justify.Stage 2 — Evaluate operational readiness, not feature lists. The marketing page tells you what a technology can do. The GitHub issues tell you what it cannot. I check: How old is the project? How many production users at our scale? What does the on-call experience look like? Are there war stories from companies our size? For the vector database evaluation, I found three blog posts from companies using it in production. Two of them described “unexpected challenges” that were clearly euphemisms for data loss during compaction. That told me more than any benchmark.Stage 3 — Run a time-boxed, production-realistic spike. Not a “hello world” proof of concept — a spike with realistic data volumes, failure injection, and operational scenarios. Can we back it up? Can we restore from backup? What happens when a node fails? What does the upgrade path look like? I give this 1-2 weeks with 1-2 engineers. If the spike raises more questions than it answers, that is a signal.Stage 4 — Present findings with intellectual honesty. I write a one-page evaluation that includes: what the technology does well, where it falls short, the operational cost of adoption, the migration risk, and my recommendation. When I had to tell our CTO that the vector database was not ready for us, I framed it as: “The technology is promising but operationally immature for our scale. I recommend we revisit in 6 months. In the meantime, here is a lighter-weight approach using pgvector that covers our immediate needs with zero additional operational burden.” He appreciated the analysis because it was thorough, not dismissive. He would not have appreciated “no, it’s hype.”War Story: The most expensive technology hype-adoption I witnessed was at a previous company that migrated from PostgreSQL to a distributed NewSQL database because “we needed to scale globally.” The migration took 9 months, cost roughly 800Kinengineeringtime,andintroducedanewclassofconsistencybugsthattheteamspentthenextyeardebugging.Theirony:areadreplicasetupwithPostgreSQLandaCDNwouldhavehandledtheiractualtrafficpatternsforabout800K in engineering time, and introduced a new class of consistency bugs that the team spent the next year debugging. The irony: a read-replica setup with PostgreSQL and a CDN would have handled their actual traffic patterns for about 2K/month in infrastructure. They did not need a globally distributed database. They needed someone brave enough to say “our current database is fine.” I use this story every time someone proposes a technology migration without first quantifying the actual problem they are solving.

Follow-up: How do you say “no” to a technology your CTO or VP of Engineering is personally championing without damaging the relationship?

The key is to never frame it as “you are wrong.” Frame it as “here is what I found when I investigated, and here is what I recommend.”I always start by acknowledging their insight: “You are right that our search experience needs improvement, and the vector database approach is the right direction long-term.” Then I present my findings as due diligence, not opposition: “I ran a two-week evaluation, and here is what I found…” Data is your shield. If you show up with benchmarks, operational concerns, and a concrete alternative, you are being a responsible engineer. If you show up with opinions and no data, you are being difficult.I also offer a compromise path when possible: “I recommend we start with pgvector as a low-risk first step. If we outgrow it within 6 months, we will have learned enough about our access patterns to make a much better decision about a dedicated vector store.” This gives the CTO a path to their vision while protecting the team from premature complexity.The worst thing you can do is silently comply and let the project fail. That damages trust far more than respectful disagreement backed by evidence.

Tell me about a time you had to give difficult technical feedback to a peer — someone at your level or above — who was emotionally invested in their approach.

“I just told them they were wrong and showed them the data.” This reveals someone who has never navigated the interpersonal complexity of peer feedback. Others say “I let it go to avoid conflict” — equally bad, because it means flawed designs ship without challenge. The weakest answer is “I escalated to our manager,” which is appropriate as a last resort but not as a first move.
This is one of the hardest interpersonal situations in engineering, because technical disagreements between peers can easily become status competitions. The goal is to change the outcome without damaging the relationship.Here is a specific example. A peer — a senior engineer I respected — had spent three weeks designing a custom event-sourcing system for our order processing pipeline. They had written a 12-page design doc, built a proof of concept, and were clearly proud of the work. The problem: our order volume was 50K per day, and event sourcing added enormous complexity for a scale we would not hit for years, if ever. A simpler approach — state-based persistence with an audit log — would have delivered the same business requirements in a third of the time with a fraction of the operational burden.How I handled it:Step 1 — Private conversation first, always. I never give critical feedback on someone’s design in a public review if I can avoid it. I sent a Slack message: “Hey, I have been reading your event-sourcing RFC and I have some thoughts. Can we grab 30 minutes?” Starting in private respects their investment and prevents them from feeling ambushed in front of the team.Step 2 — Lead with genuine appreciation for what they got right. This is not empty flattery — it is calibration. “The way you modeled the domain events is really clean, and the replay mechanism for debugging is genuinely useful. I can tell you have thought deeply about this.” This establishes that my feedback is nuanced, not a blanket rejection.Step 3 — Ask questions instead of making statements. Instead of “event sourcing is overkill,” I asked: “Help me understand the scaling scenario where we would need event replay. What is our projected order volume in 18 months, and at what volume does the simpler approach break down?” This forced both of us to look at the data together rather than debating architecture in the abstract. When they pulled the numbers, they realized the break-even point was at 500K orders per day — 10x our current volume and beyond any realistic 18-month projection.Step 4 — Offer a bridge, not just a critique. “What if we implement the audit log pattern now, which gives us most of the debugging benefits you are going for, and design the data model so that migrating to full event sourcing later is straightforward? That way we ship in 3 weeks instead of 8, and we preserve the optionality.” Giving them a path that preserves the core of their idea makes it easier to adapt.The result: They initially pushed back — “but we will need this eventually.” I acknowledged that was possible and suggested we document the event-sourcing design as a future RFC with a specific trigger: “When order volume exceeds 200K/day, revisit this design.” They agreed. We shipped the simpler version, and as of when I left that company 18 months later, order volume was at 65K/day. The event-sourcing RFC never needed to be revisited.War Story: The one time I handled peer feedback poorly was early in my career. A colleague had designed a caching strategy that I thought was wrong, and I said so in a design review in front of eight people: “This will not work at scale because you have not considered cache invalidation for the user-session edge case.” I was technically right. But the way I delivered it — publicly, bluntly, without acknowledging the good parts of the design — damaged our working relationship for months. They stopped coming to me for design input, and I lost an ally on the team. I learned that being right about the technology and wrong about the delivery is worse than being wrong about both, because the relationship damage compounds long after the technical decision is forgotten.

Follow-up: What do you do when someone receives your feedback, appears to agree, but then does not change their approach?

This is more common than outright disagreement, and it is trickier to handle because the conflict is invisible.First, I check whether my feedback was actually as clear as I thought. Sometimes “I will think about it” means they genuinely need time to process, not that they are ignoring me. I follow up after a few days: “Hey, I wanted to check in on the design discussion we had. Have you had time to consider the alternative approach? I am happy to pair on it if that would be helpful.”If they have heard me, understood me, and still disagree — that is their prerogative. Peer feedback is input, not a mandate. Unless I believe the decision will cause serious harm (data loss, security vulnerability, massive cost), I document my concern in the design review comments for the record and let them own the outcome.If I believe the decision IS seriously harmful, I escalate — but transparently. “I want to bring this to [tech lead/architect] for a second opinion. I am not trying to go around you — I genuinely think this needs another perspective because the stakes are high.” Transparent escalation preserves the relationship. Secret escalation destroys it.

You are a senior engineer on a platform team. Product teams keep complaining that your internal tools are slow, hard to use, and undocumented. They are partially right. How do you respond?

“Product teams do not understand the complexity of what we are building” (defensive). Or “We will add it to the backlog” (dismissive). The worst: “They should read the source code” (genuinely hostile to your users). These answers reveal an engineer who sees internal customers as interruptions rather than the reason the platform team exists.
This scenario is close to my heart because I spent two years on a platform team where this exact dynamic nearly destroyed our credibility with the engineering org. Here is what I learned about fixing it:First, accept the feedback as valid signal, not personal attack. The platform team’s customers are the product engineers. If they say the tools are slow and undocumented, that is a usability bug — treat it with the same severity you would treat a customer-facing performance issue. I have seen platform teams dismiss internal complaints because “it works, they just need to learn it.” That is the same energy as a product designer saying “the users are holding it wrong.”Second, quantify the pain. I spent one week embedded with two product teams, watching how they actually used our deployment tooling. What I saw was brutal: engineers were spending 20-30 minutes per deploy waiting for our CLI tool to run, and when it failed (which happened roughly 15% of the time), the error message was a raw stack trace with no guidance on how to fix it. One engineer had written a personal wiki page called “How to Actually Deploy” with workarounds for our tool’s quirks. That wiki page was more used than our official documentation. That was embarrassing and clarifying in equal measure.Third, treat the feedback loop as a product discipline. I implemented three changes:
  1. A quarterly developer satisfaction survey — 5 questions, takes 2 minutes. “On a scale of 1-5, how would you rate the deployment experience?” We tracked NPS across quarters. Starting score: 2.1 out of 5. After two quarters of focused improvement: 3.8.
  2. Office hours — We held 30 minutes of open office hours every Thursday. Any product engineer could show up with questions, complaints, or feature requests. The first session, two people came. By month three, we had 8-10 regulars. More importantly, we caught usability issues weeks earlier than we would have through formal bug reports.
  3. “Eat your own cooking” policy — Every platform engineer had to deploy a production service using our own tools once per sprint. Not their platform service — a product service, using the same workflow product engineers use. The bugs we found by being our own users were embarrassing. One of our engineers discovered that our retry logic silently swallowed errors if the auth token expired mid-deploy — an issue that had been reported three times by product teams and marked “cannot reproduce” because it only happened with long-running deploys.
War Story: The single most impactful change we made was not a technical improvement — it was adding human-readable error messages. I led a hackathon-style two-day sprint where we catalogued every error our CLI could produce (there were 47 distinct ones), wrote a plain-English explanation for each, and added a suggested fix. Error messages went from ECONNREFUSED: connection refused at 10.0.3.42:8443 to Cannot connect to the deployment controller at 10.0.3.42:8443. This usually means the controller pod is not running. Try: kubectl get pods -n deploy-system | grep controller. If the pod is in CrashLoopBackOff, escalate to #platform-oncall. Deploy-related support requests in our Slack channel dropped by 70% in the following month. That two-day investment saved roughly 5 engineer-hours per week across the org — over 250 hours per year.

Follow-up: How do you prioritize platform improvements against product team feature requests when both are urgent?

This is the central tension of platform engineering, and anyone who tells you there is a formula is lying.My heuristic: platform work should be prioritized by the number of teams it unblocks multiplied by the frequency of the pain. A deployment reliability improvement that affects 8 teams daily beats a custom feature request from one team, even if that one team is loud and politically powerful.I make this visible by maintaining a “platform pain index” — a ranked list of issues weighted by (number of affected teams) x (frequency per week) x (time wasted per occurrence). I share this list with engineering leadership monthly. When a product team VP asks “why have you not built the custom deployment pipeline my team requested,” I can point to the index: “Your request affects 1 team. The items above it affect 6-8 teams. Here is where it sits in the queue and here is when we expect to reach it.”The political key: never say “your request is not important.” Say “your request is important and here is when we will get to it, and here is why these other items come first.” Transparency about prioritization earns far more trust than opaque backlogs.

Your team’s technical recommendation gets overruled by a business decision. The VP decides to ship the version you argued was architecturally unsound because of a partnership deadline. What do you do?

“I would refuse to implement something I know is wrong” (insubordination disguised as principle). Or “I would just do what they say” (doormat who has abdicated technical responsibility). Both are wrong. The strong answer lives in the messy middle.
This has happened to me, and it will happen to every engineer who works on anything commercially important. The answer is not about compliance or rebellion — it is about responsible execution of a decision you disagree with.Here is the real scenario. We were building a data integration for a partnership that represented $4M in annual revenue. My team had designed a proper ETL pipeline with schema validation, error handling, and idempotent processing. The timeline for the full solution was 12 weeks. The VP told us we had 5 weeks because the partnership contract had a go-live date that was already negotiated.I pushed back with specifics: “In 5 weeks, we can build the pipeline but without schema validation or idempotent processing. That means if the partner sends malformed data, it will silently corrupt our database. And if we need to replay data, we will get duplicates.” The VP heard me and said: “I understand the risk. The cost of missing this deadline is losing the partnership. Ship it in 5 weeks.”What I did:1. I documented the decision and the risks in writing. Not as a CYA move — as a responsible engineering practice. I sent an email: “Per our conversation, we are proceeding with the 5-week timeline. The following risks are accepted: [list]. We will address these in a follow-up phase targeted for [date].” This protects the team and creates accountability on both sides.2. I identified the non-negotiable safety rails within the shortened timeline. We could not do full schema validation, but we could add basic type checking that would catch the most dangerous malformed data (wrong data types, null values in required fields). We could not build idempotent processing, but we could add a unique constraint on the database that would at least prevent exact-duplicate inserts and alert us. These took 2 days of the 5-week budget and prevented the worst failure modes.3. I built a concrete plan for the follow-up. I wrote tickets for the schema validation, idempotent processing, and monitoring work — with effort estimates and a proposed timeline. I scheduled a meeting for week 6 to review the follow-up plan with the VP. This turned the “tech debt” from a vague future concern into a committed roadmap item.4. I executed with full commitment. Once the decision was made, I did not passive-aggressively build a slow, resentful implementation. I built the best possible version within the constraints. The team shipped on time, the partnership launched, and the partner sent their first data file on day one.The follow-up reality: The schema validation work was completed in week 8. The idempotent processing took until week 14. During weeks 5-8, we did encounter two data quality issues from the partner that schema validation would have caught — but our basic type checking caught the worst one, and the other was manually corrected in 30 minutes. The total cost of the “shortcut” was about 2 hours of manual cleanup and a slightly elevated anxiety level. Worth it for $4M in ARR? Absolutely.War Story: The time this pattern went badly was at a different company where the VP overruled a security concern — we flagged that a rush integration was not going through our standard security review, and were told “we will do the review after launch.” I documented the risk but did not escalate to the CISO. Three months later, a penetration tester found an SSRF vulnerability in that integration that could have exposed customer PII. The incident response cost far more than the delay would have. I learned that there are business decisions that can overrule engineering preferences (timeline, scope, polish) and decisions that CANNOT be overruled by business pressure (security, data integrity, legal compliance). Knowing where that line is — and being willing to escalate when it is crossed — is a senior engineer’s duty.

Follow-up: How do you prevent the follow-up tech debt work from being perpetually deprioritized after the deadline passes?

This is where most “we will fix it later” plans die. My tactic is to make the debt self-reporting.I set up monitoring that will start alerting when the shortcuts become painful. For the data integration example, I added a metric tracking “data quality exceptions per day.” When that metric crossed a threshold, it automatically created a PagerDuty alert and a Slack notification to the VP’s channel. This turned the tech debt from “engineering wants to clean up” to “the system is telling us it needs attention.”I also tie the follow-up work to the next business milestone. “We have a second partner onboarding in Q3. The current pipeline cannot handle two partners without the schema validation work. We need to complete it by [date] or the Q3 onboarding is at risk.” Business stakeholders respond to business deadlines, not engineering ideals.The nuclear option: if the tech debt is genuinely deprioritized past the point of safety, I write a risk assessment and send it to my skip-level. Not as a threat — as a “you should be aware that this exists and here is the probability and impact of failure.” Escalation to the right altitude is not politics. It is risk management.

You discover that a system you depend on, owned by another team, has a subtle bug that affects your service’s correctness. That team says it is not a priority for them. What do you do?

“I would file a bug and wait for them to fix it.” This is the answer of someone who has never been blocked by cross-team dependency issues in a real organization. At the other extreme: “I would just fix it myself in their codebase” — well-intentioned but potentially disastrous if you do not understand their system’s invariants and deployment pipeline.
Cross-team dependency bugs are one of the most common and most frustrating patterns in any organization with more than 3 engineering teams. I have a playbook for this, and it is deliberate about escalation sequencing:Level 1 — Fix it locally if you can (hours, not days). Before involving the other team at all, I ask: can I work around this bug on my side? If their API returns slightly wrong data in an edge case, can my service detect and correct it? A defensive check on my side might take 30 minutes to implement and fully unblock me. This is not ideal — it is a band-aid — but it is pragmatic. I document the workaround with a comment: // WORKAROUND: payments-service returns null tax_rate for legacy accounts (JIRA-4521). Remove when upstream fix lands.Level 2 — Offer to do the work (days). If the workaround is insufficient, I go to the owning team with a specific, actionable proposal: “I have identified the bug, I have written a failing test that reproduces it, and I have a draft PR with a proposed fix. Can you review it?” This dramatically lowers the barrier. Their “not a priority” usually means “we do not have capacity to investigate, diagnose, AND fix this.” By doing 80% of the work, I change the ask from “please investigate this” to “please review this PR.”At one company, I fixed 6 cross-team bugs this way over a year. It earned me a reputation as someone who unblocks themselves — which turned out to be valuable social capital when I later needed those teams to prioritize something I could not fix myself.Level 3 — Escalate with data (one week). If the bug genuinely needs the owning team’s attention and they will not engage, I escalate — but with evidence, not emotion. I write a short document: “Bug X in service Y causes Z impact to our customers. Frequency: N times per day. Customer impact: [specific]. Business cost: [estimated]. Our workaround is [description], but it is insufficient because [reason]. We need the owning team to prioritize this fix.” I send this to both team leads and, if necessary, their shared manager.The data matters because “another team will not fix a bug” sounds like whining. “A bug in service Y is causing $15K/month in incorrect billing and affecting 200 customers” is a business problem that gets attention.Level 4 — Architecture-level fix (long term). If cross-team dependency bugs are a recurring pattern, the root cause is often architectural — tight coupling, shared mutable state, or missing contracts. This is where staff-level thinking kicks in. After the third cross-team bug in 6 months involving the same two services, I proposed a contract-testing framework (using Pact) that would catch breaking changes at CI time. That took 3 weeks to implement but eliminated the class of problem entirely.War Story: The most politically complex version of this I experienced was when the other team’s “bug” was actually a documented behavior that I disagreed with. Their service intentionally rounded currency values to 2 decimal places before returning them, which caused off-by-one-cent errors in our aggregation. They argued it was correct (per their spec). I argued it caused financial reporting discrepancies. The resolution required escalating to a VP who decided the financial accuracy requirement trumped the other team’s spec, and they changed the rounding behavior. The whole process took 6 weeks. I learned that sometimes a “bug fix” is actually a “cross-team contract negotiation,” and framing it correctly is half the battle.

Follow-up: When is it appropriate to contribute code directly to another team’s codebase versus going through their process?

It depends entirely on the organizational culture and the nature of the change.At companies with strong inner-source culture (Google, Stripe, many modern orgs), contributing directly to another team’s codebase via their standard PR process is encouraged and expected. I have worked at a company where roughly 30% of PRs to any given service came from outside the owning team. The key is following their conventions — their style guide, their test patterns, their review process. You are a guest in their house.At companies without this culture, sending an unsolicited PR can feel territorial. In that case, I write the code but do not submit it. Instead, I share it with the team lead: “I wrote a draft fix. Feel free to use it as-is, modify it, or ignore it and write your own. I wanted to lower the effort for you.”The line I never cross: deploying code to another team’s service without their explicit approval. Even if I have commit access, deploying changes they have not reviewed and do not understand is a recipe for an incident they cannot debug. The owning team must always be in the loop for production changes.

Describe a time when the “obvious” architectural choice turned out to be wrong. What made the counterintuitive option the right one?

They cannot come up with an example, or they describe a situation where the “obvious” choice was obviously wrong to anyone with experience (like “we thought we should use a single server for everything”). The question is specifically designed to surface situations where the conventional wisdom failed and the candidate had to think independently.
The best example I have is when we were designing a notification system for a fintech application — transaction alerts, security notifications, marketing messages. The “obvious” architecture was microservices: separate services for email, SMS, push notifications, and in-app notifications, with a routing layer that dispatches based on notification type and user preferences. This is what every architecture blog recommends. It is also what we built.And it was wrong for us.Why the obvious answer failed: Our team was 4 backend engineers. We now had 5 services to deploy, monitor, and maintain (4 channel services + the router). Each service had its own deployment pipeline, its own health checks, its own retry logic, and its own failure modes. When the email service backed up due to a Sendgrid rate limit, the router did not handle backpressure correctly and started dropping SMS notifications too. Debugging cross-service issues required correlating logs across 5 services with different log formats. Our on-call rotation became a nightmare because every incident required understanding the interactions between all five services.The counterintuitive solution: a modular monolith. After 4 months of pain, I proposed consolidating back to a single service with internal modules for each channel. Same clean separation of concerns, same independent testability — but one deployment, one log stream, one health check, and shared retry/backpressure logic. The team pushed back initially: “But that is going backwards! Monoliths do not scale!”I made the case with numbers: “Our notification volume is 200K per day. This service could handle 5M per day on a single instance. We do not have a scale problem. We have an operational complexity problem. We are spending 40% of our on-call time on inter-service communication issues that would not exist in a monolith.”We migrated back over 3 weeks. On-call incidents related to the notification system dropped from an average of 4 per month to 0.5 per month. Developer velocity on notification features increased roughly 2x because engineers no longer had to coordinate deploys across 5 services.The lesson: The “obvious” architectural choice is usually the one optimized for a scale or organizational size you do not have. Microservices are the right pattern when you have 50 engineers and millions of requests per second. They are the wrong pattern when you have 4 engineers and 200K notifications per day. The right architecture is the one that matches your actual constraints — team size, traffic volume, operational maturity — not the one that matches the conference talk you watched last month.War Story: I later met an architect at a large bank who told me his team ran a microservices architecture with 160 services maintained by 12 engineers. Each engineer was responsible for 13 services. Deployments took the entire team a full day because of cross-service dependency ordering. He called it “microservices hell” and was leading a multi-year initiative to consolidate back to 8 well-bounded services. His estimate of the total cost of the premature microservices adoption: over $3M in engineering time over 3 years. The lesson generalizes: an architecture that is ideal at Google-scale can be catastrophic at startup-scale, and vice versa.

Follow-up: How do you know when your team HAS outgrown a monolith and microservices become the right move?

There are three concrete signals, and you should need at least two of them before breaking up:
  1. Deploy contention. Multiple teams are regularly blocked waiting to deploy because another team’s changes are in the pipeline. If your deploy frequency is declining and the bottleneck is coordination (not test failures or pipeline speed), that is a real decomposition signal.
  2. Blast radius problems. A bug in one module causes an outage in an unrelated module because they share a process or a database connection pool. If you are regularly experiencing incidents where the root cause is “module A affected module B through shared resource C,” your modules need process-level isolation.
  3. Team ownership boundaries are clear. If you have distinct teams that own distinct business domains and they are stepping on each other’s code, a service boundary might help. But if the same 4 engineers work on everything, splitting into services just means 4 engineers now have to context-switch across 4 repos instead of navigating one.
The litmus test I use: “Will splitting this service reduce total engineering effort, or just move the complexity to the network layer?” If the complexity just moves, keep the monolith and add better internal boundaries (modules, packages, clear APIs).

Your team just went through a painful reorg. Half the engineers are new, the other half are demoralized. You are the most senior IC on the team. What do you do in the first month?

“I would focus on the technical work and let the manager handle the people stuff.” This reveals someone who does not understand that senior ICs — especially in disrupted teams — have a critical role in setting technical culture, providing continuity, and rebuilding trust. Others say “I would schedule 1:1s with everyone” which is a start but lacks any actionable framework.
I have been through this twice, and the second time I handled it much better than the first because I learned that post-reorg, the technical work is secondary to the human work for the first 2-4 weeks.Week 1 — Stabilize through clarity. The biggest source of demoralization post-reorg is ambiguity: “What are we even working on? Who owns what? Is my project still alive?” As the senior IC, I create a “state of the team” document within the first 3 days:
  • What systems does this new team own? (List them explicitly — ambiguity about ownership is poison)
  • What are the active projects and their status? (In progress, paused, cancelled, unknown)
  • Who is the on-call contact for each system? (Even if it is temporary)
  • What decisions are pending that need a decision-maker?
I share this with the new team and the manager. It is not perfect — it is deliberately a working draft. But it gives everyone a shared reality to operate from instead of anxious speculation.Week 2 — 1:1s with everyone, but with a specific question. I schedule 30 minutes with each engineer. Not generic “how are you doing” conversations. I ask three specific questions: (1) “What is the one thing you were working on before the reorg that you think should continue?” (2) “What context about our systems do you need that you do not have?” (3) “What is your biggest concern about the new team structure?” The answers to question 3 are the real gold — they surface fears and resentments that will fester if not addressed.Week 3 — Create an early win. Demoralized teams need momentum, not strategy documents. I identify the smallest, most impactful thing we can ship as a team — a bug fix that has been annoying customers, an operational improvement, a piece of tech debt that everyone hates. The goal is to have a shared accomplishment within the first month that proves “this new team can ship.” At my last reorg, the early win was a 2-day project to add structured logging to our three most-paged services. It was not glamorous. But when on-call got better within a week, the team felt competent and united.Week 4 — Establish working agreements. By now I have enough context to facilitate a team norming session: How do we do code reviews? What is our on-call rotation? How do we make architectural decisions? What does “done” mean for a feature? I do not dictate these — I facilitate the conversation. The new engineers bring fresh perspectives, and the existing engineers bring context. The synthesis is a set of working agreements that the whole team owns.War Story: The first reorg I went through, I made the mistake of immediately trying to “fix” the technical architecture for the new team scope. I spent two weeks writing an ambitious RFC for how to reorganize our services along the new team boundaries. Nobody engaged with it because they were still processing the organizational change. I learned that timing matters — you cannot do architecture work on a team that has not yet formed a shared identity. The human infrastructure has to be rebuilt before the technical infrastructure can be rethought.One thing I deliberately avoid in the first month: complaining about the reorg itself. Even if it was a bad decision (and sometimes it is), the senior IC who spends the first month saying “this reorg was stupid” poisons the team’s chance of succeeding. You can have that opinion privately. In the team context, you model forward motion: “This is our reality now. Here is how we make it work.”

Follow-up: How do you handle the situation where the reorg split domain knowledge — the engineers who understood the system are now on a different team?

This is one of the most common and most damaging reorg outcomes, and there is no quick fix — only mitigation.Immediately, I set up cross-team knowledge transfer sessions. Not vague “tell us about the system” meetings, but specific, recorded walkthroughs: “Walk us through how the payment reconciliation job works, including the failure modes and the manual recovery procedure.” I schedule 2-3 of these in the first two weeks and record them in Loom for future reference.Then I build the documentation that should have existed before the reorg. I pair with the remaining engineer who has the most context (there is always at least one) and we write runbooks for the top 5 on-call scenarios. This is painful and slow, but it is the only way to transfer tacit knowledge into explicit knowledge.For the medium term, I negotiate a “transition support agreement” with the other team: for the next 3 months, their engineers will be available for 2 hours per week to answer questions about the systems they used to own. This is not charity — it is in their interest too, because if our team breaks their old system, the customers do not care about org charts.The hard truth: some knowledge will be lost. No amount of documentation captures the intuition that comes from having operated a system for two years. Accept that the first 3 months will have more incidents and slower feature velocity. Set those expectations with leadership explicitly so the team is not penalized for a problem that was created by the reorg itself.

A junior engineer on your team ships a change that causes a production incident. In the postmortem, a director asks, “How did this get past code review?” You reviewed the PR. What do you say?

“The junior engineer should have caught it in testing” (blame-shifting downward — the worst answer). Or “I missed it, it is my fault” (takes all blame without examining the systemic issue, which feels noble but does not prevent recurrence). Some say “the CI pipeline should have caught it” (deflecting to tooling), which is partially right but avoids personal accountability.
I say exactly what happened, without defensiveness and without throwing the junior engineer under the bus: “I reviewed the PR and approved it. I did not catch the issue because [specific honest reason]. Here is what I think we should change to prevent this class of problem.”Let me give you the real version of this that happened to me. A junior engineer on my team added a database migration that changed a column from nullable to non-nullable. The migration worked in staging because staging had clean data. In production, 3% of rows had null values in that column, which caused the migration to fail mid-deploy, locking the table for 8 minutes and causing a partial outage.I had reviewed the PR. I approved it. I missed it.In the postmortem, when the director asked how it got past review, I said: “I approved this PR. I reviewed the migration for syntax and logical correctness, but I did not check the production data distribution for null values in the affected column. That is a gap in my review process, and it is also a gap in our tooling — we do not have automated checks that validate migrations against production data characteristics.”Then I pivoted to systemic fixes:
  1. My personal fix: I added a personal checklist for migration reviews that includes “check for null/default values in affected columns using a read-replica query.” I shared this checklist with the team.
  2. Team process fix: We added a required section to migration PRs: “Production data impact analysis” where the author must include a query result showing the data distribution for affected columns.
  3. Tooling fix: We built a CI step that runs migrations against an anonymized production data snapshot. This catches the “works in staging, fails in production” class of bugs automatically. It took one week to build and has prevented 4 similar issues since.
The director was satisfied not because I took blame, but because I demonstrated that the incident led to concrete improvements at three levels: personal, process, and tooling.What I explicitly did NOT do: I did not say anything that made the junior engineer feel responsible. They followed the correct process — they wrote the migration, added tests, and submitted it for review. The reviewer (me) is the last line of defense, and I missed it. More importantly, the system should not depend on a human catching every edge case in review — that is what tooling is for.War Story: After that incident, the junior engineer came to me privately and said they felt terrible about the outage. I told them something a mentor once told me: “The outage was not caused by your migration. It was caused by a system that allowed an unsafe migration to reach production. You exposed a gap that existed before you joined the team. The fact that we fixed it means you made our system safer.” They stayed on the team for two more years and became one of our strongest engineers. How you handle the aftermath of an incident defines your team’s psychological safety more than any amount of “blameless culture” posters on the wall.

Follow-up: How do you balance thorough code review with not becoming a bottleneck for your team’s velocity?

This is a genuine tension, and I have seen both failure modes: the reviewer who rubber-stamps everything (fast but dangerous) and the reviewer who blocks every PR for days with 40 comments (thorough but paralyzing).My approach is tiered review depth based on blast radius:
  • Migrations, security-sensitive code, and public API changes: I review with maximum depth. I block time for these. I run queries against staging data. This might take 1-2 hours for a complex migration. The team knows these PRs will take longer, and they plan accordingly.
  • Feature code in well-tested areas: I review for design, readability, and test coverage. I trust the CI pipeline to catch correctness issues. I aim for 30-minute turnaround. If I have more than 3 comments, I suggest a pairing session instead of async back-and-forth.
  • Documentation, config changes, dependency updates: I skim for obvious issues and approve quickly. These do not need deep review, and slow-rolling them is pure waste.
I also invest in making reviews faster for everyone: I maintain a team style guide so we do not re-debate formatting in every PR, I enforce linting in CI so review comments focus on logic rather than style, and I have “review templates” for common PR types (migrations, API changes, feature flags) that prompt the author to include the context reviewers need.The metric I track: median time from PR opened to first review. Our team target is under 4 hours during business hours. If it creeps above that, we discuss it in retro.

You have three competing stakeholders: the product manager wants a new feature, the security team wants you to fix a vulnerability, and the SRE team says your service’s reliability is below SLO. You can only do one this sprint. How do you decide?

“Security always comes first” or “I would do what the product manager wants because they set the roadmap” or “Reliability is non-negotiable.” All of these are dogmatic answers that ignore context. The correct answer depends entirely on specifics: How critical is the vulnerability? How far below SLO are you? How revenue-critical is the feature? Engineers who answer with a universal rule have not dealt with real prioritization pressure.
This is a triage problem, and triage requires data, not principles. Here is my actual decision process:Step 1 — Quantify each ask on the same axes. I need all three requests expressed in comparable terms:
  • Security vulnerability: What is the CVSS score? Is it actively being exploited, or theoretical? Is it in a public-facing service or an internal tool? Is customer data at risk? A CVSS 9.1 RCE in a public API with known exploits in the wild is a “drop everything” situation. A CVSS 4.0 information disclosure in an admin tool used by 3 people can wait a sprint.
  • SLO breach: How far below SLO are we? Is it getting worse or stable? What is the customer impact? If we are at 99.5% against a 99.9% SLO and trending down, that is urgent — we are consuming error budget at an unsustainable rate. If we dipped to 99.85% due to a one-time incident that has been resolved and we are recovering, it can wait.
  • Feature request: What is the revenue or retention impact? Is there a contractual deadline? Is a customer churning without it? “Would be nice to have” is very different from “our largest customer’s renewal depends on this being live by end of month.”
Step 2 — Assess reversibility and escalation risk. Security vulnerabilities tend to have step-function risk — they are fine until they are catastrophic. SLO breaches compound. Feature delays are usually linear. This means security issues often have hidden urgency that the CVSS score alone does not capture.Step 3 — Make a recommendation, get alignment, commit. I write a brief priority proposal: “Given [vulnerability assessment], [SLO analysis], and [feature urgency], I recommend we address the security vulnerability this sprint for these reasons. SLO work is next sprint. Feature work begins in sprint N. Here is my reasoning.” I share this with all three stakeholders simultaneously. Transparency is critical — the worst outcome is one stakeholder feeling blindsided.In my specific experience: The most common correct answer is to timebox the security fix (often a few days, not a full sprint), do the SLO work (because it affects all customers, not just one), and negotiate the feature timeline. But I have also done the feature first when the revenue impact was existential — at a startup where a single customer represented 30% of ARR and their contract renewal literally depended on a specific feature shipping by a specific date. Context dominates rules.War Story: At one company, I got all three of these requests in the same standup on a Monday morning. The security team had found a SQL injection vulnerability in our search endpoint. The SRE team showed us we had burned through 80% of our monthly error budget in the first week. And the PM had a customer demo on Friday that required a new dashboard widget. I split the sprint: two engineers on the SQL injection fix (shipped by Tuesday), I personally tackled the SLO issue (a connection pool misconfiguration — fixed by Wednesday), and one engineer started the dashboard widget (shipped Thursday for the Friday demo). Was it a stressful week? Yes. But the triage was sound because I had assessed each item’s actual urgency instead of applying a blanket priority rule. The SQL injection could not wait, the SLO fix was a known root cause with a quick fix, and the feature had a hard external deadline. If any of those factors had been different, the prioritization would have been different too.

Follow-up: How do you handle the emotional dynamics when you tell a stakeholder their urgent request is not your top priority this sprint?

The key is to never say “your request is not important.” Say “your request is important, and here is when we will address it and why the other item comes first.”I have a personal rule: I never communicate a prioritization decision over Slack or email. I do it in a synchronous conversation — video call at minimum — so the stakeholder can ask questions, push back, and feel heard. Being deprioritized feels bad. Being deprioritized by a Slack message feels dismissive.I also share the full triage context: “Here is what we are facing this sprint. Here are the three competing requests. Here is how I assessed each one. I chose to prioritize the security fix because [concrete reason]. Your feature work starts next sprint, and here is the specific engineer who will work on it.” This level of transparency usually converts frustration into understanding.The move that builds the most long-term trust: following through on the timeline you gave. If you said “next sprint,” deliver next sprint. If you repeatedly deprioritize a stakeholder and miss your own revised timelines, they will escalate — and they will be right to.
Strong Answer Framework:Step 1 - Diagnose the real blocker honestly: “Close but not quite” is the most dangerous feedback in engineering because it is ambiguous by design — managers use it when they do not want to either promote you or tell you the truth. I run a three-way diagnostic: (a) ask my manager for the two specific artifacts a recently-promoted staff engineer produced that I have not produced, (b) read the actual staff-level rubric line-by-line and mark each criterion as “demonstrated,” “partial,” or “missing,” and (c) ask a staff engineer I trust — not my manager — to review my last six months of work and give me their unvarnished read. If all three sources converge on the same gap, that is the gap. If they diverge, the real problem is that no one has a clear picture of my impact, which is itself the problem to solve.Step 2 - Decide between scope expansion and company change: Once I know the gap, I test whether my current company can realistically close it in the next 6-9 months. Staff impact usually requires a problem big enough to warrant it — a cross-team initiative, a multi-quarter technical strategy, a platform others depend on. If my team does not have a staff-sized problem and leadership cannot credibly point to one, I am not stuck because of my skills; I am stuck because of the scope available to me. At that point I have two options: negotiate a lateral move within the company to a team with bigger scope, or interview externally at a company that is hiring staff-level ICs. The brutal truth is that the fastest route to staff is often a title change at a new company, not waiting for a promo at the current one.Step 3 - Ship one ‘promo-defining’ project and document it like a staff engineer already: If I stay, I commit to one project that will serve as the unambiguous artifact — usually a technical strategy doc that influences roadmap across 2+ teams, or a migration that reduces cost/risk measurably. I write it the way Tanya Reilly describes staff engineers writing: problem statement, constraints, options considered, recommendation, explicit trade-offs, rollout plan. I socialize it in skip-level 1:1s before the promo cycle, not during. By cycle time, the question is not “did they show staff-level impact?” but “can we afford to lose this person if we do not promote them?”Real-World Example: A senior engineer at Stripe in 2022 was stuck at L5 for nearly three years. Her manager kept citing “influence.” She wrote a six-page technical strategy on data pipeline consolidation that identified $2M in annual redundant spend, socialized it with three VPs individually before any formal review, and got named technical lead for the resulting initiative. She was promoted to L6 the next cycle — not because her skills changed, but because the artifact made the impact undeniable.Senior Follow-up Questions:
  • “What if the feedback is ‘you are a strong IC but not a leader’ — how is that different to address?” - Strong answer: That is code for “you have not influenced people outside your direct team.” The fix is not to become more charismatic but to produce written artifacts (RFCs, strategy docs, post-mortems) that other teams reference when making decisions — influence scales through writing, not through meetings.
  • “How long should you stay stuck before job-hopping becomes the right move?” - Strong answer: My rule is two full promo cycles with concrete, documented feedback and a credible path forward. If after cycle two the feedback has shifted or the ‘path forward’ has not materialized, the company is signaling — intentionally or not — that they will not promote you. Staying a third cycle is optimism punishing your career.
  • “Is it ever right to threaten to leave to force a promotion?” - Strong answer: Only if you have a real offer and are genuinely prepared to take it. Fake leverage is the fastest way to permanently damage the relationship with your manager, and companies increasingly track ‘counter-offer’ promotions and penalize them at the next cycle.
Common Wrong Answers:
  • “I would just work harder and take on more projects.” - This is the trap. Volume of work does not produce staff-level impact; one well-chosen project does. More projects often means more glue work that stays invisible.
  • “I would ask my manager what I need to do differently.” - You have done this for two years and it has not worked. Asking the same person the same question expecting different information is not diagnosis; it is avoidance of the harder conversation with peers or a job search.
Further Reading:
  • “The Staff Engineer’s Path” by Tanya Reilly (O’Reilly, 2022) — especially the chapter on scope and the “big three” staff archetypes
  • Will Larson’s StaffEng.com interviews — patterns across engineers who made staff at Stripe, Slack, Google
  • Related chapter: Communication and Soft Skills on writing RFCs that influence across teams
Strong Answer Framework:Step 1 - Separate the move from the narrative about the move: The internal move and the external story are two different problems. Internally, the question is whether your current company has a Senior+/Staff IC role with scope that matches what you were managing. If they do not, the move will feel like a demotion no matter how you frame it, because you will be reporting to someone who used to be your peer and working on scope smaller than your old team’s. I would not attempt the internal transition unless there is a concrete staff-track role with clear scope. Externally, the narrative is easier: Charity Majors has made the IC-to-manager-and-back path a respected pattern, and every hiring manager worth talking to knows managers-turned-ICs often become the strongest senior ICs on a team.Step 2 - Use the ‘management sabbatical’ framing, not ‘stepping down’: The language matters. “I managed for two years and learned what I needed to, and now I am returning to the work I do best” is a confident, forward-looking frame. “I realized management was not for me” is defensive and invites follow-up questions about what went wrong. In interviews, I lead with the skills I gained: running roadmaps across multiple engineers, driving prioritization conversations with product and design, navigating performance management. Those are all staff-track skills. I am not walking back from management; I am bringing management skills into an IC seat where I can still write code and design systems.Step 3 - Prepare for the ‘coding rust’ question and the ‘why would you report to a former peer’ question: These are the two questions every interviewer will ask. For coding rust: acknowledge it honestly (“I have been managing more than coding for 18 months, so my React patterns are rusty, but my systems design and architecture skills have sharpened because I was making those calls at the team level”) and be ready to demonstrate on the whiteboard. For the reporting question: “My old team’s manager is now someone I look forward to learning from — the IC ladder at a staff level runs parallel to management, and the best staff engineers I know have managers who enable them, not manage them.” If the company cannot separate ‘reporting line’ from ‘seniority,’ that is a red flag about the company, not about you.Real-World Example: Charity Majors’ own story is the canonical one — she went from engineer to manager at Facebook, hated it, left, co-founded Honeycomb as CTO where she did both, then transitioned back to pure IC as the company grew. Her blog post “Engineering Management: The Pendulum or the Ladder” (2017) is the definitive public essay on this transition and is frequently referenced in staff+ interviews. Hiring managers who have read it view the IC-manager-IC arc as a strength signal.Senior Follow-up Questions:
  • “How do you convince a hiring manager you will not ‘stealth manage’ on an IC team — telling everyone what to do?” - Strong answer: I commit explicitly to the team norms: I do not run standup, I do not run 1:1s with teammates, I bring decisions to the manager and tech lead rather than making them unilaterally. I also say it out loud in my first week: “I used to manage, I am here to IC, please call me out if I slip into manager-mode.”
  • “What if the only IC role available is a Senior role, but you want Staff?” - Strong answer: I would not make the jump for a Senior role if I was managing team scope. Better to stay managing while I interview at companies that have open Staff IC roles, and time the move to a title that reflects my actual scope. Accepting a Senior role ‘to rebuild my coding’ almost always locks you into that level for 2+ years.
  • “How do you handle the pay conversation — managers usually earn more than ICs at the same level?” - Strong answer: I negotiate based on total comp and market data for Staff IC, not on my previous manager base. At most tech companies, Staff IC and Senior Manager are paid equivalently on the compensation bands; if a company cannot match that, it tells me their IC track is not actually respected.
Common Wrong Answers:
  • “I would just take any IC role to get back to coding.” - This undervalues your management experience and locks you into a level below your actual seniority. You will spend 2 years climbing back to where you were.
  • “I would not tell new employers I used to manage.” - Hiding it backfires when it comes up in reference checks, and it wastes the single biggest differentiator you have over other senior IC candidates.
Further Reading:
  • Charity Majors, “Engineering Management: The Pendulum or the Ladder” (2017 blog post)
  • “The Manager’s Path” by Camille Fournier — the chapter on whether to manage is equally useful in reverse
  • Related chapter: Leadership, Execution, and Infrastructure on leading without authority, which maps directly to Staff IC expectations
Strong Answer Framework:Step 1 - Accept that glue work is real labor but it is evaluated on two different axes: The first mistake is to frame this as “glue work is bad and I should do less of it.” Glue work is what keeps teams functional, and somebody has to do it. The second mistake is to assume that doing glue work is the same as being credited for it — Tanya Reilly’s “Being Glue” talk is about this exact asymmetry. The work is valuable but performance reviews reward visible technical artifacts, and glue work is structurally invisible. So the real diagnosis is not “am I doing the wrong work?” but “am I doing work that my org will credit me for at the next level?”Step 2 - Audit what you are doing and split it into three buckets: I spend a week tracking my actual time in 30-minute increments and categorize every block: (a) glue work that I am doing because no one else will — but that could be done by a mid-level engineer or automated; (b) glue work that genuinely requires senior judgment — onboarding a new hire onto a complex system, mediating a cross-team design dispute; (c) technical work that is actually producing promotable artifacts. If bucket (a) is 30%+ of my week, that is the problem. I am the team’s default tool because delegating has a short-term cost, but the long-term cost is my career. I move bucket (a) work off my plate: build the runbook so onboarding does not require me, automate the flaky deploy so nobody needs to ‘ask the senior person,’ nominate a different meeting attendee so I stop being the default cross-team representative.Step 3 - Convert the bucket (b) work into visible artifacts: The senior-judgment glue work is genuinely valuable, but it gets credited only if it is documented. Every cross-team mediation becomes a design doc summarizing the decision and rationale — now it is citable in your promo packet. Every onboarding becomes a contribution to the team handbook — now it is measurable impact. Every unblocking conversation becomes an RFC or a one-pager that lives in the team wiki. This is the Tanya Reilly move: do not stop doing glue work, make it legible. Then in the next perf review I have a specific claim: “I reduced new-hire time-to-first-commit from 4 weeks to 9 days by writing [document] and building [onboarding tool]. That unblocked [N] engineers representing [X] engineer-weeks of productivity.”Real-World Example: A senior engineer at Squarespace (in Tanya Reilly’s actual team context, circa 2018-2019) was doing heavy glue work and kept getting “needs more technical impact” feedback. She stopped treating onboarding as a series of 1:1 conversations and built a structured onboarding program — documented, repeatable, with clear milestones. The program itself became the artifact: it was cited in her promotion packet as “reduced onboarding ramp-up time by 60% across the org,” and it got her promoted to staff. The work was the same; the framing and the artifact changed.Senior Follow-up Questions:
  • “How do you decide which glue work to stop doing when saying no will make the team temporarily worse off?” - Strong answer: I pick the piece that has the highest ‘cost to me’ and ‘lowest switching cost to the team.’ If dropping it for one sprint would cause a critical failure, I keep doing it while I build the documentation or tool that replaces me. If dropping it for one sprint just means someone else is inconvenienced, I drop it now — the short-term pain trains the team to distribute the load.
  • “Isn’t refusing glue work just offloading it onto others, often women and underrepresented folks who feel more pressure to take it?” - Strong answer: Yes, and that is why the solution is not ‘refuse glue work’ but ‘make the work visible and distribute it equitably.’ I flag the pattern to my manager — “this work is essential and falling on a few people; we need to rotate it or fund it explicitly” — rather than silently dropping it for someone else to pick up.
  • “What if your manager explicitly asks you to do the glue work?” - Strong answer: Then I ask them to explicitly value it in my review. “You are asking me to prioritize onboarding and cross-team coordination — I will do it, and I want us to agree now that this work will count as technical leadership impact in my next review, with these specific artifacts as evidence.” If they will not commit in writing, I know what the review will say before it happens.
Common Wrong Answers:
  • “I would just refuse to do glue work and focus only on coding.” - This tanks team health, and at senior level, team health is part of your job. You will get feedback that you are ‘not a team player’ and the promotion will move further away.
  • “I would do the glue work and trust that my manager sees the impact.” - This is the exact failure mode Tanya Reilly’s talk warns about. Invisible work stays invisible no matter how much of it you do. Trust is not a substitute for legibility.
Further Reading:
  • Tanya Reilly, “Being Glue” (2019 talk and essay) — the definitive piece on this problem
  • “The Staff Engineer’s Path” by Tanya Reilly — Part II on “Big-Picture Thinking” and documentation
  • Related chapter: Communication and Soft Skills on making written artifacts the unit of influence