Python interviews span an unusually wide range — from basic syntax and data structure manipulation to CPython internals, concurrency models, and framework-specific architecture decisions. What makes Python interviews distinctive is that the language’s simplicity on the surface hides significant depth underneath. Interviewers use this gap to distinguish candidates who have genuine understanding from those who have only written scripts. The GIL, descriptor protocol, metaclasses, and memory management model are the topics where strong candidates separate themselves. The questions below are organized from fundamentals through advanced topics. For each question, the provided answer goes beyond the textbook definition to include the “why,” the trade-offs, and the production context that interviewers want to hear. Practice by explaining each concept out loud before reading — your ability to articulate clearly matters as much as knowing the right answer.Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
1. Python Basics
What is Python and its key features?
What is Python and its key features?
- Interpreted with bytecode compilation: Python is not purely interpreted. CPython compiles source to
.pycbytecode first, then the VM interprets that bytecode. This is why you see__pycache__directories. Understanding this distinction shows you know what actually happens when you runpython script.py. - Dynamic typing with strong typing: Python is dynamically typed (variables do not declare types) but strongly typed (you cannot add a string to an integer without explicit conversion). Many candidates confuse “dynamic” with “weak” typing — JavaScript is weakly typed, Python is not.
- First-class functions: Functions are objects. You can assign them to variables, pass them as arguments, return them from other functions. This enables decorators, closures, and functional patterns that are central to idiomatic Python.
- Extensive standard library (“batteries included”):
collections,itertools,functools,pathlib,dataclasses,typing— knowing these separates a Python developer from someone who just writes Python syntax. - Memory management via reference counting + cyclic GC: Automatic memory management, but understanding it matters for long-running services where memory leaks from circular references can crash your app at 3 AM.
- GIL (Global Interpreter Lock): The elephant in the room. One thread executes Python bytecode at a time in CPython. This has massive implications for concurrency design.
- “You mentioned Python compiles to bytecode. What is the difference between CPython, PyPy, and Cython, and when would you choose each?”
- “How does Python’s strong dynamic typing affect how you design a large codebase with 50+ engineers? What tooling fills the gap?”
- “What is one feature of Python you think is a design mistake, and how do you work around it?”
- “Python 3.13 introduced a free-threaded build. What does this mean for the GIL and existing C extensions?”
What are Python's built-in data types?
What are Python's built-in data types?
- Numeric:
int(arbitrary precision — no overflow),float(64-bit IEEE 754 double),complex(real + imaginary) - Sequence:
list(mutable, ordered),tuple(immutable, ordered),range(lazy integer sequence),str(immutable Unicode text) - Mapping:
dict(insertion-ordered since 3.7, hash map under the hood) - Set:
set(mutable, hash-based unique collection),frozenset(immutable, can be dict key) - Boolean:
bool(subclass ofint—True == 1,False == 0) - Binary:
bytes(immutable),bytearray(mutable),memoryview(zero-copy buffer access) - None Type:
NoneType(singleton — only oneNoneobject exists)
boolbeing a subclass ofintmeansTrue + True == 2. This is by design (PEP 285) and occasionally causes bugs:isinstance(True, int)returnsTrue.dictordering is an implementation detail in CPython 3.6 and a language guarantee since 3.7. If you are writing code that must run on older Python, do not rely on it.memoryviewis critical for high-performance work — it lets you slice bytes without copying data. NumPy uses this concept extensively.- Python integers have arbitrary precision:
2**10000works fine. There is nointoverflow. Under the hood, CPython uses a C struct with a variable-length array of digits.
- “Why can you use a tuple as a dictionary key but not a list? What is the relationship between hashability and immutability?”
- “What happens when you put a
float('nan')into a set? Can you have duplicate NaN values? Why doesfloat('nan') == float('nan')returnFalse?” - “When would you use
bytearrayoverbytes? Give a network protocol example.” - “
sys.getsizeof(1)returns 28 bytes for a single integer. Why is a Pythonintso large compared to a Cint?”
What is the difference between lists and tuples?
What is the difference between lists and tuples?
append, pop, sort in place). Tuples are immutable (cannot change after creation). But a tuple can contain mutable objects: t = ([1, 2], [3, 4]) — the tuple itself is immutable, but the lists inside it can change. This is a common gotcha.Performance (real numbers):- Tuple creation is ~5-10x faster than list creation for small sizes (CPython optimizes tuple allocation with a free list cache)
- Tuples use less memory: a 3-element tuple is ~64 bytes vs ~88 bytes for a 3-element list (CPython 3.11), because lists over-allocate space for future
append()calls - Tuple element access is marginally faster (simpler C struct, no indirection for resizing metadata)
timeitbenchmark: creating(1,2,3)takes ~15ns vs[1,2,3]at ~50ns on typical hardware
tuple as a dict key for multi-dimensional lookups: cache[(x, y, z)].Semantic meaning: In idiomatic Python, tuples represent heterogeneous fixed-structure records (like a row from a database: (name, age, email)), while lists represent homogeneous variable-length collections (like a list of users). This is why namedtuple exists.What interviewers are really testing: Whether you understand the why behind choosing one vs the other, not just the syntax difference.Red flag answer: “Tuples use parentheses and lists use brackets.” This shows zero understanding of the engineering implications.Follow-up:- “If tuples are immutable, why can
t = ([1,2],)have its inner list modified? What does immutability actually mean here?” - “When would you use a
namedtuplevs adataclass? What are the trade-offs in terms of memory, mutability, and inheritance?” - “You have a function returning multiple values. Should you return a tuple or a list? Why? What about a
dataclass?” - “Can you hash
(1, [2, 3])? Why or why not? What exception is raised?”
Explain Python's mutable and immutable types
Explain Python's mutable and immutable types
list, dict, set, bytearray, and most user-defined objects.Immutable types (create new objects on modification): int, float, str, tuple, frozenset, bytes, bool, NoneType.Why this matters in real code:- The mutable default argument trap — the single most common Python bug:
None as default, create new list inside function. This bug has caused production incidents at companies processing millions of requests where a shared default list accumulated data across HTTP requests.- Dictionary keys must be immutable (and hashable). If you mutate an object used as a dict key, the hash changes and the dict breaks silently — you can no longer find the key.
- Pass-by-object-reference implications: When you pass a list to a function, the function gets a reference to the same list object. Modifications inside the function affect the caller. With immutable types, operations create new objects, so the caller is unaffected.
-
Thread safety: Immutable objects are inherently thread-safe because they cannot change. Mutable shared state requires locks. This is why
strconcatenation in a loop creates new strings each time (use''.join()instead for O(n) vs O(n^2)).
- “Show me a bug caused by mutability that would be hard to catch in code review. How would you write a test for it?”
- “How does string immutability affect performance when building a large string in a loop? What is the fix, and what is the Big-O difference?”
- “Is a tuple always immutable? What about
tuple([mutable_list])? Can youhash()a tuple that contains a list?” - “You are designing a configuration object for a Flask app. Should its attributes be mutable or immutable? What pattern would you use?”
What are Python's identity operators?
What are Python's identity operators?
is and is not check whether two variables point to the same object in memory (identity), not whether they have the same value (equality).The critical distinction:is should actually be used:Nonechecks: Always useif x is None, neverif x == None. Why? Because==calls__eq__, which a class could override to returnTrueforNonecomparison.ischecks identity directly and cannot be overridden. PEP 8 explicitly mandates this.- Sentinel values: When you create a unique sentinel like
_MISSING = object(), useisto check against it.
.py file, the compiler may constant-fold and reuse the same object, giving different results than the REPL.What interviewers are really testing: Whether you know when to use is vs == and can explain the interning behavior.Red flag answer: Using is for general value comparison, or not knowing about None check conventions.Follow-up:- “Why does PEP 8 require
is Noneinstead of== None? Give me a case where== Nonewould give a wrong result.” - “What is integer interning, and why does
257 is 257give different results in the REPL vs a script?” - “How would you create your own sentinel object, and why would you use
isto compare against it?”
What is the difference between global and local variables? (LEGB rule)
What is the difference between global and local variables? (LEGB rule)
- Local — inside the current function
- Enclosing — in the enclosing function (for nested functions/closures)
- Global — at the module level
- Built-in — in Python’s built-in namespace (
len,print,range, etc.)
global and nonlocal keywords:UnboundLocalError trap:x in the function, x is treated as local for the entire function, even before the assignment. This catches many developers off guard.Production advice: Global mutable state is a code smell. In production systems, you should use dependency injection, configuration objects, or module-level constants (uppercase by convention). Flask’s g object, Django’s settings module — frameworks provide patterns to avoid raw globals.What interviewers are really testing: Whether you know LEGB, understand UnboundLocalError, and have opinions about global state in real systems.Red flag answer: Only explaining global keyword without mentioning LEGB or the UnboundLocalError trap.Follow-up:- “What is an
UnboundLocalErrorand when does it occur? Show me a 3-line function that triggers it. Why does Python decide scope at compile time, not runtime?” - “When would you use
nonlocalvsglobal? Can you give a real example wherenonlocalis genuinely useful?” - “In a production Flask app, how would you manage configuration instead of using global variables? Compare
app.config, environment variables, and dependency injection approaches.” - “A developer added
logging = custom_logger()inside a function and nowlogging.info()fails earlier in the same function. What happened?”
How does Python handle type casting?
How does Python handle type casting?
int('123')returns123.int('0xff', 16)returns255(base conversion).int(3.9)returns3(truncates, does not round).float('3.14')returns3.14.float('inf')andfloat('nan')are valid.str(42)returns'42'. Butrepr(42)also returns'42'— the difference matters for strings:str('hello')returnshello,repr('hello')returns'hello'(with quotes). Userepr()for debugging,str()for user display.bool()follows truthiness rules:bool(0),bool(''),bool([]),bool(None)are allFalse. Everything else isTrue.list('abc')returns['a', 'b', 'c']— iterates the string.
3 + 4.0returns7.0(int promoted to float)True + 2returns3(bool is subclass of int)if my_list:implicitly callsbool(my_list)— this is the Pythonic way to check for empty collections
int('3.14')raisesValueError— you must doint(float('3.14'))for string-to-int via floatfloat('nan') == float('nan')isFalse(IEEE 754 spec). Usemath.isnan()insteadbool([False])isTrue— the list is not empty, even though its only element is falsyjson.loads()returns Pythondict/listautomatically but all numbers becomefloat, losing integer precision for large numbers
- “What is the difference between
str()andrepr()? When does it matter?” - “Why does
int('3.14')fail? How would you safely convert arbitrary user input to an integer?” - “Explain Python’s truthiness rules. What objects are falsy?”
What is PEP 8 and why is it important?
What is PEP 8 and why is it important?
- Naming:
snake_casefor functions/variables/modules,PascalCasefor classes,UPPER_SNAKE_CASEfor constants,_single_leading_underscorefor internal use,__double_leadingfor name mangling - Indentation: 4 spaces (never tabs). This is non-negotiable in the Python community.
- Line length: 79 characters (72 for docstrings). Many teams relax this to 88 (Black’s default) or 100-120 for modern wide monitors.
- Imports: Standard library first, then third-party, then local. One import per line. Tools like
isortautomate this.
black: Opinionated auto-formatter. Zero configuration. “Any color you want, as long as it is black.” Eliminates style debates in code review.ruff: Blazing-fast linter (written in Rust), replacesflake8,isort,pyupgrade, and dozens of plugins. Adopted by major projects like FastAPI, pandas, and Airflow.mypy/pyright: Not PEP 8 per se, but type checking is now standard in serious Python projects. PEP 484 type hints are checked by these tools.- Pre-commit hooks: Run
black+ruff+mypybefore every commit. This is the industry-standard setup for Python projects in 2024+. - CI/CD enforcement: Fail the build if linting does not pass. No exceptions.
- “Walk me through the linting/formatting setup on your last project. What tools did you use?”
- “Your team disagrees on line length (79 vs 120). How do you resolve this?”
- “What is the difference between a linter (
ruff/flake8) and a formatter (black)? Why do you need both?”
What is the Zen of Python?
What is the Zen of Python?
import this.The principles that actually matter in interviews (and why):- “Explicit is better than implicit” — This is why Python does not have implicit type coercion like JavaScript (
"1" + 1throws TypeError in Python, returns"11"in JS). This principle also explains whyimport *is discouraged. - “Simple is better than complex” — Prefer
for item in itemsoverfor i in range(len(items)). Prefer list comprehensions over manual loops when they are readable. But: “Complex is better than complicated” — sometimes a class is clearer than a nested dict of functions. - “Readability counts” — This is why Python uses significant whitespace, why
and/or/notexist instead of&&/||/!, and why the community obsesses over naming. - “Errors should never pass silently” — Never use bare
except: pass. Log it, re-raise it, handle it specifically. This principle directly informs exception handling best practices. - “There should be one — and preferably only one — obvious way to do it” — Contrast with Perl’s TMTOWTDI (“There is more than one way to do it”). Python values consistency.
- “If the implementation is hard to explain, it is a bad idea” — If your clever metaprogramming requires a 500-word comment, refactor it.
- “Give me an example of real code where ‘explicit is better than implicit’ changed your design decision.”
- “When does ‘simple is better than complex’ conflict with ‘practicality beats purity’? How do you decide?”
- “How would you use the Zen of Python to argue for or against using a metaclass in a code review?”
What is the difference between deep equality and structural pattern matching?
What is the difference between deep equality and structural pattern matching?
==) recursively compares values. Python’s == operator calls __eq__ on objects, and built-in types implement deep comparison by default:match/case statements to destructure and match against patterns. It is not just a switch statement — it is a fundamentally different tool:isinstance checks and nested if/elif blocks. Before 3.10, parsing a JSON API response with variable shapes required ugly conditional logic. With match/case, you can destructure the shape directly.Guards (if clauses in case) add conditional logic to patterns. Capture variables bind matched values to names. OR patterns (case "yes" | "y") match multiple options.Gotcha: case name: where name is a variable does NOT match against the variable’s value — it is a capture pattern that always matches and binds the value to name. Use case MyEnum.VALUE: for value matching, or use a guard: case x if x == name:.What interviewers are really testing: Whether you know modern Python features and can recognize when pattern matching simplifies code vs when it is overkill.Red flag answer: “It is just Python’s version of a switch statement.” Pattern matching is far more powerful.Follow-up:- “When would pattern matching be clearer than if/elif chains? When would it be worse?”
- “What is the capture variable gotcha in
match/case, and how do you avoid it?” - “How would you use pattern matching to parse different shapes of JSON API responses?”
What is the walrus operator (:=) and when should you use it?
What is the walrus operator (:=) and when should you use it?
:=, PEP 572, Python 3.8+) is an assignment expression — it assigns a value to a variable as part of an expression, rather than as a standalone statement.Where it genuinely helps:- When it hurts readability.
x := exprinside a complex expression makes code harder to parse visually. - In simple assignments.
x := 5is worse thanx = 5— it adds noise. - PEP 572 itself warns against overuse. Guido van Rossum stepped down as BDFL partly because of the contentious debate around this feature.
- “Show me a case where the walrus operator makes code genuinely clearer, and one where it makes it worse.”
- “Why was PEP 572 so controversial in the Python community?”
- “How does
:=differ from=in terms of scope inside comprehensions?“
2. Data Structures
How do you create and manipulate lists?
How do you create and manipulate lists?
append(item)— O(1) amortized. Uses over-allocation strategy: when the internal array is full, CPython allocates ~12.5% more space. This is whyappendis fast on average but occasionally triggers a resize.insert(0, item)— O(n). Every element must shift. If you are doing frequent left-inserts, usecollections.dequeinstead (O(1) on both ends).pop()— O(1) from end, O(n) from beginning (pop(0)). Again,dequeis better for FIFO patterns.remove(item)— O(n). Linear search + shift.inoperator — O(n). If you need fast membership testing, use asetinstead.sort()— O(n log n). Uses Timsort (hybrid merge sort + insertion sort). Stable sort.sorted()returns a new list;sort()modifies in place.- Slice
lst[a:b]— O(b-a). Creates a shallow copy of the slice.
- “Your code does
list.insert(0, item)in a loop processing 1M items. What is the total complexity, and how do you fix it?” - “What is the difference between
sort()andsorted()? When would you use each?” - “How does CPython’s list over-allocation strategy work, and why does it matter?”
What are list comprehensions?
What are list comprehensions?
[expression for item in iterable if condition]Why they are faster than equivalent loops:LIST_APPEND bytecode instruction, avoiding the overhead of looking up and calling the append method on each iteration.Advanced patterns:- When the logic requires multiple statements or side effects
- When the comprehension exceeds ~80 characters or nests more than 2 levels deep — readability drops fast
- When you do not need the full list in memory — use a generator expression
(x**2 for x in range(10_000_000))instead to avoid allocating a massive list
[process(x) for x in range(10_000_000)] creates a 10M-element list in memory. If you are only iterating once, use a generator expression or map().What interviewers are really testing: Whether you know the performance implications and readability boundaries, not just the syntax.Red flag answer: Writing deeply nested comprehensions as a flex. One-line does not mean readable.Follow-up:- “When should you use a generator expression
(...)instead of a list comprehension[...]?” - “This 3-level nested comprehension is hard to read. Refactor it to be clearer.”
- “What is the bytecode difference between a list comprehension and an equivalent for loop? Why is the comprehension faster?”
Explain Python dictionaries and their methods
Explain Python dictionaries and their methods
get(key, default)— Returnsdefault(orNone) if key missing. AvoidsKeyError. Use this overdict[key]when the key might not exist.setdefault(key, default)— Returns value if key exists, otherwise sets key to default and returns it. Atomic operation useful for building groupings:
collections.defaultdict is usually cleaner for this pattern.update(other_dict)— Merges another dict. Since Python 3.9, you can use|operator:merged = dict1 | dict2(dict2 wins on conflicts).pop(key, default)— Removes and returns value. RaisesKeyErrorif no default.items()/keys()/values()— Return view objects (not copies). Views reflect changes to the dict. Do not modify a dict while iterating its views.
- Lookup, insert, delete: O(1) average
- Memory: ~50-70 bytes per key-value pair (CPython 3.11)
- Hash function: CPython uses SipHash for strings (resistant to hash-flooding DoS attacks, changed after CVE-2012-1150)
- Keys must be hashable (implement
__hash__and__eq__). Mutable objects like lists cannot be keys.
dict, defaultdict, OrderedDict, and Counter.Red flag answer: “Dictionaries store key-value pairs.” This is the level of a beginner tutorial.Follow-up:- “What happens if two keys have the same hash? How does Python resolve collisions?”
- “When would you use
collections.defaultdictvsdict.setdefault()?” - “Your dict lookup is O(1) in theory but slow in practice. What could cause this?”
What is the difference between sets and frozensets?
What is the difference between sets and frozensets?
- Membership testing (
in): O(1) average. This is the primary reason to use sets —if item in my_setis O(1) vs O(n) for lists. - Add/remove: O(1) average (sets only).
- Union/intersection/difference: O(min(len(s1), len(s2))) for intersection, O(len(s1) + len(s2)) for union.
- Cache keys representing a set of features or permissions
- Dict keys for memoization where the input is a collection of unique items
- Elements of other sets (e.g., a set of sets for graph algorithms)
- Immutable configuration that should not be modified after initialization
- “Why cannot you use a regular set as a dictionary key?”
- “You need to check if a user has any of 10,000 banned words in their input. What data structure do you use?”
- “What happens to set performance when every element has the same hash?”
How do you perform set operations?
How do you perform set operations?
- Union (
s1 | s2ors1.union(s2)) — O(len(s1) + len(s2)). All elements from both sets. - Intersection (
s1 & s2ors1.intersection(s2)) — O(min(len(s1), len(s2))). Only elements in both. - Difference (
s1 - s2ors1.difference(s2)) — O(len(s1)). Elements in s1 but not s2. - Symmetric difference (
s1 ^ s2) — O(len(s1) + len(s2)). Elements in either but not both. - Subset/Superset (
s1 <= s2,s1 >= s2) — O(len(s1)). Check containment.
- “You have two lists of 1M user IDs each. How do you find users in both lists? What is the complexity?”
- “What is the difference between
s1 - s2ands2 - s1? When does order matter?” - “How would you use set operations to implement a simple permissions system?”
What is the difference between list, tuple, and set?
What is the difference between list, tuple, and set?
| Feature | list | tuple | set |
|---|---|---|---|
| Ordered | Yes | Yes | No |
| Mutable | Yes | No | Yes (elements cannot be mutable) |
| Duplicates | Allowed | Allowed | Not allowed |
| Indexable | Yes, O(1) | Yes, O(1) | No |
| Membership test | O(n) | O(n) | O(1) |
| Hashable | No | Yes (if elements are hashable) | No (use frozenset) |
| Memory | Highest (over-allocates) | Lowest (fixed) | Medium (hash table) |
- List: When you need ordered, mutable, indexed access. Default choice for collections. Example: a list of user records you will sort, filter, paginate.
- Tuple: When data is fixed and should not change. Function return values, dict keys, named records. Example:
(latitude, longitude)coordinate pairs. Also used for heterogeneous data (name, age, email) vs lists for homogeneous data (list of names). - Set: When you need fast membership testing or deduplication. Example: checking if a URL has been visited, deduplicating a list of emails, computing permission intersections.
collections.deque— When you need O(1) append/pop from both ends (queue/stack)collections.OrderedDict— When you need dict with explicit ordering control (e.g., LRU cache before 3.7)collections.Counter— When you need to count occurrencescollections.defaultdict— When you want automatic default values
- “You are processing 10M log entries and need to find unique IPs. Which data structure and why?”
- “Your function returns
(success, data, error). Should this be a tuple, list, or dict?” - “When would you reach for
collections.dequeinstead of a list?”
What is the difference between deepcopy and shallow copy?
What is the difference between deepcopy and shallow copy?
copy.copy(), list slicing lst[:], list(lst), dict.copy()):
Creates a new container object but fills it with references to the same nested objects. Only the top-level container is new.Deep copy (copy.deepcopy()):
Recursively creates new objects for everything, including all nested containers. A completely independent clone.- Circular references:
deepcopyhandles them correctly by tracking already-copied objects with a memo dict. A naive recursive copy would infinite-loop. - Custom objects: You can customize copy behavior by implementing
__copy__()and__deepcopy__()methods. SQLAlchemy models, for example, need special handling because they carry database session state. - Performance:
deepcopyis slow — it must traverse the entire object graph. On a deeply nested dict with 100K entries, it can take hundreds of milliseconds. In a hot path, consider whether you actually need a full deep copy or can restructure to avoid it. - Hidden shallow copies:
list[:]slicing,dict.copy(), and thelist()constructor all create shallow copies. Many developers assume these are deep copies.
config = copy.deepcopy(config_template) or restructure to use immutable data.What interviewers are really testing: Whether you have been burned by shallow copy bugs and can explain the reference model.Red flag answer: Defining the difference without mentioning a real bug scenario or knowing about circular reference handling.Follow-up:- “Show me a production bug caused by shallow copy. How would you find it?”
- “
deepcopyis too slow for your hot path. What alternatives exist?” - “How does
deepcopyhandle circular references internally?”
What is collections.defaultdict and when should you use it over a regular dict?
What is collections.defaultdict and when should you use it over a regular dict?
defaultdict (from collections) automatically creates a default value when you access a missing key, eliminating the need for existence checks.How it works internally:defaultdict overrides __missing__(). When __getitem__ fails to find a key, it calls the factory function, stores the result, and returns it. This means accessing a missing key creates that key — which is a subtle difference from dict.get().defaultdict vs alternatives:dict.setdefault(key, [])— similar but more verbose. Creates a new default object on every call even if the key exists (though CPython optimizes common cases).dict.get(key, default)— does NOT insert the key into the dict. Read-only fallback.collections.Counter— specializeddefaultdict(int)with extra methods likemost_common().
defaultdict creates entries on access, iterating with a for key in d loop and accessing keys that do not exist will pollute the dict:dict.get() instead.What interviewers are really testing: Whether you know the right tool for grouping/counting patterns and understand the side effects.Red flag answer: Not knowing defaultdict exists, or using manual if key in dict checks everywhere.Follow-up:- “What is the difference between
defaultdict(list)and usingdict.setdefault(key, [])?” - “When would
defaultdictcreate unexpected entries? How do you prevent it?” - “How would you implement a
defaultdictfrom scratch using__missing__?“
3. Object-Oriented Programming
What are the four pillars of OOP in Python?
What are the four pillars of OOP in Python?
private or protected keywords. Instead:- Single underscore
_name: Convention for “internal use.” Not enforced. Other code can access it but should not. - Double underscore
__name: Triggers name mangling — Python rewrites it to_ClassName__name. This prevents accidental override in subclasses, but it is NOT true privacy (you can still access it if you know the mangled name). - The philosophy: “We are all consenting adults here.” Python trusts developers to respect conventions.
abc.ABC or abc.ABCMeta). You cannot instantiate a class with unimplemented abstract methods. Also achieved through Python 3.8+ Protocol classes (structural subtyping — no inheritance required).3. Inheritance — Python supports multiple inheritance with C3 linearization MRO. But the community strongly prefers composition over inheritance. Deep inheritance hierarchies are a code smell in Python. Use mixins sparingly.4. Polymorphism — In Python, this is primarily duck typing: “If it quacks like a duck…” You do not need inheritance for polymorphism. Any object with a read() method works wherever a file-like object is expected. typing.Protocol formalizes this.What makes Python’s OOP unique: It is opt-in and flexible. You can write purely procedural Python, purely functional Python, or full OOP. Most production Python is a pragmatic mix.What interviewers are really testing: Whether you understand how Python’s OOP differs from Java/C++ and can articulate the philosophy, not just the definitions.Red flag answer: Describing the four pillars exactly as a Java textbook would. Python’s implementation is fundamentally different.Follow-up:- “Python has no
privatekeyword. How do you prevent other developers from accessing internal state?” - “When would you use
abc.ABCvstyping.Protocol?” - “Give me an example where duck typing is more Pythonic than inheritance-based polymorphism.”
How do you create a class in Python?
How do you create a class in Python?
class keyword, but a strong answer covers the full lifecycle of Python object creation and modern alternatives.Basic class:Person("Alice", 30):Person.__new__(Person)is called to create the instance (allocates memory)Person.__init__(instance, "Alice", 30)initializes the instance- The instance is returned
__new__, but it is essential for immutable types (you cannot modify self in __init__ for immutable objects) and Singleton patterns.Modern alternatives (what you should actually use in 2024+):@dataclass— Most data-holding classes. Default choice for DTOs, configs, records.NamedTuple— When you want immutability and tuple compatibility (hashable, can be dict key).- Regular class — When you need complex behavior, custom
__init__logic, or non-trivial methods. attrs(third-party) — More features thandataclass(validators, converters). Used by major projects likepytest.
@dataclass instead of writing boilerplate __init__/__repr__/__eq__ by hand.Red flag answer: Writing a class with manual __init__, __repr__, and __eq__ when @dataclass would do it in 3 lines.Follow-up:- “What is the difference between
__new__and__init__? When would you override__new__?” - “Compare
@dataclassvsNamedTuplevsattrs. When would you choose each?” - “How does
@dataclass(frozen=True, slots=True)work, and when would you use those options?”
Explain instance, class, and static methods
Explain instance, class, and static methods
self as first parameter. Operate on instance data. Can access both instance and class attributes.Class methods — Decorated with @classmethod. Take cls as first parameter. Operate on the class itself, not instances. The killer use case is alternative constructors:cls parameter is crucial: if SpecialDate(Date) calls Date.from_string(), it returns a SpecialDate instance, not a Date. This is why @classmethod exists instead of just using regular functions.Static methods — Decorated with @staticmethod. Take neither self nor cls. Pure utility functions that logically belong to the class namespace:@staticmethod uses should just be module-level functions. @staticmethod only makes sense when the function is conceptually bound to the class.What interviewers are really testing: Whether you know when to use @classmethod (alternative constructors, factory methods) vs @staticmethod (rare) vs instance methods (default).Red flag answer: Defining all three without giving the alternative constructor use case for @classmethod.Follow-up:- “Why does
@classmethoduseclsinstead of the class name directly? What breaks if you hardcode the class name?” - “When would a
@staticmethodbe better as a module-level function?” - “How would you implement
dict.fromkeys()using@classmethod?”
What is inheritance in Python?
What is inheritance in Python?
super() is NOT “call the parent” — it is “call the next class in the MRO chain”:super() in B calls C’s method, not A’s. This is cooperative multiple inheritance, and misunderstanding it causes bugs.When to use inheritance vs composition:- Inheritance: True “is-a” relationships.
DogIS anAnimal. Interface implementation via ABCs. - Composition: “has-a” relationships.
CarHAS anEngine. Prefer this in most cases. - Mixins: Small, focused classes that add specific behavior.
LoggingMixin,SerializableMixin. Keep them single-purpose.
super() in multiple inheritance, and the preference for composition.Red flag answer: “Inheritance lets you reuse code.” Without understanding MRO or the composition alternative.Follow-up:- “In the diamond problem, what does
super()in class B actually call? Walk me through the MRO.” - “When would multiple inheritance cause problems that composition would not?”
- “How do mixins work, and what rules should you follow when designing them?”
What are magic methods (dunder methods)?
What are magic methods (dunder methods)?
__init__ (initializer), __new__ (constructor), __del__ (finalizer — avoid using this, use context managers instead)String representations:__str__— called bystr()andprint(). Human-readable. Example:"Alice (age 30)".__repr__— called byrepr()and in the REPL. Developer-readable, ideallyeval-able. Example:"Person('Alice', 30)". Rule: always implement__repr__. If you only implement one, implement__repr__—__str__falls back to it.
__eq__, __lt__, __le__, __gt__, __ge__, __ne__. Use @functools.total_ordering to implement only __eq__ and __lt__, and get the rest for free.Container protocol: __len__, __getitem__, __setitem__, __delitem__, __contains__, __iter__. Implementing these makes your object work with len(), [] indexing, in operator, and for loops.Arithmetic: __add__, __sub__, __mul__, etc. Also reverse versions (__radd__) for when your object is on the right side of the operator.Context manager: __enter__, __exit__ — enable with statement support.Callable: __call__ — makes instances callable like functions. Useful for stateful callables, decorators implemented as classes.Hashing: __hash__ — required for dict keys and set membership. Important rule: if you define __eq__, Python sets __hash__ to None (making objects unhashable). You must explicitly define __hash__ if you want hashability with custom equality.What interviewers are really testing: Whether you can design classes that integrate naturally with Python’s ecosystem, not just classes with methods.Red flag answer: Listing dunder methods without explaining the protocols they implement or the __repr__ vs __str__ distinction.Follow-up:- “You define
__eq__on your class. What happens to__hash__? What breaks?” - “How would you make a custom class work with
forloops? What methods do you need?” - “What is the difference between
__str__and__repr__? Which should you always implement?”
What are __slots__ in Python classes and why use them?
What are __slots__ in Python classes and why use them?
__slots__ is a class-level declaration that tells Python: “These are the only attributes instances of this class will ever have.” It replaces the per-instance __dict__ with a fixed-size struct.How it works under the hood:__dict__ (a dict storing its attributes). That dict itself costs ~100 bytes per instance (empty dict overhead) plus ~50-70 bytes per attribute. With __slots__, Python stores attributes in a fixed C struct, eliminating the dict entirely.Real memory savings (CPython 3.11 benchmarks):- Regular class with 3 attributes: ~152 bytes per instance
__slots__class with 3 attributes: ~72 bytes per instance- At 1 million instances: ~152 MB vs ~72 MB. That is 80 MB saved.
- At 10 million instances (e.g., a graph with 10M nodes): ~800 MB saved. This can be the difference between OOM and running smoothly.
- Cannot add attributes dynamically — no monkey patching instance state
- Cannot use
__dict__— some serialization libraries expect it - Inheritance is tricky — if parent has
__slots__, child must also declare__slots__(can be empty__slots__ = ()) or it gets a__dict__anyway - Cannot use
__weakref__unless you include it in__slots__ @dataclass(slots=True)(Python 3.10+) — the modern way to get slots without manual declaration
- Classes with millions of instances (nodes in a graph, points in a point cloud, rows in a dataset)
- Internal library classes where the API is fixed
- Performance-critical hot paths
- “You add
__slots__to a class but it still has a__dict__. What went wrong?” - “How does
__slots__interact with inheritance? What happens in a child class?” - “When would
__slots__actually hurt you? Give me a scenario.”
Explain property decorators (@property, setter, deleter)
Explain property decorators (@property, setter, deleter)
@property turns method calls into attribute access, giving you controlled getters/setters without changing the API. This is Python’s answer to Java’s getName()/setName() pattern.The core pattern:- API evolution without breaking changes: Start with a simple attribute
self.name = name. Later, when you need validation, replace it with@property— all calling code stays the same. No need to changeobj.nametoobj.get_name(). - Computed attributes:
fahrenheitabove is derived fromcelsius. The caller does not need to know it is calculated. - Lazy loading: Compute expensive values only when accessed, cache the result.
- Validation at the boundary: Enforce invariants when attributes are set.
@property is syntactic sugar for the descriptor protocol (__get__, __set__, __delete__). Understanding descriptors explains how properties, methods, and classmethod/staticmethod all work.Gotcha: @property on a method that does I/O or expensive computation can surprise callers who expect attribute access to be cheap. If obj.data triggers a database query, it should probably be obj.get_data() or obj.load_data() to signal the cost.What interviewers are really testing: Whether you understand the Pythonic way to evolve APIs and enforce invariants.Red flag answer: Describing the syntax without explaining why you would use properties over plain attributes.Follow-up:- “You start with
self.name = nameand later need validation. How do you add it without breaking existing code?” - “What is the relationship between
@propertyand the descriptor protocol?” - “When is
@propertya bad idea? Give me a case where a method is better.”
How does Python implement Method Resolution Order (MRO)?
How does Python implement Method Resolution Order (MRO)?
super().Python uses C3 linearization (since Python 2.3). The algorithm guarantees:- Children come before parents
- If a class inherits from multiple parents, they are searched in the order listed
- A valid linearization exists (otherwise Python raises
TypeErrorat class definition time)
super():
super() does NOT mean “call the parent.” It means “call the next class in the MRO.” This is cooperative multiple inheritance:- Always use
super()consistently in a class hierarchy — mixingsuper()with direct parent calls breaks cooperative inheritance - Mixins should always call
super()in their methods - Use
ClassName.__mro__orClassName.mro()to debug method resolution issues - Consider whether you actually need multiple inheritance or if composition is simpler
super() follows the MRO, not just the parent, and can trace through a diamond hierarchy.Red flag answer: “MRO is the order Python searches for methods.” Without being able to trace through a concrete diamond example.Follow-up:- “Walk me through the MRO for
class D(B, C)where both B and C inherit from A. What doessuper()in B call?” - “When does C3 linearization fail? What causes a
TypeError?” - “How would you debug a method resolution issue in a codebase with deep inheritance?“
4. Functions and Decorators
What are *args and **kwargs?
What are *args and **kwargs?
*args collects extra positional arguments into a tuple. **kwargs collects extra keyword arguments into a dict. But the real depth is in understanding how they interact with Python’s full argument resolution system.The complete argument order (must be in this order):/ and * separators (Python 3.8+):len takes obj as positional-only so you can write len(obj) but not len(obj=mylist).What interviewers are really testing: Whether you understand argument resolution order, unpacking, and real-world patterns like decorator forwarding.Red flag answer: Only knowing the basic definition without unpacking (*/**) or positional-only/keyword-only parameters.Follow-up:- “What is the difference between
*and**in a function call vs a function definition?” - “Why does Python 3.8 add positional-only parameters with
/? Give a real API where this matters.” - “How would you use
*argsand**kwargsto implement a decorator that works with any function?”
What are lambda functions?
What are lambda functions?
lambda arguments: expression (no statements, no assignments, no multi-line)Where lambdas shine:- They show up with their name in stack traces (lambdas show as
<lambda>) - They can have docstrings
- They can contain multiple statements and early returns
ruff/flake8flag lambda assignments (E731) for a reason
operator module for common operations:- “When would you use
operator.itemgetterinstead of a lambda?” - “A lambda appears in a stack trace as
<lambda>. Why is that a problem?” - “Can a lambda contain an assignment? A conditional? Multiple expressions?”
What are decorators in Python?
What are decorators in Python?
@decorator actually does:@functools.wraps, the decorated function loses its name, docstring, and module — which breaks introspection, help(), Sphinx docs, and debugging. This is the #1 mistake in decorator implementations.Decorator with arguments (the two-level pattern):@functools.lru_cache(maxsize=128)— memoization with LRU eviction. Use for expensive pure functions.@functools.cache(Python 3.9+) — unbounded memoization (simpler thanlru_cache).@app.route('/path')(Flask) — URL routing.@login_required(Django) — authentication enforcement.@pytest.fixture— test dependency injection.@contextmanager— turn a generator into a context manager.
@functools.wraps, can write decorators with arguments, and have used decorators in production.Red flag answer: Writing a decorator without @functools.wraps — this is the litmus test.Follow-up:- “What happens if you forget
@functools.wraps? What breaks in production?” - “Write a decorator that takes arguments (e.g.,
@retry(max_attempts=3)).” - “How would you implement
@lru_cachefrom scratch?”
What is a closure?
What is a closure?
__closure__ as cell objects. The closure holds a reference to the variable, not a copy of the value. This leads to a famous gotcha:The late-binding closure bug:i, not its value at closure creation time. When the lambda is called, it reads the current value of i, which is 4 after the loop finishes.Real-world closure use cases:- Factory functions (as shown above)
- Callback registration with context
- Implementing decorators (decorators are closures)
- Data encapsulation (private state without classes)
- Partial application of functions
__closure__ contains.Red flag answer: Defining closures correctly but not knowing about the loop variable capture bug.Follow-up:- “Why does
[lambda: i for i in range(5)]produce five functions that all return 4? How do you fix it?” - “What is stored in
func.__closure__? How can you inspect it?” - “How are closures related to decorators?”
What are generator functions and yield?
What are generator functions and yield?
yield to produce values lazily, one at a time, pausing execution between yields. They are the foundation of Python’s memory-efficient iteration.How generators work under the hood:yield and thawed on next(). Local variables, instruction pointer, and stack are all preserved.Why generators matter (real numbers):range() in Python 3 returns a lazy object, and why tools like csv.reader() yield rows one at a time.yield vs return:returnterminates the function and sends one valueyieldsuspends the function and sends one value, resuming on next call- A generator can
yieldmany times. After the function body is exhausted,StopIterationis raised.
send() and bidirectional communication:send() for advanced use, and pipeline patterns.Red flag answer: Explaining yield without mentioning memory benefits or knowing about send().Follow-up:- “How would you process a 50GB log file without running out of memory?”
- “What does
generator.send(value)do? When would you use it?” - “What is the difference between a generator function and a generator expression?”
Explain function annotations and type hints
Explain function annotations and type hints
- Static analysis:
mypy,pyright,pytypecatch type errors before running code - IDE support: Autocomplete, refactoring, inline errors
- Documentation: Self-documenting function signatures
- Runtime validation: Libraries like
pydanticuse hints for runtime type checking and serialization
typing module essentials:typing.Protocol — the game changer (Python 3.8+):- Major projects like FastAPI, SQLAlchemy 2.0, and Django 4.1+ are fully typed
mypy --strictcatches entire categories of bugs (null reference, wrong type, missing attributes)- Some teams report 15-30% fewer production bugs after adopting strict type checking
pyright(Microsoft, used in VS Code/Pylance) is faster and stricter thanmypy
- “What is the difference between
mypyandpyright? Which have you used?” - “How does
typing.Protocoldiffer fromabc.ABC? When would you use each?” - “How does Pydantic use type hints differently than mypy?”
How does duck typing work in Python?
How does duck typing work in Python?
- Iterable protocol: Has
__iter__()— works withforloops - Sequence protocol: Has
__getitem__()and__len__()— works with indexing andlen() - Context manager protocol: Has
__enter__()and__exit__()— works withwith - Callable protocol: Has
__call__()— can be called like a function - Hashable protocol: Has
__hash__()— can be dict key or set member
- Python 3.8+
typing.Protocolformalizes duck typing for static analysis - You define the expected interface, and
mypy/pyrightverify it at check time — without requiring inheritance - This gives you the flexibility of duck typing with the safety of static type checking
- Pro: Extremely flexible, easy testing (mock anything), no rigid hierarchies
- Con: Runtime
AttributeErrorif object does not have required methods, harder to discover expected interfaces without type hints
Protocol bridges the gap.Red flag answer: Explaining duck typing without connecting it to Python’s protocol system or typing.Protocol.Follow-up:- “How would you make duck typing safer in a large codebase? What tools help?”
- “What is the difference between
typing.Protocolandabc.ABC? When do you want each?” - “Give me an example where duck typing makes testing dramatically easier than inheritance-based polymorphism.”
5. File Handling and I/O
How do you read and write files?
How do you read and write files?
with):with is non-negotiable:
Without with, if an exception occurs between open() and f.close(), the file handle leaks. In a long-running server processing thousands of requests, leaked file handles exhaust the OS limit (typically 1024 on Linux by default) and crash the process. The with statement guarantees close() via __exit__.The encoding trap: open() without encoding uses the system default (often cp1252 on Windows, utf-8 on Linux). This causes cross-platform bugs. Always specify encoding='utf-8'. Python 3.15 will warn about missing encoding (PEP 686).Reading strategies by file size:- Small files (<10MB):
f.read()into memory - Medium files (10MB-1GB): Iterate line-by-line with
for line in f - Large files (1GB+): Use
f.read(chunk_size)in a loop, ormmapfor random access - Structured data: Use
csv,json,pandas.read_csv()withchunksizeparameter
f = open(...) without with, or calling f.read() on a potentially large file.Follow-up:- “Your script works on Linux but produces garbled text on Windows. What is likely wrong?”
- “How would you read a 50GB log file without loading it into memory?”
- “What happens if you forget to close a file in a web server handling 10,000 requests/second?”
What is the difference between 'w' and 'a' file modes?
What is the difference between 'w' and 'a' file modes?
'w' (write): Creates file if it does not exist. Truncates to zero length if it does exist — all previous content is destroyed immediately on open(), not on write(). This is the #1 way developers accidentally delete data.'a' (append): Creates file if it does not exist. Writes are always appended to end. Atomic on most POSIX systems — multiple processes can append without corruption (this is why log files use append mode).The complete mode table (what most people miss):'r'— Read only. File must exist.'w'— Write only. Creates or truncates.'a'— Append only. Creates or appends.'x'— Exclusive creation. Fails if file exists. Use this for safe file creation — prevents overwriting.'r+'— Read and write. File must exist. Does not truncate.'w+'— Read and write. Creates or truncates.'b'suffix — Binary mode ('rb','wb'). No encoding/decoding, returnsbytes.'t'suffix — Text mode (default). Applies encoding, returnsstr.
'w' mode and safer alternatives.Red flag answer: Only knowing 'w' and 'a' without mentioning 'x' mode or atomic write patterns.Follow-up:- “Your app writes a config file with
'w'mode. It crashes mid-write. What state is the file in?” - “How do you safely overwrite a file without risking data loss?”
- “What is
'x'mode and when would you use it?”
How do you work with CSV files?
How do you work with CSV files?
csv module handles CSV parsing and writing, but production use involves edge cases that trip up most developers.Basic usage:newline='' trap: On Windows, without newline='', you get double line endings in CSV output because the csv module writes \r\n and Python’s text mode adds another \r. Always pass newline='' when opening files for csv writing.When to use csv vs pandas:csvmodule: Simple parsing, streaming large files (memory efficient), no dependenciespandas.read_csv(): Data analysis, type inference, handling missing values, chunksize for large files, much faster for large datasets (C engine)polars.read_csv(): Even faster than pandas for pure CSV loading, no GIL limitations
csv module handles all of these correctly.What interviewers are really testing: Whether you handle encodings, the newline parameter, and know when to graduate to pandas/polars.Red flag answer: Parsing CSV by splitting on commas (line.split(',')) instead of using the csv module.Follow-up:- “Why does splitting on commas not work for real CSV files?”
- “How would you process a 10GB CSV file that does not fit in memory?”
- “When would you use
csv.DictReadervspandas.read_csv()?”
How do you work with JSON?
How do you work with JSON?
json module serializes Python objects to JSON strings and back. But production use requires understanding its limitations and alternatives.Core API:- JSON has no integer type —
json.loads('{"n": 12345678901234567890}')converts to Python int, but JavaScript loses precision for integers > 2^53. - JSON has no datetime, bytes, set, or tuple type. You need custom serialization.
json.dumpsfails on non-serializable types (datetime, Decimal, numpy arrays). Fix withdefaultparameter:
json(stdlib): ~50MB/s. Fine for most use cases.orjson(third-party): ~1GB/s. Auto-serializes datetime, UUID, numpy. Used by FastAPI internally.ujson: ~300MB/s. Drop-in replacement.msgpack: Binary format, faster and smaller than JSON. Good for internal service communication.
eval() to parse JSON. Always use json.loads(). eval() executes arbitrary code.What interviewers are really testing: Whether you know the type limitations, custom serialization, and when to use faster alternatives.Red flag answer: Not knowing that json.dumps fails on datetime or Decimal objects.Follow-up:- “Your API returns a
Decimalanddatetime. How do you serialize them to JSON?” - “When would you use
orjsoninstead of the stdlibjsonmodule?” - “
json.loadsis slow for your 500MB response. What are your options?”
What are context managers?
What are context managers?
with statement protocol, guaranteeing resource cleanup even when exceptions occur. They are Python’s answer to C++‘s RAII and Java’s try-with-resources.The protocol — two magic methods:@contextmanager decorator:open()— file cleanupthreading.Lock()— lock acquire/releasecontextlib.suppress(ExceptionType)— silently ignore specific exceptionstempfile.TemporaryDirectory()— auto-cleanup temp directoriesunittest.mock.patch()— mock scoping in testsdecimal.localcontext()— temporary decimal precision changescontextlib.ExitStack()— manage dynamic number of context managers
__exit__ return value matters:
If __exit__ returns True, the exception is suppressed (swallowed). If False (default), the exception propagates. Almost always return False — suppressing exceptions silently is dangerous.What interviewers are really testing: Whether you can write context managers and understand when to use class-based vs @contextmanager.Red flag answer: Knowing with works with files but not being able to create your own context manager.Follow-up:- “Write a context manager that temporarily changes the working directory and restores it.”
- “What does the return value of
__exit__control? When would you returnTrue?” - “How would you manage a dynamic number of resources with
contextlib.ExitStack?“
6. Exception Handling
What is exception handling?
What is exception handling?
- EAFP (Easier to Ask Forgiveness than Permission) — the Pythonic way. Try the operation, catch the exception if it fails:
except block should either handle the error meaningfully, log it, or re-raise it.What interviewers are really testing: Whether you understand EAFP, catch specific exceptions, and never use bare except: pass.Red flag answer: Using try/except as a replacement for input validation, or catching Exception everywhere.Follow-up:- “What is EAFP and why does Python prefer it over LBYL?”
- “When is LBYL actually better than EAFP? Give me a concrete case.”
- “What is wrong with
except Exception as e: pass?”
What are common built-in exceptions?
What are common built-in exceptions?
BaseException. Understanding the hierarchy matters more than memorizing names.The hierarchy (what most people miss):Exception, NEVER BaseException. Catching BaseException traps KeyboardInterrupt and SystemExit, making your program impossible to kill with Ctrl+C.Exception groups (Python 3.11+):
ExceptionGroup wraps multiple exceptions raised concurrently (e.g., from asyncio.TaskGroup). Use except* to catch specific types from the group. This is a fundamental change for async error handling.What interviewers are really testing: Whether you know the hierarchy, especially the BaseException vs Exception distinction.Red flag answer: Listing exceptions without understanding the hierarchy or catching BaseException.Follow-up:- “Why should you never catch
BaseException? What goes wrong?” - “What are
ExceptionGroupandexcept*in Python 3.11+?” - “When would you catch
OSErrorvsFileNotFoundErrorspecifically?”
What is try-except-else-finally?
What is try-except-else-finally?
try statement has four blocks, each with a specific purpose. The else clause is the one most developers misunderstand.else exists (the misunderstood clause):
Code in else only runs if try succeeds with no exceptions. Why not just put it in try? Because if save_to_database raises an exception, you do not want it caught by the except ConnectionError handler — that would hide a database bug as a connection error.finally guarantees (and gotchas):finallyruns even iftryhas areturnstatementfinallyruns even if an exception is raised and not caught- Gotcha: If
finallyhas areturn, it silently overrides thetryblock’s return value and suppresses any active exception. Never putreturninfinally.
from keyword):from e, the original context is still available in __context__ but the intent is less clear.What interviewers are really testing: Whether you know what else is for and understand exception chaining.Red flag answer: Not knowing what else does, or putting all logic in try instead of using else.Follow-up:- “What happens if
finallycontains areturnstatement?” - “Why would you put code in
elseinstead of at the end oftry?” - “What is exception chaining with
raise ... from ...? When do you use it?”
How do you create custom exceptions?
How do you create custom exceptions?
- Create a base exception for your library/app. Callers can catch
AppErrorto handle all your exceptions. - Always inherit from
Exception, neverBaseException. - Add structured data (error codes, details dict) for machine-readable error handling, especially in APIs.
- Map to HTTP status codes in web apps:
ValidationError-> 400,NotFoundError-> 404,AuthError-> 401. - Keep the hierarchy shallow — 2-3 levels max. Deep exception hierarchies are as bad as deep inheritance.
Exception with no additional context or structure.Follow-up:- “How would you design an exception hierarchy for a payment processing API?”
- “How do your custom exceptions map to HTTP status codes in a REST API?”
- “When is it better to return an error value (like
Resulttype) vs raising an exception?”
What are assertions?
What are assertions?
assert tests conditions that should ALWAYS be true if the code is correct. They are a debugging tool, not an error handling mechanism.Syntax and behavior:python -O (optimize flag). Running python -O script.py sets __debug__ = False and strips all assert statements. This means:assert:- Internal invariants:
assert len(self._items) == self._count - Preconditions in internal functions (not public APIs)
- Sanity checks during development
- Test assertions:
assert result == expected(pytest uses this)
assert:- User input validation (use
ValueError,TypeError) - Security checks (use explicit conditionals)
- Anything that must run in production (assertions are stripped by
-O)
-O flag.Follow-up:- “What happens to assertions when you run
python -O? What breaks if you rely on them?” - “When should you use
assertvsraise ValueError?” - “How does pytest use assertions differently from standard Python?“
7. Modules and Packages
What is the difference between module and package?
What is the difference between module and package?
.py file. When imported, Python executes the file top-to-bottom and creates a module object. The module’s functions, classes, and variables become attributes of that object.Package: A directory containing modules and (typically) an __init__.py file. Packages create namespaces for organizing related modules hierarchically.What actually happens on import (the import machinery):- Python checks
sys.modules(import cache). If already imported, returns cached module. - Python searches
sys.path(list of directories) to find the module. - Python creates a module object and executes the module’s code.
- The module is cached in
sys.modules.
- Module code runs exactly once on first import, regardless of how many files import it. Subsequent imports return the cached module.
- Modules are effectively singletons — this is why module-level state is shared across the entire application.
sys.pathdetermines where Python looks. It includes the script’s directory,PYTHONPATHenv var, and installed packages.
__init__.py can be namespace packages, allowing a single logical package to be split across multiple directories on disk. Used by large frameworks with plugin systems.What interviewers are really testing: Whether you understand the import machinery (caching, sys.modules, execution order).Red flag answer: “A module is a file, a package is a directory.” Without understanding caching or sys.path.Follow-up:- “What happens if two different files import the same module? Does the module code run twice?”
- “How do you add a custom directory to Python’s import search path?”
- “What are namespace packages and when would you use them?”
How do you import modules?
How do you import modules?
from module import * is dangerous:- Pollutes the namespace with unknown names
- Can silently override existing variables
- Makes it impossible to know where a name came from during code review
- Only acceptable in
__init__.pyto re-export a public API, controlled by__all__
- First import executes the module (can be slow for heavy modules like
pandas— ~200ms) - Subsequent imports are O(1) dict lookups in
sys.modules - Lazy imports (
importlib.import_module()or in-function imports) can improve startup time
import * dangers, and performance implications.Red flag answer: Using from module import * in production code.Follow-up:- “How do you resolve circular imports? Give me three strategies.”
- “What is
TYPE_CHECKINGand when do you use it?” - “Your application takes 5 seconds to start. You suspect heavy imports. How do you diagnose it?”
What is __init__.py?
What is __init__.py?
__init__.py is the initializer for a Python package. It runs when the package is first imported.Key responsibilities:- Marks directory as a package (required before Python 3.3, recommended after)
- Controls the public API via
__all__:
- Re-exports for convenience: Users write
from mypackage import Engineinstead offrom mypackage.core import Engine - Package-level initialization: Setup logging, register plugins, validate environment
__all__ does two things:- Controls what
from package import *exports - Signals to type checkers and IDEs what the public API is
- Empty
__init__.py: Most common. Just marks as package. - Re-export pattern: Flatten nested imports for cleaner public API.
- Lazy loading: Import submodules on first access to improve startup time (used by large libraries like
numpy).
__all__, re-exporting, and API design via __init__.py.Red flag answer: “It makes a directory a package.” Without knowing about __all__ or re-exporting.Follow-up:- “How does
__all__affectfrom package import *?” - “How would you design
__init__.pyfor a library with a clean public API?” - “What are the pros and cons of an empty
__init__.pyvs one that re-exports?”
What is the __name__ variable?
What is the __name__ variable?
__name__ is a special variable that Python sets depending on how the file is executed.- Executed directly (
python script.py):__name__ == '__main__' - Imported as module (
import script):__name__ == 'script'
- Testability:
import mymoduledoes not trigger side effects. Tests can import and test individual functions. - Reusability: The module can be both a script and a library.
- Multiprocessing on Windows:
multiprocessingspawns new Python processes that import the module. Without the guard, the spawned process re-runs the main code, causing infinite process spawning.
__main__.py convention:
For packages, python -m mypackage runs mypackage/__main__.py. This is how python -m pytest, python -m http.server, and python -m venv work.What interviewers are really testing: Whether you know why the guard exists, especially the multiprocessing and testability reasons.Red flag answer: Knowing the pattern without understanding why it prevents side effects on import.Follow-up:- “What happens on Windows with
multiprocessingif you forget theif __name__ == '__main__'guard?” - “What is
__main__.pyand how doespython -m package_nameuse it?” - “How does the guard pattern improve testability?”
How do you manage Python dependencies and virtual environments?
How do you manage Python dependencies and virtual environments?
uv (by Astral, the ruff creators) — the new standard:uv is a drop-in replacement for pip, pip-tools, virtualenv, and venv. Written in Rust, it is dramatically faster.poetry — dependency management + packaging:pip + requirements.txt — the classic approach:pip install flask might install Flask 2.0 today and Flask 3.0 tomorrow, breaking your app. Always use lock files (poetry.lock, requirements.txt with pinned versions, uv.lock).pyproject.toml (PEP 621) is the modern standard for project metadata, replacing setup.py, setup.cfg, and requirements.txt in many workflows.What interviewers are really testing: Whether you have opinions about dependency management tooling and understand lock files.Red flag answer: “I use pip install without a virtual environment.” This is a career-ending answer for a senior role.Follow-up:- “What is the difference between
poetryanduv? When would you choose each?” - “Why are lock files important? What breaks without them?”
- “How would you set up a reproducible development environment for a new team member?“
8. Advanced Python Concepts
What are iterators and iterables?
What are iterators and iterables?
for loops, comprehensions, map, filter, generators, and zip all work.The protocol:- Iterable: Any object with
__iter__()method that returns an iterator. Lists, strings, dicts, files, generators are all iterables. - Iterator: An object with both
__iter__()(returns self) and__next__()(returns next value or raisesStopIteration).
for loop actually does:__iter__ and __next__.Follow-up:- “What is the difference between an iterable and an iterator? Can something be both?”
- “Why can you loop over a list multiple times but only over a generator once?”
- “Build me an iterator class that yields Fibonacci numbers.”
What is multithreading vs multiprocessing?
What is multithreading vs multiprocessing?
| Threading | Multiprocessing | asyncio | |
|---|---|---|---|
| Best for | I/O-bound (network, disk) | CPU-bound (computation) | High-concurrency I/O |
| Memory | Shared (lightweight) | Separate (heavy ~30-50MB per process) | Shared (very lightweight) |
| GIL impact | Blocked for CPU work | No GIL issue (separate interpreters) | Single thread, no GIL issue |
| Communication | Shared objects + locks | Queues, Pipes (serialization overhead) | Coroutine chaining |
| Overhead per unit | ~100KB per thread | ~30-50MB per process | ~1KB per coroutine |
| Debugging | Hard (race conditions) | Medium (process isolation) | Medium (async stack traces) |
concurrent.futures):- “Your image processing is slow. You add threads but it does not speed up. Why?”
- “What are the risks of shared memory in multithreading?”
- “How does
concurrent.futuressimplify switching between threads and processes?”
What is the Global Interpreter Lock (GIL)?
What is the Global Interpreter Lock (GIL)?
- Prevents: True parallel execution of Python bytecode across threads
- Does NOT prevent: I/O parallelism (the GIL is released during I/O operations — network, disk, sleep)
- Does NOT prevent: Parallel execution of C extensions (NumPy releases the GIL during array operations)
- Does NOT prevent: Multiprocessing (separate processes, separate GILs)
python --disable-gil). This is the biggest change to CPython in decades. The t suffix in version tags (e.g., python3.13t) indicates the free-threaded build. As of 3.13, it is experimental; expected to become default in 3-5 years.Workarounds used in production today:multiprocessing/ProcessPoolExecutorfor CPU-bound workasynciofor I/O-bound concurrency- C extensions (NumPy, Cython) that release the GIL
- Use a different Python implementation (PyPy has the GIL; Jython and GraalPython do not)
- “If the GIL limits threading, why does Python even have a
threadingmodule?” - “What is PEP 703 (free-threaded Python) and what is its status?”
- “How does NumPy achieve parallelism despite the GIL?”
What are metaclasses?
What are metaclasses?
type.The mind-bending truth:class Foo(Base, metaclass=MyMeta):, it calls MyMeta('Foo', (Base,), namespace_dict). The metaclass controls:__new__: Creates the class object__init__: Initializes the class object__call__: Controls what happens when the class is instantiated
- ORMs: Django’s
Modelmetaclass automatically creates database fields from class attributes - API frameworks: Abstract base classes use
ABCMetato enforce abstract method implementation - Singleton pattern: Metaclass that caches instances
- Automatic registration: Metaclass that registers all subclasses in a registry
- Validation: Ensure class definitions follow specific rules at definition time
__init_subclass__(Python 3.6+) — hook called when a class is subclassed. Handles most registration/validation patterns without a metaclass.__set_name__(Python 3.6+) — descriptor hook for attribute naming.@dataclassandtyping.Protocol— handle many patterns that previously required metaclasses.
__init_subclass__ would suffice (poor judgment).Follow-up:- “When would you use
__init_subclass__instead of a metaclass?” - “How does Django’s
Modelclass use metaclasses to create database tables from class definitions?” - “Write a metaclass that enforces all subclass method names are lowercase.”
What are coroutines and async/await?
What are coroutines and async/await?
async defcreates a coroutine function (calling it returns a coroutine object, does not execute it)awaitsuspends the coroutine and yields control back to the event loop- The event loop runs other ready coroutines while this one waits for I/O
- When I/O completes, the event loop resumes the coroutine
asyncio.gather(*coros)— Run multiple coroutines concurrently, return all resultsasyncio.TaskGroup()(Python 3.11+) — Structured concurrency with proper error handlingasyncio.Queue— Producer-consumer patternsasync for— Iterate over async iteratorsasync with— Async context managers
- Web servers handling thousands of concurrent connections (FastAPI, Starlette)
- API clients making many concurrent requests
- Chat applications, WebSocket servers
- NOT for CPU-bound work (async is single-threaded)
async, it infects your entire call stack. You cannot call an async function from a regular function without asyncio.run(). This leads to “async all the way up” or painful refactoring.What interviewers are really testing: Whether you understand the event loop model and when async is the right choice.Red flag answer: “Async makes things run in parallel.” It does not — it is concurrent, not parallel. One thread, cooperative scheduling.Follow-up:- “What is the difference between concurrency and parallelism? Where does asyncio fit?”
- “What is the ‘function color’ problem with async/await?”
- “When would you use
asyncio.TaskGroupvsasyncio.gather?”
What is the difference between comprehensions (list/dict/set/generator)?
What is the difference between comprehensions (list/dict/set/generator)?
- List comprehension: When you need the full collection (indexing, slicing, multiple iterations,
len()) - Dict comprehension: Transforming/filtering dicts, inverting key-value pairs
- Set comprehension: Extracting unique values from a collection
- Generator expression: When you only iterate once, especially for large/infinite sequences. Also when feeding directly into
sum(),min(),max(),any(),all()— no intermediate list needed.
- “Why does
sum(x**2 for x in range(10))not need square brackets?” - “You have a 100GB dataset. How do you process it with comprehensions?”
- “What is the scope of the loop variable in a comprehension? Does it differ between Python 2 and 3?”
How does Python's garbage collector work?
How does Python's garbage collector work?
sys.getrefcount(obj) (returns count + 1 because the function call itself creates a temporary reference).2. Cyclic garbage collector (for circular references):
Reference counting cannot handle cycles:- Gen 0: Newly created objects. Collected most frequently (threshold: ~700 objects).
- Gen 1: Survived one collection. Collected less frequently.
- Gen 2: Long-lived objects. Collected rarely. Objects that survive collection are promoted to the next generation. This is based on the “generational hypothesis”: most objects die young.
- GC pauses can cause latency spikes in real-time systems. Instagram famously disabled the cyclic GC in their Django servers because it caused 10% of CPU time to be spent on GC.
gc.disable()is safe IF you avoid circular references (use weak references or restructure data).gc.collect()forces a collection — useful before memory-critical sections.gc.get_threshold()returns(700, 10, 10)— the generation thresholds.__del__finalizers complicate GC (objects with__del__in a cycle were uncollectable before Python 3.4).
- “Why did Instagram disable Python’s garbage collector? What did they do instead?”
- “How does the generational GC decide when to run?”
- “What happens if two objects in a reference cycle both have
__del__methods?“
9. Web Development with Python
What are popular Python web frameworks?
What are popular Python web frameworks?
- Batteries-included: ORM, admin panel, auth, forms, templating, migrations, security middleware
- Opinionated: One way to do things. Enforces patterns (MTV — Model/Template/View).
- Best for: Content sites, admin-heavy apps, rapid prototyping, teams that want conventions over configuration
- Used by: Instagram (until ~2019 at enormous scale), Mozilla, Pinterest, Disqus
- Trade-off: Monolithic, harder to swap components, ORM can be a bottleneck at extreme scale
- Minimalist core: routing + WSGI + Jinja2 templates. Everything else is extensions.
- Unopinionated: Choose your own ORM, auth, etc.
- Best for: APIs, microservices, projects where you want full control
- Used by: Netflix (some services), LinkedIn, Reddit (early)
- Trade-off: More decisions to make, less built-in security, can lead to inconsistent patterns in large teams
- Built on Starlette (async) + Pydantic (validation). Native async support.
- Automatic OpenAPI/Swagger docs from type hints
- Best for: APIs (especially when performance matters), async workloads, ML model serving
- Performance: ~3-5x faster than Flask for I/O-bound workloads due to async
- Trade-off: Smaller ecosystem than Django, async complexity, less mature
- Need admin panel, auth, ORM out of the box? Django
- Need lightweight API with full control? Flask
- Need async, auto-docs, type validation? FastAPI
- Need maximum performance? FastAPI or Starlette
- Existing team knows Django? Probably keep using Django
- “You are starting a new API project. Walk me through how you choose between Flask, FastAPI, and Django REST Framework.”
- “When would you choose Django over FastAPI despite FastAPI being ‘faster’?”
- “What is WSGI vs ASGI? Why does it matter?”
How do you create a simple Flask application?
How do you create a simple Flask application?
- Application factory pattern: Do not create
appat module level. Use a factory function for testability (create fresh app per test). - Blueprints: Organize routes into modules. Without them, a 50-endpoint API becomes unmaintainable.
- WSGI server: Never use
app.run()in production. Use Gunicorn:gunicorn -w 4 "myapp:create_app()". - Request validation: Use
marshmalloworflask-pydanticfor input validation. - Configuration: Environment-based config (dev/staging/prod) via environment variables.
- “What is the application factory pattern and why is it important for testing?”
- “How do you deploy a Flask app in production? What WSGI server do you use?”
- “How would you structure a Flask app with 50+ endpoints?”
What is Django ORM?
What is Django ORM?
- Complex aggregations: Use
.annotate()andF()expressions - Raw SQL:
User.objects.raw('SELECT ...')for complex queries the ORM cannot express - Database-specific features: Use
django.db.connectionfor raw cursor access - High-volume writes: Use
bulk_create()andbulk_update()instead of saving one object at a time
select_related/prefetch_related, and when to bypass the ORM.Red flag answer: Describing the ORM without mentioning N+1 queries or performance implications.Follow-up:- “What is the N+1 query problem? How do you detect and fix it in Django?”
- “What is the difference between
select_relatedandprefetch_related?” - “When would you use raw SQL instead of the ORM?”
How do you handle HTTP requests?
How do you handle HTTP requests?
requests library is the standard for synchronous HTTP calls, but production code requires handling timeouts, retries, connection pooling, and async alternatives.Basic usage with production-grade error handling:timeout, requests.get() will hang indefinitely if the server never responds. This has caused production outages at many companies.Async alternative (httpx or aiohttp):httpx is a modern alternative that supports both sync and async, with HTTP/2 support.What interviewers are really testing: Timeouts, retries, error handling, and connection pooling.Red flag answer: Using requests.get() without timeout or error handling.Follow-up:- “What happens if you do not set a timeout on
requests.get()?” - “How would you implement exponential backoff retry logic?”
- “When would you use
httpxoraiohttpinstead ofrequests?”
What is REST API?
What is REST API?
- Client-Server: Separation of concerns
- Stateless: Each request contains all needed information (no server-side session)
- Cacheable: Responses must declare if cacheable
- Uniform Interface: Consistent URL structure, standard HTTP methods
- Layered System: Client cannot tell if connected directly to server
- Code on Demand (optional): Server can extend client with executable code
GET— Read. Safe (no side effects). Idempotent. Cacheable.POST— Create. NOT idempotent (calling twice creates two resources).PUT— Full replace. Idempotent (calling twice produces same result).PATCH— Partial update. NOT necessarily idempotent.DELETE— Remove. Idempotent (deleting twice, second is a no-op).
200OK,201Created,204No Content (successful DELETE)400Bad Request,401Unauthorized,403Forbidden,404Not Found,409Conflict422Unprocessable Entity (validation error — common with FastAPI)429Too Many Requests (rate limiting)500Internal Server Error,503Service Unavailable
- “What does idempotent mean? Which HTTP methods are idempotent?”
- “Your API returns 200 for everything, even errors. What is wrong with this?”
- “How would you design a REST API for a file upload that supports resumable uploads?“
10. Data Science and Libraries
What is NumPy and why is it used?
What is NumPy and why is it used?
ndarray) that underlies virtually all numerical computing in Python. It is fast because array operations run in compiled C code, not interpreted Python.Why NumPy is 10-100x faster than Python lists:- Broadcasting: Operations between arrays of different shapes without copying:
- Vectorization: Replace Python loops with array operations
- Views vs copies: Slicing creates a view (shared memory), not a copy. Modifying a slice modifies the original. Use
.copy()for independent data. - dtype: All elements share one type (
float64,int32, etc.). This enables the performance gains.
- “Explain broadcasting. What happens when you add a (3,4) array and a (4,) array?”
- “What is the difference between a NumPy view and a copy? When does it matter?”
- “Why is
np.sum(arr)faster thansum(arr)for a NumPy array?”
What is pandas and its main data structures?
What is pandas and its main data structures?
- Never iterate with
for row in df.iterrows()— it is 100-1000x slower than vectorized operations. Use.apply()(better) or vectorized NumPy operations (best). chunksizefor large files:pd.read_csv('huge.csv', chunksize=10_000)returns an iterator of DataFrames, enabling processing files larger than RAM.categorydtype for strings with few unique values: Reduces memory by 90%+ for columns like “status” or “country”.inplace=Truedoes NOT save memory — pandas internally creates a copy anyway. It is deprecated in newer pandas.
- polars: 5-10x faster than pandas for most operations. Written in Rust. No GIL limitations. Lazy evaluation. The future of Python dataframes.
- Dask: Parallel pandas for datasets larger than RAM. Distributed computing.
- PySpark: For truly massive datasets (TB+) on clusters.
iterrows() for DataFrame operations, or not knowing about polars.Follow-up:- “Why is
df.iterrows()slow? What should you use instead?” - “How would you process a 50GB CSV file with pandas?”
- “When would you choose polars over pandas?”
How do you handle missing data in pandas?
How do you handle missing data in pandas?
- Drop rows (
df.dropna()): When missing data is <5% and random (MCAR). Risky if missing data is systematic. - Fill with constant (
df.fillna(0)ordf.fillna('Unknown')): When a default value makes domain sense. - Forward/backward fill (
df.fillna(method='ffill')): For time series data where the last known value is a reasonable proxy. - Fill with mean/median (
df.fillna(df.mean())): When the distribution is roughly normal (mean) or skewed (median). Warning: reduces variance and can bias models. - Interpolation (
df.interpolate()): For continuous time series. - Model-based imputation (scikit-learn’s
SimpleImputer,KNNImputer): When relationships between features can predict missing values.
NaN(NumPyfloat('nan')): The default for missing numeric data.NaN != NaNisTrue(IEEE 754).None: Python’s null. Used in object-dtype columns.pd.NA(pandas 1.0+): The new nullable type. Works with integer columns without converting to float.- Older pandas: Missing integers become floats because
NaNis a float.pd.Int64Dtype()fixes this.
fillna(0) for everything.” This corrupts data when 0 has semantic meaning.Follow-up:- “Your temperature column has 20% missing values.
fillna(0)would be wrong. What do you do?” - “What is the difference between
NaN,None, andpd.NAin pandas?” - “How would you detect if missing data is random (MCAR) vs systematic (MNAR)?”
What is matplotlib?
What is matplotlib?
- seaborn: Statistical visualizations (distributions, regressions, heatmaps). Built on matplotlib, much less code for complex plots.
- plotly: Interactive web-based charts. Zoom, hover, click. Great for dashboards and Jupyter notebooks.
- Altair: Declarative visualization based on Vega-Lite. Concise syntax, excellent for exploratory analysis.
- bokeh: Interactive web plots with Python backend. Similar to plotly.
- “What is the difference between the pyplot and object-oriented matplotlib interfaces?”
- “When would you use plotly vs matplotlib? What are the trade-offs?”
- “How do you create publication-quality figures with matplotlib?”
What is scikit-learn used for?
What is scikit-learn used for?
fit/predict/transform) and comprehensive algorithm collection make it the go-to for classical ML.The universal API pattern:RandomForestClassifier to LogisticRegression by changing one line.The Pipeline pattern (production-critical):- Deep learning: Use PyTorch or TensorFlow
- Gradient boosting at scale: Use XGBoost, LightGBM, or CatBoost (faster than sklearn’s GradientBoosting)
- Very large datasets (millions of rows): Some sklearn algorithms do not scale. Use
partial_fitfor incremental learning, or switch to Spark MLlib.
- “What is data leakage and how do Pipelines prevent it?”
- “When would you use XGBoost instead of sklearn’s RandomForest?”
- “How would you handle a dataset with 100M rows that does not fit in memory?“
11. Additional Important Topics
What is list slicing?
What is list slicing?
sequence[start:end:step]. But the real depth is in understanding how slicing interacts with the object model.The basics:startis inclusive,endis exclusive- Negative indices count from end:
lst[-3:]gets last 3 elements stepcontrols stride:lst[::2]gets every other element- Omitted values:
lst[:]copies entire list,lst[::-1]reverses
__getitem__ with a slice object: lst[1:4] is equivalent to lst.__getitem__(slice(1, 4)). You can implement slicing in custom classes by accepting slice objects in __getitem__.Critical nuance — shallow copy:__getitem__.Follow-up:- “What does
lst[:]actually do? Is it a deep copy or shallow copy?” - “How would you implement slicing in a custom class?”
- “What happens when you assign to a slice:
lst[1:3] = [10, 20, 30, 40]?”
What is the difference between '==' and 'is'?
What is the difference between '==' and 'is'?
== calls __eq__ to compare values. is compares memory addresses (object identity). This is covered in the identity operators question above — the key addition here is when each goes wrong.The dangerous cases:is with strings seems to work (but should not be relied on):
CPython interns string literals and identifier-like strings. So "hello" is "hello" is True in practice, but "hello world" is "hello world" may or may not be True depending on context. Never rely on this behavior.What interviewers are really testing: Whether you use is correctly (only for None and sentinels).Red flag answer: Using is for any value comparison besides None.Follow-up:- “Can
==returnTruewhileisreturnsFalse? Give an example.” - “Why does
a = 256; b = 256; a is breturnTruebuta = 257; b = 257; a is bmight returnFalse?”
What are Python's logical operators?
What are Python's logical operators?
and, or, and not are logical operators, but what most people miss is that and/or return values, not booleans.Short-circuit evaluation with value return:and/or do NOT always return booleans. bool(x and y) returns a boolean, but x and y returns the actual value. This matters when you store the result.What interviewers are really testing: Whether you know about value return behavior, not just True/False logic.Red flag answer: “They return True or False.” They return values.Follow-up:- “What does
[] or 'default'return? What about[1,2] or 'default'?” - “How does short-circuit evaluation work with
and?”
What is the enumerate() function?
What is the enumerate() function?
enumerate() adds a counter to an iterable, yielding (index, value) tuples. It is essential for Pythonic code — any time you need an index in a loop.Why it exists:enumerate is a lazy iterator. It does not create a list of tuples in memory. It yields one (count, value) tuple at a time.Advanced pattern — enumerate with unpacking in comprehensions:enumerate instead of range(len(...)).Red flag answer: Using for i in range(len(list)): when enumerate would be cleaner.Follow-up:- “How does
enumerateinteract with generators? Does it consume the entire generator?” - “What does the
startparameter do, and when would you use it?”
What is the zip() function?
What is the zip() function?
zip() pairs up elements from multiple iterables, creating tuples of corresponding elements. It is fundamental for parallel iteration.Core behavior:strict=True.Red flag answer: Not knowing that zip silently drops elements from longer iterables.Follow-up:- “You zip two lists of different lengths. What happens to the extra elements?”
- “How do you unzip a list of tuples? Explain the
zip(*)pattern.” - “When would you use
zip_longestvszip(..., strict=True)?“
12. Core Programming Concepts
What is the difference between append() and extend()?
What is the difference between append() and extend()?
append treats its argument as a single element, extend treats its argument as an iterable of elements.append(x): O(1) amortized. Single element.extend(iterable): O(k) where k is length of iterable. But faster than callingappendk times because it avoids k separate method lookups and resizes once.+= [items]is equivalent toextend, notappend.
extend and strings:- “What happens when you
extenda list with a string?” - “Is
lst += [1, 2]the same aslst.extend([1, 2])orlst.append([1, 2])?”
What are default arguments in functions?
What are default arguments in functions?
None sentinel:[] is evaluated once when def is executed (at import time). The list object is stored in add_item.__defaults__. Every call that uses the default shares the same list object.When mutable defaults are intentional (rare but valid):
Caching/memoization patterns use mutable defaults deliberately:@functools.lru_cache instead.What interviewers are really testing: Whether you know about the mutable default bug and the None sentinel pattern.Red flag answer: Not knowing about this trap, or not using None as sentinel.Follow-up:- “Why are default arguments evaluated at definition time, not call time?”
- “How can you inspect a function’s default values?”
- “When would a mutable default argument be intentionally useful?”
What is the difference between remove(), pop(), and del?
What is the difference between remove(), pop(), and del?
remove(value) | pop(index) | del lst[index] | |
|---|---|---|---|
| Removes by | Value | Index | Index |
| Returns removed item | No | Yes | No |
| On missing | ValueError | IndexError | IndexError |
| Complexity | O(n) — linear search | O(n) from front, O(1) from end | O(n) from front, O(1) from end |
remove(x): When you know the value but not the index. Only removes first occurrence.pop(i): When you need the removed value (e.g., stack operations:stack.pop())del lst[i]: When you just want deletion by index without needing the valuedel lst[1:3]: Slice deletion — removes a range
remove in a loop:remove() during iteration.Follow-up:- “Why does removing items from a list during iteration skip elements?”
- “What is the time complexity of
remove()vspop()?”
What is string formatting?
What is string formatting?
f"{x = }" trick (Python 3.8+) is a game-changer for debugging. It automatically shows the variable name alongside its value.When to use each approach:- f-strings: Default choice. Fast (compiled at parse time), readable, flexible.
str.format(): When the format string is determined at runtime (e.g., loaded from a config file). f-strings cannot be reused as templates.%formatting: Legacy code only. Still common inloggingmodule for performance reasons (string not formatted if log level is filtered out).string.Template: For user-provided format strings (safe — no arbitrary code execution).
f"{__import__('os').system('rm -rf /')}". Never build f-strings from user input. Use string.Template for user-provided format strings.What interviewers are really testing: Whether you use f-strings, know the = debugging feature, and understand security implications.Red flag answer: Using % formatting or str.format() when f-strings would be simpler.Follow-up:- “What is the
f'{x = }'syntax in Python 3.8+? When would you use it?” - “When would
str.format()be better than an f-string?” - “Why is f-string formatting a security risk with user input?”
What are the any() and all() functions?
What are the any() and all() functions?
any() and all() are built-in functions for checking conditions across iterables. They short-circuit for efficiency.Behavior:any() and all() stop as soon as the result is determined. any stops at first True, all stops at first False. This means you can use them with expensive checks and they will not evaluate everything unnecessarily.De Morgan’s equivalence:not any(x)is equivalent toall(not x)— “none are true” == “all are false”not all(x)is equivalent toany(not x)— “not all are true” == “some are false”
- “Why does
all([])returnTrue? Is this intuitive?” - “How does short-circuit evaluation work with
any(expensive_check(x) for x in items)?” - “Rewrite this nested if-chain using
any()orall().”
13. Python Internals and Memory Management
How does Python handle memory management?
How does Python handle memory management?
- OS allocator (
malloc/free): Large allocations (>512 bytes) - Python object allocator (
pymalloc): Small object allocations. Uses memory pools of fixed-size blocks (8, 16, 24, … 512 bytes). This avoids callingmallocfor every small object, which would be slow. - Object-specific allocators:
int,float,list,dicteach have optimized allocation strategies. For example, small ints (-5 to 256) are pre-allocated and reused.
pymalloc manages memory in 256KB arenas, divided into 4KB pools, divided into fixed-size blocks. Objects of similar sizes share a pool. When all blocks in a pool are freed, the pool is returned to the arena for reuse.Practical memory management:- Use
sys.getsizeof(obj)for shallow size of one object - Use
pympler.asizeoffor deep/recursive size (includes referenced objects) tracemallocmodule traces memory allocations to their source linememory_profilershows line-by-line memory usagegc.get_objects()lists all tracked objects (useful for leak debugging)
pymalloc, memory profiling tools, and can debug memory issues.Red flag answer: Only knowing about reference counting without understanding the allocator layers.Follow-up:- “How would you find a memory leak in a long-running Python service?”
- “What tools do you use to profile memory usage?”
- “Why does CPython have its own memory allocator instead of just using malloc?”
What is Python's memory model (pass-by-object-reference)?
What is Python's memory model (pass-by-object-reference)?
= moves the sticky note. Mutation changes the object the sticky note is on. Rebinding moves the sticky note to a different object.The augmented assignment gotcha:+= behaves differently for mutable vs immutable types. For lists, it mutates. For tuples and strings, it creates new objects.What interviewers are really testing: Whether you can predict what happens when you pass mutable vs immutable objects to functions.Red flag answer: Saying Python is “pass by reference” or “pass by value.”Follow-up:- “Is Python pass-by-value or pass-by-reference? Neither — explain what it actually is.”
- “Why does
lst += [4]inside a function affect the caller butnum += 1does not?” - “How would you write a function that cannot accidentally modify its arguments?”
What is object interning in Python?
What is object interning in Python?
- Integers -5 to 256: Pre-allocated at interpreter startup.
a = 100; b = 100; a is bis alwaysTrue. - Short strings that look like identifiers (alphanumeric + underscore): Automatically interned.
"hello" is "hello"isTrue. - Strings used as variable names, function names, etc.: Interned by the compiler.
True,False,None: Singletons (always interned).
sys.intern():sys.intern() when you have many duplicate strings in memory (e.g., column names in a data processing pipeline with millions of rows).Why this matters: Interning saves memory and makes is comparisons O(1). But never rely on interning for correctness — always use == for value comparison. Interning is a CPython optimization detail that may change.What interviewers are really testing: Whether you know the interning boundaries and understand it is implementation-specific.Red flag answer: “Integers are interned” without knowing the -5 to 256 range or the REPL vs script difference.Follow-up:- “Why does
257 is 257give different results in the REPL vs a .py file?” - “When would you use
sys.intern()in production code?” - “Why is interning an implementation detail you should never rely on?”
What are weak references in Python?
What are weak references in Python?
None when the object is collected.The problem weak references solve:weakref.WeakValueDictionary:obj remain, the GC collects it and the cache entry automatically disappears.Use cases:- Caches: Allow cached objects to be collected when memory is tight
- Observer pattern: Observers hold weak references to subjects to avoid preventing collection
- Circular reference avoidance: Break cycles by making one reference weak
- Object tracking/debugging: Monitor objects without affecting their lifecycle
int, str, tuple, and other built-in immutable types cannot have weak references. Custom classes support them by default (unless __slots__ is used without __weakref__).What interviewers are really testing: Whether you understand the GC implications and can identify real use cases.Red flag answer: Knowing the syntax without understanding why — the garbage collection interaction.Follow-up:- “How does a
WeakValueDictionarydiffer from a regular dict in terms of garbage collection?” - “What types of objects cannot have weak references? Why?”
- “How would you implement an LRU cache that respects memory pressure using weak references?”
How does CPython differ from PyPy or Jython?
How does CPython differ from PyPy or Jython?
- Written in C. Reference implementation — defines what “Python” means.
- Compiles to bytecode, interpreted by CPython VM.
- Has the GIL. Uses reference counting + cyclic GC.
- Best ecosystem compatibility (C extensions, pip packages).
- Performance: baseline. ~10-100x slower than C for CPU-bound work.
- Written in Python (RPython). Uses JIT (Just-In-Time) compilation.
- Can be 4-10x faster than CPython for long-running programs (the JIT “warms up”).
- Has a GIL (same as CPython).
- Compatibility: Most pure-Python code works. C extensions may not (no
ctypesAPI compatibility guarantee). NumPy works via cpyext but is slower than on CPython. - Best for: Long-running servers, computation-heavy pure Python code.
- Runs on GraalVM (polyglot runtime by Oracle).
- No GIL. Can interop with Java, JavaScript, Ruby, R.
- Relatively new, limited ecosystem support.
- Compiles Python-like code to C extensions. Gives C-level performance for annotated code.
- Used by many high-performance libraries (NumPy, pandas, scikit-learn).
- Need C extensions (NumPy, pandas, TensorFlow)? CPython
- Need faster pure-Python execution? PyPy
- Need Java interop? GraalPython or Jython
- Need to write C-speed extensions? Cython
- “When would you deploy PyPy instead of CPython? What would you lose?”
- “What is JIT compilation and why does it make PyPy faster?”
- “How does Cython differ from CPython?”
How can you inspect CPython bytecode?
How can you inspect CPython bytecode?
dis module disassembles Python functions into CPython bytecode instructions. This is invaluable for understanding performance and debugging mysterious behavior.Example:LOAD_FAST/STORE_FAST: Local variable access (fastest)LOAD_GLOBAL/STORE_GLOBAL: Global variable access (slower — dict lookup)LOAD_ATTR: Attribute access (dict lookup on object)CALL_FUNCTION: Function call overheadBINARY_ADD,BINARY_MULTIPLY: Arithmetic operationsFOR_ITER: Loop iteration
dis.dis() shows you exactly what the interpreter does differently.What interviewers are really testing: Whether you can use low-level tools to understand and debug Python performance.Red flag answer: Never having used dis or not understanding why local variables are faster than global ones.Follow-up:- “Why is accessing a local variable faster than a global variable in CPython?”
- “How would you use
disto explain why list comprehensions are faster than equivalent for loops?” - “What bytecode instruction does a
forloop compile to?“
14. Descriptors and Magic Methods
How do properties work under the hood? Explain the descriptor protocol
How do properties work under the hood? Explain the descriptor protocol
classmethod, staticmethod, and super() all use it.The protocol — three optional methods:__get__(self, obj, objtype=None)— Called when attribute is accessed__set__(self, obj, value)— Called when attribute is assigned__delete__(self, obj)— Called when attribute is deleted
- Data descriptor: Defines
__set__and/or__delete__. Takes priority over instance__dict__. - Non-data descriptor: Defines only
__get__. Instance__dict__takes priority. @propertycreates a data descriptor. Functions are non-data descriptors (which is why you can override methods on instances).
@property works internally (simplified):obj.method and MyClass.method behave differently.Attribute lookup order (the full picture):- Data descriptors on the class (e.g., properties)
- Instance
__dict__ - Non-data descriptors on the class (e.g., methods)
__getattr__(fallback)
@property and methods work.Red flag answer: Knowing @property syntax without understanding descriptors.Follow-up:- “What is the difference between a data descriptor and a non-data descriptor?”
- “How does a function become a bound method when accessed on an instance?”
- “Build a descriptor that validates attribute types (e.g., must be
int).”
What is the difference between __getattr__ and __getattribute__?
What is the difference between __getattr__ and __getattribute__?
__getattr__(self, name) — The fallback. Only called when normal attribute lookup fails (not found in __dict__, class hierarchy, or data descriptors):__getattribute__(self, name) — Total control. Called for EVERY attribute access:__getattribute__, any self.x access calls __getattribute__ again. You MUST use object.__getattribute__(self, name) to break the recursion.Use cases:__getattr__: Proxy objects, lazy loading, default values, API wrappers__getattribute__: Logging, access control, transparent proxies (rare — use with extreme caution)
__getattribute__.Follow-up:- “How do you avoid infinite recursion in
__getattribute__?” - “Build a proxy object using
__getattr__that delegates all attribute access to a wrapped object.” - “When would you use
__getattribute__instead of__getattr__?”
How can you create a class without using the class keyword?
How can you create a class without using the class keyword?
type() as a metaclass demonstrates that classes are objects created by calling a metaclass.- The
classstatement is syntactic sugar for atype()call typeis both a function (returning an object’s type) and a metaclass (creating new classes)- Classes are first-class objects that can be created, modified, passed as arguments, and returned from functions
- ORMs that create model classes from database schemas
- Plugin systems that register classes at runtime
- Testing frameworks that generate test classes dynamically
- Serialization libraries that create classes from schemas (Pydantic, marshmallow)
type() can create classes.Follow-up:- “If
typecreates classes, what createstype? (Answer:typecreates itself)” - “How would you dynamically create a class with attributes determined at runtime?”
- “How does this relate to metaclasses?”
Explain monkey patching with an example
Explain monkey patching with an example
- Testing:
unittest.mock.patchis structured monkey patching with automatic cleanup - Hotfixing third-party bugs: Patch a broken method in a library you cannot update immediately
- Feature flags: Replace behavior at runtime based on configuration
- gevent/eventlet: Monkey-patches the standard library to make blocking I/O non-blocking
- Code behavior depends on import order and patch order
- Breaks IDE navigation and static analysis (the patched function is not the one in the source)
- Library updates may change the patched interface, breaking your patch silently
- Makes debugging extremely difficult (“why is requests.get returning a mock in production?”)
unittest.mock.patch:- “How does
unittest.mock.patchimprove on raw monkey patching?” - “What is gevent’s monkey patching and why is it necessary?”
- “You need to fix a bug in a third-party library. Monkey patch or fork the repo?“
15. Advanced Generators and Iterators
What is the yield from statement (PEP 380)?
What is the yield from statement (PEP 380)?
yield from delegates to a sub-generator, but it is far more than syntactic sugar for a for loop. It creates a bidirectional channel between the caller and the sub-generator.Basic delegation:yield from handles that a manual loop does not:send()forwarding: Values sent to the outer generator are forwarded to the sub-generatorthrow()forwarding: Exceptions thrown into the outer generator are thrown into the sub-generatorclose()forwarding: Closing the outer generator closes the sub-generator- Return value capture: The sub-generator’s
returnvalue becomes the value of theyield fromexpression
yield from, you would need ~40 lines of boilerplate to handle all edge cases correctly. PEP 380 formalized this.Real-world use case — flattening nested structures:yield from was the original mechanism for coroutine composition before async/await was introduced in Python 3.5. await is essentially yield from for coroutines.What interviewers are really testing: Whether you understand the bidirectional channel, not just the iteration delegation.Red flag answer: “It is the same as for x in gen: yield x.” It is not — send(), throw(), and close() forwarding are the whole point.Follow-up:- “What is the difference between
yield from genandfor x in gen: yield x?” - “How does
yield fromrelate toawaitin asyncio?” - “How do you capture the return value of a sub-generator?”
How does generator execution and frame persistence work?
How does generator execution and frame persistence work?
yield and thawed on next().The execution lifecycle:- Generator objects hold a reference to their frame object (
gen.gi_frame) - The frame stores: local variables, instruction pointer (
f_lasti), operand stack, block stack gen.gi_frameisNoneafter the generator is exhausted- You can inspect the current state:
gen.gi_frame.f_locals,gen.gi_frame.f_lineno
- “How can you inspect the current state of a suspended generator?”
- “Why can you have millions of generators but only thousands of threads?”
- “What happens to the generator frame when it is exhausted?”
How does itertools enable memory-efficient data processing?
How does itertools enable memory-efficient data processing?
itertools module provides composable, lazy building blocks for creating efficient data processing pipelines. It is one of the most underused modules in Python.Essential itertools functions:- “How would you process a 100GB log file using
itertoolsand generators?” - “What is the gotcha with
itertools.groupby? (Must be sorted first)” - “When would
itertools.productbe better than nested for loops?“
16. Advanced Concurrency and Parallelism
Why does the threading module exist if GIL prevents true parallelism?
Why does the threading module exist if GIL prevents true parallelism?
time.sleep()). While one thread waits for I/O, other threads can execute Python code.Real-world performance impact:- GUI applications (separate thread for UI responsiveness)
- Background tasks (periodic cleanup, monitoring)
- Real-time systems (audio playback while processing)
- Libraries that release the GIL in C extensions (NumPy, PIL, cryptography)
counter += 1 are NOT atomic (they compile to multiple bytecodes: LOAD, ADD, STORE). You still need threading.Lock for shared mutable state.What interviewers are really testing: Whether you understand that the GIL is released during I/O and that threading is valuable for I/O-bound work.Red flag answer: “Threading is useless in Python because of the GIL.”Follow-up:- “Is
counter += 1thread-safe in Python? Why or why not?” - “When is the GIL released? Give me specific examples.”
- “How many threads should you use for a web scraper hitting 1000 URLs?”
When to choose ThreadPoolExecutor vs ProcessPoolExecutor vs asyncio?
When to choose ThreadPoolExecutor vs ProcessPoolExecutor vs asyncio?
- “Your task involves both API calls (I/O) and data processing (CPU). What architecture do you use?”
- “What happens if you use ThreadPoolExecutor for CPU-bound work? Show me the math.”
- “Why is asyncio more efficient than threading for 10,000 concurrent connections?”
What are the resource implications (memory, CPU) of each concurrency approach?
What are the resource implications (memory, CPU) of each concurrency approach?
| Approach | Memory per unit | Creation time | Max practical count |
|---|---|---|---|
| Thread | ~100KB (stack) | ~1ms | ~1,000-5,000 |
| Process | ~30-50MB (full Python interpreter) | ~50-100ms | ~CPU count (4-64) |
| Coroutine | ~1KB | ~0.01ms | ~100,000+ |
- Default stack size: 8MB on Linux (most unused). Set
threading.stack_size(65536)for low-memory threads. - Context switching: ~1-10 microseconds (OS-managed, preemptive)
- Shared memory: Pro for data sharing, con for needing locks
- GIL contention: Under heavy CPU load, threads fight for the GIL, adding ~5% overhead per thread
- Each process gets a full Python interpreter copy
- IPC (Inter-Process Communication) requires serialization:
pickleformultiprocessing, which is slow for large objects - Copy-on-write after
fork(): Memory is shared until modified. Linux is efficient here; macOS less so. initializerfunctions inProcessPoolExecutorrun once per worker — use them to load large models/data per worker instead of passing through pickle
- Coroutines are lightweight Python objects (~1KB)
- Event loop overhead: ~10 microseconds per coroutine switch
- No parallelism: All coroutines share one CPU core
- Blocking call in any coroutine blocks ALL coroutines (the “foot gun”)
- “You need to handle 50,000 concurrent WebSocket connections. Calculate the memory needed for threads vs asyncio.”
- “Your process pool workers each load a 500MB ML model. How much total memory?”
- “How does copy-on-write affect memory usage after forking?”
How does async/await differ from threading?
How does async/await differ from threading?
- The OS decides when to switch threads (can happen at any bytecode instruction)
- You need locks to protect shared data (race conditions are possible)
- Blocking I/O is fine — the OS blocks the thread and switches to another
- Simple mental model: write sequential code, wrap in threads
- Coroutines explicitly yield control at
awaitpoints (you control when switching happens) - No locks needed for coroutine-local data (no preemptive switching between awaits)
- Blocking I/O is FORBIDDEN — one blocking call freezes the entire event loop
- Requires async-compatible libraries for everything (aiohttp, asyncpg, aiofiles)
async def, the entire call stack above it must also be async. You cannot await from a regular function. This creates a viral effect that can require rewriting large portions of code.When threading wins: Quick scripts, code that uses sync-only libraries, teams without async experience.When asyncio wins: High-concurrency servers (>1000 connections), WebSocket servers, API gateways.What interviewers are really testing: Whether you understand the trade-offs, especially the cooperative nature of async.Red flag answer: “Async is always faster than threading.” It is more efficient for many connections but adds complexity.Follow-up:- “What happens if you accidentally call
time.sleep(10)inside an async handler?” - “How do you run blocking code from within an async function?”
- “What is the ‘function color’ problem and how does it affect your architecture decisions?”
How do you profile and solve real-world performance bottlenecks?
How do you profile and solve real-world performance bottlenecks?
- Reproduce the slow behavior with realistic data
- Profile to find the hot path (usually 5% of code is 95% of runtime)
- Optimize the hot path using appropriate techniques:
- Algorithm improvement (O(n^2) to O(n log n))
- Data structure change (list to set for lookups)
- Caching (
functools.lru_cache) - Vectorization (NumPy instead of Python loops)
- Concurrency (appropriate model for task type)
- C extension (Cython, pybind11) as last resort
- Measure again to verify improvement
- Add regression benchmarks to prevent future regressions
py-spy is the most practical tool for production profiling — it attaches to a running process with zero overhead, requires no code changes, and produces flame graphs. It has saved hours of debugging in my experience.What interviewers are really testing: Whether you follow a systematic profiling workflow or guess at optimizations.Red flag answer: Optimizing without profiling first, or only knowing print(time.time()).Follow-up:- “How would you profile a production service without restarting it?”
- “Your function is slow.
cProfileshows it spends 80% of time in one line. What is your next step?” - “What is a flame graph and how do you read one?”
What are the real-world use cases for multithreading, multiprocessing, and asyncio?
What are the real-world use cases for multithreading, multiprocessing, and asyncio?
- Web scraping: Fetch 100 URLs concurrently. Each thread blocks on network I/O while others proceed. ThreadPoolExecutor with 10-50 workers.
- File processing: Read/write multiple files simultaneously. Disk I/O is the bottleneck, not CPU.
- Database operations: Execute multiple queries in parallel. Database drivers release the GIL during network waits.
- GUI applications: Keep the UI responsive while doing background work.
- Image/video processing: Resize 10,000 images using all 8 CPU cores. Each worker process handles a batch.
- Data transformation: ETL pipelines with heavy computation (parsing, transforming, aggregating).
- Scientific computing: Monte Carlo simulations, numerical integration, matrix operations (when NumPy is insufficient).
- ML model training: Parallel hyperparameter search with
ProcessPoolExecutor.
- Web servers: FastAPI/Starlette handling 10,000+ concurrent HTTP connections.
- WebSocket servers: Real-time chat, live dashboards, gaming servers.
- API gateways: Fan-out requests to multiple backend services concurrently.
- Message queue consumers: Process Kafka/RabbitMQ messages with async handlers.
- Web crawlers: Crawl thousands of pages with aiohttp, rate-limited.
- “Design the concurrency architecture for an image-processing API that receives uploads (I/O), processes them (CPU), and stores results (I/O).”
- “How would you handle a mixed workload that is both I/O-bound and CPU-bound?”
- “What is the maximum number of concurrent WebSocket connections a single asyncio server can handle? What is the bottleneck?“
17. Design Patterns and Architecture
How do you apply SOLID principles in Python?
How do you apply SOLID principles in Python?
NotImplementedError.I — Interface Segregation:
Use typing.Protocol (Python 3.8+) for narrow, focused interfaces. Do not force classes to implement methods they do not need.D — Dependency Inversion:
Depend on abstractions (Protocols, ABCs), not concrete implementations. In Python, this is often as simple as passing a function or Protocol-typed parameter.What interviewers are really testing: Whether you can translate SOLID from Java-speak to Pythonic patterns.Red flag answer: Describing SOLID exactly as a Java textbook would without Python-specific adaptations.Follow-up:- “How does
typing.Protocolimplement the Interface Segregation principle in Python?” - “Give me a real example of Liskov Substitution violation in Python.”
- “How do you implement Dependency Inversion in Python without a DI framework?”
Why prefer composition over inheritance? Provide a practical refactoring example
Why prefer composition over inheritance? Provide a practical refactoring example
- “When IS inheritance the right choice in Python?”
- “How do mixins fit into the composition vs inheritance debate?”
- “Refactor this 5-level inheritance chain into composition.”
How would you implement a Factory pattern in Python?
How would you implement a Factory pattern in Python?
- Direct instantiation: When the class to create is known at the call site
- Factory function: When the class depends on runtime configuration (format type, environment)
- Abstract factory: When you need families of related objects (rare in Python)
- “How does the registry decorator pattern work?”
- “When is a factory function overkill and you should just use
@classmethod?” - “How would you make this factory extensible for plugins?”
What are four different ways to implement Singleton in Python, and their trade-offs?
What are four different ways to implement Singleton in Python, and their trade-offs?
sys.modules and only executed once. Most Pythonic approach. Used by the standard library.2. __new__ override:__init__ runs on every call, which can re-initialize state.3. Decorator:isinstance checks.4. Metaclass:isinstance works. But complex.The honest answer: In Python, just use a module-level instance or a function that lazily creates one. The other approaches are mostly academic exercises. “The Borg pattern” (sharing state across instances) is another alternative.What interviewers are really testing: Whether you know multiple approaches AND can articulate that the simple one is usually best.Red flag answer: Reaching for the metaclass approach first.Follow-up:- “Why is a module-level singleton considered the most Pythonic approach?”
- “Are any of these approaches thread-safe? How would you make them safe?”
- “What is the Borg pattern and when is it better than a traditional Singleton?”
How do functions as first-class objects make design patterns simpler in Python?
How do functions as first-class objects make design patterns simpler in Python?
- “Which GoF patterns are unnecessary in Python? Why?”
- “When IS a class-based strategy better than a function in Python?”
- “How does
functools.partialrelate to the Command pattern?”
How do you write a context manager from scratch (class-based and @contextmanager)?
How do you write a context manager from scratch (class-based and @contextmanager)?
@contextmanager (concise):contextlib.ExitStack for dynamic resource management:async with):ExitStack for dynamic cases.Red flag answer: Only knowing with open() without being able to create custom context managers.Follow-up:- “When would you use
ExitStackinstead of nestedwithstatements?” - “How do async context managers differ from sync ones?”
- “What happens if code inside a
@contextmanager’stryblock raises an exception?”
How do you resolve circular dependencies?
How do you resolve circular dependencies?
AttributeError or ImportError.Solution 1 — Restructure (best):
Extract shared dependencies into a third module. If A needs something from B and B needs something from A, there is probably a shared concept that deserves its own module:TYPE_CHECKING guard (for type hints only):TYPE_CHECKING pattern.Red flag answer: “I just move the import into the function” without understanding why the cycle exists or knowing about TYPE_CHECKING.Follow-up:- “How does
from __future__ import annotationshelp with circular imports?” - “What is the
TYPE_CHECKINGconstant and how does it work?” - “When is restructuring better than lazy imports?“
18. Testing and Debugging
How do you decide what to mock in tests?
How do you decide what to mock in tests?
- Network calls (HTTP APIs, database queries, message queues)
- File system operations
- System clock (
datetime.now(),time.time()) - Environment variables
- Third-party services (payment gateways, email providers)
- Your own business logic classes
- Data transformations
- Validation functions
- Anything that does not cross a process/network boundary
patch target rule (most common mistake):patch replaces the name at the import location, not the definition location. Getting this wrong is the #1 source of “my mock is not working” bugs.What interviewers are really testing: Whether you know the boundary rule and the patch target rule.Red flag answer: Mocking internal methods or not knowing where to target patch.Follow-up:- “Why do you
patch('mymodule.get')instead ofpatch('requests.get')?” - “When would you use a real database in tests instead of mocking?”
- “How do you test code that depends on
datetime.now()?”
What is the difference between mocking and stubbing?
What is the difference between mocking and stubbing?
unittest.mock, Mock() does both, but the distinction matters for test design.Stubbing — provides data, does not verify behavior:- Stub when you care about the output of the function under test
- Mock when you care about side effects (was the email sent? was the cache invalidated?)
- “When does verifying mock calls become an anti-pattern?”
- “How does
unittest.mock.ANYhelp you write less brittle mock assertions?” - “What is
spec=Trueon a Mock and why should you use it?”
What are the dangers of over-mocking in test suites?
What are the dangers of over-mocking in test suites?
- Tests break on refactoring but code still works — you changed internal structure, not behavior, but 50 tests fail because mocks expected specific method calls
- Mocks return mocks return mocks —
mock.get.return_value.json.return_value.datais a code smell - Tests are harder to read than the code — 40 lines of mock setup for 5 lines of actual test
- Bugs slip through — the real database driver returns
Decimal, your mock returnsfloat, causing a production bug that tests never caught
- Unit tests (70%): Test pure logic, no I/O. No mocks needed because there is nothing to mock.
- Integration tests (20%): Test with real database, real file system. Use
pytest-dockerortestcontainersfor reproducible environments. - End-to-end tests (10%): Test the full system. Slow but catches integration issues.
- Fakes: In-memory implementations of interfaces (e.g.,
dict-based repository instead of database repository) - Test databases: SQLite for unit tests, Docker-based PostgreSQL for integration tests
- Dependency injection: Design code so dependencies are passed in, not imported
- “How do you decide between a mock and a fake?”
- “How do you test database interactions without mocking?”
- “Your team has 5,000 unit tests that all pass, but production bugs keep slipping through. What is likely wrong?”
How do you debug Python code effectively?
How do you debug Python code effectively?
- Obvious error with traceback: Read the traceback bottom-up. The last frame shows where the error occurred. The cause is usually 1-3 frames above.
- Logic error (wrong result, no crash): Use
breakpoint()(Python 3.7+) orimport pdb; pdb.set_trace()to inspect state at the problem point. - Intermittent/timing issue: Add structured logging with context. Use
logging.debug()with correlation IDs. - Performance issue: Profile with
py-spyorcProfilebefore optimizing. - Memory issue: Use
tracemallocto track allocations.
breakpoint()):n(next) — execute current line, step over function callss(step) — step INTO function callsc(continue) — run until next breakpointl(list) — show source code around current linep expr— print expression valuepp expr— pretty-print (great for dicts/lists)w(where) — show full stack traceu/d(up/down) — navigate stack frames!statement— execute arbitrary Python code
breakpoint() and have a systematic debugging approach.Red flag answer: “I add print statements everywhere.” This works but is the least efficient approach.Follow-up:- “What is
breakpoint()and how doesPYTHONBREAKPOINTenvironment variable work?” - “How do you debug an issue that only happens in production?”
- “Walk me through how you would debug a function that returns an incorrect result.”
How do you use logging for debugging and production monitoring?
How do you use logging for debugging and production monitoring?
DEBUG: Detailed diagnostic info. Disabled in production.INFO: Normal operation milestones. “User created”, “Payment processed”.WARNING: Something unexpected but not an error. “Retry attempt 2/3”, “Disk 80% full”.ERROR: Something failed but the service continues. “Failed to send email, will retry”.CRITICAL: Service is about to crash. “Database connection pool exhausted”.
%s vs f-string performance trick:%s formatting, the string is only built if the log level is active. With f-strings, it is always built. This matters in hot loops.Correlation IDs for distributed tracing:%s performance pattern.Red flag answer: Using print() in production code, or only using logging.basicConfig().Follow-up:- “Why use
%sformatting instead of f-strings in logging calls?” - “How do you correlate logs across microservices?”
- “How would you set up logging that outputs JSON for a production ELK stack?”
How do you debug performance issues in Python?
How do you debug performance issues in Python?
- CPU-bound: High CPU usage, slow computation. Profile with
cProfileorpy-spy. - I/O-bound: Low CPU, waiting for network/disk. Profile with
asynciodebug mode or logging timestamps. - Memory-bound: Growing RSS, GC pauses. Profile with
tracemalloc.
- Algorithm change: O(n^2) to O(n log n) is the biggest win possible
- Data structure change:
listincheck tosetincheck (O(n) to O(1)) - Caching:
@functools.lru_cachefor pure functions called repeatedly - Vectorization: Replace Python loops with NumPy/pandas operations
- Concurrency: Right model for the task (threading/async for I/O, multiprocessing for CPU)
- C extension: Cython, pybind11, or Rust via PyO3 (last resort)
- “How do you read a flame graph? What pattern indicates a performance issue?”
- “Your function is slow but
cProfileshows no single slow function. What could cause this?” - “How does
py-spyattach to a running process without restarting it?”
How do you debug memory leaks in Python?
How do you debug memory leaks in Python?
- Unbounded cache:
@lru_cache(maxsize=None)or plain dict used as cache. Fix: setmaxsizeor usecachetools.TTLCache. - Closures capturing large objects: Lambda in a loop captures a DataFrame. Fix: extract the value, do not close over the large object.
- Global collections: Module-level lists/dicts that
appendbut neverpop. Fix: use bounded collections or weak references. - C extension leak:
tracemalloccannot see it. Monitor RSS via/proc/self/statusand compare withtracemalloctotals. The difference is C-level allocations. - Circular references with
__del__: Before Python 3.4, these were uncollectable. After 3.4, the GC handles them, but__del__can still cause issues by resurrecting objects.
gc.collect() in a loop” (treats the symptom, not the cause).Follow-up:- “How do you distinguish a Python-level memory leak from a C extension leak?”
- “Your service RSS grows by 1GB/day. Walk me through your debugging approach.”
- “Why might
gc.collect()not free memory, and why might freed Python memory not reduce RSS?”
Advanced Scenario-Based Questions
Scenario: Your CPU-bound Python service is only using one core despite running 8 threads. What is happening and how do you fix it?
Scenario: Your CPU-bound Python service is only using one core despite running 8 threads. What is happening and how do you fix it?
ThreadPoolExecutor, yet htop shows a single core pegged at 100% while the other 7 sit idle. P99 latency is climbing toward your SLA. Your manager asks why the 8-core box is basically a single-core machine.What weak candidates say:- “Just add more threads” or “Increase the thread pool size to 64.”
- Vaguely mention the GIL without explaining why it applies here or what concrete alternatives exist.
- Suggest rewriting the whole service in Go or Rust as the only option.
- “This is the GIL in action. Because the workload is CPU-bound (NumPy scoring that drops back into Python between calls, or pure-Python feature preprocessing), the GIL serializes bytecode execution across all 8 threads. You get concurrency but not parallelism.”
- “First, I would profile with
py-spyorcProfileto confirm the hot path is actually CPU-bound Python, not I/O. If it is, the fix depends on context:”- Quick win: Switch from
ThreadPoolExecutortoProcessPoolExecutor. Each worker process gets its own GIL. Serialization cost for IPC is the trade-off — measure it. - Medium effort: Offload the hot loop to a C extension, Cython, or ensure the NumPy/SciPy calls release the GIL (most do via
Py_BEGIN_ALLOW_THREADS). Verify withnogil=Truein Cython. - Structural fix: Move to a multi-process architecture with
gunicorn --workers 8or Celery workers, one per core, with a Redis/RabbitMQ broker. - Nuclear option: Evaluate the free-threaded CPython 3.13+ build (
python --disable-gil), but acknowledge it is experimental and many C extensions are not yet safe.
- Quick win: Switch from
- “In a previous role, we had a similar issue with a feature-engineering pipeline. Switching to
ProcessPoolExecutorwith 4 workers on a 4-core box dropped P99 from 850ms to 210ms. The gotcha was that our model object was 400MB and pickling it for each task killed throughput — we fixed that by loading the model once per worker process using an initializer function.”
- “If your NumPy scoring already releases the GIL, but the feature preprocessing between calls is pure Python, how do you handle the mixed workload?”
- “What is the memory overhead of
ProcessPoolExecutorwith 8 workers each loading a 500MB model? How would you mitigate it?” - “Explain what
Py_BEGIN_ALLOW_THREADSactually does at the C level and why it is safe for NumPy but dangerous for code that touches Python objects.”
Scenario: A long-running Python process is slowly eating all available memory over 48 hours. How do you find and fix the leak?
Scenario: A long-running Python process is slowly eating all available memory over 48 hours. How do you find and fix the leak?
- “Use
delon variables” or “Callgc.collect()in a loop.” - Blame Python for being a memory-hungry language.
- Suggest just restarting the service on a cron schedule (treating the symptom, not the cause).
- “I would approach this systematically with three tools in sequence:”
- Step 1 —
tracemalloc: Enable it in production withtracemalloc.start(25)(25 frames deep). Take snapshots every 10 minutes and compare withsnapshot.compare_to(old_snapshot, 'lineno'). This shows which lines are allocating the most new memory between snapshots. - Step 2 —
objgraph: Once I have a suspect module, useobjgraph.show_most_common_types()to see which object types are accumulating. Thenobjgraph.show_backrefs(objgraph.by_type('SuspectClass')[:5])to visualize what is holding references. - Step 3 —
gcmodule: Checkgc.garbagefor uncollectable objects (those with__del__methods in reference cycles). Rungc.set_debug(gc.DEBUG_SAVEALL)in a staging environment to catch them.
- Step 1 —
- “The most common culprits I have seen in production:”
- Caches without eviction: An
lru_cachewithmaxsize=None(unbounded), or a plain dict used as a cache that grows forever. Fix: set a maxsize or usecachetools.TTLCache. - Closures capturing large objects: A lambda or callback inside a loop that accidentally closes over a large DataFrame. The DataFrame stays alive as long as any callback references it.
- Circular references with
__del__: Pre-Python-3.4, these were uncollectable. Post-3.4, the GC handles them, but custom__del__methods can still prevent collection if they resurrect objects. - C extension leaks: If using libraries like
lxmlor database drivers, memory may leak in C code thattracemalloccannot see. Usememory_profileralongside/proc/self/smapsto compare Python-tracked vs OS-tracked RSS.
- Caches without eviction: An
- “At a previous company, we traced a 10GB/day leak to a logging handler that appended formatted log records to an in-memory list (someone left a
MemoryHandlerwith no target).objgraphshowed 8 millionstrobjects. Two-line fix, saved us $400/month in oversized EC2 instances.”
- “What is the difference between RSS, VMS, and USS, and which one should you actually monitor for a Python memory leak?”
- “Why might
gc.collect()not reclaim memory, and why might the process RSS not drop even after Python frees objects?” - “How would you detect a memory leak in a C extension that
tracemalloccannot track?”
Scenario: Your asyncio web server freezes under load and stops responding to health checks. What went wrong?
Scenario: Your asyncio web server freezes under load and stops responding to health checks. What went wrong?
- “Add more asyncio tasks” or “Increase the connection limit.”
- Confuse asyncio with threading and suggest adding locks.
- Cannot explain what “blocking the event loop” actually means.
- “Classic event loop starvation. The CPU is at 5% which rules out a CPU-bound bottleneck — something is blocking the single event loop thread, preventing it from servicing other coroutines. The three most common causes:”
- Synchronous I/O in an async handler: Someone called
requests.get(),time.sleep(), or a synchronous database driver inside anasync defhandler instead of usingaiohttp,asyncio.sleep(), or an async driver. Even one blocking call stalls every coroutine. - CPU-bound work on the event loop: JSON serialization of a 50MB payload, or a regex over a massive string, running directly in a coroutine without offloading.
- Deadlock from incorrect
awaitchains: Two coroutines waiting on each other’sasyncio.Eventorasyncio.Lock, or callingloop.run_until_complete()from inside an already-running loop (common when mixing sync and async code).
- Synchronous I/O in an async handler: Someone called
- “To diagnose, I would:”
- Enable
asynciodebug mode (PYTHONASYNCIODEBUG=1orloop.set_debug(True)) which logs coroutines that take longer than 100ms without yielding. - Use
py-spyto get a live stack trace of the stuck event loop thread — it will show exactly which blocking call it is sitting on. - Add
loop.slow_callback_duration = 0.05and watch warnings.
- Enable
- “The fix depends on the root cause:”
- Blocking I/O: Replace with async equivalent, or wrap with
asyncio.to_thread()(Python 3.9+) /loop.run_in_executor()to push it to a thread pool. - CPU-bound work: Offload to
ProcessPoolExecutorvialoop.run_in_executor(). - Deadlock: Restructure the
awaitchain. Never callloop.run_until_complete()from within a running loop — useawaitdirectly.
- Blocking I/O: Replace with async equivalent, or wrap with
- “I once debugged a production freeze caused by a single
dns.resolver.resolve()call (synchronousdnspython) inside an async handler. Under load, DNS lookups took 2-5 seconds each, completely blocking the event loop. Replacing it withaiodnsfixed the freeze and dropped P99 latency from timeout to 40ms.”
- “What exactly happens internally when you
await asyncio.sleep(0), and why do people sprinkle it in long-running coroutines?” - “Explain the difference between
asyncio.to_thread(),loop.run_in_executor(), and just spawning a raw thread. When would you pick each?” - “Can you have a deadlock in asyncio without using any explicit locks? Describe a scenario.”
Scenario: Adding a new module causes a working application to crash on import with an AttributeError. The attribute clearly exists. What is going on?
Scenario: Adding a new module causes a working application to crash on import with an AttributeError. The attribute clearly exists. What is going on?
utils/cache.py to your Django project. Suddenly, an unrelated view that imports from utils.helpers crashes with AttributeError: module 'utils' has no attribute 'helpers'. The attribute clearly exists — utils/helpers.py is right there, unchanged. Reverting cache.py fixes it. What is happening?What weak candidates say:- “There is a typo in the import” or “The file is not saved.”
- Cannot explain how Python’s import system actually resolves modules.
- Suggest deleting
__pycache__and restarting (sometimes works, but they cannot explain why).
- “This is almost certainly a circular import. The import system is one of the trickiest parts of CPython. Here is what is likely happening:”
- When Python imports
utils.cache, that module’s top-level code runs. Ifcache.pyimports something fromutils.helpersat the top level, andhelpers.py(or something it imports) tries to import fromutils.cache, you get a partially-initialized module. - Python’s import system uses
sys.modulesas a cache. When a module is being imported, a partially-initialized module object is placed insys.modules. If another module tries to import from it before its top-level code finishes, it gets the half-built module — which may be missing attributes that have not been defined yet. - The
AttributeErrorsays'utils' has no attribute 'helpers'because theutilspackage__init__.pyhas not finished executing — it is mid-import and thehelperssubmodule has not been bound to the package namespace yet.
- When Python imports
- “To debug: add
print(f'Importing {__name__}')at the top of each suspect module and read the order. Or usepython -vto see the import sequence. The cycle will be obvious.” - “Fixes, in order of preference:”
- Refactor: Extract the shared dependency into a new module that both
cache.pyandhelpers.pyimport. Break the cycle architecturally. - Lazy import: Move the problematic import inside the function that uses it, so it runs at call time, not import time.
importlib.import_module(): Defer the import programmatically — same idea as lazy import but sometimes cleaner for dynamic cases.- Restructure
__init__.py: If__init__.pydoesfrom .helpers import *, that is often the trigger. Keep__init__.pyminimal.
- Refactor: Extract the shared dependency into a new module that both
- “One subtle variant: if you name a file
utils/cache.pyand there is also a third-party package calledcache, you can get shadowing where Python imports your local file instead of the library, or vice versa, depending onsys.pathorder. Always checkmodule.__file__to confirm you are importing what you think you are.”
- “Walk me through the exact sequence of steps CPython takes when it encounters
import utils.helpers— what does it check, and in what order?” - “What is the difference between a regular package (with
__init__.py) and a namespace package (PEP 420)? How does it affect import resolution?” - “You mentioned
sys.modules. What happens if you manually delete an entry fromsys.modulesand re-import the module? What are the dangers?”
Scenario: A metaclass-based ORM is producing bizarre behavior where model fields from one class leak into another. How do you debug this?
Scenario: A metaclass-based ORM is producing bizarre behavior where model fields from one class leak into another. How do you debug this?
UserModel.fields contains fields from OrderModel — models are somehow sharing state. The bug is intermittent and only appears when both models are imported in the same module. Test suites pass because each test file imports only one model.What weak candidates say:- “I would not use metaclasses” (avoids the question entirely).
- Cannot explain what a metaclass
__new__or__init__does during class creation. - Suggest adding print statements with no systematic approach.
- “This is a classic mutable-default-on-the-metaclass bug. The most common cause: the metaclass stores field definitions in a shared mutable object (like a class-level list or dict on the metaclass itself) instead of creating a fresh one per class. Here is the typical pattern:”
- “The fix is to create a new list per class in
__new__:”
- “To debug this in the wild, I would:”
- Check
id(UserModel.fields) == id(OrderModel.fields)— if True, they are literally the same object in memory. - Use
UserModel.__class__to confirm the metaclass, then inspect the metaclass’s__new__and__init__for shared mutable state. - Check if the metaclass inherits from another metaclass that introduces shared state.
- Look for
__init_subclass__hooks that might mutate a parent class’s attributes.
- Check
- “This same pattern bites people with Django
ModelForm.Metaif you accidentally mutatefieldsat class creation time instead of copying. The broader lesson: any time you are in__new__or__init__of a metaclass, every mutable object must be explicitly copied or freshly created per class. Never assign a metaclass-level mutable to a class attribute by reference.”
- “What is the difference between
__new__and__init__on a metaclass versus on a regular class? When does each run?” - “How does
__init_subclass__(PEP 487) reduce the need for metaclasses? Give an example where you would refactor a metaclass to use it instead.” - “What happens if two metaclasses conflict — for example, your ORM metaclass and an ABC metaclass? How does Python resolve metaclass conflicts?”
Scenario: Your Python service segfaults intermittently in production but never in development. How do you approach a crash in a C extension?
Scenario: Your Python service segfaults intermittently in production but never in development. How do you approach a crash in a C extension?
- “Rewrite the C extension in pure Python.”
- “Add try/except around the C call” (SIGSEGV is not a Python exception — it kills the process).
- Have no idea how to read a core dump or use debugging tools for C extensions.
- “Intermittent segfaults in C extensions that only appear under multi-process/multi-worker conditions usually point to one of three things:”
- Thread safety: If gunicorn uses
gthreadworkers, multiple threads may call the C extension concurrently. If the C code uses static/global variables without locks, you get data races and memory corruption. - Fork safety: If gunicorn uses
preforkworkers, the C extension is initialized before forking. Some C libraries (e.g., OpenSSL, certain database drivers) are not fork-safe — they hold file descriptors, mutexes, or memory-mapped regions that become invalid afterfork(). Fix: use gunicorn’spost_forkhook to reinitialize, or switch to--preloadwith careful init. - Buffer overflow / use-after-free: The EXIF parser reads variable-length metadata. A malformed JPEG with unexpected EXIF data could cause the C code to read past a buffer.
- Thread safety: If gunicorn uses
- “My debugging approach:”
- Step 1: Analyze the core dump with
gdb python corefollowed bybt(backtrace). With debug symbols (python3-dbgpackage), this shows the exact C function and line number. - Step 2: Run under Valgrind in staging:
valgrind --tool=memcheck python your_script.py. This catches invalid reads/writes, use-after-free, and leaks. Expect it to be 20-50x slower. - Step 3: If it is a data-dependent crash, collect the JPEG that triggered it from the request logs and write a regression test.
- Step 4: Use
faulthandlermodule (python -X faulthandler) to get a Python-level traceback even on SIGSEGV — this shows you which Python function was calling into the C code. - Step 5: Compile the C extension with AddressSanitizer (
-fsanitize=address) and run the test suite. ASAN catches buffer overflows, use-after-free, and stack corruption with minimal overhead compared to Valgrind.
- Step 1: Analyze the core dump with
- “In a real incident, we had a C extension that cached parsed results in a
static PyObject*. It was initialized in the first worker before fork, and after fork all child processes shared the pointer. When one child’s GC collected the object, the others segfaulted on access. Fix: move initialization intopost_forkand remove the static cache.”
- “What is the difference between
faulthandler.enable()and a regular Python traceback? Why canfaulthandlerprint something useful during a segfault when a normal except block cannot?” - “Explain the GIL’s role in C extension safety. If your C extension calls
Py_BEGIN_ALLOW_THREADS, what are you now responsible for that the GIL previously protected?” - “Your C extension needs to create Python objects. Walk me through the reference counting rules: when do you call
Py_INCREFvsPy_DECREF, and what is a ‘borrowed reference’ vs a ‘new reference’?”
Scenario: Your team cannot install the same package versions across dev, CI, and production. Builds break randomly. How do you fix Python packaging?
Scenario: Your team cannot install the same package versions across dev, CI, and production. Builds break randomly. How do you fix Python packaging?
requirements.txt has unpinned transitive dependencies. Every week someone’s build breaks because numpy 2.0 dropped and broke pandas, or a compiled wheel is not available for one platform. A new hire spent two days just getting the dev environment working.What weak candidates say:- “Just
pip freeze > requirements.txt” (captures one platform’s resolved versions, breaks on another). - “Use Docker” as the only answer (does not address the dependency management problem itself).
- Cannot distinguish between
requirements.txt,setup.py,pyproject.toml,Pipfile, andpoetry.lock.
- “This is a dependency resolution and lockfile problem. The Python packaging ecosystem has historically been terrible at this compared to npm/yarn or Cargo, but modern tools solve it. Here is what I would do:”
- Step 1 — Single source of truth: Move to
pyproject.toml(PEP 621) for declaring dependencies with loose constraints (e.g.,numpy>=1.24,<2.0). This replacessetup.py,setup.cfg, andrequirements.txtfor dependency declaration. - Step 2 — Lockfile with platform awareness: Use uv (fastest, Rust-based), pip-tools (
pip-compile), or Poetry to generate a lockfile with pinned transitive dependencies.uv lockgenerates a universal lockfile that works across platforms.pip-compile --generate-hashespins versions AND validates integrity. - Step 3 — Per-platform resolution (if needed): For packages with compiled wheels (NumPy, scipy), you may need separate lock files per platform:
requirements-linux-x86.txt,requirements-macos-arm.txt.uvhandles this natively with its universal resolver. - Step 4 — CI enforcement: CI should install from the lockfile (
uv sync --frozenorpip install -r requirements.txt --require-hashes), never resolve fresh. If lockfile is out of date, CI fails. - Step 5 — Docker for parity: Use a multi-stage Dockerfile where the build stage resolves/installs dependencies and the runtime stage copies the installed virtualenv. This ensures production matches CI.
- Step 1 — Single source of truth: Move to
- “The specific tool choice matters:”
- uv: My current recommendation. 10-100x faster than pip, universal lockfile, drop-in pip replacement. Handles virtualenvs, resolution, and installation.
- Poetry: Mature, good lockfile, but slower and sometimes fights with pip-installed packages.
- pip-tools: Minimal, just
pip-compileandpip-sync. Works if you want to stay close to pip.
- “For the monorepo specifically: use workspace support (uv workspaces or Poetry monorepo plugins) so the 3 services share a single lockfile but declare their own dependencies. This prevents version conflicts where service A needs
pandas 1.5and service B needspandas 2.0.” - “At a previous company, switching from bare
requirements.txttouv lockwith hashes cut our ‘broken build’ tickets from 5/week to zero and reduced CI install time from 4 minutes to 18 seconds.”
- “What is the difference between a wheel and an sdist? Why does
numpyinstall in 2 seconds on some platforms but takes 5 minutes compiling on others?” - “Explain what
--require-hashesprotects against. What attack vector does it mitigate that pinned versions alone do not?” - “Your data scientist needs to add a package that conflicts with an existing pinned dependency. Walk me through how you would resolve this without breaking other services.”
Scenario: You are rolling out mypy strict mode on a 200k-line Python codebase and the team is pushing back. How do you approach type checking at scale?
Scenario: You are rolling out mypy strict mode on a 200k-line Python codebase and the team is pushing back. How do you approach type checking at scale?
str where an int was expected. You enable mypy --strict and get 14,000 errors. Three senior engineers say it is a waste of time. The intern says “just add type: ignore everywhere.” How do you proceed?What weak candidates say:- “Just fix all 14,000 errors” (unrealistic timeline, team will revolt).
- “Add
type: ignoreto everything” (defeats the purpose). - Cannot explain the difference between
mypy,pyright,pytype, or when to use each. - Think type hints are enforced at runtime by default.
- “Adopting strict typing on a large untyped codebase is a migration, not a switch flip. I would use an incremental approach:”
- Phase 1 — Baseline without breaking CI (Week 1):
Set up
mypyin CI in non-blocking mode. Usemypy.iniwithdisallow_untyped_defs = falseglobally. Generate a baseline error count. Configure per-module overrides so new modules are strict from day one. - Phase 2 — Gradual strictness with per-module config (Weeks 2-4):
Use mypy’s
[[tool.mypy.overrides]]inpyproject.tomlto enable strict mode per module. Start with the module that caused the production incident. Usemonkeytypeorpytypeto auto-generate type stubs from runtime traces — this handles 60-70% of annotations automatically. - Phase 3 — CI enforcement on new code (Month 2):
Use
mypy --warn-unused-ignoressotype: ignorecomments do not linger. Block PRs that add new untyped functions to already-typed modules. Use apre-commithook withmypy --incremental(fast, only checks changed files). - Phase 4 — Ratchet mechanism (Ongoing):
Track total mypy error count in CI. New PRs must not increase the count. This naturally drives the number to zero over time without requiring a dedicated sprint. Tools like
mypy-baselineor a simplewc -lon mypy output make this easy.
- Phase 1 — Baseline without breaking CI (Week 1):
Set up
- “For the pushback from senior engineers, I would address their actual concerns:”
- “It slows me down”:
monkeytype run+monkeytype applyauto-annotates from tests.pyrightis faster than mypy for editor integration. Show them that typed code catches bugs before code review. - “Python is not supposed to be typed”: Show PEP 484, 526, 544. Typing is opt-in and gradual by design. Protocols (PEP 544) give you structural typing that respects duck typing.
- “It does not catch real bugs”: Share the production incident that triggered this initiative. Show a demo where mypy catches
Optionalmisuse (Nonepassed where a value is expected) — the #1 production error class in untyped Python.
- “It slows me down”:
- “Tool selection matters at scale:”
- mypy: Most mature, best ecosystem. Daemon mode (
dmypy) for fast incremental checks. Use--install-typesto auto-install type stubs. - pyright: Faster, better IDE integration (VS Code / Pylance), stricter by default. Written in TypeScript, harder to extend.
- pytype: Google’s tool. Can infer types without annotations — great for generating initial annotations on a legacy codebase.
- mypy: Most mature, best ecosystem. Daemon mode (
- “At a 150k-line codebase, we went from 0% to 85% typed coverage in 6 months using the ratchet approach. Mypy caught 23 bugs in code review that quarter — 3 of which would have been P1 production incidents based on the code paths involved.”
- “What is the difference between
Protocol(PEP 544) andABC? When would you choose one over the other for type checking?” - “Explain
TypeVar,Generic, andParamSpec. How would you type a decorator that preserves the decorated function’s signature?” - “Your codebase uses dynamic patterns like
setattr,getattr, and**kwargsheavily. How does mypy handle these, and what are your options when it cannot infer types?”
Conclusion and Interview Tips
This guide covers 150+ essential Python interview questions across all major categories — basics, data structures, OOP, functions, file handling, exception handling, modules, advanced concepts, Python internals, memory management, descriptors, generators, concurrency, design patterns, architecture, testing, web development, and data science. Together, they represent the full range of what you will encounter in Python-focused technical interviews from startups through FAANG.Key Interview Preparation Tips
- Master Python fundamentals before diving into frameworks. An interviewer can tell within minutes whether you truly understand Python or just know Django’s API. The GIL, mutability vs immutability, and the difference between
isand==are tested at every level. - Practice coding problems in Python specifically. Python’s idioms (list comprehensions, generator expressions,
collectionsmodule) let you write cleaner solutions than direct translations from other languages. UsingCounter,defaultdict, andheapqnaturally signals Python fluency. - Build real projects and be ready to discuss the decisions you made. A deployed Flask/FastAPI service with proper error handling, logging, and tests demonstrates more than 100 solved LeetCode problems. Be prepared to explain your architecture choices and what you would change with hindsight.
- Understand memory management and the GIL deeply. These are the two topics that most reliably separate mid-level from senior Python candidates. Know when the GIL matters (CPU-bound work), when it does not (I/O-bound work), and the practical implications for choosing between threading, multiprocessing, and asyncio.
- Study the standard library thoroughly. Python’s “batteries included” philosophy means the standard library covers an enormous range. Knowing about
functools.lru_cache,contextlib.contextmanager,itertools,pathlib, anddataclasseswithout needing to look them up signals deep familiarity. - Prepare for both theoretical and practical questions. Some interviews ask “explain the descriptor protocol,” others ask “write a function that…” Be ready for both. The strongest candidates connect theory to practice: “The descriptor protocol is what makes
@propertywork, and here is how I used it to…”
During the Interview
- Ask clarifying questions before coding. “Should this handle concurrent access?” or “What is the expected input size?” shows you think about context before writing code.
- Explain your thought process continuously. “I am choosing a dictionary here because lookups are O(1) and we need to check membership frequently” is far stronger than silently writing code.
- Write Pythonic code, not Java-in-Python. Using list comprehensions over manual loops,
withstatements for resource management, and f-strings over string concatenation signals Python fluency. Interviewers notice this. - Consider edge cases and error handling. What happens with empty input? What about None values? What if the file does not exist? Mentioning these proactively demonstrates production-level thinking.
- Discuss time and space complexity for every solution. Even when not asked, briefly stating the complexity shows algorithmic maturity. “This is O(n) time and O(n) space due to the hash set” takes 3 seconds and makes a strong impression.
- Be honest about gaps but show your reasoning. “I have not used asyncio in production, but I understand it uses an event loop similar to Node.js, and I would choose it over threading for I/O-bound workloads because of the GIL” shows that you can reason from principles even about unfamiliar territory.
What Separates Good from Great Python Candidates
| Good Candidate | Great Candidate |
|---|---|
| Knows list vs tuple | Explains when tuple’s immutability enables hashability and use as dict keys |
| Can write a class | Understands __slots__, descriptors, and when to use @dataclass instead |
| Knows about the GIL | Can explain when the GIL does and does not matter, and chooses the right concurrency model accordingly |
Uses try/except | Follows EAFP over LBYL and can explain why, with concrete examples |
Knows pip install | Understands virtual environments, dependency resolution, and why poetry or uv exist |