Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Python Interview Questions (Fundamentals)

A comprehensive guide to Python interview questions, organized by category. This collection covers fundamental syntax to advanced concepts for the 2025 edition.

1. Python Basics

Answer: Python is a high-level, interpreted, dynamically-typed, general-purpose programming language created by Guido van Rossum and first released in 1991. It emphasizes code readability and developer productivity over raw execution speed.Key Features:
  • Easy to Learn/Read: Enforces indentation-based blocks, which eliminates brace debates and produces visually consistent code. Python’s “executable pseudocode” reputation means onboarding junior engineers to a Python codebase takes days, not weeks.
  • Interpreted: Code is executed line-by-line by the CPython interpreter (the reference implementation). No separate compilation step, though .pyc bytecode files are cached in __pycache__ directories for faster subsequent imports.
  • Dynamically Typed: Variable types are determined at runtime, not at declaration. This enables rapid prototyping but means type errors surface at runtime. Production codebases increasingly use mypy or pyright with type hints (PEP 484) to get static analysis benefits without losing flexibility.
  • Object-Oriented: Everything in Python is an object — even int, str, and functions. Supports classes, multiple inheritance, and the full OOP toolkit, but also embraces functional patterns (map, filter, first-class functions).
  • Extensive Standard Library: “Batteries included” — json, os, pathlib, collections, itertools, unittest, http.server, and 200+ modules ship with CPython. This means fewer external dependencies for common tasks.
  • Cross-platform: Runs identically on Windows, Linux, macOS, and even embedded systems (MicroPython). This is why DevOps tooling (Ansible, SaltStack) and data science libraries standardized on Python.
  • Garbage Collected: Uses reference counting as the primary mechanism plus a cyclic garbage collector for handling circular references. You almost never manage memory manually, but understanding this matters when debugging memory leaks in long-running services.
What interviewers are really testing: Whether you can go beyond “Python is easy” and articulate why Python makes the trade-offs it does (developer speed vs. runtime speed), and whether you understand the ecosystem around CPython.Red flag answer: Only saying “Python is easy and popular.” That tells the interviewer nothing about your understanding. Another red flag: not knowing Python is interpreted or confusing it with compiled languages.Follow-up:
  1. “What is the difference between CPython, PyPy, and Cython, and when would you choose each?”
  2. “If Python is interpreted and dynamically typed, how do teams enforce type safety in large codebases?”
  3. “What are the performance implications of Python being interpreted, and how have you worked around them in production?”
Answer: Python provides several built-in types, and understanding their internal implementations is what separates a junior answer from a senior one:
CategoryTypesInternal Implementation
Numericint, float, complexint is arbitrary precision (no overflow!). float is C double (IEEE 754, 64-bit). complex stores real+imaginary as two floats.
Sequencelist, tuple, range, strlist is a dynamic array (over-allocates ~12.5%). tuple is a fixed-size array. str is immutable Unicode (UTF-8/UCS-2/UCS-4 depending on content).
MappingdictHash table using open addressing (since Python 3.6, insertion-ordered by implementation; guaranteed since 3.7).
Setset, frozensetHash table (like dict but keys-only). Average O(1) lookup.
BooleanboolSubclass of int. True == 1, False == 0. This means True + True == 2.
Binarybytes, bytearray, memoryviewbytes is immutable, bytearray is mutable. memoryview provides zero-copy slicing of binary data.
None TypeNoneTypeSingleton — there is exactly one None object in memory. That’s why is None works and is preferred over == None.
Key nuance most candidates miss: Python’s int has arbitrary precision. 2 ** 10000 works perfectly and returns a massive integer — no overflow. This is unlike C/Java where int wraps at 32/64 bits. The trade-off is performance: big-int operations are significantly slower than fixed-width integer arithmetic.What interviewers are really testing: Whether you understand the behavior of these types beyond just naming them. Can you explain why dict is ordered? Why bool is a subclass of int? Why memoryview exists?Red flag answer: Only listing the types without any understanding of their properties or internal representation. Not mentioning bytes/bytearray (critical for any network or file I/O work).Follow-up:
  1. “Why is bool a subclass of int in Python, and what surprising behaviors does this cause?”
  2. “When would you use memoryview and what problem does it solve?”
  3. “What happens internally when a Python int exceeds 64 bits?”
Answer:
FeatureListsTuples
MutabilityMutable (can change in place)Immutable (cannot change after creation)
SyntaxSquare brackets []Parentheses () — though it’s the comma that makes a tuple, not the parens
PerformanceSlower creation, more memory overhead~5-12% faster creation, smaller memory footprint
HashabilityNot hashable (can’t be dict keys)Hashable if all elements are hashable (can be dict keys)
MemoryOver-allocates for amortized O(1) appendsExact allocation, no slack space
Use CaseCollections that change (shopping cart items)Fixed records (database row, coordinate pair, function return values)
The deeper story:
  • Why tuples are faster: CPython caches small tuples (up to length 20) in a free list, so creating them avoids malloc calls. Tuples also have a smaller memory footprint — a list of 10 elements uses ~136 bytes vs ~120 bytes for the equivalent tuple, because lists need extra space for the over-allocation growth buffer.
  • Named tuples: In production, raw tuples are often replaced by collections.namedtuple or typing.NamedTuple for readability: Point = namedtuple('Point', ['x', 'y']). This gives you tuple performance with attribute access.
  • The gotcha: ([1, 2],) is a tuple containing a mutable list. The tuple is immutable (you can’t replace the list), but the list inside can still be modified. This trips up many candidates.
t = ([1, 2],)
t[0].append(3)  # Works! t is now ([1, 2, 3],)
t[0] = [4, 5]   # TypeError: 'tuple' object does not support item assignment
What interviewers are really testing: Whether you understand why immutability matters (hashability, thread safety, intent signaling) and not just the mechanical difference.Red flag answer: Only saying “lists are mutable, tuples are immutable” without explaining the practical implications. Not knowing that tuples can be dict keys. Not knowing the comma-creates-a-tuple rule (x = 1, is a tuple).Follow-up:
  1. “Can you have a tuple that contains mutable objects? What implications does that have for hashing?”
  2. “When would you use namedtuple vs. a dataclass vs. a regular tuple?”
  3. “In a multithreaded application, why might you prefer tuples over lists for shared data?”
Answer:
  • Mutable: Objects whose internal state can be modified after creation without creating a new object. The object’s id() stays the same.
    • Examples: list, dict, set, bytearray, and custom class instances (by default).
    • Example: l = [1, 2]; l.append(3)id(l) hasn’t changed, same object in memory.
  • Immutable: Objects that cannot be changed once created. Any “modification” actually creates a brand new object with a new id().
    • Examples: int, float, str, tuple, frozenset, bytes.
    • Example: s = 'hello'; s = s.upper() — this creates a new string 'HELLO' and rebinds s to it. The original 'hello' still exists in memory (until garbage collected).
Why this matters in real-world code:
  1. Default argument trap — the single most common Python bug in production:
# DANGEROUS: mutable default argument
def add_item(item, items=[]):
    items.append(item)
    return items

add_item(1)  # [1]
add_item(2)  # [1, 2] -- NOT [2]! The default list is shared across calls.

# CORRECT:
def add_item(item, items=None):
    if items is None:
        items = []
    items.append(item)
    return items
  1. Dict keys must be hashable (immutable): You can use (1, 2) as a dict key but not [1, 2]. This is because hash values must be stable — if an object’s contents can change, its hash would change and the dict’s internal hash table would break.
  2. String interning: CPython interns small strings and integers (-5 to 256) for performance. This means a = 'hello'; b = 'hello'; a is b returns True — but only because of an optimization, NOT because strings are mutable. Never rely on is for value comparison.
  3. Copy implications: Mutability directly affects whether you need shallow vs. deep copies. With immutable objects, “copying” is essentially free — Python just shares the reference.
What interviewers are really testing: Whether you’ve been bitten by mutability bugs in real code (especially the mutable default argument trap) and whether you understand the relationship between mutability, hashing, and memory management.Red flag answer: Not knowing the mutable default argument gotcha. Confusing rebinding (x = x + 1) with mutation. Thinking strings are mutable because you can do s += 'more'.Follow-up:
  1. “Explain the mutable default argument bug and how it has affected you in real code.”
  2. “Why does CPython intern small integers, and what range does it intern?”
  3. “If I pass a list to a function and modify it inside, does the caller see the change? Why?”
Answer: Identity operators check if two variables reference the same object in memory (same id()), not just if they have equal values.
a = [1, 2, 3]
b = a          # b points to the SAME object as a
c = [1, 2, 3]  # c is a NEW object with the same values

print(a is b)      # True  -- same object (id(a) == id(b))
print(a is c)      # False -- different objects, even though values match
print(a == c)      # True  -- values are equal
print(a is not c)  # True  -- confirms they're different objects
Critical real-world usage — None checks:
# CORRECT: Always use 'is' for None checks
if value is None:
    handle_missing()

# WRONG: Don't use == for None
if value == None:  # This calls value.__eq__(None), which could be overridden!
    handle_missing()
Why is for None: None is a singleton — there’s exactly one None object in the entire Python process. Using is checks identity directly (one pointer comparison, extremely fast). Using == triggers the __eq__ method, which could be overridden by a custom class to return True even when the value isn’t None.The integer caching trap:
a = 256
b = 256
print(a is b)  # True -- CPython caches integers -5 to 256

a = 257
b = 257
print(a is b)  # False in the REPL! (True in a .py file due to peephole optimization)
This is implementation-specific (CPython), and relying on it is a bug. Always use == for value comparison.What interviewers are really testing: Whether you know the is vs == distinction AND that you’ve internalized the “always use is for None” convention. Bonus points for knowing why (singleton pattern, avoiding __eq__ override issues).Red flag answer: Using == for None checks. Not knowing about integer caching. Thinking is and == are interchangeable for immutable types.Follow-up:
  1. “Can you write a class where x == None returns True but x is None returns False? Why is this dangerous?”
  2. “Why does CPython cache integers from -5 to 256 specifically?”
  3. “In what scenarios would a is b return True for strings but not always reliably?”
Answer: PEP 8 is not just style — it’s a communication protocol between Python developers. Violating it signals either inexperience or a deliberate (and documented) exception.
  • Variables/Functions: snake_case (e.g., calculate_total_price, get_user_by_id).
  • Classes: PascalCase (e.g., HttpClient, UserRepository).
  • Constants: UPPER_SNAKE_CASE (e.g., MAX_RETRIES = 3, DEFAULT_TIMEOUT = 30).
  • Private by convention: Single underscore prefix _internal_method — signals “don’t touch this from outside,” but Python does NOT enforce it. It’s a gentleman’s agreement.
  • Name mangling: Double underscore prefix __private_attr — CPython mangles this to _ClassName__private_attr to avoid name collisions in inheritance hierarchies. This is NOT security — it’s collision avoidance.
  • Dunder/Magic: Double underscore on both sides __init__, __str__ — reserved for Python’s protocol methods. Never invent your own dunder names.
  • Throwaway variables: Single underscore _ for values you don’t need: for _ in range(10):.
  • Module-level “exports”: __all__ list controls what from module import * exposes.
In practice (what tools enforce this):
  • flake8 or ruff for linting (ruff is 10-100x faster, written in Rust)
  • black or ruff format for auto-formatting
  • isort or ruff for import ordering
  • Pre-commit hooks running these tools catch violations before code review
What interviewers are really testing: Whether you follow PEP 8 naturally and understand why — particularly the single vs. double underscore distinction, which reveals understanding of Python’s access model.Red flag answer: Not knowing the difference between _private and __mangled. Calling double underscore “private” without understanding name mangling. Never having used a linter.Follow-up:
  1. “What is the actual mechanism behind double-underscore name mangling? Can you still access a __private attribute from outside?”
  2. “How do you enforce PEP 8 across a team of 20 developers? What tooling do you use?”
  3. “When would you deliberately violate PEP 8, and how would you document that decision?”
Answer:
  • == checks value equality: “Do these two objects have the same content?” It calls __eq__() under the hood, which means classes can customize what “equal” means.
  • is checks identity: “Are these literally the same object in memory?” It compares id() values — one pointer comparison, zero method calls.
x = [1, 2]
y = [1, 2]
z = x

print(x == y)  # True  -- same values
print(x is y)  # False -- different objects in memory
print(x is z)  # True  -- z was assigned to the same object as x
When to use which:
  • Use is for: None checks (if x is None), sentinel values, and checking against singleton objects.
  • Use == for: Everything else — comparing values, numbers, strings, collections.
  • Never use is for string or number comparison in production code, even if it “works” due to interning.
The __eq__ override trap:
class Sneaky:
    def __eq__(self, other):
        return True  # Claims to be equal to everything!

s = Sneaky()
print(s == None)   # True (!)  -- __eq__ was overridden
print(s == 42)     # True (!)
print(s is None)   # False     -- identity can't be faked
This is exactly why is None is safer: identity cannot be spoofed.Performance difference: is is a single pointer comparison (essentially free). == may trigger arbitrarily complex __eq__ logic — for large nested dicts, this could be expensive.What interviewers are really testing: Whether you default to is for None checks (a strong Python habit), whether you understand __eq__ customization, and whether you know about CPython’s interning optimization without relying on it.Red flag answer: Using is for string comparison (“it works for me”). Not knowing that == calls __eq__. Using == None instead of is None.Follow-up:
  1. “What happens when you compare two objects with == and neither has defined __eq__?”
  2. “How would you implement __eq__ and __hash__ correctly for a custom class that you want to use as dict keys?”
  3. “Why does float('nan') == float('nan') return False?”
Answer: Tests whether a value exists in a container. The critical nuance is time complexity varies dramatically by container type:
# String containment (substring search)
'app' in 'apple'          # True -- O(n*m) substring search

# List/Tuple (linear scan)
5 in [1, 2, 3, 4, 5]      # True -- O(n), must check each element

# Set (hash lookup)
5 in {1, 2, 3, 4, 5}      # True -- O(1) average case

# Dict (checks keys, not values!)
'key' in {'key': 'val'}   # True -- O(1) hash lookup on keys
'val' in {'key': 'val'}   # False! Checks keys only.

# To check dict values:
'val' in {'key': 'val'}.values()  # True -- but O(n)!
Performance-critical insight: If you’re doing repeated membership checks against a large collection, converting a list to a set first gives you O(1) lookups instead of O(n):
# Slow: O(n) per check, O(n*m) total
blocklist = ['spam@x.com', 'junk@y.com', ...]  # 100K items
for email in incoming_emails:  # 50K emails
    if email in blocklist:     # O(100K) each time
        block(email)

# Fast: O(1) per check after O(n) set construction
blocklist_set = set(blocklist)  # One-time O(n) cost
for email in incoming_emails:
    if email in blocklist_set:  # O(1) each time
        block(email)
This optimization turned a 45-minute ETL job into a 3-second one at a company I worked with.Custom __contains__: You can define in behavior for custom classes:
class NumberRange:
    def __init__(self, start, end):
        self.start, self.end = start, end
    def __contains__(self, item):
        return self.start <= item <= self.end

print(5 in NumberRange(1, 10))  # True
What interviewers are really testing: Whether you think about time complexity when choosing data structures, and specifically whether you know the list-vs-set performance difference for in operations.Red flag answer: Not knowing in checks dict keys, not values. Not being aware of the O(n) vs O(1) difference between list and set membership tests. Never having optimized a hot loop by converting to a set.Follow-up:
  1. “You have 10 million records and need to check membership frequently. What data structure do you use and why?”
  2. “What is the worst-case time complexity of in for a set, and when does it degrade?”
  3. “How does Python’s in operator work on a custom object that defines neither __contains__ nor __iter__?“

2. Data Structures

Answer: Common operations for list, with their time complexities (what most candidates forget):
MethodBehaviorTime Complexity
append(x)Adds x to the endO(1) amortized
insert(i, x)Inserts x at index iO(n) — shifts all elements after i
remove(x)Removes first occurrence of xO(n) — linear search + shift
pop()Removes and returns last elementO(1)
pop(i)Removes and returns element at index iO(n) — shifts elements
extend(iter)Appends all elements from iterableO(k) where k = len(iter)
sort()Sorts in-place (Timsort)O(n log n)
reverse()Reverses in-placeO(n)
index(x)Returns index of first xO(n)
count(x)Counts occurrences of xO(n)
Key distinctions interviewers test:
  • append vs extend: l.append([1,2]) adds [1,2] as a single element; l.extend([1,2]) adds 1 and 2 separately. Confusing these is a common bug.
  • sort() vs sorted(): sort() mutates in-place and returns None. sorted() returns a new list. Writing x = my_list.sort() sets x to None — a very common mistake.
  • remove() only removes the first occurrence: If you need to remove all occurrences, use a list comprehension: [x for x in items if x != target].
Why insert(0, x) is expensive: Python lists are dynamic arrays (contiguous memory). Inserting at the front requires shifting every element right by one position — O(n). If you need frequent front insertions, use collections.deque which gives O(1) for both ends.
from collections import deque
d = deque([1, 2, 3])
d.appendleft(0)  # O(1) -- not O(n) like list.insert(0, x)
What interviewers are really testing: Whether you know the time complexities of list operations (especially that insert(0) and pop(0) are O(n)), and whether you know when to reach for deque instead.Red flag answer: Not knowing sort() returns None. Not knowing insert(0, x) is O(n). Confusing append and extend.Follow-up:
  1. “Why is list.pop(0) O(n) but deque.popleft() O(1)? What’s the underlying data structure difference?”
  2. “How does Python’s list over-allocation strategy work, and why does append have amortized O(1) complexity?”
  3. “You’re building a queue. Why should you NOT use a regular list?”
Answer: A concise, Pythonic way to create lists that’s both more readable and faster than equivalent for loops.Syntax: [expression for item in iterable if condition]
# Basic
squares = [x**2 for x in range(10)]

# With filter
evens = [x for x in range(20) if x % 2 == 0]

# Nested (flattening a matrix)
matrix = [[1,2], [3,4], [5,6]]
flat = [num for row in matrix for num in row]  # [1,2,3,4,5,6]

# With conditional expression (ternary)
labels = ['even' if x % 2 == 0 else 'odd' for x in range(5)]
Why comprehensions are faster than loops: CPython optimizes list comprehensions at the bytecode level. The comprehension uses a dedicated LIST_APPEND bytecode instruction that avoids the method lookup overhead of list.append() in a regular loop. In benchmarks, comprehensions are typically 20-30% faster than equivalent for loop + append patterns.Dict and set comprehensions (often overlooked):
# Dict comprehension
word_lengths = {word: len(word) for word in ['hello', 'world']}

# Set comprehension
unique_lengths = {len(word) for word in ['hi', 'hey', 'hello']}
Generator expression (when you don’t need the full list):
# List comprehension: builds entire list in memory
total = sum([x**2 for x in range(10_000_000)])  # ~80MB list

# Generator expression: produces values lazily, constant memory
total = sum(x**2 for x in range(10_000_000))    # ~zero extra memory
When NOT to use comprehensions:
  • When the logic requires multiple statements or side effects
  • When nesting gets beyond 2 levels (readability drops off a cliff)
  • When you need error handling (try/except inside a comprehension is not possible)
What interviewers are really testing: Whether you use comprehensions idiomatically, whether you know about generator expressions for memory efficiency, and whether you know when comprehensions become less readable than a plain loop.Red flag answer: Writing a 3-level nested comprehension and calling it “Pythonic.” Not knowing generator expressions exist. Not knowing comprehensions are faster than loops at the bytecode level.Follow-up:
  1. “What is the difference between a list comprehension and a generator expression in terms of memory and performance?”
  2. “Can you use walrus operator (:=) inside a comprehension? Give an example.”
  3. “At what point do you stop using comprehensions and switch to a regular loop?”
Answer: Dictionaries are Python’s most important data structure. They’re used everywhere internally — module namespaces, class attributes, function kwargs, and global/local variable scopes are all backed by dicts.Implementation details (CPython 3.7+):
  • Hash table using open addressing with a compact, insertion-ordered layout
  • Keys must be hashable (immutable types, or objects with stable __hash__)
  • Average case: O(1) for get/set/delete. Worst case: O(n) if all keys hash-collide (extremely rare in practice)
  • Dicts are insertion-ordered as of Python 3.7 (guaranteed by spec, was implementation detail in 3.6)
Key methods and their nuances:
d = {'name': 'Alice', 'age': 30}

# get() vs direct access
d['missing']            # Raises KeyError!
d.get('missing')        # Returns None (safe)
d.get('missing', 0)     # Returns 0 (custom default)

# setdefault() -- atomic get-or-set (thread-safe with GIL)
d.setdefault('role', 'engineer')  # Sets AND returns

# update() -- merge another dict (last value wins on conflicts)
d.update({'age': 31, 'city': 'NYC'})

# Merge operators (Python 3.9+)
merged = d | {'new_key': 'val'}   # Creates new dict
d |= {'new_key': 'val'}          # Updates in-place

# Dictionary unpacking
combined = {**defaults, **overrides}  # overrides win on conflicts

# items(), keys(), values() return VIEW objects (not lists!)
for key, value in d.items():  # View reflects live changes to dict
    print(key, value)
defaultdict — the production workhorse:
from collections import defaultdict

# Instead of checking if key exists before appending:
groups = defaultdict(list)
for item in data:
    groups[item.category].append(item)  # No KeyError, auto-creates []

# Counting pattern:
counts = defaultdict(int)
for word in words:
    counts[word] += 1  # No KeyError, auto-creates 0
Counter for frequency analysis:
from collections import Counter
c = Counter(['a', 'b', 'a', 'c', 'a'])
c.most_common(2)  # [('a', 3), ('b', 1)]
What interviewers are really testing: Whether you reach for defaultdict/Counter instead of writing manual existence checks. Whether you know dicts are ordered in Python 3.7+. Whether you understand hashability requirements for keys.Red flag answer: Using if key in dict: dict[key].append(...) instead of defaultdict(list). Not knowing dicts are ordered in modern Python. Not knowing about the | merge operator.Follow-up:
  1. “What happens if you modify a dictionary while iterating over it? How do you handle that safely?”
  2. “Explain how Python’s dict achieves O(1) average lookup. What causes worst-case O(n)?”
  3. “When would you use OrderedDict from collections now that regular dicts are ordered?”
Answer:
  • Set: Mutable, unordered collection of unique, hashable elements. Uses a hash table internally (like a dict with only keys). Supports add(), remove(), discard(), pop().
  • Frozenset: Immutable version of a set. Because it’s immutable, it’s hashable — meaning it can be used as a dict key or an element of another set.
When frozenset matters in real code:
# You need a set of sets (e.g., tracking unique combinations)
# This fails:
set_of_sets = {  {1,2}, {3,4}  }  # TypeError: unhashable type 'set'

# This works:
set_of_frozensets = { frozenset({1,2}), frozenset({3,4}) }

# Dict key example: caching results for a combination of features
cache = {}
features = frozenset(['age', 'income', 'location'])
cache[features] = model.predict(features)
Performance characteristics:
  • add(): O(1) average
  • remove(x): O(1) average, raises KeyError if missing
  • discard(x): O(1) average, does NOT raise if missing (prefer this in production)
  • x in set: O(1) average — this is the primary reason to use sets
  • Set operations (union, intersection): O(min(len(s1), len(s2))) for intersection
Real-world use case — deduplication:
# Deduplicate while preserving order (Python 3.7+)
seen = set()
unique = []
for item in items:
    if item not in seen:
        seen.add(item)
        unique.append(item)

# Or use dict.fromkeys() trick:
unique = list(dict.fromkeys(items))
What interviewers are really testing: Whether you know when to use sets (deduplication, fast membership testing, set math) and whether you understand hashability constraints.Red flag answer: Not knowing why you can’t put a set inside a set. Never having used frozenset. Using a list for membership checks when a set would be O(1).Follow-up:
  1. “How would you find duplicate elements in a list of 10 million items efficiently?”
  2. “What happens to set performance when you have a custom class with a bad __hash__ that always returns the same value?”
  3. “When would you use a set vs. a dict with dummy values?”
Answer: Set operations map directly to mathematical set theory and are heavily used in data processing, permissions systems, and graph algorithms.
OperationOperatorMethodWhat it returns
UnionA | BA.union(B)All elements from both sets
IntersectionA & BA.intersection(B)Only elements in both sets
DifferenceA - BA.difference(B)Elements in A but not in B
Symmetric DiffA ^ BA.symmetric_difference(B)Elements in either but not both
SubsetA &lt;= BA.issubset(B)True if all of A is in B
SupersetA >= BA.issuperset(B)True if A contains all of B
DisjointA.isdisjoint(B)True if no overlap
Real-world example — permission system:
user_permissions = {'read', 'write', 'delete'}
required_permissions = {'read', 'write'}

# Check if user has all required permissions
if required_permissions.issubset(user_permissions):
    allow_action()

# What extra permissions does user have?
extra = user_permissions - required_permissions  # {'delete'}
Real-world example — finding changed records:
yesterday_ids = set(fetch_ids('2024-01-01'))
today_ids = set(fetch_ids('2024-01-02'))

new_records = today_ids - yesterday_ids        # Added
deleted_records = yesterday_ids - today_ids    # Removed
unchanged = yesterday_ids & today_ids          # Still present
all_affected = yesterday_ids ^ today_ids       # Changed in either direction
What interviewers are really testing: Whether you can apply set theory to solve real problems (permissions, diffing, deduplication) rather than just reciting the operators.Red flag answer: Only knowing union and intersection. Never having applied set operations to a real problem. Not knowing the difference between - (difference) and ^ (symmetric difference).Follow-up:
  1. “How would you use set operations to implement a simple role-based access control system?”
  2. “What is the time complexity of set intersection? Does the order of operands matter?”
  3. “How would you find all users who are in Group A but not in Group B using set operations?”
Answer: Slicing is one of Python’s most powerful features and follows the pattern sequence[start:stop:step]. All three parameters are optional.
text = 'Hello, World!'

# Basic indexing
text[0]       # 'H' -- zero-indexed
text[-1]      # '!' -- negative indexes count from the end

# Slicing: [start:stop) -- stop is EXCLUSIVE
text[0:5]     # 'Hello'
text[7:]      # 'World!' -- from index 7 to the end
text[:5]      # 'Hello' -- from start to index 5 (exclusive)

# Step
text[::2]     # 'Hlo ol!' -- every 2nd character
text[::-1]    # '!dlroW ,olleH' -- reverse

# Slice assignment (lists only, not strings/tuples)
nums = [0, 1, 2, 3, 4]
nums[1:3] = [10, 20, 30]  # [0, 10, 20, 30, 3, 4] -- replacement can differ in length!
Key nuances:
  • Slicing never raises IndexError: [1,2,3][100:] returns [], not an error. Indexing does raise: [1,2,3][100] is IndexError.
  • Slices create copies: new = old[:] creates a shallow copy. This is important — modifying new won’t affect old (for the top level; nested objects are still shared).
  • Named slices for readability:
# Instead of magic numbers:
NAME = slice(0, 20)
AGE = slice(20, 24)
record = 'John Smith          0030'
print(record[NAME].strip())  # 'John Smith'
print(record[AGE])            # '0030'
  • slice() objects: The slice builtin creates reusable slice objects. s = slice(1, 10, 2); lst[s] is equivalent to lst[1:10:2]. Useful for dynamic slicing.
What interviewers are really testing: Whether you know slicing is exclusive of the stop index, that slicing never raises IndexError, and that you can use negative indices and steps fluently.Red flag answer: Off-by-one confusion about whether stop is inclusive or exclusive. Not knowing negative indexing. Not knowing that slicing creates a copy.Follow-up:
  1. “What is the difference between lst[:] and lst.copy() and list(lst)?”
  2. “How would you implement __getitem__ to support slicing in a custom class?”
  3. “Explain what happens internally when you do lst[1:3] = [10, 20, 30] on a list.”
Answer: Use sorted() with a key function. This is tested frequently because it touches lambdas, operator module, stability, and custom comparisons.
students = [
    {'name': 'John', 'age': 25, 'gpa': 3.5},
    {'name': 'Jane', 'age': 22, 'gpa': 3.9},
    {'name': 'Bob', 'age': 22, 'gpa': 3.7},
]

# Sort by single key
by_age = sorted(students, key=lambda x: x['age'])

# Sort by multiple keys (age ascending, then gpa descending)
by_age_gpa = sorted(students, key=lambda x: (x['age'], -x['gpa']))

# Using operator.itemgetter (faster than lambda for large lists)
from operator import itemgetter
by_age = sorted(students, key=itemgetter('age'))
by_age_name = sorted(students, key=itemgetter('age', 'name'))

# Reverse sort
by_age_desc = sorted(students, key=itemgetter('age'), reverse=True)
Why operator.itemgetter is preferred in production:
  • It’s implemented in C, making it ~20-40% faster than an equivalent lambda for large lists
  • It’s more readable for multi-key sorts: itemgetter('age', 'name') vs lambda x: (x['age'], x['name'])
  • It’s picklable (lambdas are not), which matters for multiprocessing
Sort stability: Python’s sort (Timsort) is stable — elements that compare equal maintain their original order. This means you can sort by multiple keys by sorting multiple times (least important key first):
# Sort by GPA, then by age (stable sort trick)
result = sorted(students, key=itemgetter('gpa'))  # First: GPA
result = sorted(result, key=itemgetter('age'))     # Then: age (preserves GPA order within same age)
What interviewers are really testing: Whether you know sorted() vs .sort(), whether you reach for operator.itemgetter over lambdas, and whether you understand sort stability.Red flag answer: Only knowing the lambda approach. Not knowing about operator.itemgetter. Not understanding that list.sort() returns None. Implementing a custom bubble sort instead of using built-in Timsort.Follow-up:
  1. “What sorting algorithm does Python use internally, and why was it chosen?”
  2. “How would you sort a list of objects that don’t have a natural ordering? What about using functools.total_ordering?”
  3. “You need to get the top 10 items from a list of 10 million. Is sorted() the best approach?”
Answer: This is one of the most practically important concepts in Python, and misunderstanding it causes insidious production bugs.
  • Shallow Copy: Creates a new outer container, but elements inside still reference the same objects. Changes to nested objects are visible in both copies.
  • Deep Copy: Creates a new outer container AND recursively copies every nested object. The result is completely independent.
import copy

original = [[1, 2], [3, 4], {'key': 'value'}]

shallow = copy.copy(original)
deep = copy.deepcopy(original)

# Modify a nested object
original[0].append(999)

print(shallow[0])  # [1, 2, 999] -- AFFECTED! Shares the nested list
print(deep[0])     # [1, 2]      -- NOT affected, independent copy
Ways to make shallow copies:
# All of these are SHALLOW copies:
b = a[:]              # Slice copy
b = list(a)           # Constructor copy
b = a.copy()          # Method copy
b = copy.copy(a)      # Explicit shallow copy

# For dicts:
d2 = d.copy()
d2 = {**d}            # Unpacking (shallow)
d2 = dict(d)          # Constructor
When deep copy gets tricky:
  • Circular references: deepcopy handles them via a memo dict that tracks already-copied objects
  • Performance: Deep copying a large nested structure is expensive. A 10MB nested dict takes ~100ms to deepcopy vs ~1ms for shallow copy
  • Unpicklable objects: deepcopy may fail on objects with file handles, database connections, or locks. You can customize behavior with __copy__ and __deepcopy__ methods.
Real production bug:
# A config object shared across request handlers
default_config = {'retries': 3, 'headers': {'auth': 'token123'}}

# BUG: shallow copy means 'headers' dict is shared
handler_config = default_config.copy()
handler_config['headers']['custom'] = 'value'
# default_config['headers'] now also has 'custom'!

# FIX: deep copy
handler_config = copy.deepcopy(default_config)
What interviewers are really testing: Whether you’ve been bitten by this in real code. Whether you know that dict.copy(), list[:], etc., are all SHALLOW. Whether you understand the performance implications of deep copy.Red flag answer: Thinking .copy() creates a deep copy. Not knowing what happens with nested mutable objects. Never having encountered this bug in real code.Follow-up:
  1. “You have a deeply nested config dict that gets passed to 100 microservices. How do you prevent accidental mutation?”
  2. “What happens when deepcopy encounters a circular reference?”
  3. “How would you implement __deepcopy__ on a custom class that holds a database connection?“

3. Object-Oriented Programming

Answer: The four pillars are a framework for thinking about code organization, not just vocabulary words. Here’s what they mean in practice in Python:
  1. Encapsulation: Bundling data and methods that operate on that data into a single unit (class), and controlling access to internal state.
    • In Python, there are no true private members (unlike Java/C++). Convention uses _private (single underscore) and __mangled (double underscore, triggers name mangling).
    • Real-world example: A BankAccount class where _balance is internal. External code uses deposit() and withdraw() methods that enforce business rules (no negative balance) instead of directly modifying _balance.
  2. Abstraction: Hiding complex implementation details and exposing only the necessary interface.
    • Python uses abc.ABC and @abstractmethod for abstract base classes.
    • Real-world example: A PaymentProcessor ABC defines process_payment() as abstract. StripeProcessor and PayPalProcessor implement the details. Calling code only knows the interface.
  3. Inheritance: Creating new classes that inherit behavior from existing ones (code reuse and specialization).
    • Python supports multiple inheritance (unlike Java). This is powerful but dangerous — the Method Resolution Order (MRO) determines which parent’s method gets called.
    • Real-world preference: Composition over inheritance in most modern Python code. “Has-a” relationships (a Car has an Engine) are usually better than “is-a” (ElectricCar is-a Car), because inheritance hierarchies become rigid and fragile.
  4. Polymorphism: Different objects responding to the same method call differently.
    • Python achieves this through duck typing: “If it walks like a duck and quacks like a duck, it’s a duck.” You don’t need a shared base class — just implement the same method.
    • Real-world example: len() works on strings, lists, dicts, and any object with __len__. That’s polymorphism without inheritance.
What interviewers are really testing: Whether you can go beyond reciting definitions and explain how each pillar manifests specifically in Python (duck typing, no true private, ABC module, MRO). Bonus: whether you know when inheritance is the wrong choice (composition over inheritance).Red flag answer: Reciting textbook definitions without Python-specific examples. Not mentioning duck typing when discussing polymorphism. Not knowing about abc.ABC. Blindly advocating deep inheritance hierarchies.Follow-up:
  1. “Explain Python’s MRO (Method Resolution Order) and the C3 linearization algorithm.”
  2. “When would you choose composition over inheritance? Give a concrete example.”
  3. “How does duck typing relate to the EAFP (Easier to Ask Forgiveness than Permission) principle in Python?”
Answer: __init__ is the initializer (not the constructor — __new__ is the actual constructor that creates the object). __init__ sets up the object’s initial state after __new__ has created it.
class Person:
    species = 'Homo sapiens'  # Class attribute (shared by all instances)

    def __init__(self, name, age):
        self.name = name  # Instance attribute (unique per instance)
        self.age = age
        self._id = id(self)  # Convention: "private" attribute

    def greet(self):
        return f"Hi, I'm {self.name}, age {self.age}"

    def __repr__(self):  # For debugging
        return f"Person(name='{self.name}', age={self.age})"

    def __str__(self):  # For user-facing output
        return self.greet()

p1 = Person("Adam", 30)
p2 = Person("Eve", 28)
Key distinctions tested in interviews:
  • __init__ vs __new__: __new__ creates the instance (allocates memory), __init__ initializes it. You almost never override __new__ unless implementing singletons, immutable types, or metaclass patterns.
  • Class vs instance attributes: species is shared — Person.species affects all instances. name is per-instance. If you accidentally define a mutable class attribute (like a list), all instances share it — a classic bug.
  • self is explicit: Unlike Java/C++ where this is implicit, Python forces you to declare self as the first parameter. This is a deliberate design choice for readability.
Modern alternative — dataclass (Python 3.7+):
from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
    species: str = 'Homo sapiens'  # Default value

    def greet(self):
        return f"Hi, I'm {self.name}"

# Auto-generates __init__, __repr__, __eq__, and more
p = Person("Adam", 30)
print(p)  # Person(name='Adam', age=30, species='Homo sapiens')
What interviewers are really testing: Whether you know __init__ vs __new__, class vs instance attributes, and whether you’d reach for dataclass in modern code.Red flag answer: Calling __init__ a “constructor.” Not knowing about class attributes vs instance attributes. Not knowing dataclass exists.Follow-up:
  1. “When would you override __new__ instead of __init__?”
  2. “What happens if you define a mutable default like items=[] as a class attribute?”
  3. “Compare dataclass vs NamedTuple vs attrs — when would you use each?”
Answer: This is about what data the method has access to, which determines its role in the class:
Method TypeDecoratorFirst ParamCan Access
Instance(none)selfInstance + class state
Class@classmethodclsClass state only (no instance)
Static@staticmethod(none)Nothing — pure utility
class Pizza:
    base_price = 10  # Class attribute

    def __init__(self, toppings):
        self.toppings = toppings  # Instance attribute

    # Instance method: needs self to access toppings
    def price(self):
        return self.base_price + len(self.toppings) * 2

    # Class method: alternative constructor (factory pattern)
    @classmethod
    def margherita(cls):
        return cls(['mozzarella', 'tomato', 'basil'])

    # Static method: utility, no access to instance or class
    @staticmethod
    def is_valid_topping(topping):
        return topping.lower() in ['mozzarella', 'pepperoni', 'mushroom', 'basil', 'tomato']

# Usage
p = Pizza.margherita()      # Class method as factory
print(p.price())            # Instance method
Pizza.is_valid_topping('x') # Static method (no instance needed)
Why @classmethod matters for inheritance:
class FancyPizza(Pizza):
    base_price = 15

# Because margherita uses cls (not Pizza), it creates a FancyPizza!
fp = FancyPizza.margherita()  # Returns FancyPizza, not Pizza
print(type(fp))  # <class 'FancyPizza'>
If margherita had been a static method calling Pizza(...) directly, it would always return a Pizza, breaking the inheritance chain.When to use each:
  • Instance method: 95% of the time. When you need access to instance data.
  • Class method: Alternative constructors (from_json, from_csv, create_default), factory patterns, and methods that need to work correctly with inheritance.
  • Static method: Pure utility functions that logically belong to the class but don’t need any class/instance state. Controversial — some argue these should just be module-level functions.
What interviewers are really testing: Whether you understand the factory pattern use case for @classmethod, and whether you know that cls enables correct inheritance behavior (vs. hardcoding the class name).Red flag answer: Saying static methods are “the same as regular functions.” Not knowing the factory pattern use of @classmethod. Not understanding how cls interacts with inheritance.Follow-up:
  1. “Why might you choose a @classmethod factory over __init__ with different parameter sets?”
  2. “Should @staticmethod even exist, or should those always be module-level functions?”
  3. “How do these method types interact with Python’s descriptor protocol?”
Answer: Inheritance allows code reuse and specialization. super() is the mechanism for calling methods from parent classes, and understanding it properly requires knowing the MRO (Method Resolution Order).
class Animal:
    def __init__(self, name):
        self.name = name

    def speak(self):
        return "..."

class Dog(Animal):
    def __init__(self, name, breed):
        super().__init__(name)  # Calls Animal.__init__
        self.breed = breed

    def speak(self):
        return "Woof!"

class ServiceDog(Dog):
    def __init__(self, name, breed, task):
        super().__init__(name, breed)  # Calls Dog.__init__
        self.task = task
Multiple Inheritance and the Diamond Problem:
class A:
    def method(self): return 'A'

class B(A):
    def method(self): return 'B'

class C(A):
    def method(self): return 'C'

class D(B, C):
    pass

d = D()
print(d.method())  # 'B' -- follows MRO
print(D.__mro__)   # (D, B, C, A, object) -- C3 linearization
Python uses the C3 linearization algorithm to compute MRO. The key rules: children come before parents, and the left-to-right order of bases is preserved. super() follows the MRO, not just “the parent class.”Why super() follows MRO, not just the direct parent: In cooperative multiple inheritance, super() calls the next class in the MRO, not necessarily the direct parent. This is critical for the diamond pattern to work correctly:
class Base:
    def __init__(self):
        print('Base')

class Left(Base):
    def __init__(self):
        print('Left')
        super().__init__()  # Calls Right.__init__, NOT Base.__init__!

class Right(Base):
    def __init__(self):
        print('Right')
        super().__init__()

class Child(Left, Right):
    def __init__(self):
        super().__init__()

Child()  # Prints: Left, Right, Base (each called exactly once!)
Modern best practice — composition over inheritance:
# Instead of deep inheritance:
class Engine:
    def start(self): ...

class Car:
    def __init__(self):
        self.engine = Engine()  # Composition: Car HAS-A Engine

    def start(self):
        self.engine.start()
What interviewers are really testing: Whether you understand super() follows MRO (not just parent), whether you can explain the diamond problem, and whether you know when to use composition instead.Red flag answer: Thinking super() always calls the direct parent. Not knowing about MRO or C3 linearization. Creating 5+ level deep inheritance hierarchies without considering composition.Follow-up:
  1. “Explain how C3 linearization works and why Python chose it over depth-first search.”
  2. “What happens if you call super() in a class that uses multiple inheritance and one parent doesn’t call super()?”
  3. “You’re designing a plugin system. Would you use inheritance or composition? Why?”
Answer: Dunder (double underscore) methods are Python’s protocol system — they let your custom classes integrate seamlessly with Python’s built-in operations. When you write len(obj), Python calls obj.__len__(). When you write a + b, Python calls a.__add__(b).Core protocols:
CategoryMethodsTriggered By
Creation__new__, __init__, __del__Object lifecycle
String__str__, __repr__, __format__str(), repr(), f"{}"
Comparison__eq__, __lt__, __le__, __gt__, __ge__==, &lt;, &lt;=, >, >=
Arithmetic__add__, __sub__, __mul__, __truediv__+, -, *, /
Container__len__, __getitem__, __setitem__, __contains__len(), [], in
Context__enter__, __exit__with statement
Callable__call__obj()
Iteration__iter__, __next__for loops
Hashing__hash__hash(), dict keys, set membership
__repr__ vs __str__ — the rule you must know:
  • __repr__ is for developers (debugging). Should be unambiguous. Ideally, eval(repr(obj)) recreates the object.
  • __str__ is for end users. Human-readable.
  • If only one is defined, define __repr__. Python falls back to __repr__ when __str__ is missing, but NOT the reverse.
class Money:
    def __init__(self, amount, currency='USD'):
        self.amount = amount
        self.currency = currency

    def __repr__(self):
        return f"Money({self.amount}, '{self.currency}')"

    def __str__(self):
        return f"${self.amount:.2f} {self.currency}"

    def __add__(self, other):
        if self.currency != other.currency:
            raise ValueError("Cannot add different currencies")
        return Money(self.amount + other.amount, self.currency)

    def __eq__(self, other):
        return self.amount == other.amount and self.currency == other.currency

    def __hash__(self):
        return hash((self.amount, self.currency))

m = Money(10.50)
print(repr(m))  # Money(10.5, 'USD')
print(str(m))   # $10.50 USD
The __eq__/__hash__ contract: If you define __eq__, you MUST also define __hash__ if you want the object to be usable as a dict key or set element. If you define __eq__ without __hash__, Python sets __hash__ to None, making the object unhashable. Objects that are equal MUST have the same hash.What interviewers are really testing: Whether you know the __repr__ vs __str__ distinction, the __eq__/__hash__ contract, and whether you can use dunders to make pythonic, operator-friendly classes.Red flag answer: Only knowing __init__ and __str__. Not knowing the __eq__/__hash__ contract. Defining __eq__ without __hash__ and then wondering why your objects can’t be dict keys.Follow-up:
  1. “What happens if two objects are __eq__ but have different __hash__ values?”
  2. “How would you make a custom class work with the with statement?”
  3. “Explain the difference between __add__ and __radd__. When is __radd__ called?”
Answer: @property lets you define methods that are accessed like attributes, providing a clean API while encapsulating validation, computation, or lazy loading behind the scenes.
class User:
    def __init__(self, first_name, last_name, birth_year):
        self._first_name = first_name
        self._last_name = last_name
        self._birth_year = birth_year

    @property
    def full_name(self):
        """Computed property -- no stored state."""
        return f"{self._first_name} {self._last_name}"

    @property
    def age(self):
        """Computed from birth_year -- always current."""
        from datetime import date
        return date.today().year - self._birth_year

    @property
    def first_name(self):
        return self._first_name

    @first_name.setter
    def first_name(self, value):
        if not value or not value.strip():
            raise ValueError("Name cannot be empty")
        self._first_name = value.strip()

    @first_name.deleter
    def first_name(self):
        raise AttributeError("Cannot delete name")

u = User("John", "Doe", 1990)
print(u.full_name)    # "John Doe" -- accessed like attribute, computed like method
print(u.age)          # 36 (in 2026)
u.first_name = "Jane" # Calls setter with validation
Why @property matters in production:
  1. API stability: You start with a simple attribute, then later add validation/computation without changing the caller’s code. user.name stays the same whether it’s a raw attribute or a property.
  2. Lazy loading: Expensive computations can be deferred until first access.
  3. Caching with functools.cached_property (Python 3.8+):
from functools import cached_property

class DataSet:
    @cached_property
    def processed_data(self):
        """Expensive computation, run only once."""
        return heavy_processing(self.raw_data)
How @property works under the hood: @property is a descriptor. It implements __get__, __set__, and __delete__. When you access obj.x and x is a descriptor on the class, Python calls x.__get__(obj, type(obj)) instead of returning the descriptor itself. This is the same mechanism behind @classmethod, @staticmethod, and bound methods.What interviewers are really testing: Whether you use properties to build clean APIs, whether you know about cached_property, and whether you understand the descriptor protocol that powers properties.Red flag answer: Not knowing about setters/deleters. Using Java-style get_name() / set_name() methods instead of properties. Not knowing about cached_property for expensive computations.Follow-up:
  1. “What is the descriptor protocol and how does @property use it?”
  2. “What’s the difference between @property and @cached_property? When is each appropriate?”
  3. “How would you implement a property that validates type, not just value (e.g., ensuring an attribute is always an integer)?“

4. Functions and Decorators

Answer: *args and **kwargs provide flexible function signatures that accept arbitrary numbers of arguments.
  • *args: Collects extra positional arguments into a tuple.
  • **kwargs: Collects extra keyword arguments into a dict.
def flexible(required, *args, **kwargs):
    print(f"Required: {required}")
    print(f"Extra positional (tuple): {args}")
    print(f"Extra keyword (dict): {kwargs}")

flexible('hello', 1, 2, 3, debug=True, mode='fast')
# Required: hello
# Extra positional (tuple): (1, 2, 3)
# Extra keyword (dict): `{'debug': True, 'mode': 'fast'}`
The unpacking operators (the other side):
def add(a, b, c):
    return a + b + c

args = [1, 2, 3]
add(*args)  # Unpacks list into positional args: add(1, 2, 3)

kwargs = {'a': 1, 'b': 2, 'c': 3}
add(**kwargs)  # Unpacks dict into keyword args: add(a=1, b=2, c=3)
Parameter order (must follow this or SyntaxError):
  1. Regular positional parameters
  2. *args
  3. Keyword-only parameters (after *args or after bare *)
  4. **kwargs
def example(pos, *args, keyword_only, **kwargs):
    pass  # keyword_only MUST be passed by name
Keyword-only arguments (the bare * trick):
def connect(host, port, *, timeout=30, retries=3):
    # timeout and retries MUST be keyword arguments
    pass

connect('localhost', 8080, timeout=60)   # OK
connect('localhost', 8080, 60)           # TypeError!
This is heavily used in library design to prevent positional argument confusion. Django, Flask, and SQLAlchemy use this pattern extensively.Real-world use: decorator forwarding:
def logging_decorator(func):
    def wrapper(*args, **kwargs):  # Accept anything
        print(f"Calling {func.__name__}")
        return func(*args, **kwargs)  # Forward everything
    return wrapper
What interviewers are really testing: Whether you know the unpacking operators (not just the collection side), the parameter order rules, and keyword-only arguments.Red flag answer: Not knowing that *args is a tuple (not a list). Not knowing about keyword-only arguments. Never having used *args/**kwargs in decorator patterns.Follow-up:
  1. “What is the difference between * and ** when used in function calls vs function definitions?”
  2. “How would you use keyword-only arguments to design a safer API?”
  3. “Can you have positional-only parameters in Python? How?”
Answer: Lambdas are anonymous, single-expression functions. They exist for short, throwaway functions where defining a full def would be overkill.
# Lambda syntax: lambda parameters: expression
square = lambda x: x * x
add = lambda x, y: x + y

# Most common use: as key functions
sorted(names, key=lambda name: name.lower())
sorted(students, key=lambda s: (s['grade'], -s['age']))

# With filter and map (though comprehensions are preferred)
evens = list(filter(lambda x: x % 2 == 0, range(20)))
squared = list(map(lambda x: x**2, range(10)))
Limitations of lambdas:
  • Single expression only — no statements, no assignments, no multi-line logic
  • Cannot contain try/except, if/else statements (only ternary expressions)
  • No docstrings, no type hints
  • Not picklable (can’t be serialized for multiprocessing)
  • Harder to debug: tracebacks show <lambda> instead of a meaningful name
When to use lambdas vs. named functions:
  • Use lambda: Short sort keys, quick filter/map callbacks, simple event handlers
  • Use def: Anything reused, anything needing documentation, anything with complex logic
  • Use operator module instead of lambda: operator.itemgetter('key') is faster and clearer than lambda x: x['key']
The lambda closure gotcha:
# BUG: All lambdas capture the same variable 'i'
funcs = [lambda: i for i in range(5)]
print([f() for f in funcs])  # [4, 4, 4, 4, 4] -- NOT [0, 1, 2, 3, 4]!

# FIX: Use default argument to capture current value
funcs = [lambda i=i: i for i in range(5)]
print([f() for f in funcs])  # [0, 1, 2, 3, 4]
This is a late binding closure issue. The lambda captures the variable i, not the value at the time of creation. By the time the lambda runs, the loop is done and i == 4.What interviewers are really testing: Whether you know the closure gotcha, whether you know when lambdas are appropriate vs. overkill, and whether you’d prefer operator.itemgetter for sort keys.Red flag answer: Using lambdas for complex logic. Not knowing the closure capture gotcha. Not knowing that lambdas can’t contain statements.Follow-up:
  1. “Explain the late-binding closure bug with lambdas in loops. How do you fix it?”
  2. “Why would you prefer operator.itemgetter or operator.attrgetter over a lambda?”
  3. “Are lambdas more or less efficient than regular functions? Why?”
Answer: Decorators are functions (or classes) that take a function as input and return a modified version of it, without changing the original function’s source code. They implement the Decorator Pattern and are Python’s primary mechanism for cross-cutting concerns.
import functools
import time

def timing_decorator(func):
    @functools.wraps(func)  # CRITICAL: preserves __name__, __doc__, etc.
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        print(f"{func.__name__} took {elapsed:.4f}s")
        return result
    return wrapper

@timing_decorator
def slow_function():
    time.sleep(1)
    return "done"
functools.wraps — the detail that separates juniors from seniors: Without @functools.wraps(func), the wrapper function replaces the original’s metadata. slow_function.__name__ would be 'wrapper' instead of 'slow_function'. This breaks debugging, logging, documentation generation, and any code that introspects function names. Always use @functools.wraps.Decorators with arguments (the double-wrapper pattern):
def retry(max_attempts=3, backoff=1.0):
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_attempts):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_attempts - 1:
                        raise
                    time.sleep(backoff * (2 ** attempt))
        return wrapper
    return decorator

@retry(max_attempts=5, backoff=0.5)
def call_api():
    ...
Class-based decorators:
class CountCalls:
    def __init__(self, func):
        functools.update_wrapper(self, func)
        self.func = func
        self.count = 0

    def __call__(self, *args, **kwargs):
        self.count += 1
        return self.func(*args, **kwargs)

@CountCalls
def my_func():
    pass

my_func()
print(my_func.count)  # 1
Real-world decorator examples you should know:
  • @functools.lru_cache — memoization with LRU eviction
  • @functools.cached_property — lazy, cached attribute computation
  • @contextlib.contextmanager — turns a generator into a context manager
  • @dataclasses.dataclass — auto-generates boilerplate methods
  • @app.route (Flask) — URL routing
  • @login_required (Django) — authentication enforcement
Stacking decorators:
@decorator_a
@decorator_b
def func(): ...
# Equivalent to: func = decorator_a(decorator_b(func))
# decorator_b is applied FIRST (bottom-up)
What interviewers are really testing: Whether you always use @functools.wraps. Whether you can write decorators with arguments. Whether you understand stacking order. Whether you’ve used decorators for real cross-cutting concerns (caching, retries, auth).Red flag answer: Forgetting @functools.wraps. Not knowing how to write decorators with arguments. Only knowing @staticmethod/@classmethod as decorators.Follow-up:
  1. “What happens when you stack multiple decorators? What’s the execution order?”
  2. “How would you write a decorator that works on both sync and async functions?”
  3. “Explain functools.lru_cache — how does it work, what are the gotchas, and when would you NOT use it?”
Answer: A closure is a function that captures and remembers variables from its enclosing (lexical) scope, even after that outer function has finished execution. The inner function “closes over” the variables.
def make_multiplier(factor):
    def multiply(x):
        return x * factor  # 'factor' is captured from the enclosing scope
    return multiply

double = make_multiplier(2)
triple = make_multiplier(3)

print(double(5))   # 10
print(triple(5))   # 15
print(double.__closure__[0].cell_contents)  # 2 (the captured value)
How closures work internally: Python stores closure variables in cell objects. The inner function’s __closure__ attribute is a tuple of cell objects, each containing one captured variable. The __code__.co_freevars tuple lists the names of the captured variables.The nonlocal keyword — modifying closure variables:
def make_counter():
    count = 0
    def increment():
        nonlocal count  # Without this, 'count += 1' would create a LOCAL variable
        count += 1
        return count
    return increment

counter = make_counter()
print(counter())  # 1
print(counter())  # 2
print(counter())  # 3
Without nonlocal, assigning to count inside increment creates a new local variable, and the += 1 operation would raise UnboundLocalError because it reads before assigning.Real-world use cases:
  1. Factory functions (like make_multiplier above)
  2. Decorators (the wrapper function is a closure over the decorated function)
  3. Callback registration with pre-bound parameters
  4. Lightweight alternative to classes when you need just state + one or two functions
Closures vs. classes:
# Closure approach (lighter, simpler)
def make_account(balance):
    def deposit(amount):
        nonlocal balance
        balance += amount
        return balance
    return deposit

# Class approach (heavier, more features)
class Account:
    def __init__(self, balance):
        self.balance = balance
    def deposit(self, amount):
        self.balance += amount
        return self.balance
Use closures when you need simple state with 1-2 functions. Use classes when you need multiple methods, inheritance, or complex state.What interviewers are really testing: Whether you understand how closures capture variables (cells, late binding), whether you know nonlocal, and whether you can explain the relationship between closures and decorators.Red flag answer: Not knowing the late binding gotcha (closures capture variables, not values). Not knowing nonlocal. Confusing closures with regular nested functions that don’t capture anything.Follow-up:
  1. “What is the difference between global and nonlocal?”
  2. “Explain the late binding behavior of closures and how it causes bugs in loops.”
  3. “When would you use a closure instead of a class, and vice versa?”
Answer: Generators are functions that produce a sequence of values lazily, one at a time, using yield. They don’t compute or store the entire sequence in memory — they produce each value on demand.
def fibonacci():
    a, b = 0, 1
    while True:  # Infinite sequence -- no problem with generators!
        yield a
        a, b = b, a + b

# Only computes values as requested
fib = fibonacci()
print(next(fib))  # 0
print(next(fib))  # 1
print(next(fib))  # 1
print(next(fib))  # 2
Why generators matter — memory efficiency:
# List: ALL 10 million items in memory at once (~80MB for ints)
data = [x**2 for x in range(10_000_000)]

# Generator: ONE item in memory at a time (~0 extra memory)
data = (x**2 for x in range(10_000_000))
If you’re processing a 50GB log file, you can’t load it all into memory. Generators let you process line by line:
def read_large_file(path):
    with open(path) as f:
        for line in f:  # File objects are already iterators!
            yield line.strip()

# Process 50GB file with constant memory usage
for line in read_large_file('huge.log'):
    if 'ERROR' in line:
        handle_error(line)
yield from — delegating to sub-generators (Python 3.3+):
def flatten(nested):
    for item in nested:
        if isinstance(item, list):
            yield from flatten(item)  # Delegate to recursive call
        else:
            yield item

list(flatten([1, [2, [3, 4]], 5]))  # [1, 2, 3, 4, 5]
send() — two-way communication with generators (coroutines):
def accumulator():
    total = 0
    while True:
        value = yield total  # Receive value from send()
        total += value

acc = accumulator()
next(acc)        # Prime the generator (advance to first yield)
acc.send(10)     # Returns 10
acc.send(20)     # Returns 30
acc.send(5)      # Returns 35
Generator vs. Iterator: Every generator is an iterator, but not every iterator is a generator. Generators are a convenient way to create iterators without writing a class with __iter__ and __next__.What interviewers are really testing: Whether you understand the memory implications, whether you’ve used generators for real data processing, and whether you know about yield from and send().Red flag answer: Not knowing the memory difference between [x for x in ...] and (x for x in ...). Not knowing generators are single-use (can’t iterate twice). Not knowing yield from.Follow-up:
  1. “What happens if you try to iterate over a generator a second time?”
  2. “How would you implement a generator-based pipeline for processing streaming data?”
  3. “Explain how yield from differs from a for loop with yield inside it — what extra functionality does it provide?“

5. File Handling and I/O

Answer: Always use the with statement (context manager) to ensure files are closed properly, even if exceptions occur. This is non-negotiable in production code.
# Reading -- choose the right method:
with open('file.txt', 'r', encoding='utf-8') as f:
    content = f.read()       # Read entire file into one string (careful with large files!)
    lines = f.readlines()    # Read all lines into a list (also loads everything into memory)

# Line-by-line reading (memory efficient for large files):
with open('large.log', 'r', encoding='utf-8') as f:
    for line in f:           # File object is an iterator -- constant memory
        process(line)

# Writing ('w' truncates, 'a' appends, 'x' exclusive create):
with open('output.txt', 'w', encoding='utf-8') as f:
    f.write("Hello\n")
    f.writelines(['line1\n', 'line2\n'])  # Does NOT add newlines automatically!
Critical details most candidates miss:
  1. Always specify encoding='utf-8': Without it, Python uses the system default encoding (often cp1252 on Windows, utf-8 on Linux/Mac). This causes cross-platform bugs. PEP 686 in Python 3.15 will make UTF-8 the default.
  2. File modes explained:
    • 'r' — read (default). File must exist.
    • 'w' — write. Truncates (erases) existing content!
    • 'a' — append. Adds to end.
    • 'x' — exclusive create. Fails if file exists (prevents accidental overwrites).
    • 'b' — binary mode ('rb', 'wb'). For images, PDFs, any non-text data.
    • '+' — read+write ('r+', 'w+').
  3. pathlib is the modern way:
from pathlib import Path

path = Path('data') / 'output.txt'  # OS-independent path joining
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text("Hello", encoding='utf-8')
content = path.read_text(encoding='utf-8')
  1. Buffering and flushing:
with open('log.txt', 'a') as f:
    f.write("event happened\n")
    f.flush()  # Force write to disk NOW (important for logging)
What interviewers are really testing: Whether you always use with, whether you specify encoding, whether you use pathlib in modern code, and whether you know the difference between file modes (especially 'w' truncating vs 'a' appending).Red flag answer: Not using with statement. Not specifying encoding. Using os.path.join instead of pathlib. Using f.read() on a multi-GB file.Follow-up:
  1. “What happens if an exception occurs inside a with block? Is the file still closed?”
  2. “How would you safely write to a file without risking data corruption if the process crashes mid-write?”
  3. “How does Python’s file buffering work, and when would you need to change the buffer size?”
Answer: JSON is the lingua franca of web APIs and config files. Python’s json module handles serialization/deserialization, but there are important nuances for production use.
import json

# Python dict to JSON string
data = {"id": 1, "name": "Alice", "active": True, "score": None}
json_string = json.dumps(data, indent=2)  # Pretty-printed

# JSON string to Python dict
parsed = json.loads(json_string)

# File I/O (use dump/load, not dumps/loads)
with open('data.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, indent=2, ensure_ascii=False)  # ensure_ascii=False for Unicode

with open('data.json', 'r', encoding='utf-8') as f:
    loaded = json.load(f)
Type mapping gotchas:
PythonJSONGotcha
dictobjectJSON keys are always strings. {1: 'a'} becomes {"1": "a"}
list, tuplearrayTuples become arrays — round-trip loses tuple type
True/Falsetrue/falseCase difference matters
Nonenull
int/floatnumberJSON has no int/float distinction
setNot serializable by default!
datetimeNot serializable by default!
Handling non-serializable types (custom encoder):
from datetime import datetime
from decimal import Decimal

class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        if isinstance(obj, Decimal):
            return str(obj)  # Preserve precision
        if isinstance(obj, set):
            return list(obj)
        return super().default(obj)

json.dumps({"ts": datetime.now(), "price": Decimal("19.99")}, cls=CustomEncoder)
Performance: json vs orjson vs ujson: For high-throughput APIs (processing thousands of requests/sec), the stdlib json module is a bottleneck. orjson (Rust-based) is 3-10x faster and handles datetime, numpy arrays, and dataclass natively. In a FastAPI app, switching from json to orjson can cut serialization time from 15ms to 2ms per request.What interviewers are really testing: Whether you know the type mapping gotchas (sets, datetime, int keys), whether you can write custom encoders, and whether you know about performance alternatives for high-throughput scenarios.Red flag answer: Not knowing the dumps/dump vs loads/load distinction. Not knowing that sets and datetimes aren’t serializable. Never having heard of orjson.Follow-up:
  1. “How would you handle serializing a Python object with circular references to JSON?”
  2. “What is the security risk of json.loads on untrusted input? Compare to pickle.loads.”
  3. “How would you validate the structure of incoming JSON in a REST API? (JSON Schema, Pydantic, etc.)“

6. Exception Handling

Answer: Exception handling is Python’s primary error recovery mechanism. The philosophy is EAFP (Easier to Ask Forgiveness than Permission) rather than LBYL (Look Before You Leap).
# Basic try-except
try:
    result = 10 / 0
except ZeroDivisionError:
    result = 0
    print("Cannot divide by zero")

# Catching multiple exceptions
try:
    data = json.loads(user_input)
except (json.JSONDecodeError, TypeError) as e:
    print(f"Invalid input: {e}")

# Catching the exception object for logging
try:
    connect_to_db()
except ConnectionError as e:
    logger.error(f"DB connection failed: {e}", exc_info=True)
    raise  # Re-raise after logging! Don't silently swallow.
Critical production rules:
  1. NEVER use bare except: — it catches EVERYTHING, including KeyboardInterrupt and SystemExit, making your program unkillable:
# TERRIBLE: catches Ctrl+C, SystemExit, MemoryError
try:
    do_something()
except:  # Never do this!
    pass

# BAD BUT LESS BAD: catches too broadly
except Exception:
    pass

# GOOD: catch specific exceptions
except (ValueError, TypeError) as e:
    handle_specific_error(e)
  1. Don’t silently swallow exceptionsexcept: pass is called “error swallowing” and it hides bugs for weeks. At minimum, log the exception.
  2. Use raise to re-raise — if you can’t fully handle the error, log it and re-raise:
except SomeError as e:
    logger.error(f"Failed: {e}")
    raise  # Preserves the original traceback
  1. Exception chaining (Python 3):
try:
    data = parse_config(raw)
except ValueError as e:
    raise ConfigError("Bad config format") from e  # Chains the original cause
What interviewers are really testing: Whether you write bare except: (instant red flag), whether you silently swallow exceptions, and whether you understand EAFP vs LBYL.Red flag answer: Using bare except:. Using except: pass without logging. Not knowing about exception chaining with from e. Wrapping entire functions in try-except instead of targeting specific operations.Follow-up:
  1. “What is the difference between raise and raise e inside an except block?”
  2. “Explain EAFP vs LBYL with a concrete example. Which does Python prefer?”
  3. “How would you create a custom exception hierarchy for a library?”
Answer: The full try statement has four blocks, and knowing when each runs is critical:
try:
    result = dangerous_operation()   # May raise
except SpecificError as e:
    handle_error(e)                  # Runs ONLY if exception occurred
else:
    use_result(result)               # Runs ONLY if NO exception occurred
finally:
    cleanup()                        # ALWAYS runs, no matter what
Why else matters (and why most devs get it wrong): The else block runs only when the try block succeeds without exceptions. The key benefit: code in else is NOT protected by the except — so if use_result() raises, it’s NOT caught here. This keeps error handling precise.
# WITHOUT else -- imprecise error handling
try:
    data = fetch_from_api()       # We want to catch errors here
    processed = process(data)     # But NOT catch errors here!
except ConnectionError:
    handle_network_error()        # Would catch process() errors too!

# WITH else -- precise error handling
try:
    data = fetch_from_api()       # Only this is protected
except ConnectionError:
    handle_network_error()
else:
    processed = process(data)     # If this raises, it propagates normally
finally guarantees (and the return value gotcha): finally runs even if:
  • An exception was raised and not caught
  • A return statement was executed in try or except
  • A break or continue was executed
# GOTCHA: finally can override return values!
def tricky():
    try:
        return 1
    finally:
        return 2  # This wins! Returns 2, not 1.

print(tricky())  # 2 -- the finally return overrides the try return
Never put return in a finally block — it silently swallows exceptions and overrides previous returns.Real-world pattern — database transactions:
connection = get_db_connection()
try:
    connection.begin()
    execute_queries(connection)
except DatabaseError:
    connection.rollback()  # Undo on failure
    raise
else:
    connection.commit()    # Commit only on success
finally:
    connection.close()     # Always close the connection
What interviewers are really testing: Whether you know the else block exists and why it’s important (most juniors don’t), and whether you understand the finally + return gotcha.Red flag answer: Not knowing else exists. Putting all post-success logic inside the try block. Putting return in a finally block. Not understanding that finally always runs.Follow-up:
  1. “What happens if both the except block and the finally block raise exceptions?”
  2. “How does finally interact with generators and yield?”
  3. “When would you use finally vs. a context manager (with statement)?“

8. Advanced Concepts

Answer: This distinction is fundamental to Python’s for loop machinery and understanding it unlocks generators, custom iteration, and lazy evaluation.
  • Iterable: Any object that can return an iterator. It implements __iter__() which returns a fresh iterator. Examples: list, tuple, str, dict, set, range, file objects.
  • Iterator: An object that tracks position during iteration. It implements both __iter__() (returns self) and __next__() (returns next value or raises StopIteration).
The for loop protocol:
# When you write:
for item in [1, 2, 3]:
    print(item)

# Python actually does:
iterator = iter([1, 2, 3])  # Calls list.__iter__() -> returns list_iterator
while True:
    try:
        item = next(iterator)  # Calls iterator.__next__()
        print(item)
    except StopIteration:
        break
Key distinction:
  • Iterables can be iterated multiple times (each call to __iter__ returns a fresh iterator)
  • Iterators are single-use — once exhausted, calling next() keeps raising StopIteration
my_list = [1, 2, 3]  # Iterable
for x in my_list: pass  # Works
for x in my_list: pass  # Works again! Fresh iterator each time.

my_iter = iter(my_list)  # Iterator
list(my_iter)  # [1, 2, 3]
list(my_iter)  # [] -- exhausted!
Building a custom iterator:
class Countdown:
    def __init__(self, start):
        self.current = start

    def __iter__(self):
        return self  # Iterator returns itself

    def __next__(self):
        if self.current <= 0:
            raise StopIteration
        self.current -= 1
        return self.current + 1

for num in Countdown(5):
    print(num)  # 5, 4, 3, 2, 1
In practice, use generators instead of manual iterators — they’re cleaner and less error-prone:
def countdown(start):
    while start > 0:
        yield start
        start -= 1
What interviewers are really testing: Whether you understand the for-loop protocol, the single-use nature of iterators, and the difference between iter() and next().Red flag answer: Confusing iterables and iterators. Not knowing iterators are single-use. Not understanding the StopIteration protocol.Follow-up:
  1. “How does itertools.chain work internally? Is it lazy?”
  2. “What is the difference between __iter__ returning self vs returning a new iterator object?”
  3. “Name three itertools functions you use regularly and explain their use cases.”
Answer: This is one of the most misunderstood topics in Python, and getting it right in production is the difference between a 10x speedup and a frustrating debugging session.
AspectMultithreadingMultiprocessing
MemoryShared memory spaceSeparate memory per process
GIL ImpactBlocked for CPU-bound (only one thread runs Python bytecode at a time)Bypasses GIL entirely (each process has its own interpreter)
Best forI/O-bound tasks (network calls, disk reads, DB queries)CPU-bound tasks (math, image processing, ML training)
OverheadLow (threads are lightweight)High (process creation, IPC serialization)
CommunicationShared objects, queue.Queue, locksmultiprocessing.Queue, Pipe, shared memory
DebuggingRace conditions, deadlocksSerialization errors, zombie processes
When to use which — the decision tree:
  1. I/O-bound (waiting on network/disk): Use threading or, better yet, asyncio
  2. CPU-bound (number crunching): Use multiprocessing or concurrent.futures.ProcessPoolExecutor
  3. Mixed: Use asyncio with loop.run_in_executor() for the CPU-bound parts
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

# I/O-bound: Thread pool (e.g., downloading 100 URLs)
with ThreadPoolExecutor(max_workers=20) as pool:
    results = pool.map(download_url, urls)

# CPU-bound: Process pool (e.g., resizing 1000 images)
with ProcessPoolExecutor(max_workers=8) as pool:
    results = pool.map(resize_image, images)
Why concurrent.futures over raw threading/multiprocessing:
  • Uniform API for both thread and process pools
  • Built-in Future objects for tracking completion
  • Easier exception handling
  • as_completed() for processing results as they finish
The asyncio alternative (modern Python): For I/O-bound work with thousands of concurrent tasks, asyncio is often better than threads because it avoids the overhead of OS thread context switching:
import asyncio
import aiohttp

async def fetch_all(urls):
    async with aiohttp.ClientSession() as session:
        tasks = [session.get(url) for url in urls]
        return await asyncio.gather(*tasks)
What interviewers are really testing: Whether you know the GIL’s impact on threading, whether you choose the right concurrency model for the workload, and whether you’ve used concurrent.futures in production.Red flag answer: Using threads for CPU-bound work. Not knowing what the GIL is. Using raw threading.Thread instead of concurrent.futures. Not knowing asyncio exists.Follow-up:
  1. “You have a web scraper that needs to fetch 10,000 URLs. How do you design it?”
  2. “What are the common pitfalls of shared state in multithreaded Python?”
  3. “How does asyncio differ from threading, and when would you choose one over the other?”
Answer: The GIL is a mutex (mutual exclusion lock) in CPython that allows only one thread to execute Python bytecode at a time. It exists because CPython’s memory management (reference counting) is not thread-safe.What the GIL actually does:
  • Prevents multiple threads from executing Python bytecode simultaneously
  • Protects CPython’s internal data structures (reference counts, object allocations) from race conditions
  • Is released during I/O operations (file reads, network calls, time.sleep), C extensions, and certain NumPy operations
Why it exists (the trade-off):
  • Pro: Makes single-threaded code faster (no locking overhead on every object operation). Makes C extension writing simpler.
  • Con: CPU-bound multithreaded Python code gets zero speedup from multiple cores. A 4-thread CPU-bound program runs at ~1x speed, not ~4x.
What the GIL does NOT prevent: The GIL does NOT prevent race conditions in your application code:
# This is still a race condition despite the GIL:
counter = 0

def increment():
    global counter
    for _ in range(1_000_000):
        counter += 1  # NOT atomic! Read-modify-write can be interrupted between bytecodes

# Two threads doing this can produce counter < 2_000_000
The GIL only guarantees that one bytecode instruction runs at a time. counter += 1 compiles to multiple bytecodes (LOAD_GLOBAL, LOAD_CONST, BINARY_ADD, STORE_GLOBAL), and the GIL can switch threads between any of them.Workarounds for CPU-bound parallelism:
  1. multiprocessing — separate processes, each with its own GIL
  2. C extensions (NumPy, Pandas) — release the GIL during computation
  3. Cython with nogil — write C-speed Python that releases the GIL
  4. concurrent.futures.ProcessPoolExecutor — simplest process-based parallelism
Python 3.13+ (PEP 703) — Experimental no-GIL mode: Python 3.13 introduced an experimental free-threaded build (--disable-gil). This removes the GIL entirely, enabling true multithreaded parallelism for CPU-bound tasks. As of 2025, it’s still experimental and not all C extensions support it.What interviewers are really testing: Whether you understand why the GIL exists (not just that it exists), that it doesn’t prevent application-level race conditions, and what the concrete workarounds are.Red flag answer: Thinking the GIL prevents all race conditions. Not knowing that the GIL is released during I/O. Saying “Python can’t do parallel programming” (it can, via multiprocessing). Not knowing about the PEP 703 no-GIL work.Follow-up:
  1. “Does the GIL make Python thread-safe? If not, what race conditions can still occur?”
  2. “How do NumPy and Pandas achieve parallel performance despite the GIL?”
  3. “What is PEP 703 and what’s the status of removing the GIL from CPython?“

10. Data Science & Numerical Python

Answer: NumPy (Numerical Python) is the foundation of Python’s scientific computing ecosystem. Everything — Pandas, SciPy, scikit-learn, TensorFlow, PyTorch — is built on NumPy arrays.Why NumPy is fast (the real answer):
  1. Contiguous memory: ndarray stores elements in a contiguous C-array, unlike Python lists which store pointers to scattered objects. This means CPU cache lines are used efficiently.
  2. Vectorized operations: Operations happen in compiled C/Fortran, not interpreted Python. np.array * 2 is a single C call, not a Python loop.
  3. No type checking per element: All elements share one dtype, so no per-element type dispatch.
  4. BLAS/LAPACK integration: Linear algebra ops use highly optimized libraries that leverage CPU SIMD instructions.
import numpy as np

# Python list vs NumPy -- performance difference
import time

# Python list: ~2.5 seconds for 10M elements
python_list = list(range(10_000_000))
start = time.time()
result = [x * 2 for x in python_list]
print(f"Python: {time.time() - start:.3f}s")

# NumPy: ~0.02 seconds (100x+ faster)
np_array = np.arange(10_000_000)
start = time.time()
result = np_array * 2
print(f"NumPy: {time.time() - start:.3f}s")
Key concepts:
  • Broadcasting: Arrays of different shapes can operate together: np.array([1,2,3]) + 10 adds 10 to each element.
  • Views vs copies: Slicing a NumPy array returns a view (no copy!). Modifying the view modifies the original. Use .copy() for independent arrays.
  • dtype: Specify data type for memory control: np.array([1,2,3], dtype=np.float32) uses 4 bytes per element instead of 8.
What interviewers are really testing: Whether you understand why NumPy is fast (not just “it’s written in C”) and whether you know the view-vs-copy gotcha.Red flag answer: Only saying “NumPy is fast.” Not knowing about broadcasting, views, or dtypes. Using Python loops over NumPy arrays (defeats the purpose).Follow-up:
  1. “What is broadcasting and when does it fail?”
  2. “How do you avoid accidental data corruption from NumPy views?”
  3. “When would you use float32 vs float64?”
Answer: Pandas is the standard library for tabular data manipulation in Python. It wraps NumPy arrays in labeled, SQL-like structures.Core structures:
  • Series: 1D labeled array (like a column in a spreadsheet)
  • DataFrame: 2D labeled table (like a spreadsheet or SQL table)
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [30, 25, 35],
    'salary': [70000, 50000, 90000]
})

# SQL-like operations
filtered = df[df['age'] > 28]                          # WHERE
grouped = df.groupby('age')['salary'].mean()           # GROUP BY
sorted_df = df.sort_values('salary', ascending=False)  # ORDER BY
merged = pd.merge(df, other_df, on='name')             # JOIN
Performance pitfalls (real-world lessons):
  1. Never iterate rows with iterrows() — it’s 100-1000x slower than vectorized ops. Use .apply(), vectorized NumPy operations, or .values for bulk processing.
  2. Use appropriate dtypes: category dtype for low-cardinality strings saves 90%+ memory. int8 for small integers. Downcasting a 10GB DataFrame to proper dtypes can reduce it to 2GB.
  3. query() for readable filtering: df.query('age > 28 and salary > 60000') beats chained boolean indexing for readability.
  4. Chained indexing warning: df['col'][0] = val may not modify the DataFrame (creates a copy). Use df.loc[0, 'col'] = val instead.
When Pandas isn’t enough:
  • Over ~10GB: Use Polars (Rust-based, 5-10x faster) or Dask (parallel Pandas)
  • For SQL-heavy workflows: DuckDB can query DataFrames directly with SQL
  • For streaming data: Pandas is batch-only; use Spark or Flink
What interviewers are really testing: Whether you’ve hit real Pandas performance issues and know the alternatives. Whether you use vectorized operations vs row-wise loops.Red flag answer: Using for index, row in df.iterrows() for everything. Not knowing about loc vs iloc. Not knowing when to move beyond Pandas to Polars/Dask.Follow-up:
  1. “How would you handle a 50GB CSV file that doesn’t fit in memory?”
  2. “Explain the difference between loc, iloc, and at in Pandas.”
  3. “What is Polars and when would you choose it over Pandas?”

Additional Important Topics

Answer: These are two of Python’s most-used built-in functions, and fluency with them is a signal of Pythonic code.enumerate(iterable, start=0): Wraps an iterable and returns (index, element) tuples. Replaces the anti-pattern of for i in range(len(items)).
# Anti-pattern (C-style):
for i in range(len(names)):
    print(f"{i}: {names[i]}")

# Pythonic:
for i, name in enumerate(names):
    print(f"{i}: {name}")

# Custom start index (useful for 1-based output):
for rank, name in enumerate(leaderboard, start=1):
    print(f"#{rank}: {name}")
zip(*iterables): Pairs up elements from multiple iterables. Stops at the shortest iterable (use itertools.zip_longest to pad).
names = ['Alice', 'Bob', 'Charlie']
scores = [95, 87, 92]
grades = ['A', 'B+', 'A-']

# Combine multiple iterables
for name, score, grade in zip(names, scores, grades):
    print(f"{name}: {score} ({grade})")

# Dict from two lists (very common pattern)
name_to_score = dict(zip(names, scores))

# Unzipping (transpose):
pairs = [('a', 1), ('b', 2), ('c', 3)]
letters, numbers = zip(*pairs)  # ('a','b','c'), (1,2,3)
zip gotcha with unequal lengths:
list(zip([1, 2, 3], ['a', 'b']))  # [(1, 'a'), (2, 'b')] -- silently drops 3!

# Python 3.10+: strict mode raises on length mismatch
list(zip([1, 2, 3], ['a', 'b'], strict=True))  # ValueError!

# For padding instead of truncating:
from itertools import zip_longest
list(zip_longest([1, 2, 3], ['a', 'b'], fillvalue='?'))
# [(1, 'a'), (2, 'b'), (3, '?')]
Both are lazy (generators): zip and enumerate return iterators, not lists. They produce values on demand, so zip(range(1_000_000), range(1_000_000)) uses constant memory.What interviewers are really testing: Whether you write Pythonic loops (enumerate over range(len(…))), whether you know about strict=True in zip, and whether you can use zip for dict construction and transposing.Red flag answer: Using range(len(items)) instead of enumerate. Not knowing zip truncates silently. Not knowing about zip_longest.Follow-up:
  1. “How would you zip three lists together but raise an error if they have different lengths?”
  2. “What does zip(*matrix) do and why is it useful?”
  3. “Are enumerate and zip lazy or eager? What are the memory implications?”
Answer: range() generates a sequence of numbers lazily. In Python 3, it returns a range object (not a list), which is a lazy, immutable sequence that computes values on demand.Syntax: range(stop), range(start, stop), range(start, stop, step)
range(5)          # 0, 1, 2, 3, 4 (stop is exclusive)
range(2, 7)       # 2, 3, 4, 5, 6
range(0, 10, 2)   # 0, 2, 4, 6, 8 (step of 2)
range(10, 0, -1)  # 10, 9, 8, ..., 1 (counting down)
Why range is special (not just a generator):
  • O(1) membership testing: 999_999 in range(1_000_000) is instant — it computes arithmetically, doesn’t iterate.
  • O(1) length: len(range(1_000_000_000)) is instant.
  • O(1) indexing: range(1_000_000)[999_999] is instant.
  • Hashable and comparable: Two ranges are equal if they produce the same sequence: range(0, 10, 2) == range(0, 10, 2) is True.
# This is O(1), NOT O(n) -- range computes membership arithmetically
if 500_000 in range(1_000_000):
    print("Found!")  # Instant

# Compare to a generator (this WOULD be O(n)):
gen = (x for x in range(1_000_000))
# 500_000 in gen  # O(n) -- must iterate through values
Common patterns:
# Repeat something N times (when you don't need the index)
for _ in range(10):
    retry_operation()

# Generate indices for parallel iteration (prefer enumerate/zip instead)
for i in range(len(items)):  # Anti-pattern! Use enumerate instead
    process(items[i])

# Creating sequences with list()
indices = list(range(0, 100, 5))  # [0, 5, 10, ..., 95]
What interviewers are really testing: Whether you know range is lazy and supports O(1) membership testing (not a generator that must iterate).Red flag answer: Saying range creates a list. Not knowing that in checks are O(1) for range. Using range(len(...)) instead of enumerate.Follow-up:
  1. “How does range achieve O(1) membership testing?”
  2. “What is the difference between range in Python 2 vs Python 3?”
  3. “Can you create a range-like class for floats? What challenges would you face?”
Answer: PEP 8 is Python’s official style guide. It’s not about aesthetics — it’s about reducing cognitive load when reading code across teams and projects. Consistency enables faster code review, easier onboarding, and fewer “style” debates.Key rules:
  • Indentation: 4 spaces (never tabs). This is non-negotiable in the Python community.
  • Line length: 79 characters for code, 72 for docstrings/comments. Many teams extend to 88 (black default) or 100 in practice.
  • Blank lines: 2 between top-level functions/classes, 1 between methods inside a class.
  • Imports: Always at the top of the file, grouped and ordered:
    1. Standard library (import os, import sys)
    2. Third-party (import requests, import numpy)
    3. Local/project (from myapp import utils)
  • Naming conventions: (See Question 6 for full details.)
  • Whitespace: One space around operators (x = 1 + 2), no space inside brackets (func(arg) not func( arg )).
  • Comparisons: Use is/is not for None/True/False. Use if items: instead of if len(items) > 0:.
Tooling (how real teams enforce PEP 8):
  • ruff: Modern, Rust-based linter+formatter. Replaces flake8, isort, pycodestyle, pyflakes — all in one tool, 10-100x faster. Rapidly becoming the standard.
  • black: Opinionated auto-formatter. “Any color you like, as long as it’s black.” Eliminates all formatting debates.
  • mypy / pyright: Static type checkers (not PEP 8, but same category of code quality).
  • Pre-commit hooks: Run ruff and black automatically before every commit. This is the real enforcement mechanism — not code review.
What interviewers are really testing: Whether you follow PEP 8 naturally and use tooling to enforce it, rather than relying on manual review.Red flag answer: Not knowing what PEP 8 is. Using tabs. Not using any linter or formatter. Arguing about style instead of using black/ruff to settle it automatically.Follow-up:
  1. “How do you set up automated PEP 8 enforcement in a CI/CD pipeline?”
  2. “When is it acceptable to violate PEP 8?”
  3. “What is the difference between ruff, flake8, and black? How do they complement each other?”
Answer: Context managers ensure that setup and teardown logic always runs, even if exceptions occur. They implement the __enter__ / __exit__ protocol.The problem they solve:
# Without context manager -- if process() raises, file never closes:
f = open('data.txt')
data = f.read()
process(data)     # If this crashes...
f.close()         # ...this never runs. File handle leaks.

# With context manager -- file always closes:
with open('data.txt') as f:
    data = f.read()
    process(data)  # Even if this crashes, __exit__ runs and closes the file
Building custom context managers:
# Class-based:
class DatabaseConnection:
    def __enter__(self):
        self.conn = create_connection()
        return self.conn

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.conn.close()
        return False  # Don't suppress exceptions (True would suppress)

with DatabaseConnection() as conn:
    conn.execute("SELECT * FROM users")

# Generator-based (simpler, using contextlib):
from contextlib import contextmanager

@contextmanager
def timer(label):
    import time
    start = time.perf_counter()
    yield  # Code inside 'with' block runs here
    elapsed = time.perf_counter() - start
    print(f"{label}: {elapsed:.4f}s")

with timer("data processing"):
    heavy_computation()
Real-world use cases:
  • File handles, database connections, network sockets
  • Acquiring/releasing locks: with threading.Lock():
  • Temporary directory: with tempfile.TemporaryDirectory() as tmp:
  • Changing directory: with contextlib.chdir('/tmp'):
  • Suppressing exceptions: with contextlib.suppress(FileNotFoundError):
  • Redirecting stdout: with contextlib.redirect_stdout(f):
__exit__ parameters explained: __exit__(self, exc_type, exc_val, exc_tb) — if no exception occurred, all three are None. If an exception occurred, they contain the exception info. Returning True suppresses the exception (use sparingly).What interviewers are really testing: Whether you use with by default for resource management, whether you can build custom context managers, and whether you know the contextlib shortcuts.Red flag answer: Not using with for file operations. Not knowing how to write a custom context manager. Not knowing about contextlib.contextmanager.Follow-up:
  1. “What happens if __enter__ raises an exception? Does __exit__ still run?”
  2. “How would you create an async context manager?”
  3. “When would you return True from __exit__ to suppress an exception?”
Answer: Understanding memory management is essential for building long-running services (web servers, data pipelines, ML training loops) that don’t slowly leak memory.Two-tier garbage collection:
  1. Reference counting (primary): Every object has a reference count. When it drops to zero, the object is immediately deallocated. This handles ~95% of garbage collection.
import sys
a = [1, 2, 3]
print(sys.getrefcount(a))  # 2 (a + the getrefcount argument)
b = a
print(sys.getrefcount(a))  # 3 (a + b + getrefcount)
del b
print(sys.getrefcount(a))  # 2
  1. Cyclic garbage collector (secondary): Handles circular references that reference counting can’t:
# Circular reference: a -> b -> a
a = []
b = [a]
a.append(b)  # Now a references b and b references a
del a, b     # Refcounts are 1 (not 0!) -- only cyclic GC can clean this
CPython memory optimizations:
  • Small object allocator (pymalloc): Objects under 512 bytes use Python’s internal allocator, avoiding expensive malloc system calls.
  • Free lists: CPython caches recently deallocated objects (int, float, tuple, list, dict) for reuse.
  • Interning: Small integers (-5 to 256) and certain strings are cached as singletons.
Debugging memory leaks:
import tracemalloc

tracemalloc.start()
# ... your code ...
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
for stat in top_stats[:10]:
    print(stat)  # Shows which lines allocated the most memory
Tools: tracemalloc (stdlib), objgraph (visualize references), memory_profiler (@profile decorator), pympler (detailed object tracking).Common memory leak patterns:
  1. Growing caches without eviction (use functools.lru_cache with maxsize)
  2. Circular references involving __del__ methods (GC can’t collect these before Python 3.4)
  3. Closures capturing large objects unintentionally
  4. Appending to global lists in long-running processes
What interviewers are really testing: Whether you’ve debugged memory leaks in production, whether you know the two-tier GC system, and whether you can name specific tools.Red flag answer: Thinking Python has no garbage collection issues. Not knowing about circular references. Never having profiled memory usage.Follow-up:
  1. “How would you debug a Python service that’s slowly leaking 100MB/hour?”
  2. “What is the __slots__ optimization and how does it affect memory usage?”
  3. “How does gc.disable() affect your application? When might you want to do this?”
Answer: Type hints (PEP 484, Python 3.5+) add optional type annotations to Python code. They’re not enforced at runtime — they’re metadata consumed by static analysis tools like mypy and pyright.
# Basic type hints
def greet(name: str, times: int = 1) -> str:
    return (f"Hello, {name}! " * times).strip()

# Collections (Python 3.9+ uses built-in types)
def process(items: list[int], config: dict[str, str]) -> tuple[int, ...]:
    ...

# Optional (value or None)
from typing import Optional
def find_user(user_id: int) -> Optional[dict]:  # Or: dict | None (3.10+)
    ...

# Union types
from typing import Union
def parse(value: Union[str, int]) -> float:  # Or: str | int (3.10+)
    ...
Why type hints matter in production:
  1. Catch bugs before runtime: mypy finds type mismatches, None-safety issues, and incorrect function calls at CI time.
  2. Self-documenting code: def fetch(url: str, timeout: float = 30.0) -> Response tells you everything.
  3. IDE intelligence: Autocomplete, refactoring, and navigation all improve dramatically.
  4. API contracts: When multiple teams work on a codebase, types serve as machine-checkable documentation.
Advanced patterns:
from typing import TypeVar, Protocol, Generic

# Generics
T = TypeVar('T')
def first(items: list[T]) -> T:
    return items[0]

# Protocol (structural subtyping / duck typing with types)
class Drawable(Protocol):
    def draw(self) -> None: ...

def render(shape: Drawable) -> None:  # Any object with .draw() works
    shape.draw()

# TypedDict (type dicts with specific keys)
from typing import TypedDict
class UserDict(TypedDict):
    name: str
    age: int
    email: str
Tooling ecosystem:
  • mypy: The original type checker, strict and well-established.
  • pyright (Microsoft): Faster, powers VS Code Pylance. Stricter than mypy by default.
  • Pydantic: Runtime type validation (for API inputs, configs). Different from mypy (compile-time) vs Pydantic (runtime).
What interviewers are really testing: Whether you use type hints in practice, whether you can distinguish static analysis from runtime validation, and whether you know about Protocol for duck typing.Red flag answer: Thinking type hints are enforced at runtime. Not knowing any type checking tools. Using Any everywhere to “satisfy” the type checker.Follow-up:
  1. “What is the difference between mypy and Pydantic? When do you need each?”
  2. “How do you handle gradual typing when adding type hints to a large existing codebase?”
  3. “What is Protocol and how does it preserve Python’s duck typing philosophy while adding type safety?”
Answer: Python offers multiple ways to define data-holding classes, and choosing the right one signals experience.dataclass (Python 3.7+) — the modern default:
from dataclasses import dataclass, field

@dataclass
class User:
    name: str
    age: int
    email: str = ""
    tags: list[str] = field(default_factory=list)  # Mutable default done right

    def is_adult(self) -> bool:
        return self.age >= 18

# Auto-generates: __init__, __repr__, __eq__
# Optional: frozen=True (immutable), order=True (comparison), slots=True (memory efficient)
NamedTuple — when you want immutability:
from typing import NamedTuple

class Point(NamedTuple):
    x: float
    y: float
    label: str = "origin"

p = Point(1.0, 2.0)
print(p.x)     # Attribute access
print(p[0])    # Tuple indexing
x, y, _ = p    # Tuple unpacking
# p.x = 3     # AttributeError -- immutable!
Comparison:
FeaturedataclassNamedTupleRegular class
Mutable by defaultYesNo (immutable)Yes
Auto __init__YesYesNo (write manually)
Auto __repr__YesYesNo
Auto __eq__Yes (value-based)Yes (value-based)No (identity-based)
HashableOnly if frozen=TrueYes (immutable)Only if __hash__ defined
InheritanceFull supportLimitedFull support
Memory with __slots__slots=True (3.10+)AutomaticManual
Tuple unpackingNoYesNo
PerformanceGoodExcellent (C-backed)Depends
When to use each:
  • dataclass: Default choice for most data-holding classes. Mutable, feature-rich, familiar.
  • NamedTuple: When immutability is important (configs, records, API responses), or when you need tuple interop.
  • Regular class: When you need complex __init__ logic, metaclasses, or heavy customization.
  • attrs (third-party): When you need features beyond dataclass (validators, converters, more control). Many large codebases use attrs over dataclass.
What interviewers are really testing: Whether you reach for dataclass instead of writing boilerplate, whether you know when immutability matters, and whether you know the field(default_factory=...) pattern for mutable defaults.Red flag answer: Manually writing __init__, __repr__, __eq__ for simple data classes. Not knowing about frozen=True. Using a regular class when dataclass or NamedTuple would be cleaner.Follow-up:
  1. “How does dataclass(frozen=True) enforce immutability? Can it be bypassed?”
  2. “What does __slots__ do and why does dataclass(slots=True) exist?”
  3. “When would you choose attrs over dataclass?”
Answer: asyncio is Python’s built-in framework for concurrent I/O-bound programming using a single-threaded event loop. It enables handling thousands of concurrent network connections without the overhead of OS threads.Core concepts:
import asyncio

# async def creates a coroutine function
async def fetch_data(url: str) -> str:
    # Simulate network I/O
    await asyncio.sleep(1)  # Non-blocking sleep
    return f"Data from {url}"

# Running coroutines concurrently
async def main():
    # Sequential (slow: 3 seconds)
    r1 = await fetch_data("url1")
    r2 = await fetch_data("url2")
    r3 = await fetch_data("url3")

    # Concurrent (fast: 1 second)
    r1, r2, r3 = await asyncio.gather(
        fetch_data("url1"),
        fetch_data("url2"),
        fetch_data("url3"),
    )

asyncio.run(main())
How it works (event loop mental model):
  1. When a coroutine hits await, it yields control back to the event loop
  2. The event loop runs other ready coroutines while the awaited operation completes
  3. When the I/O completes, the event loop resumes the paused coroutine
  4. Only one coroutine runs at a time (single-threaded) — concurrency, not parallelism
Real-world pattern — async web client:
import aiohttp

async def fetch_all(urls: list[str]) -> list[str]:
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_one(session, url) for url in urls]
        return await asyncio.gather(*tasks, return_exceptions=True)

async def fetch_one(session, url):
    async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as resp:
        return await resp.text()
When to use asyncio vs threads vs multiprocessing:
  • asyncio: Many concurrent I/O operations (web scraping, API calls, chat servers). Best when you have 100+ concurrent tasks.
  • threading: Simpler I/O concurrency with existing sync libraries that can’t be made async.
  • multiprocessing: CPU-bound work that needs true parallelism.
Common pitfalls:
  1. Blocking the event loop: Calling time.sleep() (blocking) instead of await asyncio.sleep() (non-blocking) freezes ALL coroutines.
  2. Forgetting to await: result = fetch_data("url") returns a coroutine object, not the result.
  3. Mixing sync and async: Use loop.run_in_executor() to call sync functions from async code.
What interviewers are really testing: Whether you understand the event loop model, when asyncio is appropriate (I/O-bound only), and common pitfalls like blocking the loop.Red flag answer: Thinking asyncio provides true parallelism. Using time.sleep in async code. Not knowing the difference between asyncio.gather and sequential await.Follow-up:
  1. “What happens if you accidentally call a blocking function inside an async coroutine?”
  2. “How does asyncio.gather differ from asyncio.TaskGroup (Python 3.11+)?”
  3. “How would you add asyncio to an existing synchronous Flask application?”
Answer: Descriptors are the mechanism behind @property, @classmethod, @staticmethod, __slots__, and Python’s bound method system. Understanding descriptors means understanding how Python’s attribute access actually works.What is a descriptor? Any object that defines __get__, __set__, or __delete__ is a descriptor. When placed on a class, it intercepts attribute access on instances of that class.
class Validated:
    """A descriptor that validates values on assignment."""
    def __init__(self, min_val=None, max_val=None):
        self.min_val = min_val
        self.max_val = max_val

    def __set_name__(self, owner, name):
        self.name = name  # Called when descriptor is assigned to class attribute

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self  # Accessed on class, not instance
        return getattr(obj, f'__{self.name}', None)

    def __set__(self, obj, value):
        if self.min_val is not None and value < self.min_val:
            raise ValueError(f"{self.name} must be >= {self.min_val}")
        if self.max_val is not None and value > self.max_val:
            raise ValueError(f"{self.name} must be <= {self.max_val}")
        setattr(obj, f'__{self.name}', value)

class Product:
    price = Validated(min_val=0)
    quantity = Validated(min_val=0, max_val=10000)

    def __init__(self, price, quantity):
        self.price = price        # Calls Validated.__set__
        self.quantity = quantity

p = Product(9.99, 100)
p.price = -1  # ValueError: price must be >= 0
Two types of descriptors:
  • Data descriptor: Defines __set__ and/or __delete__. Takes priority over instance __dict__.
  • Non-data descriptor: Only defines __get__. Instance __dict__ takes priority.
This distinction explains why @property (data descriptor) can intercept assignment, while regular methods (non-data descriptors) can be overridden by instance attributes.The attribute lookup chain:
  1. Data descriptors on the class (e.g., @property)
  2. Instance __dict__
  3. Non-data descriptors on the class (e.g., methods)
  4. __getattr__ (fallback)
What interviewers are really testing: Whether you understand the mechanism behind @property and Python’s method binding. This is a staff-level question — most seniors don’t fully understand descriptors.Red flag answer: Never having heard of descriptors. Not understanding why @property works.Follow-up:
  1. “How does Python’s method binding work? Why is obj.method a bound method but Class.method is a function?”
  2. “What is the difference between a data descriptor and a non-data descriptor in attribute lookup priority?”
  3. “How would you implement a caching descriptor similar to functools.cached_property?”
Answer: Recognizing and fixing anti-patterns is what distinguishes a senior Python developer from someone who just writes Python that works.1. Mutable default arguments:
# BAD
def add_to(element, target=[]):
    target.append(element)
    return target

# GOOD
def add_to(element, target=None):
    if target is None:
        target = []
    target.append(element)
    return target
2. Using type() for type checking:
# BAD -- breaks with inheritance
if type(x) == dict:
    ...

# GOOD -- respects inheritance
if isinstance(x, dict):
    ...

# BEST (EAFP) -- duck typing
try:
    x.items()
except AttributeError:
    handle_non_dict()
3. Catching too broadly:
# BAD -- hides bugs, catches KeyboardInterrupt
try:
    do_something()
except:
    pass

# GOOD -- catch specific exceptions
try:
    do_something()
except (ValueError, KeyError) as e:
    logger.warning(f"Expected error: {e}")
4. Using is for value comparison:
# BAD -- relies on CPython integer caching, breaks with large numbers
if x is 0:
    ...

# GOOD
if x == 0:
    ...
5. Not using context managers:
# BAD -- file may not close on exception
f = open('data.txt')
data = f.read()
f.close()

# GOOD
with open('data.txt') as f:
    data = f.read()
6. String concatenation in loops:
# BAD -- O(n^2) because strings are immutable, each += creates new string
result = ""
for item in items:
    result += str(item) + ", "

# GOOD -- O(n) using join
result = ", ".join(str(item) for item in items)
7. Not using enumerate:
# BAD
for i in range(len(items)):
    print(i, items[i])

# GOOD
for i, item in enumerate(items):
    print(i, item)
8. Wildcard imports:
# BAD -- pollutes namespace, makes code unreadable, hides dependencies
from module import *

# GOOD -- explicit imports
from module import SpecificClass, specific_function
What interviewers are really testing: Whether you write idiomatic Python naturally and can spot anti-patterns in code review.Red flag answer: Writing any of these anti-patterns in whiteboard code. Not knowing why mutable defaults are dangerous. Using type() instead of isinstance().Follow-up:
  1. “You’re reviewing a PR and see except Exception: pass. What do you say?”
  2. “Why is string concatenation in a loop O(n^2)? What does CPython do internally?”
  3. “What other anti-patterns have you seen in production Python code?”