Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Python Data Structures

Data Structures

Python’s built-in data structures are incredibly powerful. Unlike C++ or Java where you might need to import special libraries for hash maps or dynamic arrays, Python has them built into the core language syntax. Choosing the right data structure is like choosing the right container in a kitchen. A spice rack (dictionary) gives you instant lookup by name. A queue at a deli counter (list/deque) maintains order. A cookie cutter set (set) guarantees no duplicates. Using the wrong container does not ruin the meal — but it makes cooking slower and messier than it needs to be.

1. Lists (Dynamic Arrays)

A list is an ordered, mutable sequence of items. It can hold mixed types (integers, strings, other lists). Under the hood, a Python list is a dynamic array — a contiguous block of pointers to objects. This means index access is O(1), but inserting at the beginning is O(n) because every pointer must shift.
fruits = ["apple", "banana", "cherry"]

# Access -- O(1), instant lookup by index
print(fruits[0])   # apple
print(fruits[-1])  # cherry (Negative indexing: -1 is last, -2 is second-to-last)

# Modify
fruits.append("date")      # O(1) amortized -- add to end
fruits.insert(1, "berry")  # O(n) -- shifts everything after index 1
fruits.pop()               # O(1) -- remove and return last item
fruits.pop(0)              # O(n) -- remove from front shifts everything

# If you need fast insert/remove from both ends, use collections.deque instead
Performance Pitfall: list.insert(0, x) and list.pop(0) are O(n) operations. If you are building a queue (FIFO), use collections.deque which gives O(1) append and popleft. This is a common interview question and a real production performance trap.
from collections import deque
queue = deque()
queue.append("first")    # Add to right -- O(1)
queue.appendleft("zero") # Add to left -- O(1)
queue.popleft()           # Remove from left -- O(1)

Slicing [start:end:step]

Slicing allows you to extract sub-parts of a list. It creates a new list.
nums = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

print(nums[0:5])   # [0, 1, 2, 3, 4] (Start is inclusive, End is exclusive)
print(nums[:5])    # [0, 1, 2, 3, 4] (Start defaults to 0)
print(nums[5:])    # [5, 6, 7, 8, 9] (End defaults to length)
print(nums[::2])   # [0, 2, 4, 6, 8] (Step 2: every second item)
print(nums[::-1])  # [9, 8, ..., 0]  (Reverse the list!)

List Comprehensions

This is the “Pythonic” way to create lists based on existing iterables. It is not just syntactic sugar — comprehensions are actually faster than the equivalent for-loop with .append() because CPython optimizes the bytecode for this pattern.
# The Old Way (explicit loop with append)
squares = []
for x in range(10):
    squares.append(x**2)

# The Pythonic Way -- list comprehension
# Pattern: [expression for item in iterable]
squares = [x**2 for x in range(10)]

# With a condition (filter items)
evens = [x for x in range(10) if x % 2 == 0]

# Nested comprehension (flatten a matrix)
matrix = [[1, 2], [3, 4], [5, 6]]
flat = [num for row in matrix for num in row]  # [1, 2, 3, 4, 5, 6]
# Read it as: "for each row in matrix, for each num in row, give me num"

# With if-else (note: goes BEFORE the for, not after)
labels = ["even" if x % 2 == 0 else "odd" for x in range(5)]
# ["even", "odd", "even", "odd", "even"]
Comprehension Pitfall: Do not nest more than two levels deep. A triple-nested comprehension is technically valid but nearly unreadable. If you need three loops, use a regular for loop — readability counts.
# Bad -- unreadable triple nesting
result = [f(x, y, z) for x in xs for y in ys for z in zs if condition(x, y, z)]

# Good -- explicit loops with clear names
result = []
for x in xs:
    for y in ys:
        for z in zs:
            if condition(x, y, z):
                result.append(f(x, y, z))

2. Dictionaries (Hash Maps)

A dict stores Key-Value pairs. Keys must be hashable (immutable types: strings, numbers, tuples of immutables) and unique. Lookups are O(1) average case because dictionaries are implemented as hash tables. Since Python 3.7, dictionaries preserve insertion order as a language guarantee (it was an implementation detail in 3.6). This means you can rely on iteration order matching the order you inserted keys.
user = {
    "name": "Alice",
    "age": 30,
    "is_admin": True
}

# Access -- O(1) average case
print(user["name"])
# print(user["email"])  # KeyError! Key does not exist.

# Safe Access -- the Pythonic way to handle missing keys
print(user.get("email"))          # Returns None (no crash)
print(user.get("email", "N/A"))   # Returns "N/A" (custom default)

# Modify
user["email"] = "alice@example.com"  # Add or overwrite a key
del user["is_admin"]                  # Remove a key (KeyError if missing)
user.pop("is_admin", None)            # Safer remove -- returns None if missing

# Iterate -- .items() gives (key, value) pairs
for key, value in user.items():
    print(f"{key}: {value}")

Merging Dictionaries

defaults = {"color": "blue", "size": "medium"}
overrides = {"size": "large", "weight": "heavy"}

# Python 3.9+ -- the merge operator (cleanest)
merged = defaults | overrides  # {"color": "blue", "size": "large", "weight": "heavy"}

# Python 3.5+ -- unpacking (works everywhere)
merged = {**defaults, **overrides}

# In-place update (mutates defaults)
defaults.update(overrides)

The setdefault and Walrus Operator Patterns

# setdefault: get a value, or insert a default if missing (one atomic operation)
# Useful for building up groups without checking "is key present?"
groups = {}
for name, dept in [("Alice", "Eng"), ("Bob", "Eng"), ("Carol", "Sales")]:
    groups.setdefault(dept, []).append(name)
# {"Eng": ["Alice", "Bob"], "Sales": ["Carol"]}

# Walrus operator (:=) with dict -- Python 3.8+
# Assign and test in one expression
if (email := user.get("email")) is not None:
    send_notification(email)

Dict Comprehensions

Just like lists, you can build dictionaries dynamically.
# Map number to its square
squares = {x: x**2 for x in range(5)}
# Result: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

3. Sets (Unique Collection)

A set is an unordered collection of unique elements. It is implemented as a hash table without values — essentially a dictionary where you only care about the keys. Use Cases: Removing duplicates, fast membership testing (O(1) vs O(n) for lists), and mathematical set operations.
numbers = {1, 2, 2, 3, 3, 3}
print(numbers)  # {1, 2, 3} (Duplicates removed automatically)

a = {1, 2, 3}
b = {3, 4, 5}

print(a | b)  # Union: {1, 2, 3, 4, 5} (Items in either)
print(a & b)  # Intersection: {3} (Items in both)
print(a - b)  # Difference: {1, 2} (Items in a but not b)
print(a ^ b)  # Symmetric difference: {1, 2, 4, 5} (Items in one but not both)

When to Reach for a Set

# Membership testing: set is O(1), list is O(n)
# This matters when checking against large collections in a loop
valid_ids = {1001, 1002, 1003, 1004, 1005}  # Use a set, not a list
if user_id in valid_ids:                      # O(1) lookup
    grant_access()

# Deduplication while preserving order (Python 3.7+)
items = [3, 1, 4, 1, 5, 9, 2, 6, 5]
unique_ordered = list(dict.fromkeys(items))  # [3, 1, 4, 5, 9, 2, 6]
# dict.fromkeys preserves insertion order and drops duplicates (keys are unique)
Set Pitfall: You cannot create an empty set with {} — that creates an empty dictionary. Use set() instead.
empty_dict = {}      # This is a dict, not a set!
empty_set = set()    # This is the correct way to create an empty set
type({})             # <class 'dict'>
type(set())          # <class 'set'>

4. Tuples (Immutable Lists)

A tuple is like a list, but it cannot be changed after creation (immutable). Think of a tuple as a sealed record — once created, the structure is frozen. This makes tuples ideal for representing fixed collections of related values, like coordinates, database rows, or function return values. Why use tuples?
  1. Safety: Guarantees the container has not been modified. Useful for function returns and configuration.
  2. Performance: Tuples are smaller in memory and slightly faster to create than lists because Python can optimize immutable objects.
  3. Hashable: Can be used as dictionary keys or set elements (lists cannot, because mutable objects cannot be hashed).
point = (10, 20)
# point[0] = 30  # TypeError! Tuples are immutable.

# Unpacking -- assign tuple elements to individual variables
x, y = point
print(x, y)  # 10 20

# Unpacking with * (star) to capture the rest
first, *rest = (1, 2, 3, 4, 5)
print(first)  # 1
print(rest)   # [2, 3, 4, 5] -- note: rest is a list, not a tuple

# Named Tuples -- give your tuples readable field names
from collections import namedtuple

Point = namedtuple("Point", ["x", "y"])
p = Point(10, 20)
print(p.x, p.y)  # 10, 20 -- much clearer than p[0], p[1]

# Modern alternative: typing.NamedTuple (Python 3.6+)
from typing import NamedTuple

class Point(NamedTuple):
    x: float
    y: float
    z: float = 0.0  # Default values supported

p = Point(1.0, 2.0)
print(p)  # Point(x=1.0, y=2.0, z=0.0)
Tuple Gotcha: A tuple containing mutable objects is still “immutable” in the sense that you cannot reassign its slots — but you can mutate the objects inside. The container is frozen; the contents may not be.
t = ([1, 2], [3, 4])
# t[0] = [5, 6]     # TypeError -- cannot reassign the slot
t[0].append(99)      # This works! The list inside is still mutable.
print(t)             # ([1, 2, 99], [3, 4])

5. The collections Module

The standard library collections module provides specialized data structures that go beyond the basics.

Counter

A dictionary subclass for counting hashable objects.
from collections import Counter

colors = ["red", "blue", "red", "green", "blue", "blue"]
c = Counter(colors)

print(c) # Counter({'blue': 3, 'red': 2, 'green': 1})
print(c.most_common(1)) # [('blue', 3)]

defaultdict

A dictionary that provides a default value for missing keys. This avoids KeyError and the pattern of checking if key in dict before every access. The argument is a factory function (a callable that returns the default value) — not a value itself.
from collections import defaultdict

# Default value is list() -> [] (called each time a missing key is accessed)
groups = defaultdict(list)

groups["A"].append("Alice")  # Key "A" did not exist -- created [], then appended
groups["B"].append("Bob")

# Common patterns:
word_count = defaultdict(int)     # Missing keys default to 0
nested = defaultdict(dict)         # Missing keys default to {}

# Counting pattern -- cleaner than manual if/else
for word in ["apple", "banana", "apple", "cherry", "banana", "apple"]:
    word_count[word] += 1
# defaultdict(<class 'int'>, {'apple': 3, 'banana': 2, 'cherry': 1})
defaultdict Pitfall: Accessing a missing key creates that key with the default value as a side effect. This can cause subtle bugs when you are just trying to check if a key exists.
d = defaultdict(list)
if d["missing"]:          # This check CREATES the key "missing" with value []
    print("found")
print(dict(d))            # {"missing": []} -- the key now exists!

# Use "in" to check without side effects:
if "missing" in d:
    print("found")

Choosing the Right Data Structure

NeedUseWhy
Ordered collection, frequent appendslistO(1) append, O(1) index access
Fast lookup by keydictO(1) average lookup by key
Unique elements, membership testingsetO(1) membership test, auto-dedup
Fixed record, dictionary keytupleImmutable, hashable, memory-efficient
Queue (FIFO)collections.dequeO(1) append and popleft
Count occurrencescollections.CounterBuilt-in counting with most_common()
Group items by keycollections.defaultdictAuto-creates missing keys

Summary

  • Lists: Ordered, mutable. Use for sequences. Watch out for O(n) inserts at the front.
  • Dicts: Key-Value with O(1) lookup. Insertion-ordered since Python 3.7. Use .get() for safe access.
  • Sets: Unique elements with O(1) membership testing. Cannot use {} for an empty set.
  • Tuples: Immutable. Use for fixed records and dictionary keys. Consider NamedTuple for readability.
  • Comprehensions: Faster than manual loops. Keep them to one or two levels of nesting.
  • Shallow vs Deep: .copy() and slicing create shallow copies. Use copy.deepcopy() for nested structures.
Next, we’ll organize our code using Object-Oriented Programming.

Interview Deep-Dive

Strong Answer:
  • Python dictionaries are implemented as hash tables. When you insert a key, Python computes the key’s hash (via __hash__), uses that hash to determine a slot index in an internal array, and stores both the key and value there. Lookups work the same way: hash the key, jump to the slot, compare for equality (via __eq__). This gives O(1) average-case lookups, insertions, and deletions.
  • Hash collisions are inevitable — two different keys can produce the same slot index. CPython uses open addressing with probing (not chaining with linked lists like Java’s HashMap). When a collision occurs, it probes to the next available slot using a perturbation scheme that mixes in higher bits of the hash to distribute entries more evenly.
  • Before Python 3.6, dictionaries did not preserve insertion order. The internal layout was a single sparse array where most slots were empty (hash tables need ~33-66% empty space to maintain O(1) performance). This wasted memory because each empty slot still reserved space for a hash, key pointer, and value pointer.
  • In Python 3.6, the implementation was redesigned into a compact dict. It uses two arrays: a dense array that stores entries in insertion order (hash, key, value), and a sparse indices array that maps hash slots to positions in the dense array. This reduced memory usage by 20-25% and made insertion order preservation a side effect of the implementation. In Python 3.7, this ordering guarantee was elevated from an implementation detail to a language specification.
  • The practical implication is that modern Python dicts are simultaneously a hash map and an ordered sequence, which eliminated most use cases for collections.OrderedDict. However, OrderedDict still has a niche: it supports move_to_end(), its equality comparison considers order (two OrderedDicts with the same keys in different order are not equal, whereas regular dicts with the same key-value pairs are always equal regardless of insertion order), and it is a subclass you can use for type-based dispatch.
Follow-up: What makes a good dictionary key, and what happens if you use a mutable object as a key?
  • A dictionary key must be hashable, which means it must implement __hash__ and __eq__, and the hash must remain constant for the object’s lifetime. All immutable built-in types (str, int, float, tuple of immutables, frozenset) are hashable. Mutable types (list, dict, set) are not hashable because their contents can change, which would change their hash after insertion, making the key unfindable in the table.
  • If you define a custom class, instances are hashable by default — they inherit __hash__ from object, which returns a value derived from id() (memory address). But if you define __eq__, Python automatically sets __hash__ = None (making instances unhashable) because the contract requires that objects which compare equal must have the same hash. You must explicitly define a consistent __hash__ if you define __eq__ and want your objects to be usable as dict keys.
  • A subtle production gotcha: using floats as dict keys. hash(1) == hash(1.0) and 1 == 1.0, so d = {1: "int"}; d[1.0] = "float" overwrites the entry. This can cause extremely confusing bugs when mixing int and float keys.
Strong Answer:
  • The choice is driven by two axes: mutability (do you need to modify it?) and semantics (what does the data represent?).
  • Lists are for ordered, mutable sequences where you might append, remove, or reorder elements. They are backed by a dynamic array (contiguous memory with amortized O(1) append, O(n) insert/delete at arbitrary positions). Use them when the data is a collection of similar things: a list of user IDs, a queue of tasks, search results.
  • Tuples are for ordered, immutable records. They use less memory than lists (no over-allocation for potential growth) and are hashable (so they can be dict keys or set members). Use them when the data is a fixed structure: a coordinate (x, y), a database row, or a function returning multiple values. The immutability is a semantic signal: “this will not change.”
  • Sets are for unordered collections of unique elements backed by a hash table. They provide O(1) membership testing (x in s), which is dramatically faster than x in list (O(n)). Use them for deduplication, membership checks, and mathematical set operations (union, intersection, difference).
  • Performance comparison for membership testing: checking x in my_list scans every element — 1 million elements means up to 1 million comparisons. x in my_set does a hash lookup — constant time regardless of size. I have seen production code with response times drop from 800ms to 2ms by converting a list to a set for a membership check inside a loop. This is one of the most common performance wins in Python code reviews.
  • A nuance: tuples are not just “immutable lists.” They have different optimization paths in CPython. Small tuples (length 1-20) are cached in a free list, so creating them is faster than creating lists. Tuple packing/unpacking is also used internally by Python for multiple assignment, function returns, and the iteration protocol, making it a first-class citizen in the bytecode.
Follow-up: What is the time complexity of list.sort() versus sorted(), and when would you pick one over the other?
  • Both use Timsort, a hybrid merge-sort/insertion-sort algorithm with O(n log n) worst case and O(n) best case (when the data is already partially sorted). Timsort is specifically designed to exploit existing order in real-world data, which is why it was adopted by Java and Android as well.
  • list.sort() sorts in place and returns None. It is more memory-efficient because it does not allocate a new list. sorted() returns a new sorted list, leaving the original unchanged. Use sort() when you own the list and do not need the original order. Use sorted() when you need to preserve the original or when you are sorting a non-list iterable (generator, set, dict keys).
  • A subtle point: list.sort() is slightly faster because it can use an optimized path that temporarily decouples the list from the interpreter (the list is marked as “being sorted” and raises ValueError if you try to mutate it during sort). sorted() must first materialize the iterable into a list and then sort that.
Strong Answer:
  • A list comprehension ([x**2 for x in range(n)]) eagerly evaluates every element and stores them all in memory as a list. A generator expression ((x**2 for x in range(n))) produces elements one at a time, on demand. The generator holds only the current state (roughly the loop variable and the expression to evaluate), not the entire result set.
  • For range(100), the difference is negligible — 100 integers is roughly 800 bytes. But for range(100_000_000), the list comprehension allocates about 800MB of memory, while the generator expression uses a constant ~100 bytes regardless of how many elements it will produce. This is the difference between your process running normally and getting OOM-killed in production.
  • The memory difference matters when three conditions are met: (1) the input is large, (2) you only need to iterate once, and (3) you do not need random access or the len(). Processing lines from a large log file, streaming rows from a database cursor, or piping data through a transformation chain are all generator territory.
  • However, generators have overhead per element (they must suspend and resume the frame on each yield/next() call), so for small collections, list comprehensions are actually faster. If you are iterating over the result multiple times, a list is also better — generators are exhausted after one pass.
  • A common pattern in production data pipelines is chaining generators: filtered = (line for line in lines if "ERROR" in line); parsed = (json.loads(line) for line in filtered); timestamps = (entry["ts"] for entry in parsed). Each step adds near-zero memory overhead, and no intermediate lists are created. The entire pipeline processes one line at a time, which means you can process a 50GB log file in constant memory.
Follow-up: What is the yield from syntax, and how does it differ from iterating and yielding manually?
  • yield from iterable delegates iteration to a sub-generator or any iterable. Instead of writing for item in iterable: yield item, you write yield from iterable. It is not just syntactic sugar — it properly propagates .send(), .throw(), and .close() calls to the sub-generator, which is essential for coroutine-based code.
  • Without yield from, if you have a recursive generator (like flattening a nested structure), you lose the ability to send values into or throw exceptions into the inner generator. yield from creates a transparent bidirectional channel between the caller and the sub-generator.
  • Practically, yield from is also slightly faster than a manual loop because it avoids the overhead of repeatedly calling __next__ on the sub-iterator from Python code — the delegation happens at the C level in CPython. For deeply nested generators (tree traversal, recursive flattening), this performance difference compounds.
Strong Answer:
  • The first question is: what does “slow” mean and where is the bottleneck? If the current code is [u for u in users if u["city"] == target_city], that is O(n) — a single pass over 10 million records. That is about 1-2 seconds in pure Python, which may or may not be acceptable depending on the context.
  • If this query runs frequently with different cities, the answer is to build an index. A defaultdict(list) keyed by city, built once: by_city = defaultdict(list); for u in users: by_city[u["city"]].append(u). Now lookups are O(1) for the hash lookup plus O(k) to retrieve k matching records. This is essentially what a database index does.
  • If the data does not fit in memory, this is no longer a data-structure problem — it is an architecture problem. Move the data to a database (PostgreSQL, SQLite for local use) and let it handle indexing, query optimization, and disk-based storage. Python’s sqlite3 module is built in and handles millions of rows efficiently.
  • If the data is in memory and you need maximum speed, consider Pandas or Polars. df[df["city"] == target_city] in Pandas uses vectorized C operations and is 10-50x faster than a Python loop. Polars is even faster because it uses Rust and supports lazy evaluation with query optimization.
  • The meta-lesson: the answer to “how do I make this Python code faster” is almost never “rewrite the loop to be cleverer.” It is usually “use the right data structure” (dict/set for lookups), “use a vectorized library” (NumPy/Pandas/Polars), or “use a database.” Optimizing the algorithm and data access pattern gives orders-of-magnitude improvements. Micro-optimizing Python syntax gives single-digit percentage improvements.
Follow-up: What is the difference between collections.defaultdict and using dict.setdefault(), and when would you choose one over the other?
  • defaultdict(list) creates the default value automatically on any missing key access, including reads via d[key]. dict.setdefault(key, []) only creates the default when you explicitly call setdefault. The behavioral difference is that defaultdict modifies itself on read — accessing d[missing_key] adds that key to the dict, which can be surprising.
  • setdefault is better for one-off cases where you want to populate a default and immediately use it: d.setdefault(key, []).append(value). defaultdict is better when you are building up a data structure in a loop and want clean syntax throughout.
  • A gotcha with defaultdict: if you pass it to code that checks key in d after accidentally accessing d[key], the key will now exist (with the default value). This can mask bugs. Converting a defaultdict to a regular dict before passing it to external code is a defensive practice.