> ## Documentation Index > Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt > Use this file to discover all available pages before exploring further. # Modules & Packages > Imports, Standard Library, and Virtual Environments # Modules & Packages Real-world Python projects aren't single files. They are split across multiple files (modules) and directories (packages). Understanding how to organize code and manage dependencies is crucial. Think of modules as individual LEGO bricks -- each one does something specific. Packages are the organized trays that hold related bricks together. And `pip` is the store where you buy bricks other people made. Without this organization system, every Python project would be a single 10,000-line file, and collaboration would be impossible. *** ## 1. Modules A **Module** is simply a Python file (`.py`). Any Python file can be imported by another. ```python theme={null} # math_utils.py def add(a, b): return a + b PI = 3.14159 ``` ### Importing You can import the whole module or specific parts. ```python theme={null} # main.py # Option 1: Import entire module import math_utils print(math_utils.add(1, 2)) # Option 2: Import specific items (Cleaner) from math_utils import add, PI print(add(1, 2)) # Option 3: Alias (Common for libraries like pandas as pd) import math_utils as mu print(mu.PI) ``` ### `__name__ == "__main__"` This is a common idiom. It checks if the file is being run directly (like `python main.py`) or imported as a module. ```python theme={null} # utils.py def helper(): return "I am helpful" if __name__ == "__main__": # This block runs ONLY when you execute: python utils.py # It does NOT run when another file does: import utils print(helper()) print("Running tests or demos here...") ``` This pattern is useful for putting test code, demos, or CLI entry points at the bottom of a module without affecting importers. You will see it in virtually every well-structured Python project. **Import Pitfall: Circular Imports** -- If module A imports module B, and module B imports module A, you get a circular import error (or worse, silently incomplete modules). This is one of the most common structural bugs in growing Python projects. ```python theme={null} # BROKEN: circular dependency # models.py from services import process_user # imports services, which imports models... # services.py from models import User # imports models, which imports services... # FIX 1: Import inside the function (deferred import) # services.py def handle_request(): from models import User # Import happens at call time, not module load time return User() # FIX 2: Restructure to break the cycle (better long-term solution) # Extract shared code into a third module that both can import. ``` *** ## 2. Packages A **Package** is a directory containing Python modules. It requires a `__init__.py` file (Python 3.3+ made this optional via "namespace packages," but including it is still strongly recommended -- it is the explicit signal that "this directory is a Python package"). ```text theme={null} my_package/ __init__.py # Marks this folder as a package (can be empty) module1.py module2.py sub_package/ __init__.py helpers.py ``` ```python theme={null} # Absolute imports (preferred -- always unambiguous) from my_package import module1 from my_package.module2 import some_function from my_package.sub_package.helpers import utility # Relative imports (use within a package to refer to siblings) # In my_package/module2.py: from .module1 import something # . means "current package" from .sub_package import helpers # Import from a sub-package from ..other_package import thing # .. means "parent package" (use sparingly) ``` **Best Practice**: Prefer absolute imports over relative imports. Absolute imports are always unambiguous about where a module lives. Relative imports can be confusing and break if you reorganize your package structure. The main exception is within tightly coupled sub-packages where relative imports clarify "this is an internal dependency." ### What goes in `__init__.py`? The `__init__.py` file runs when the package is imported. Use it to define the package's public API: ```python theme={null} # my_package/__init__.py from .module1 import PublicClass from .module2 import public_function # Now users can do: from my_package import PublicClass # Instead of: from my_package.module1 import PublicClass # Control what "from my_package import *" exports: __all__ = ["PublicClass", "public_function"] ``` *** ## 3. The Standard Library Python is famous for being "Batteries Included". It has a massive standard library built-in. ### `pathlib` (Modern File Paths) Stop using `os.path.join`. Use `pathlib`. It treats file paths as objects rather than strings, which makes path manipulation safer and more readable. It works on Windows, Mac, and Linux seamlessly -- no more worrying about `/` vs `\`. ```python theme={null} from pathlib import Path # Create paths using the / operator (reads naturally) p = Path("data") / "subdir" / "file.txt" # Read and write files directly (no open() needed for simple cases) if p.exists(): content = p.read_text(encoding="utf-8") # Always specify encoding! p.write_text("new content", encoding="utf-8") # Find files with glob patterns for py_file in Path(".").glob("**/*.py"): # ** means recursive print(py_file) # Useful properties print(p.name) # "file.txt" print(p.stem) # "file" (name without extension) print(p.suffix) # ".txt" print(p.parent) # Path("data/subdir") print(p.resolve()) # Absolute path ``` ### `json` (Data Serialization) JSON is the language of the web. Python handles it natively. ```python theme={null} import json data = {"name": "Alice", "age": 30} # Serialize to String json_str = json.dumps(data) # Save to File with open("data.json", "w") as f: json.dump(data, f) ``` ### `datetime` (Dates & Times) ```python theme={null} from datetime import datetime, timedelta now = datetime.now() tomorrow = now + timedelta(days=1) print(now.strftime("%Y-%m-%d")) ``` *** ## 4. Virtual Environments (`venv`) **The Golden Rule of Python**: Never install packages globally. Always use a virtual environment. A virtual environment creates an isolated folder for your project's dependencies. Think of it as giving each project its own private copy of Python and its libraries. This prevents "Dependency Hell" where Project A needs `requests==2.28` and Project B needs `requests==2.31`, and installing one breaks the other. ### Setup ```bash theme={null} # 1. Create the environment (run once per project) # This creates a folder named '.venv' (dot prefix keeps it hidden on Unix) python -m venv .venv # 2. Activate it (run each time you open a new terminal) # Windows: .venv\Scripts\activate # Mac/Linux: source .venv/bin/activate ``` Once activated, your terminal prompt will change (e.g., `(.venv) C:\Project>`). Now, when you run `pip install`, packages go into this folder, not your system Python. **Virtual Environment Pitfalls:** * Always add `.venv/` (or `venv/`) to your `.gitignore`. Virtual environments are machine-specific and should never be committed to version control. * If you rename or move your project folder, the virtual environment may break because it contains hardcoded absolute paths. The fix is to delete it and recreate: `python -m venv .venv`. * On some systems, `python` points to Python 2. Use `python3 -m venv .venv` explicitly to ensure you get Python 3. ### Modern Alternative: `uv` The Python packaging ecosystem is evolving. `uv` (from Astral, the makers of `ruff`) is a fast, Rust-based replacement for `pip`, `venv`, and `pip-tools` combined. It creates virtual environments, resolves dependencies, and installs packages 10-100x faster than pip. ```bash theme={null} # Install uv pip install uv # Create a venv and install dependencies in one step uv venv uv pip install requests fastapi # uv also manages lockfiles for reproducible builds uv pip compile requirements.in -o requirements.txt ``` *** ## 5. Package Management (`pip`) `pip` is the package installer for Python. It fetches packages from PyPI (Python Package Index). ```bash theme={null} # Install a package pip install requests # List installed packages pip list # Save your dependencies to a file pip freeze > requirements.txt # Install dependencies from a file (Crucial for collaboration) pip install -r requirements.txt ``` ### Example: Using `requests` `requests` is the most popular Python library. It makes HTTP requests simple. ```python theme={null} import requests response = requests.get("https://api.github.com") print(response.status_code) print(response.json()) ``` *** ## Summary * **Modules**: `.py` files. * **Packages**: Folders with `__init__.py`. * **Standard Library**: Learn `pathlib`, `json`, `datetime`. * **venv**: Always isolate your projects. * **pip**: The tool to install external libraries. Next, we'll tackle **Advanced Python** concepts like decorators and async programming. *** ## Interview Deep-Dive **Strong Answer:** * When you write `import foo`, Python executes a multi-step process. First, it checks `sys.modules` -- a cache of all previously imported modules. If `foo` is already there, it returns the cached module object immediately. No file is read, no code is executed. This is why importing the same module in 50 files does not run the module code 50 times. * If the module is not cached, Python searches for it using `sys.path` -- an ordered list of directories. `sys.path` includes: the directory of the script being run, directories set in the `PYTHONPATH` environment variable, the standard library paths, and the `site-packages` directory (where pip installs packages). Python searches these in order, and the first match wins. * Once found, Python compiles the module to bytecode (if a cached `.pyc` is not up to date), executes the module's top-level code (all statements at the module level run during import), and stores the resulting module object in `sys.modules`. This is why putting side effects (print statements, database connections, API calls) at the module level is dangerous -- they execute on import, which might be at test collection time, CI startup, or other unexpected moments. * Absolute imports (`from package.sub import module`) use the full path from the project root. Relative imports (`from .sub import module` or `from ..sibling import func`) use dots to navigate relative to the current package. Relative imports only work inside packages (not in top-level scripts). One dot means "current package," two dots mean "parent package." * A common production issue: circular imports. Module A imports module B, and module B imports module A. This does not always fail -- Python handles it by returning a partially-initialized module from `sys.modules`. But if B tries to access a name from A that has not been defined yet (because A's top-level code has not finished executing), you get an `ImportError` or `AttributeError`. The fix is to restructure the code (extract shared code to a third module), use lazy imports (import inside the function that needs it), or use `TYPE_CHECKING` blocks for type-hint-only imports. **Follow-up: What is `if __name__ == "__main__"` actually doing, and what is the `__name__` variable?** * Every Python module has a `__name__` attribute. When a file is run directly (`python script.py`), `__name__` is set to the string `"__main__"`. When the same file is imported as a module (`import script`), `__name__` is set to the module's qualified name (e.g., `"script"` or `"package.script"`). * The `if __name__ == "__main__"` guard prevents code from running when the module is imported. Without it, any top-level code (test runs, demo output, server startup) would execute on import, which breaks test discovery, IDE introspection, and module reuse. * This pattern also makes your module both a library and a script. The functions and classes are importable by other code, and the `__main__` block provides a command-line entry point. This is a foundational Python pattern that every production module should use for any executable behavior. **Strong Answer:** * `pip` is the standard, built-in package installer. It installs packages from PyPI and supports `requirements.txt` for reproducibility. Its weakness is that it does not do dependency resolution well -- if package A needs `requests>=2.0` and package B needs `requests<2.25`, pip might install a conflicting version. It also does not distinguish between direct dependencies and transitive dependencies, making `requirements.txt` a flat dump that is hard to audit. * `poetry` provides a full project management experience: dependency resolution, lockfiles (`poetry.lock`), virtual environment management, and package building/publishing. It uses `pyproject.toml` for configuration (PEP 621 compliant). Its dependency resolver is deterministic -- it computes a locked set of versions that satisfy all constraints and records them. The downside is that Poetry is opinionated (it manages your virtualenv for you, which can conflict with Docker workflows) and its resolution can be slow for complex dependency trees. * `pipenv` was an earlier attempt at combining pip and virtualenv with a `Pipfile`/`Pipfile.lock` workflow. It fell out of favor due to slow resolution, inconsistent maintenance, and confusing behavior around lock file generation. Most teams have migrated to Poetry or uv. * `uv` is the newest contender, built in Rust by the creators of `ruff`. It is a drop-in replacement for pip and pip-tools that is 10-100x faster. It handles dependency resolution, virtual environment creation (`uv venv`), and lockfile generation (`uv lock`). It supports `pyproject.toml` and is rapidly becoming the default recommendation for new projects due to its speed and compatibility. * For a new production project today, I would use `uv` for speed and simplicity, with `pyproject.toml` for project metadata. For teams already on Poetry with established workflows, there is no urgency to migrate. The key principle is: always have a lockfile (deterministic builds), always separate direct dependencies from transitive ones, and always use a virtual environment. **Follow-up: What is the difference between `requirements.txt` and a lockfile, and why does it matter for production deployments?** * `requirements.txt` with pinned versions (`requests==2.31.0`) specifies direct dependencies but not the exact versions of their transitive dependencies. If `requests` depends on `urllib3`, and you do not pin `urllib3`, you might get different versions on different machines or at different times, leading to "works on my machine" problems. * A lockfile (Poetry's `poetry.lock`, uv's `uv.lock`) records the exact version of every package in the dependency tree -- direct and transitive -- along with content hashes. This guarantees that `uv sync` on your laptop, in CI, and on the production server installs byte-for-byte identical packages. Without a lockfile, you are one `pip install` away from a surprise breaking change in a transitive dependency. * The practical workflow: `pyproject.toml` declares what you need (loose constraints), the lockfile records what you got (exact versions). Developers update constraints in `pyproject.toml`, regenerate the lockfile, test, and commit both. Production deployments install from the lockfile only. **Strong Answer:** * Historically, `__init__.py` was required to mark a directory as a Python package. Without it, Python would not recognize the directory as importable. The file is executed when the package is first imported, and the resulting module object becomes the package itself. So `import mypackage` runs `mypackage/__init__.py` and gives you access to whatever names are defined there. * `__init__.py` serves several practical purposes: it controls the public API of the package (by importing select names from submodules), it can run package initialization code (setting up logging, loading configuration), and it defines `__all__` (which controls what `from package import *` exports). A common pattern is to import the most-used classes into `__init__.py` so users can write `from mypackage import MyClass` instead of `from mypackage.submodule import MyClass`. * Python 3.3 introduced namespace packages (PEP 420), which allow packages without `__init__.py`. The motivation was to allow a single logical package to be split across multiple directories or distributions. For example, the `google` namespace package lets `google-cloud-storage` and `google-auth` both install into the `google/` directory without conflicting `__init__.py` files. * In practice, most application code should still use `__init__.py`. Namespace packages are primarily for large library ecosystems (Google Cloud, Azure SDK) where multiple independent teams publish sub-packages under a shared namespace. Omitting `__init__.py` in application code causes confusion and breaks some tooling (test discovery, IDE imports, some linters). * A production best practice: keep `__init__.py` files minimal. Heavy initialization code (database connections, config parsing) should not live there because it runs on import. If importing a package triggers a database connection, you cannot import it in tests, scripts, or type-checking contexts without that side effect. Lazy initialization patterns or explicit `init()` functions are better. **Follow-up: What is the `__all__` variable and how does it affect imports?** * `__all__` is a list of strings that defines the public API of a module. It controls two things: what `from module import *` exports, and what tools like `mypy` and IDEs consider "public." * Without `__all__`, `from module import *` imports every name that does not start with an underscore. With `__all__ = ["ClassA", "function_b"]`, only those specific names are imported. This is important for packages with many internal helper functions that should not leak into the user's namespace. * `__all__` does not prevent direct access -- `from module import _private_func` still works. It is a convention, not an access control mechanism. But it is respected by documentation generators (Sphinx), linters, and IDE autocompletion, making it a valuable tool for API design. **Strong Answer:** * Step one: understand the current state. Run `pip list` on the production machine (or whoever has the "working" environment) to get the exact set of installed packages. Save this with `pip freeze > requirements-snapshot.txt`. This is your baseline -- it captures the exact versions that are known to work, including transitive dependencies. * Step two: create a virtual environment on a development machine. `python -m venv venv`, activate it, and install from the snapshot: `pip install -r requirements-snapshot.txt`. Run the full test suite and verify the application works identically. If there are no tests, run the application manually and verify key flows. * Step three: separate direct dependencies from transitive ones. Go through the snapshot and identify which packages the code actually imports (grep the codebase for `import` statements). Create a `pyproject.toml` or `requirements.in` with only the direct dependencies and their version constraints. Use `pip-compile` (from pip-tools) or `uv pip compile` to regenerate a lockfile from the direct dependencies. Verify the resolved versions match the working snapshot. * Step four: set up the virtual environment in CI/CD. The build pipeline should create a fresh venv and install from the lockfile. If the deployment is containerized (Docker), the Dockerfile should create a venv inside the container -- even in Docker, a venv keeps system Python clean and makes it clear what is application code versus OS dependencies. * Step five: document the process and add it to the contribution guide. The most common reason projects end up in this state is that the setup process was never documented, so each developer improvised. A `Makefile` or `justfile` with targets like `make setup`, `make test`, `make lock` prevents regression. * The key principle: do not change anything in production until the new setup is proven equivalent. Freeze the current state first, reproduce it in isolation, and only then start improving. **Follow-up: What is the difference between `pip freeze` output and a proper lockfile?** * `pip freeze` outputs every installed package at its current version, but it does not record which packages are direct dependencies versus transitive, it does not include content hashes (so you cannot verify package integrity), and it does not record the Python version or platform constraints. Two different runs of `pip install -r requirements.txt` can resolve transitive dependencies differently if the constraint ranges overlap. * A proper lockfile (from pip-tools, Poetry, or uv) records the full dependency graph with hashes, distinguishes direct from transitive dependencies, and is deterministic -- installing from it produces the exact same environment every time. The lockfile is what you deploy; the direct dependency list is what you edit.