Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Databases for AI Engineers
If the LLM is the brain of your AI application, the database is the filing cabinet that makes it useful. Without a database, every conversation starts from scratch, every document is re-processed on each request, and you have no way to tell User A’s data from User B’s. AI applications need databases for:- User data: Authentication, preferences, usage tracking
- Vector storage: Embeddings with pgvector — this is the one that separates AI apps from traditional ones
- Conversations: Chat history, messages (because users expect “remember what I said earlier”)
- Documents: Uploaded files, chunks, metadata
Setup
SQLAlchemy Basics
Connection Setup
Async Connection (Recommended)
Async is recommended because AI applications spend most of their time waiting — waiting for the LLM API, waiting for embedding generation, waiting for vector search. Async connections let your server handle other requests during those waits instead of blocking a thread. Think of it as the difference between a restaurant with one waiter who stands at your table until your food arrives vs. one who takes your order and serves other tables while the kitchen works.Defining Models
CRUD Operations
CRUD (Create, Read, Update, Delete) is the bread and butter of database work. If you have used any web framework before, this will feel familiar. The key difference in AI applications is that you will often need to create records in bulk (hundreds of embedding chunks at once) and read with complex filters (vector similarity search with user scoping). The patterns below cover both simple and batch operations.Create
Read
Update
Delete
Repository Pattern
The repository pattern puts all database logic for a model in one place, keeping your API routes clean. Without it, you end up with SQL scattered across your route handlers, making it impossible to test database logic independently or swap storage backends. Think of it as the “one class per table” rule for database access. Organize database operations cleanly:pgvector for Embeddings
This is the section that makes PostgreSQL an AI database. pgvector adds avector column type and similarity search operators directly to Postgres, so you can store embeddings right next to your relational data. The practical benefit: a single query can find “the 5 most similar document chunks belonging to user X that were uploaded after January 1st” — try doing that with a standalone vector database that knows nothing about your user model.
Essential for AI applications with semantic search:
Vector Similarity Search
Complete RAG Repository
Migrations with Alembic
Transactions
Transactions ensure that a group of database operations either all succeed or all fail — there is no in-between state. This is critical in AI applications when you store a document and its chunks: if the chunk insertion fails halfway through, you don’t want a document record pointing to half its chunks. Thedb.begin() context manager handles this automatically: commit on success, rollback on any exception.
Connection Pooling
Database connections are expensive to create (TCP handshake, authentication, SSL negotiation). A connection pool keeps a set of connections open and reuses them, like a carpool lane for database traffic. Without pooling, every API request opens and closes a connection — under load, this causes connection exhaustion and your database starts rejecting new connections entirely.FastAPI Integration
Complete example:pgvector vs. Dedicated Vector Databases
One of the most common architecture questions in AI engineering: should you use pgvector or a dedicated vector database? Here is an honest comparison.| Factor | pgvector (PostgreSQL) | Dedicated Vector DB (Pinecone, Qdrant, Weaviate) |
|---|---|---|
| Setup complexity | Add an extension to your existing Postgres | New service to deploy, monitor, and pay for |
| Query speed (under 1M vectors) | Good (10-50ms with HNSW index) | Excellent (1-10ms) |
| Query speed (over 10M vectors) | Degrades without careful tuning | Designed for this scale |
| Filtering | Full SQL — JOIN, WHERE, GROUP BY on relational data | Limited metadata filtering, varies by vendor |
| Transactions | ACID transactions with your relational data | Eventual consistency in most cases |
| Cost | Free (part of Postgres) | $70-700+/month depending on scale |
| Backup and recovery | One backup strategy for everything | Separate backup system to manage |
| Hybrid queries | Natural — “find similar chunks for user X uploaded after Jan 1” in one query | Requires pre-filtering or post-filtering, often slower for complex filters |
- Under 1M vectors, simple metadata filters, already using Postgres? Use pgvector. The operational simplicity wins.
- Over 5M vectors, need sub-5ms p99 latency, minimal relational data? Use a dedicated vector DB.
- Between 1M-5M vectors? Benchmark both with your actual queries. The answer depends on your filter complexity and latency requirements.
Common Database Pitfalls in AI Applications
Forgetting to index the user_id filter on vector search. Your similarity search query includesWHERE user_id = $1, but if user_id is not indexed, Postgres does a sequential scan before the vector search. At 100K+ rows, this turns a 20ms query into a 2-second query. Always create a composite index or ensure your HNSW index is partitioned by user.
Using datetime.utcnow in SQLAlchemy defaults. The default=datetime.utcnow in the model definitions above calls utcnow() at class definition time, not at row creation time. Use default=datetime.utcnow (without parentheses, passing the function reference) or better, use server_default=sa.func.now() to let PostgreSQL handle timestamps consistently.
Connection pool exhaustion under load. AI applications make slow external calls (LLM APIs, embedding APIs) while holding database connections. If your pool has 20 connections and 20 requests are all waiting on OpenAI for 5 seconds each, the 21st request gets a pool_timeout error. Either increase the pool size, release connections before making external calls, or use a task queue for long-running operations.
Storing embeddings as JSON instead of vector type. Without the vector column type, you cannot use HNSW indexes or the <=> distance operator. Storing embeddings as JSONB arrays forces you to load every row into Python and compute similarity manually — which is orders of magnitude slower.
Quick Reference
| Operation | Code |
|---|---|
| Create | db.add(obj); await db.commit() |
| Read one | select(Model).where(Model.id == id) |
| Read many | select(Model).limit(10) |
| Update | obj.field = value; await db.commit() |
| Delete | await db.delete(obj) |
| Eager load | .options(selectinload(Model.relation)) |
| Transaction | async with db.begin(): |
Interview Deep-Dive
You are building a RAG application and need to store embeddings. Your team debates using pgvector in PostgreSQL versus a dedicated vector database like Pinecone. What are the trade-offs, and when would you choose each?
You are building a RAG application and need to store embeddings. Your team debates using pgvector in PostgreSQL versus a dedicated vector database like Pinecone. What are the trade-offs, and when would you choose each?
- pgvector wins on operational simplicity. You have one database, one connection string, one backup strategy, one thing that pages you at 3 AM. Your vector search query can join directly against relational data: “find the 5 most similar chunks belonging to user X that were uploaded after January 1st” is a single SQL query. With a dedicated vector DB, you search vectors in one system, then join against Postgres for user filtering, which means two network round trips and application-level data stitching.
- Dedicated vector databases (Pinecone, Weaviate, Qdrant) win on scale and specialized performance. At 10M+ vectors, they offer sub-5ms p99 latency with purpose-built indexing algorithms, horizontal sharding, and managed infrastructure. pgvector with HNSW can handle millions of vectors, but tuning m and ef_construction parameters for that scale requires expertise, and you are limited to vertical scaling on a single Postgres instance.
- The decision framework is straightforward: under 1M vectors with existing Postgres and moderate latency requirements, use pgvector. The 20ms query time is fine for most applications, and you avoid the operational overhead of a second database. Over 5M vectors with sub-5ms latency requirements and minimal relational join needs, use a dedicated vector DB. Between 1M-5M, benchmark with your actual data and queries.
- The hidden cost of dedicated vector DBs is data synchronization. When a user deletes a document, you need to delete its chunks from both Postgres (metadata) and Pinecone (vectors). If the Pinecone delete fails, you have orphaned vectors that return results for a deleted document. With pgvector, a single CASCADE DELETE handles both. This operational burden is real and often underestimated.
EXPLAIN ANALYZE on your query. If you see “Seq Scan,” create the HNSW index: CREATE INDEX ON document_chunks USING hnsw (embedding vector_cosine_ops) WITH (m=16, ef_construction=64). Second, check your WHERE filters. If you filter by user_id without a separate B-tree index on that column, Postgres might skip the HNSW index entirely. Create a composite approach: index user_id with B-tree, embeddings with HNSW, and let the query planner combine them. Third, check your ef_search parameter at query time (SET hnsw.ef_search = 40) — higher values improve recall but increase latency. At 500K vectors with a proper HNSW index, queries should be 10-30ms.Explain the N+1 query problem in the context of an AI application that loads conversations with their messages. How do you detect it and fix it with SQLAlchemy?
Explain the N+1 query problem in the context of an AI application that loads conversations with their messages. How do you detect it and fix it with SQLAlchemy?
- The N+1 problem is the most common database performance bug. You load 50 conversations with
select(Conversation).limit(50)— that is 1 query. Then for each conversation, you accessconversation.messages, which triggers a lazy load — that is 50 more queries. Total: 51 queries when you needed 2. At 5ms per query, that is 255ms instead of 10ms. In an AI application serving real-time chat, this latency is unacceptable. - Detection: enable SQLAlchemy’s echo mode (
echo=Trueon the engine) during development and watch the SQL logs. If you see repeated SELECT statements with different IDs, you have an N+1. In production, use SQLAlchemy’sjoinedloadwarnings or a query counter middleware that flags any request making more than 10 database queries. - Fix with eager loading. SQLAlchemy offers three strategies:
selectinload(issues a second query withIN (id1, id2, ...)to fetch all related records at once — this is the safest default),joinedload(uses a JOIN to fetch in a single query — efficient for one-to-one relationships but can explode result sets for one-to-many), andsubqueryload(uses a subquery). For conversations with messages,selectinloadis the right choice:select(Conversation).options(selectinload(Conversation.messages)).limit(50). This executes 2 queries total regardless of how many conversations you load. - In AI applications specifically, the N+1 trap often hides in the document-chunk relationship. Loading a user’s documents, then accessing chunks for each document to run similarity search, can trigger hundreds of queries. Always eager-load the relationships you know you will access.
ROW_NUMBER() OVER (PARTITION BY conversation_id ORDER BY created_at DESC) <= 10. Alternatively, load conversations first, then batch-load recent messages with a separate query filtered by conversation IDs and ordered with a LIMIT. This avoids loading 10,000 objects and instead loads 500 (50 conversations times 10 messages each).Your AI application stores documents, chunks, and embeddings. A user uploads a 100-page PDF that gets split into 500 chunks. The chunk insertion fails at chunk 347 due to a database error. What happens to chunks 1-346, and how do you design this to be robust?
Your AI application stores documents, chunks, and embeddings. A user uploads a 100-page PDF that gets split into 500 chunks. The chunk insertion fails at chunk 347 due to a database error. What happens to chunks 1-346, and how do you design this to be robust?
- Without explicit transaction management, what happens depends on your session configuration. If autocommit is off (the SQLAlchemy default), chunks 1-346 are in the session’s uncommitted buffer. The error on chunk 347 means the entire session needs to be rolled back — chunks 1-346 are discarded. The document record might have been committed earlier in a separate transaction, leaving an orphaned document with no chunks. This is the worst outcome: inconsistent state.
- The correct design wraps the entire operation in a single transaction: create the document record, generate all embeddings, insert all chunks, and commit once. If any step fails, everything rolls back to a clean state. In SQLAlchemy:
async with db.begin():as a context manager that auto-commits on success and auto-rolls-back on exception. - The subtlety is that embedding generation is an external API call that should NOT be inside the transaction. You do not want to hold a database transaction open while waiting 10 seconds for OpenAI to return embeddings. The pattern is: (1) create the document with status=“processing” and commit, (2) generate embeddings outside any transaction, (3) open a new transaction, insert all chunks with embeddings, update document status to “ready”, and commit. If step 2 or 3 fails, the document stays in “processing” status and can be retried.
- For extra robustness, use
db.flush()instead ofdb.commit()to get the document ID without committing, then commit everything at the end. But be aware that flush still executes SQL — it just does not commit the transaction. If your embedding call fails after flush, you still need to rollback.
pool_timeout (default 30 seconds) and eventually throws an error. The fix is to never hold a database connection during external API calls. The pattern is: acquire connection, create document, release connection, call OpenAI for embeddings (no DB connection held), acquire connection again, insert chunks, release. In FastAPI with SQLAlchemy async, this means structuring your endpoint as multiple async with session: blocks rather than one long session. Alternatively, offload the entire upload pipeline to a background task queue (Celery, Dramatiq, or even FastAPI’s BackgroundTasks) so the API endpoint returns immediately with a “processing” status and the worker handles the slow parts without holding up the connection pool.