> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Deployment & Production

# Chapter 9: Deployment & Production

> Deploying and running your NestJS app in production requires careful planning. This chapter covers Dockerization, CI/CD, environment management, health checks, logging, monitoring, scaling, and troubleshooting. We'll walk through practical steps and explain how to make your app production-ready.

***

## 9.1 Preparing for Production

Before deploying, ensure your application is production-ready.

### Production Checklist

**Environment Configuration:**

* Set `NODE_ENV=production`
* Use environment variables for all secrets
* Remove hardcoded credentials
* Validate all environment variables

**Security:**

* Enable CORS with specific origins
* Set secure HTTP headers (helmet)
* Use HTTPS
* Validate all inputs
* Rate limiting enabled

**Performance:**

* Build optimized bundle (`npm run build`)
* Remove dev dependencies
* Enable compression
* Optimize database queries
* Use connection pooling

**Monitoring:**

* Health checks configured
* Logging set up
* Error tracking (Sentry, etc.)
* Metrics collection

**Testing:**

* All tests passing
* Tested in staging environment
* Load testing completed
* Security audit done

### Deployment Target Comparison

| Target                             | Setup Complexity | Scaling               | Cost Model          | Best For                           |
| ---------------------------------- | ---------------- | --------------------- | ------------------- | ---------------------------------- |
| **VPS (DigitalOcean, Linode)**     | Low              | Manual (PM2 cluster)  | Fixed monthly       | Small projects, tight budgets      |
| **PaaS (Railway, Render, Fly.io)** | Very Low         | Auto-scaling          | Per-usage           | Startups, quick deployment         |
| **AWS ECS / Fargate**              | Medium           | Auto-scaling          | Per-task-second     | AWS shops, moderate scale          |
| **Kubernetes (EKS, GKE, AKS)**     | High             | Full control, HPA     | Per-node + overhead | Large teams, complex microservices |
| **Serverless (Lambda + API GW)**   | Medium           | Infinite auto-scaling | Per-invocation      | Event-driven, sporadic traffic     |

**Decision Framework:**

```text theme={null}
How many developers will maintain infrastructure?
  0 (no DevOps) --> PaaS (Railway, Render) or Serverless
  1-2           --> Docker on VPS or AWS ECS
  3+            --> Kubernetes

How predictable is your traffic?
  Steady        --> VPS or containers (predictable cost)
  Spiky         --> Serverless or auto-scaling containers
  Unknown       --> Start with PaaS, migrate when you understand the pattern

Is cold start latency acceptable?
  YES           --> Serverless is fine
  NO            --> Containers (always running)
```

<Warning>
  **NestJS and Serverless**: NestJS can run in AWS Lambda using `@codegenie/serverless-adapter` or `@vendia/serverless-express`, but the cold start is 1-3 seconds because NestJS bootstraps the DI container on every cold start. For latency-sensitive APIs, use provisioned concurrency or stick with containers. Serverless NestJS works well for internal tools, webhooks, and background processing where cold starts are acceptable.
</Warning>

***

## 9.2 Dockerizing Your App

Containerization makes deployment consistent and portable. Docker packages your app and dependencies into a single image.

### Basic Dockerfile

This Dockerfile uses multi-stage builds -- a critical Docker optimization. The builder stage has all dev dependencies (TypeScript compiler, etc.) to build the app, but the final image only contains the compiled JavaScript and production dependencies. This typically reduces image size from \~500MB to \~150MB.

```dockerfile theme={null}
# Stage 1: Build -- install everything, compile TypeScript
FROM node:18-alpine AS builder

WORKDIR /app

# Copy package files first. Docker caches this layer, so if dependencies
# haven't changed, npm ci is skipped on rebuild -- saving minutes.
COPY package*.json ./

# npm ci is preferred over npm install for CI/Docker:
# - It installs exact versions from package-lock.json (deterministic)
# - It is faster because it skips the dependency resolution step
# - It errors if package-lock.json is out of sync (catches mistakes)
RUN npm ci

# Copy source code after dependencies (better layer caching)
COPY . .

# Build the TypeScript into JavaScript in the /dist folder
RUN npm run build

# Stage 2: Production -- minimal image with only what's needed to run
FROM node:18-alpine

WORKDIR /app

COPY package*.json ./

# --only=production skips devDependencies (TypeScript, Jest, etc.)
RUN npm ci --only=production

# Copy the compiled JavaScript from the builder stage
COPY --from=builder /app/dist ./dist

# SECURITY: Never run containers as root. Create a dedicated non-root user.
# If an attacker exploits a vulnerability, they get limited permissions.
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nestjs -u 1001

USER nestjs

EXPOSE 3000

CMD ["node", "dist/main"]
```

### Optimized Dockerfile

```dockerfile theme={null}
FROM node:18-alpine AS deps

WORKDIR /app

COPY package*.json ./
RUN npm ci --only=production && \
    npm cache clean --force

FROM node:18-alpine AS builder

WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY . .
RUN npm run build

FROM node:18-alpine

WORKDIR /app

# Copy production dependencies
COPY --from=deps /app/node_modules ./node_modules

# Copy built application
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package*.json ./

# Security: Run as non-root
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nestjs -u 1001 && \
    chown -R nestjs:nodejs /app

USER nestjs

EXPOSE 3000

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"

CMD ["node", "dist/main"]
```

### .dockerignore

```dockerignore theme={null}
node_modules
npm-debug.log
dist
.git
.gitignore
.env
.env.local
*.md
.vscode
.idea
coverage
.nyc_output
test
*.spec.ts
```

### Docker Compose

```yaml theme={null}
version: '3.8'

services:
  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - DATABASE_URL=postgresql://user:password@db:5432/mydb
    depends_on:
      - db
    restart: unless-stopped

  db:
    image: postgres:14-alpine
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=mydb
    volumes:
      - postgres_data:/var/lib/postgresql/data
    restart: unless-stopped

volumes:
  postgres_data:
```

**Best Practices:**

* Use multi-stage builds for smaller images
* Keep images minimal (alpine base, no dev dependencies)
* Use `.dockerignore` to exclude unnecessary files
* Run as non-root user
* Add health checks
* Use specific version tags

***

## 9.3 Environment Variables

Store secrets and configuration in environment variables. Never commit secrets to version control.

### Using @nestjs/config

```bash theme={null}
npm install @nestjs/config
```

```typescript theme={null}
// app.module.ts
import { Module } from '@nestjs/common';
import { ConfigModule } from '@nestjs/config';

@Module({
  imports: [
    ConfigModule.forRoot({
      isGlobal: true,
      envFilePath: ['.env.local', '.env'],
      validationSchema: Joi.object({
        NODE_ENV: Joi.string()
          .valid('development', 'production', 'test')
          .default('development'),
        PORT: Joi.number().default(3000),
        DATABASE_URL: Joi.string().required(),
        JWT_SECRET: Joi.string().required(),
      }),
    }),
  ],
})
export class AppModule {}
```

### Environment Files

```bash theme={null}
# .env.example (committed)
NODE_ENV=development
PORT=3000
DATABASE_URL=postgresql://localhost:5432/mydb
JWT_SECRET=your-secret-key

# .env (not committed)
NODE_ENV=production
PORT=3000
DATABASE_URL=postgresql://user:password@db:5432/mydb
JWT_SECRET=super-secret-key-change-in-production
```

### Using Config Service

```typescript theme={null}
import { Injectable } from '@nestjs/common';
import { ConfigService } from '@nestjs/config';

@Injectable()
export class AppService {
  constructor(private configService: ConfigService) {}

  getDatabaseUrl(): string {
    return this.configService.get<string>('DATABASE_URL');
  }

  getJwtSecret(): string {
    return this.configService.get<string>('JWT_SECRET');
  }
}
```

**Tip:** Use schema validation (e.g., with `joi`) to ensure required environment variables are set and validate their values.

***

## 9.4 Health Checks

Health checks help load balancers and orchestrators know if your app is healthy.

### Installing Terminus

```bash theme={null}
npm install @nestjs/terminus
```

### Health Check Controller

Health checks come in two flavors, and understanding the difference is critical for Kubernetes deployments:

* **Liveness**: "Is the process alive?" If this fails, Kubernetes restarts the pod. Keep it simple -- do not check external dependencies here, or a database outage will cascade into pod restart storms.
* **Readiness**: "Can this instance handle traffic?" If this fails, Kubernetes removes the pod from the load balancer but does not restart it. Check database connectivity and other dependencies here.

```typescript theme={null}
// health/health.controller.ts
import { Controller, Get } from '@nestjs/common';
import { HealthCheck, HealthCheckService, TypeOrmHealthIndicator } from '@nestjs/terminus';

@Controller('health')
export class HealthController {
  constructor(
    private health: HealthCheckService,
    private db: TypeOrmHealthIndicator,
  ) {}

  // General health check -- used by simple load balancers
  @Get()
  @HealthCheck()
  check() {
    return this.health.check([
      () => this.db.pingCheck('database'),
    ]);
  }

  // Readiness probe: "Can I handle traffic?"
  // Checks dependencies. If the database is down, this returns unhealthy,
  // and the load balancer stops sending requests to this instance.
  @Get('readiness')
  @HealthCheck()
  readiness() {
    return this.health.check([
      () => this.db.pingCheck('database'),
    ]);
  }

  // Liveness probe: "Am I alive?"
  // Does NOT check dependencies. Just confirms the process is responsive.
  // If this fails, something is fundamentally wrong (deadlock, OOM).
  @Get('liveness')
  @HealthCheck()
  liveness() {
    return { status: 'ok', timestamp: new Date().toISOString() };
  }
}
```

**Common Mistake:** Including database checks in the liveness probe. If the database goes down temporarily, Kubernetes will restart all your pods simultaneously, creating a thundering herd that makes recovery harder. Only check the process itself in liveness probes.

### Custom Health Indicators

```typescript theme={null}
import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';

@Injectable()
export class CustomHealthIndicator extends HealthIndicator {
  async isHealthy(key: string): Promise<HealthIndicatorResult> {
    const isHealthy = await this.checkExternalService();
    const result = this.getStatus(key, isHealthy);

    if (isHealthy) {
      return result;
    }
    throw new HealthCheckError('External service check failed', result);
  }

  private async checkExternalService(): Promise<boolean> {
    // Check external service
    return true;
  }
}
```

**Diagram: Health Check Flow**

```text theme={null}
Load Balancer/Orchestrator
    ↓
GET /health
    ↓
Health Check Service
    ↓
Check Database, External Services, etc.
    ↓
Return Status (healthy/unhealthy)
```

***

## 9.5 Logging & Monitoring

Proper logging and monitoring are essential for production applications.

### NestJS Logger

The built-in `Logger` class is context-aware -- passing the class name to the constructor means every log line includes the service name, making it easy to filter logs in production.

```typescript theme={null}
import { Logger } from '@nestjs/common';

@Injectable()
export class UsersService {
  // Passing UsersService.name as the context means logs appear as:
  // [Nest] 12345 - 04/10/2026 [UsersService] Creating user: alice@example.com
  // This is invaluable when you have 50 services and need to find
  // where a specific log message came from.
  private readonly logger = new Logger(UsersService.name);

  async create(dto: CreateUserDto) {
    this.logger.log(`Creating user: ${dto.email}`);
    
    try {
      const user = await this.userRepository.create(dto);
      this.logger.log(`User created successfully: ${user.id}`);
      return user;
    } catch (error) {
      // .error() takes the message as the first argument and the stack trace
      // as the second. Always include the stack trace -- without it, you only
      // know WHAT failed, not WHERE in the code it failed.
      this.logger.error(`Failed to create user: ${error.message}`, error.stack);
      throw error;
    }
  }
}
```

**Production Tip:** The default logger outputs human-readable text, which is great for local development but terrible for cloud log aggregation. In production, replace it with a JSON logger (Winston, Pino) so tools like ELK, Datadog, or CloudWatch can parse and index your log fields automatically.

### Structured Logging

```typescript theme={null}
import { Logger } from '@nestjs/common';

@Injectable()
export class LoggerService {
  private readonly logger = new Logger();

  log(message: string, context?: string, metadata?: any) {
    this.logger.log(JSON.stringify({
      message,
      context,
      metadata,
      timestamp: new Date().toISOString(),
    }));
  }

  error(message: string, trace?: string, context?: string) {
    this.logger.error(JSON.stringify({
      message,
      trace,
      context,
      timestamp: new Date().toISOString(),
    }));
  }
}
```

### Winston Integration

```bash theme={null}
npm install nest-winston winston
```

```typescript theme={null}
import { Module } from '@nestjs/common';
import { WinstonModule } from 'nest-winston';
import * as winston from 'winston';

@Module({
  imports: [
    WinstonModule.forRoot({
      transports: [
        new winston.transports.File({
          filename: 'error.log',
          level: 'error',
        }),
        new winston.transports.File({
          filename: 'combined.log',
        }),
      ],
    }),
  ],
})
export class AppModule {}
```

### Error Tracking with Sentry

```bash theme={null}
npm install @sentry/node @sentry/tracing
```

```typescript theme={null}
// main.ts
import * as Sentry from '@sentry/node';
import { nodeProfilingIntegration } from '@sentry/profiling-node';

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  integrations: [nodeProfilingIntegration()],
  tracesSampleRate: 1.0,
  profilesSampleRate: 1.0,
});
```

**Best Practices:**

* Log errors and warnings
* Use structured logs (JSON) for cloud platforms
* Monitor logs and metrics in real time
* Set up alerts for critical errors
* Don't log sensitive information
* Use log levels appropriately

***

## 9.6 CI/CD Pipelines

Automate build, test, and deployment with CI/CD pipelines.

### GitHub Actions Workflow

```yaml theme={null}
# .github/workflows/deploy.yml
name: Deploy

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: '18'
          cache: 'npm'
      - run: npm ci
      - run: npm run lint
      - run: npm run test
      - run: npm run test:e2e

  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build Docker image
        run: docker build -t myapp:${{ github.sha }} .
      - name: Push to registry
        run: |
          echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
          docker push myapp:${{ github.sha }}

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to production
        run: |
          kubectl set image deployment/myapp myapp=myapp:${{ github.sha }}
```

### GitLab CI

```yaml theme={null}
# .gitlab-ci.yml
stages:
  - test
  - build
  - deploy

test:
  stage: test
  script:
    - npm ci
    - npm run lint
    - npm run test
    - npm run test:e2e

build:
  stage: build
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA

deploy:
  stage: deploy
  script:
    - kubectl set image deployment/myapp myapp=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
```

***

## 9.7 Kubernetes Deployment

Deploy NestJS applications to Kubernetes for scalability and reliability.

### Deployment Manifest

```yaml theme={null}
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nestjs-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nestjs-app
  template:
    metadata:
      labels:
        app: nestjs-app
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 3000
        env:
        - name: NODE_ENV
          value: "production"
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: database-url
        livenessProbe:
          httpGet:
            path: /health/liveness
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health/readiness
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
```

### Service Manifest

```yaml theme={null}
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: nestjs-app-service
spec:
  selector:
    app: nestjs-app
  ports:
  - port: 80
    targetPort: 3000
  type: LoadBalancer
```

***

## 9.8 Scaling & High Availability

Scale your application to handle increased load.

### Horizontal Scaling

Run multiple instances behind a load balancer. This is the primary scaling strategy for NestJS applications.

**Key Requirements for Horizontal Scaling:**

Your NestJS application must be stateless to scale horizontally. This means:

* No in-memory sessions (use Redis for sessions)
* No in-memory caches that are not shared (use Redis)
* No file uploads stored on the local filesystem (use S3 or equivalent)
* No WebSocket connections without a Redis adapter (Socket.io with `@socket.io/redis-adapter`)

| Scaling Strategy       | Trigger                | Tool                                                | Typical Latency                        |
| ---------------------- | ---------------------- | --------------------------------------------------- | -------------------------------------- |
| **Manual**             | Developer decision     | PM2 cluster mode, `pm2 start app -i max`            | N/A                                    |
| **Kubernetes HPA**     | CPU/memory threshold   | `kubectl autoscale deployment app --min=2 --max=10` | 30-60s                                 |
| **Cloud Auto-scaling** | Request count, latency | AWS ALB + ECS/Fargate auto-scaling                  | 60-120s                                |
| **Serverless**         | Per-request            | AWS Lambda, Cloud Functions                         | 0s (always ready) to 1-3s (cold start) |

### Vertical Scaling

Increase instance size (more CPU/memory). This is a valid first step before going horizontal -- a single well-provisioned instance can handle thousands of requests per second. Only scale horizontally when vertical scaling becomes cost-prohibitive or you need fault tolerance.

### Database Scaling

| Strategy                       | When to Use                              | Complexity |
| ------------------------------ | ---------------------------------------- | ---------- |
| **Connection Pooling**         | Always (default pool is often too small) | Low        |
| **Read Replicas**              | Read-heavy workloads (>80% reads)        | Medium     |
| **Caching (Redis)**            | Hot data accessed repeatedly             | Medium     |
| **Sharding**                   | Very large datasets (100M+ rows)         | High       |
| **CQRS with separate read DB** | Different read/write performance needs   | High       |

### Caching

```typescript theme={null}
import { CacheModule } from '@nestjs/cache-manager';
import { redisStore } from 'cache-manager-redis-store';

@Module({
  imports: [
    CacheModule.register({
      store: redisStore,
      host: 'localhost',
      port: 6379,
    }),
  ],
})
export class AppModule {}
```

***

## 9.9 Performance Optimization

Optimize your application for production performance.

### Enable Compression

```typescript theme={null}
// main.ts
import * as compression from 'compression';

async function bootstrap() {
  const app = await NestFactory.create(AppModule);
  app.use(compression());
  await app.listen(3000);
}
```

### Connection Pooling

```typescript theme={null}
TypeOrmModule.forRoot({
  // ... other options
  extra: {
    max: 10,
    min: 2,
    idleTimeoutMillis: 30000,
  },
})
```

### Query Optimization

* Use indexes on frequently queried columns
* Optimize N+1 queries
* Use select to limit fields
* Implement pagination

***

## 9.10 Production Edge Cases

**Edge Case 1: Graceful shutdown and in-flight requests**

When Kubernetes sends SIGTERM to your pod, your NestJS app needs to stop accepting new requests, finish processing in-flight requests, close database connections, and then exit. Without graceful shutdown, users get dropped connections and database transactions may be left in an inconsistent state.

```typescript theme={null}
// main.ts
async function bootstrap() {
  const app = await NestFactory.create(AppModule);

  // Enable graceful shutdown hooks
  app.enableShutdownHooks();

  await app.listen(3000);
}
```

NestJS will call `onModuleDestroy()` and `onApplicationShutdown()` lifecycle hooks on your providers. The PrismaService example in Chapter 5 uses `onModuleDestroy()` to close the database connection. Kubernetes gives you 30 seconds by default (configurable via `terminationGracePeriodSeconds`).

**Edge Case 2: Memory leaks from event listeners**

If your service subscribes to events (EventEmitter, RxJS subjects, WebSocket events) in the constructor but never unsubscribes, every hot reload in development and every new request-scoped instance in production leaks a listener. Implement `OnModuleDestroy` and clean up subscriptions.

**Edge Case 3: Node.js single-thread and CPU-bound operations**

NestJS runs on Node.js, which is single-threaded. If a service method does CPU-intensive work (JSON parsing a 50MB file, image processing, cryptographic operations beyond bcrypt), it blocks the entire event loop and all other requests freeze. Solutions: (1) Use worker threads (`worker_threads` module); (2) Offload to a separate microservice; (3) Use a job queue (Bull) that processes CPU-intensive work in a separate process.

**Edge Case 4: Docker image size and startup time**

A NestJS Docker image with all `node_modules` can be 500MB+. This matters for Kubernetes pod startup time (image pull takes 10-30 seconds) and serverless cold starts. The multi-stage Dockerfile in section 9.2 reduces this to \~150MB. Further optimization: use `pnpm` with `--prod` flag or `node-prune` to strip test files and documentation from `node_modules`.

***

## 9.10 Troubleshooting & Maintenance

Monitor and maintain your production application.

### Monitoring

* Monitor CPU, memory, and response times
* Track error rates
* Monitor database performance
* Set up alerts for anomalies

### Logging

* Centralize logs (ELK, CloudWatch, etc.)
* Search and filter logs
* Set up log retention policies
* Monitor log volumes

### Backup & Recovery

* Regular database backups
* Test restore procedures
* Document recovery steps
* Store backups securely

### Updates

* Regularly update dependencies
* Test updates in staging
* Use semantic versioning
* Document breaking changes

***

## 9.11 Summary

You've learned how to deploy and maintain NestJS applications in production:

**Key Concepts:**

* **Docker**: Containerize applications
* **CI/CD**: Automate deployment
* **Health Checks**: Monitor application health
* **Logging**: Track application behavior
* **Monitoring**: Observe production systems
* **Scaling**: Handle increased load
* **Kubernetes**: Orchestrate containers

**Best Practices:**

* Use environment variables for configuration
* Containerize with Docker
* Implement health checks
* Set up proper logging
* Monitor production systems
* Scale horizontally
* Regular backups and updates

**Next Chapter:** Learn about advanced patterns like CQRS, GraphQL, WebSockets, and event sourcing.