Production Best Practices
Building production-ready Go applications requires more than just functional code. This chapter covers the “last mile” that separates a working prototype from a service you can deploy with confidence: structured logging, configuration management, graceful shutdown, health checks, metrics, containerization, and security hardening. These are the concerns that distinguish senior engineers from junior ones — and they are where Go truly shines as an operations-friendly language.Structured Logging
Using slog (Go 1.21+)
Go 1.21 introducedlog/slog for structured logging.
Custom Logger with Context
Using Zap (High Performance)
Configuration Management
Using Environment Variables
Using Viper (Full-Featured)
Config File Example
Graceful Shutdown
Graceful shutdown is the difference between “we deployed with zero downtime” and “some requests got 502 errors during the deploy.” When your service receives SIGTERM (which Kubernetes sends before killing a pod), it needs to stop accepting new requests, wait for in-flight requests to complete, close database connections cleanly, and then exit. Here is the standard pattern:Health Checks
Metrics with Prometheus
Docker Deployment
Dockerfile
Docker Compose
Kubernetes Deployment
Build and Version Info
Makefile
Security Best Practices
Interview Questions
How do you handle configuration in production Go applications?
How do you handle configuration in production Go applications?
- Use environment variables for secrets and deployment-specific values
- Use config files (YAML/JSON) for application defaults
- Libraries like Viper for unified configuration
- Never commit secrets to version control
- Support configuration hot-reloading for non-sensitive values
What should happen during graceful shutdown?
What should happen during graceful shutdown?
- Stop accepting new connections
- Wait for in-flight requests to complete (with timeout)
- Close database connections
- Close cache connections
- Flush logs and metrics
- Exit cleanly
How do you implement health checks?
How do you implement health checks?
- Liveness: Is the process running? Simple 200 OK response
- Readiness: Can the service handle traffic? Check dependencies (DB, cache, etc.)
What metrics should you collect for a Go service?
What metrics should you collect for a Go service?
- HTTP request count/rate by status code
- Request latency (histogram)
- Active connections
- Database query latency
- Cache hit/miss rate
- Goroutine count
- Memory usage
- Error rates
Summary
| Area | Key Points |
|---|---|
| Logging | Structured (slog/zap), request IDs, context propagation |
| Configuration | Env vars + files, secrets management, validation |
| Shutdown | Graceful, proper ordering, timeouts |
| Health Checks | Liveness + readiness probes |
| Metrics | Prometheus, business + technical metrics |
| Deployment | Multi-stage Docker, K8s manifests |
| Security | Headers, rate limiting, input validation |
Interview Deep-Dive
Design the graceful shutdown sequence for a Go service that has an HTTP server, a database connection pool, a Redis client, and background worker goroutines. What is the correct order and why?
Design the graceful shutdown sequence for a Go service that has an HTTP server, a database connection pool, a Redis client, and background worker goroutines. What is the correct order and why?
- The shutdown order must be the reverse of the dependency order. You shut down consumers before producers, and application-level resources before infrastructure-level resources.
- Step 1: Stop accepting new HTTP connections by calling
srv.Shutdown(ctx). This stops the listener, returns 503 to new connections, and waits for in-flight requests to complete (up to the context timeout). This is the first step because you want to stop new work from arriving. - Step 2: Signal background workers to stop. Cancel their context or close their input channel. Wait for them to finish (with a timeout). This is second because workers might have in-progress database writes that need to complete.
- Step 3: Close the database connection pool with
db.Close(). This waits for in-use connections to be returned, then closes all connections. This must happen after workers finish because workers might be using database connections. - Step 4: Close the Redis client. Same reasoning — it must outlive anything that uses it.
- Step 5: Flush logs and metrics. This is last because you want to capture logs from all the shutdown steps above.
- The timeout is critical. Each step should have a deadline. If a step hangs (a worker is stuck on a blocking call), the overall shutdown context should expire and force exit. A typical total shutdown timeout is 30 seconds.
- In Go, the shutdown signal comes from
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM). Kubernetes sends SIGTERM, waits theterminationGracePeriodSeconds(default 30s), then sends SIGKILL. Your shutdown must complete within that window.
You are setting up structured logging for a Go microservice. Compare slog (standard library) versus zap (uber). What are the trade-offs, and how do you ensure request IDs appear in every log line?
You are setting up structured logging for a Go microservice. Compare slog (standard library) versus zap (uber). What are the trade-offs, and how do you ensure request IDs appear in every log line?
slog(Go 1.21+) is the standard library’s structured logging package. Advantages: zero dependencies, part of the standard library so it will be maintained forever, good enough performance for most services, and a standardslog.Handlerinterface that allows pluggable backends. Disadvantages: slower than zap for very high-throughput logging, fewer features out of the box.zap(uber) is a high-performance structured logger. Advantages: 3-5x faster than slog for JSON encoding (uses a zero-allocation encoder), has both a typed logger (zap.Logger) and a sugared logger (zap.SugaredLogger) for convenience, and has extensive middleware integrations. Disadvantages: external dependency, more complex API, and if your service is not CPU-bound on logging, the performance difference is irrelevant.- My recommendation: use
slogfor new projects unless profiling shows logging is a bottleneck (rare). Use zap if you are in a team that already uses it or if you genuinely need the maximum throughput (services logging 100K+ lines per second). - For request IDs in every log line: create a logging middleware that generates or extracts a request ID, creates a child logger with the request ID as a field (
slog.With("request_id", id)), and stores it in the context usingcontext.WithValue. Every function that logs retrieves the logger from the context. This way, every log line for a request automatically includes the request ID without any manual effort at each log call site.
You are containerizing a Go service with Docker. Walk me through the multi-stage Dockerfile, explain each decision, and describe how you would optimize the image size.
You are containerizing a Go service with Docker. Walk me through the multi-stage Dockerfile, explain each decision, and describe how you would optimize the image size.
- Stage 1 (builder): Use
golang:1.21-alpineas the build image. Copygo.modandgo.sumfirst, thenRUN go mod download— this caches dependencies as a Docker layer so they are not re-downloaded on every code change. Then copy the source code and build withCGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o /app/server ./cmd/server. TheCGO_ENABLED=0ensures a statically linked binary with no C dependencies. The-ldflags="-w -s"strips debug info for smaller binaries. - Stage 2 (runtime): Use
alpine:3.18(about 5MB) orscratch(0 bytes) as the runtime image. Copy only the compiled binary from the builder stage. Addca-certificatesif making HTTPS calls andtzdatafor timezone support. Create a non-root user and run the binary as that user. - Image size optimization: the final image is typically 10-20MB (alpine + binary) versus 800MB+ if you used the golang image as the runtime. With
scratch, it can be under 10MB, but you lose a shell for debugging, DNS resolution from libc (though Go’s pure-Go resolver works), and the ability toexecinto the container. - Additional optimizations: use
.dockerignoreto exclude tests, docs, and development files from the build context (speeds updocker build). Inject version info via build args andldflags. Add a HEALTHCHECK instruction so Docker (and Docker Compose) can monitor the container’s health natively. - Security: never run as root in production. Use
USER appuserin the Dockerfile. Do not include secrets in the image — pass them via environment variables or secret management at runtime.