> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# 17. Configuration Management

> Master centralized configuration, feature flags, secrets management, and environment-specific settings for microservices

# Configuration Management

In a microservices architecture, managing configuration across dozens or hundreds of services is a significant challenge. Imagine 40 services each with their own `.env` file, and you need to rotate a database password. With decentralized config, that is 40 deploys, 40 chances for human error, and an anxious hour hoping you did not miss one. Centralized configuration turns that into a single update that propagates everywhere. This chapter covers patterns and tools for centralized, dynamic configuration -- including feature flags, which are arguably the most underrated tool in a microservices toolkit.

<Info>
  **Learning Objectives:**

  * Implement centralized configuration management
  * Set up dynamic configuration with hot reload
  * Design feature flags for progressive rollouts
  * Manage environment-specific configurations
  * Handle secrets securely across services
</Info>

***

## The Configuration Challenge

Before looking at solutions, it helps to see exactly how painful decentralized configuration becomes at scale. Every service owning its own `.env` creates three compounding problems: drift (staging and production quietly diverge), secrets leakage (credentials end up in git history), and operational fragility (changing a shared value requires coordinating N deploys). The "fix it in all 40 places" approach only works until someone misses one, and then you have a service pointing at the old database while the other 39 point at the new one -- usually discovered in production, at night, under pressure.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                      CONFIGURATION CHALLENGES                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  WITHOUT CENTRALIZED CONFIG:                                                 │
│                                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                       │
│  │  Service A   │  │  Service B   │  │  Service C   │                       │
│  │ ┌──────────┐ │  │ ┌──────────┐ │  │ ┌──────────┐ │                       │
│  │ │.env file │ │  │ │.env file │ │  │ │.env file │ │                       │
│  │ │DB_HOST=..│ │  │ │DB_HOST=..│ │  │ │DB_HOST=..│ │                       │
│  │ │API_KEY=..│ │  │ │API_KEY=..│ │  │ │API_KEY=..│ │                       │
│  │ └──────────┘ │  │ └──────────┘ │  │ └──────────┘ │                       │
│  └──────────────┘  └──────────────┘  └──────────────┘                       │
│                                                                              │
│  ⚠️ Problems:                                                                │
│  • Configuration scattered across services                                  │
│  • Hard to update consistently                                              │
│  • Requires redeployment for changes                                        │
│  • Secrets in plain text files                                              │
│  • No audit trail                                                           │
│                                                                              │
│  ══════════════════════════════════════════════════════════════════════════ │
│                                                                              │
│  WITH CENTRALIZED CONFIG:                                                    │
│                                                                              │
│                    ┌────────────────────────┐                               │
│                    │   Config Server        │                               │
│                    │   ┌────────────────┐   │                               │
│                    │   │ Consul / etcd  │   │                               │
│                    │   │ Vault / AWS SM │   │                               │
│                    │   └────────────────┘   │                               │
│                    └───────────┬────────────┘                               │
│                                │                                             │
│            ┌───────────────────┼───────────────────┐                        │
│            ▼                   ▼                   ▼                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                       │
│  │  Service A   │  │  Service B   │  │  Service C   │                       │
│  │  (watches)   │  │  (watches)   │  │  (watches)   │                       │
│  └──────────────┘  └──────────────┘  └──────────────┘                       │
│                                                                              │
│  ✅ Benefits:                                                                │
│  • Single source of truth                                                   │
│  • Hot reload without redeployment                                          │
│  • Encrypted secrets                                                        │
│  • Audit logging for changes                                                │
│  • Environment-specific overrides                                           │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

***

### Caveats & Common Pitfalls: The Silent Killers of Config Management

<Warning>
  **The traps that cause 2 AM incidents:**

  * **Config drift between environments.** Staging says `max_retries: 3`, production says `max_retries: 5`, and nobody remembers why. Bugs that reproduce in staging but not production (or vice versa) trace back to this 60% of the time. Teams discover it when a "successful" staging test passes, prod deploys, and breaks -- because staging was testing a different config path.
  * **Secrets in git history.** Someone commits an API key, notices an hour later, force-pushes the "fix." The key is still in the git history, still indexed by GitHub secret scanners, still compromised. Rotation must be assumed, not optional, once a secret touches a repo.
  * **Feature flag explosion.** A healthy feature flag system has 20-50 active flags at a mature org. An unhealthy one has 500+ flags, most of which nobody remembers what they do. The code is now a maze of nested `if (flag.enabled)` branches, and removing any flag has become risky because its behavior interacts with 12 others.
  * **Silent config failure modes.** A new config key is misspelled (`max_retires`), your code reads it with a default of 0, and retries now run zero times. Or a boolean flag is set to the string "false" which is truthy in JavaScript. Either bug passes every health check and only surfaces as elevated error rates that take an hour to trace.
</Warning>

<Tip>
  **Solutions & Patterns:**

  * **GitOps for config, not just code.** All environment configs live in a single repo with a clear `defaults/`, `staging/`, `production/` structure. Changes go through PR review. A drift-detection job runs daily and flags unexpected differences.
  * **Pre-commit secret scanning.** Use `gitleaks` or `trufflehog` in a pre-commit hook plus a CI check. If a secret ever lands in git history, rotate immediately -- do not try to "clean" git history; the secret has been published.
  * **Enforce feature flag lifecycle.** Every flag has an owner, a creation date, and an expiration date in its metadata. A nightly job creates cleanup tickets for flags older than 90 days. A linter fails CI if a flag is referenced in code without a corresponding definition with required metadata.
  * **Strongly-typed config loading with startup validation.** Tools like `pydantic-settings` (Python) or `convict` (Node) refuse to boot the service if a key is missing, misspelled, or has the wrong type. Startup-time failure is 1000x cheaper than runtime failure buried in production traffic.
</Tip>

## The 12-Factor App Configuration

### Factor III: Config

Store config in the environment, not in code. This is one of the most violated 12-factor principles in practice. The test is simple: could you open-source your codebase right now without exposing a single credential? If the answer is no, you have config leaking into code.

**Production pitfall:** A common mistake is having different config loading logic per environment (`if (env === 'production')`). This means your staging environment is not actually testing the same config path as production, which defeats the purpose of having staging at all.

The first step toward centralization is making your service read every configuration value from the environment rather than hardcoding it. This seems trivial, but it's where most teams start accumulating debt: a hardcoded localhost here, a default password there, and six months later you have three PRs open just to change an endpoint URL. The rule: if a value could ever differ between environments (dev, staging, prod, CI, a teammate's laptop), it belongs in the environment -- not in code.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // ❌ Bad: Hardcoded configuration
    const config = {
      database: {
        host: 'localhost',
        port: 5432,
        password: 'secretpassword'  // Never do this!
      },
      api: {
        timeout: 5000,
        retries: 3
      }
    };

    // ✅ Good: Environment-based configuration
    const config = {
      database: {
        host: process.env.DB_HOST || 'localhost',
        port: parseInt(process.env.DB_PORT, 10) || 5432,
        password: process.env.DB_PASSWORD  // Must be set externally
      },
      api: {
        timeout: parseInt(process.env.API_TIMEOUT, 10) || 5000,
        retries: parseInt(process.env.API_RETRIES, 10) || 3
      }
    };
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # ❌ Bad: Hardcoded configuration
    config = {
        "database": {
            "host": "localhost",
            "port": 5432,
            "password": "secretpassword",  # Never do this!
        },
        "api": {"timeout": 5000, "retries": 3},
    }

    # ✅ Good: Environment-based configuration with pydantic-settings
    from pydantic import Field
    from pydantic_settings import BaseSettings, SettingsConfigDict


    class DatabaseSettings(BaseSettings):
        model_config = SettingsConfigDict(env_prefix="DB_")

        host: str = "localhost"
        port: int = 5432
        password: str  # required; no default means startup fails if missing


    class ApiSettings(BaseSettings):
        model_config = SettingsConfigDict(env_prefix="API_")

        timeout: int = 5000
        retries: int = 3


    class AppSettings(BaseSettings):
        database: DatabaseSettings = Field(default_factory=DatabaseSettings)
        api: ApiSettings = Field(default_factory=ApiSettings)


    config = AppSettings()
    ```
  </Tab>
</Tabs>

### Configuration Hierarchy

Hierarchy is how you keep centralized config sane as it grows. Without structure, a flat key-value store becomes a dumping ground: thousands of untyped strings with no clear ownership. A good hierarchy expresses three things: *what* the value is (database/host), *which scope* it applies to (service-specific vs global), and *which environment* it targets (dev/staging/prod). You also want validation on load -- misspelling a key or passing a string where a number is expected should fail at startup, loudly, not silently become "undefined" halfway through handling a request. Tools like `convict` (Node) and `pydantic-settings` (Python) do this automatically, which is why they are worth using over a hand-rolled `process.env` grab-bag.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // config/index.js - Hierarchical configuration with validation
    const convict = require('convict');
    const path = require('path');

    const config = convict({
      env: {
        doc: 'The application environment',
        format: ['production', 'staging', 'development', 'test'],
        default: 'development',
        env: 'NODE_ENV'
      },
      
      server: {
        port: {
          doc: 'The port to bind to',
          format: 'port',
          default: 3000,
          env: 'PORT'
        },
        host: {
          doc: 'The host to bind to',
          format: 'ipaddress',
          default: '0.0.0.0',
          env: 'HOST'
        }
      },
      
      database: {
        host: {
          doc: 'Database host',
          format: String,
          default: 'localhost',
          env: 'DB_HOST'
        },
        port: {
          doc: 'Database port',
          format: 'port',
          default: 5432,
          env: 'DB_PORT'
        },
        name: {
          doc: 'Database name',
          format: String,
          default: 'myapp',
          env: 'DB_NAME'
        },
        username: {
          doc: 'Database username',
          format: String,
          default: '',
          env: 'DB_USERNAME',
          sensitive: true
        },
        password: {
          doc: 'Database password',
          format: String,
          default: '',
          env: 'DB_PASSWORD',
          sensitive: true
        },
        pool: {
          min: {
            doc: 'Minimum pool size',
            format: 'nat',
            default: 2,
            env: 'DB_POOL_MIN'
          },
          max: {
            doc: 'Maximum pool size',
            format: 'nat',
            default: 10,
            env: 'DB_POOL_MAX'
          }
        }
      },
      
      redis: {
        host: {
          doc: 'Redis host',
          format: String,
          default: 'localhost',
          env: 'REDIS_HOST'
        },
        port: {
          doc: 'Redis port',
          format: 'port',
          default: 6379,
          env: 'REDIS_PORT'
        },
        password: {
          doc: 'Redis password',
          format: String,
          default: '',
          env: 'REDIS_PASSWORD',
          sensitive: true
        }
      },
      
      services: {
        payment: {
          url: {
            doc: 'Payment service URL',
            format: 'url',
            default: 'http://payment-service:3000',
            env: 'PAYMENT_SERVICE_URL'
          },
          timeout: {
            doc: 'Payment service timeout (ms)',
            format: 'nat',
            default: 5000,
            env: 'PAYMENT_SERVICE_TIMEOUT'
          }
        },
        inventory: {
          url: {
            doc: 'Inventory service URL',
            format: 'url',
            default: 'http://inventory-service:3000',
            env: 'INVENTORY_SERVICE_URL'
          }
        }
      },
      
      features: {
        newCheckout: {
          doc: 'Enable new checkout flow',
          format: Boolean,
          default: false,
          env: 'FEATURE_NEW_CHECKOUT'
        },
        darkMode: {
          doc: 'Enable dark mode',
          format: Boolean,
          default: true,
          env: 'FEATURE_DARK_MODE'
        }
      },
      
      logging: {
        level: {
          doc: 'Log level',
          format: ['error', 'warn', 'info', 'debug'],
          default: 'info',
          env: 'LOG_LEVEL'
        }
      }
    });

    // Load environment-specific config
    const env = config.get('env');
    const configPath = path.join(__dirname, `${env}.json`);

    try {
      config.loadFile(configPath);
    } catch (e) {
      console.log(`No config file found for ${env}, using defaults and env vars`);
    }

    // Validate configuration
    config.validate({ allowed: 'strict' });

    module.exports = config;
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # config/settings.py - Hierarchical configuration with validation
    import json
    from pathlib import Path
    from typing import Literal

    from pydantic import Field, HttpUrl, SecretStr, field_validator
    from pydantic_settings import BaseSettings, SettingsConfigDict


    class ServerSettings(BaseSettings):
        model_config = SettingsConfigDict(env_prefix="")

        port: int = Field(default=3000, ge=1, le=65535, validation_alias="PORT")
        host: str = Field(default="0.0.0.0", validation_alias="HOST")


    class DatabasePoolSettings(BaseSettings):
        model_config = SettingsConfigDict(env_prefix="DB_POOL_")

        min: int = Field(default=2, ge=0)
        max: int = Field(default=10, ge=1)


    class DatabaseSettings(BaseSettings):
        model_config = SettingsConfigDict(env_prefix="DB_")

        host: str = "localhost"
        port: int = Field(default=5432, ge=1, le=65535)
        name: str = "myapp"
        username: str = ""
        password: SecretStr = SecretStr("")  # wrapped so it never leaks in logs
        pool: DatabasePoolSettings = Field(default_factory=DatabasePoolSettings)


    class RedisSettings(BaseSettings):
        model_config = SettingsConfigDict(env_prefix="REDIS_")

        host: str = "localhost"
        port: int = Field(default=6379, ge=1, le=65535)
        password: SecretStr = SecretStr("")


    class PaymentServiceSettings(BaseSettings):
        model_config = SettingsConfigDict(env_prefix="PAYMENT_SERVICE_")

        url: HttpUrl = Field(default="http://payment-service:3000")
        timeout: int = Field(default=5000, ge=0)


    class InventoryServiceSettings(BaseSettings):
        model_config = SettingsConfigDict(env_prefix="INVENTORY_SERVICE_")

        url: HttpUrl = Field(default="http://inventory-service:3000")


    class ServiceUrls(BaseSettings):
        payment: PaymentServiceSettings = Field(default_factory=PaymentServiceSettings)
        inventory: InventoryServiceSettings = Field(default_factory=InventoryServiceSettings)


    class FeatureSettings(BaseSettings):
        model_config = SettingsConfigDict(env_prefix="FEATURE_")

        new_checkout: bool = Field(default=False, validation_alias="FEATURE_NEW_CHECKOUT")
        dark_mode: bool = Field(default=True, validation_alias="FEATURE_DARK_MODE")


    class LoggingSettings(BaseSettings):
        level: Literal["error", "warn", "info", "debug"] = Field(
            default="info", validation_alias="LOG_LEVEL"
        )


    class AppSettings(BaseSettings):
        """Top-level config object. Reads from env vars with an optional
        per-environment JSON overlay (e.g. config/production.json)."""

        model_config = SettingsConfigDict(env_file=".env", extra="forbid")

        env: Literal["production", "staging", "development", "test"] = Field(
            default="development", validation_alias="NODE_ENV"
        )

        server: ServerSettings = Field(default_factory=ServerSettings)
        database: DatabaseSettings = Field(default_factory=DatabaseSettings)
        redis: RedisSettings = Field(default_factory=RedisSettings)
        services: ServiceUrls = Field(default_factory=ServiceUrls)
        features: FeatureSettings = Field(default_factory=FeatureSettings)
        logging: LoggingSettings = Field(default_factory=LoggingSettings)

        @field_validator("env", mode="before")
        @classmethod
        def _lowercase_env(cls, v: str) -> str:
            return v.lower() if isinstance(v, str) else v


    def load_config() -> AppSettings:
        settings = AppSettings()

        # Merge per-environment JSON overlay if it exists
        overlay = Path(__file__).parent / f"{settings.env}.json"
        if overlay.exists():
            overlay_data = json.loads(overlay.read_text())
            settings = AppSettings(**{**settings.model_dump(), **overlay_data})

        return settings


    config = load_config()
    ```
  </Tab>
</Tabs>

***

## Configuration Tool Comparison

Before choosing a tool, understand what each is optimized for. The most common mistake is using a general-purpose key-value store for secrets management, or paying for a dedicated feature flag service when Consul can handle your simple boolean flags.

| Capability                 | Consul                     | etcd                           | Spring Cloud Config  | AWS Parameter Store           | Vault                                          |
| -------------------------- | -------------------------- | ------------------------------ | -------------------- | ----------------------------- | ---------------------------------------------- |
| **Primary purpose**        | Service discovery + config | Distributed KV store           | App config server    | Cloud-native config           | Secrets management                             |
| **Hot reload**             | Yes (watches)              | Yes (watches)                  | Yes (bus refresh)    | No (polling only)             | No (app must re-fetch)                         |
| **Secret management**      | Basic (ACLs)               | Basic (RBAC)                   | Encrypt/decrypt      | Yes (SecureString)            | Excellent (dynamic secrets, leasing, rotation) |
| **Feature flags**          | Manual (KV structure)      | Manual (KV structure)          | Manual               | Manual                        | Not designed for this                          |
| **Kubernetes native?**     | Helm chart available       | Built into K8s (backing store) | Helm chart available | AWS only                      | Helm chart, K8s auth                           |
| **Multi-datacenter**       | Yes (built-in)             | Requires federation            | No                   | Multi-region with replication | Yes (replication)                              |
| **Operational complexity** | Medium                     | Low-Medium (if using K8s etcd) | Low                  | Low (managed)                 | High                                           |
| **Cost**                   | Free (OSS) / Enterprise    | Free (OSS)                     | Free (OSS)           | \$0.05 per 10K API calls      | Free (OSS) / Enterprise                        |

**Decision framework:**

* **Startup with 5 services on Kubernetes:** Use K8s ConfigMaps + Secrets (already there, zero extra infrastructure)
* **Growing team needing feature flags:** Add LaunchDarkly or Unleash (purpose-built; do not build your own if you can avoid it)
* **Enterprise with compliance requirements:** Vault for secrets + Consul for config (audit trails, dynamic credentials, RBAC)
* **AWS-native shop:** Parameter Store + Secrets Manager (managed, integrates with IAM)

***

## Consul for Configuration

Consul is one of the most popular choices for centralized config because it combines service discovery with a hierarchical KV store. The key-value API lets you model configuration as a tree (`config/production/order-service/database/host`), and every service watches its subtree for changes. The killer feature is **blocking queries**: instead of polling, your client tells Consul "give me this key, but if it has not changed in 60 seconds, return nothing." When the key changes, Consul responds instantly. This gives you hot-reload without the latency and cost of polling.

The trade-off to understand: Consul is eventually consistent across datacenters but strongly consistent within a single datacenter (Raft-backed). If your service reads a value immediately after you wrote it in the same DC, you see the new value. Across DCs there may be a small replication lag. For config, this is almost always fine; for coordinating distributed locks, it matters more.

### Setup and Connection

The client below shows the full lifecycle for Consul-based configuration: load an initial snapshot on startup, then register watches that fire whenever values change. Notice how we merge a global config layer with a service-specific layer -- this is the "hierarchy" pattern in action. Global values (like the company SMTP server) live once; service-specific overrides (like the order service's custom timeout) live under the service's own prefix. If you flatten everything into one namespace, you lose the ability to reason about scope and end up with copies of the same value under different keys.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // config/consul-config.js
    const Consul = require('consul');
    const EventEmitter = require('events');

    class ConsulConfig extends EventEmitter {
      constructor(options = {}) {
        super();
        
        this.consul = new Consul({
          host: process.env.CONSUL_HOST || 'localhost',
          port: process.env.CONSUL_PORT || 8500,
          promisify: true
        });
        
        this.serviceName = options.serviceName || process.env.SERVICE_NAME;
        this.environment = options.environment || process.env.NODE_ENV || 'development';
        this.prefix = `config/${this.environment}`;
        
        this.config = {};
        this.watchers = new Map();
      }

      async load() {
        // Load global config
        const globalConfig = await this.getPrefix(`${this.prefix}/global`);
        
        // Load service-specific config (overrides global)
        const serviceConfig = await this.getPrefix(`${this.prefix}/${this.serviceName}`);
        
        // Merge configs
        this.config = this.deepMerge(globalConfig, serviceConfig);
        
        console.log(`Loaded configuration for ${this.serviceName} in ${this.environment}`);
        return this.config;
      }

      async getPrefix(prefix) {
        try {
          const result = await this.consul.kv.get({
            key: prefix,
            recurse: true
          });
          
          if (!result) return {};
          
          const config = {};
          for (const item of result) {
            const key = item.Key.replace(`${prefix}/`, '');
            const value = this.parseValue(item.Value);
            this.setNestedValue(config, key, value);
          }
          
          return config;
        } catch (error) {
          console.error(`Failed to load config from ${prefix}:`, error.message);
          return {};
        }
      }

      parseValue(value) {
        if (!value) return null;
        
        try {
          return JSON.parse(value);
        } catch {
          // Not JSON, return as string
          return value;
        }
      }

      setNestedValue(obj, path, value) {
        const keys = path.split('/');
        let current = obj;
        
        for (let i = 0; i < keys.length - 1; i++) {
          if (!(keys[i] in current)) {
            current[keys[i]] = {};
          }
          current = current[keys[i]];
        }
        
        current[keys[keys.length - 1]] = value;
      }

      deepMerge(target, source) {
        const result = { ...target };
        
        for (const key in source) {
          if (source[key] instanceof Object && key in target) {
            result[key] = this.deepMerge(target[key], source[key]);
          } else {
            result[key] = source[key];
          }
        }
        
        return result;
      }

      get(path, defaultValue = undefined) {
        const keys = path.split('.');
        let value = this.config;
        
        for (const key of keys) {
          if (value && typeof value === 'object' && key in value) {
            value = value[key];
          } else {
            return defaultValue;
          }
        }
        
        return value;
      }

      // Watch for configuration changes
      watch(key, callback) {
        const fullKey = `${this.prefix}/${this.serviceName}/${key}`;
        
        const watcher = this.consul.watch({
          method: this.consul.kv.get,
          options: { key: fullKey }
        });
        
        watcher.on('change', (data) => {
          const newValue = data ? this.parseValue(data.Value) : null;
          const oldValue = this.get(key.replace(/\//g, '.'));
          
          if (JSON.stringify(newValue) !== JSON.stringify(oldValue)) {
            // Update local config
            this.setNestedValue(this.config, key, newValue);
            
            // Emit change event
            this.emit('change', { key, oldValue, newValue });
            callback(newValue, oldValue);
          }
        });
        
        watcher.on('error', (err) => {
          console.error(`Watch error for ${key}:`, err);
        });
        
        this.watchers.set(key, watcher);
        return watcher;
      }

      // Set configuration value
      async set(key, value) {
        const fullKey = `${this.prefix}/${this.serviceName}/${key}`;
        const stringValue = typeof value === 'object' ? JSON.stringify(value) : String(value);
        
        await this.consul.kv.set(fullKey, stringValue);
        this.setNestedValue(this.config, key, value);
      }

      // Close all watchers
      close() {
        for (const watcher of this.watchers.values()) {
          watcher.end();
        }
        this.watchers.clear();
      }
    }

    module.exports = ConsulConfig;
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # config/consul_config.py
    import asyncio
    import json
    import logging
    import os
    from typing import Any, Awaitable, Callable

    import consul.aio

    logger = logging.getLogger(__name__)

    ChangeCallback = Callable[[Any, Any], Awaitable[None] | None]


    class ConsulConfig:
        """Hierarchical configuration backed by Consul KV with hot-reload."""

        def __init__(
            self,
            service_name: str | None = None,
            environment: str | None = None,
            host: str | None = None,
            port: int | None = None,
        ) -> None:
            self.service_name = service_name or os.environ["SERVICE_NAME"]
            self.environment = environment or os.environ.get("NODE_ENV", "development")
            self.prefix = f"config/{self.environment}"

            self._host = host or os.environ.get("CONSUL_HOST", "localhost")
            self._port = port or int(os.environ.get("CONSUL_PORT", 8500))
            self._consul: consul.aio.Consul | None = None

            self.config: dict[str, Any] = {}
            self._watch_tasks: dict[str, asyncio.Task] = {}
            self._listeners: list[Callable[[dict], None]] = []

        @property
        def client(self) -> consul.aio.Consul:
            if self._consul is None:
                self._consul = consul.aio.Consul(host=self._host, port=self._port)
            return self._consul

        async def load(self) -> dict[str, Any]:
            """Load global config, then overlay service-specific config."""
            global_cfg = await self._get_prefix(f"{self.prefix}/global")
            service_cfg = await self._get_prefix(f"{self.prefix}/{self.service_name}")
            self.config = self._deep_merge(global_cfg, service_cfg)
            logger.info(
                "Loaded configuration for %s in %s", self.service_name, self.environment
            )
            return self.config

        async def _get_prefix(self, prefix: str) -> dict[str, Any]:
            try:
                _, items = await self.client.kv.get(prefix, recurse=True)
            except Exception as exc:
                logger.error("Failed to load config from %s: %s", prefix, exc)
                return {}

            if not items:
                return {}

            result: dict[str, Any] = {}
            for item in items:
                key = item["Key"].replace(f"{prefix}/", "")
                value = self._parse_value(item.get("Value"))
                self._set_nested_value(result, key, value)
            return result

        @staticmethod
        def _parse_value(raw: bytes | None) -> Any:
            if raw is None:
                return None
            text = raw.decode("utf-8") if isinstance(raw, (bytes, bytearray)) else raw
            try:
                return json.loads(text)
            except (TypeError, ValueError):
                return text

        @staticmethod
        def _set_nested_value(obj: dict, path: str, value: Any) -> None:
            keys = path.split("/")
            cursor = obj
            for key in keys[:-1]:
                cursor = cursor.setdefault(key, {})
            cursor[keys[-1]] = value

        def _deep_merge(self, target: dict, source: dict) -> dict:
            result = dict(target)
            for key, value in source.items():
                if isinstance(value, dict) and isinstance(result.get(key), dict):
                    result[key] = self._deep_merge(result[key], value)
                else:
                    result[key] = value
            return result

        def get(self, path: str, default: Any = None) -> Any:
            cursor: Any = self.config
            for key in path.split("."):
                if isinstance(cursor, dict) and key in cursor:
                    cursor = cursor[key]
                else:
                    return default
            return cursor

        async def set(self, key: str, value: Any) -> None:
            full_key = f"{self.prefix}/{self.service_name}/{key}"
            payload = json.dumps(value) if not isinstance(value, str) else value
            await self.client.kv.put(full_key, payload)
            self._set_nested_value(self.config, key, value)

        def watch(self, key: str, callback: ChangeCallback) -> None:
            """Register a callback to fire whenever `key` changes in Consul."""
            full_key = f"{self.prefix}/{self.service_name}/{key}"
            task = asyncio.create_task(self._watch_loop(full_key, key, callback))
            self._watch_tasks[key] = task

        async def _watch_loop(
            self, full_key: str, local_key: str, callback: ChangeCallback
        ) -> None:
            index: str | None = None
            while True:
                try:
                    # Consul blocking query: returns immediately if index changes,
                    # or after wait timeout if nothing changed.
                    index, item = await self.client.kv.get(full_key, index=index, wait="60s")
                    new_value = self._parse_value(item["Value"]) if item else None
                    old_value = self.get(local_key.replace("/", "."))

                    if json.dumps(new_value, sort_keys=True) != json.dumps(
                        old_value, sort_keys=True
                    ):
                        self._set_nested_value(self.config, local_key, new_value)
                        result = callback(new_value, old_value)
                        if asyncio.iscoroutine(result):
                            await result
                except Exception as exc:
                    logger.error("Watch error for %s: %s", local_key, exc)
                    await asyncio.sleep(5)

        async def close(self) -> None:
            for task in self._watch_tasks.values():
                task.cancel()
            self._watch_tasks.clear()
            if self._consul is not None:
                await self._consul.close()
    ```
  </Tab>
</Tabs>

### Using Consul Config in Services

Here is where the payoff shows up: your application startup code loads config once, registers watches for anything that might change at runtime (database pool sizes, feature flag states, third-party endpoints), and then calls `config.get(...)` freely throughout the codebase. When the value changes in Consul, your watch callback fires and gracefully reconfigures whatever needs to change -- no restart required. The common mistake is caching a value from `config.get()` at startup and holding it in a local variable forever; always re-read from `config.get()` at request time, or wire up a watch to refresh your local copy.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // app.js
    const express = require('express');
    const ConsulConfig = require('./config/consul-config');

    const app = express();
    const config = new ConsulConfig({ serviceName: 'order-service' });

    async function startServer() {
      // Load initial configuration
      await config.load();
      
      // Watch for configuration changes
      config.watch('database', (newValue, oldValue) => {
        console.log('Database config changed:', { oldValue, newValue });
        // Reconnect to database with new config
        reconnectDatabase(newValue);
      });
      
      config.watch('features/rateLimit', (newValue) => {
        console.log('Rate limit changed to:', newValue);
        updateRateLimiter(newValue);
      });
      
      // Listen for any config changes
      config.on('change', ({ key, oldValue, newValue }) => {
        console.log(`Config changed: ${key}`, { oldValue, newValue });
      });
      
      // Use configuration
      const port = config.get('server.port', 3000);
      const dbConfig = config.get('database');
      
      app.get('/health', (req, res) => {
        res.json({
          status: 'healthy',
          config: {
            environment: config.environment,
            features: config.get('features')
          }
        });
      });
      
      app.listen(port, () => {
        console.log(`Server running on port ${port}`);
      });
    }

    startServer().catch(console.error);
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # app.py -- FastAPI app wired to ConsulConfig
    import logging
    from contextlib import asynccontextmanager
    from typing import Any

    from fastapi import FastAPI

    from config.consul_config import ConsulConfig

    logger = logging.getLogger(__name__)
    config = ConsulConfig(service_name="order-service")


    async def reconnect_database(new_value: dict[str, Any]) -> None:
        logger.info("Reconnecting database with new config: %s", new_value)
        # actual pool rebuild goes here


    async def update_rate_limiter(new_value: Any) -> None:
        logger.info("Rate limit changed to: %s", new_value)


    @asynccontextmanager
    async def lifespan(app: FastAPI):
        # Load initial configuration
        await config.load()

        # Watch for runtime config changes
        config.watch("database", lambda new, old: reconnect_database(new))
        config.watch("features/rateLimit", lambda new, old: update_rate_limiter(new))

        yield
        await config.close()


    app = FastAPI(lifespan=lifespan)


    @app.get("/health")
    async def health() -> dict[str, Any]:
        return {
            "status": "healthy",
            "config": {
                "environment": config.environment,
                "features": config.get("features"),
            },
        }


    if __name__ == "__main__":
        import uvicorn

        uvicorn.run(app, host="0.0.0.0", port=config.get("server.port", 3000))
    ```
  </Tab>
</Tabs>

***

## Feature Flags

Feature flags are the secret weapon of high-performing engineering teams. They decouple deployment from release -- you can merge and deploy code that is not yet visible to users, then turn it on gradually (1% of users, then 10%, then 50%, then 100%). If something goes wrong, you flip the flag off in seconds instead of rolling back a deployment. Netflix, Google, and Facebook all use feature flags extensively. The trade-off: they add complexity and create technical debt if you do not clean up old flags. Set a rule: every feature flag gets a cleanup ticket with a deadline when it is created.

### Caveats & Common Pitfalls: Feature Flag Technical Debt

<Warning>
  **Feature flags promise agility but accumulate sharp costs if untended:**

  * **The "temporary" flag that never goes away.** A flag added for a 2-week experiment in 2022 is still in the code in 2026. The code branches have diverged. Removing the flag now requires reading and testing both branches carefully -- a 1-week task for what should have been 10 minutes of cleanup.
  * **Flag interactions creating untested state-space.** 10 boolean flags = 1024 possible combinations. You cannot test them all. A user who happens to have 4 specific flags enabled hits a code path nobody has ever executed. This manifests as "impossible" bugs that QA could never reproduce.
  * **Flag evaluation latency on hot paths.** Each flag check is cheap (microseconds), but on a request that evaluates 20 flags, each calling out to an in-memory map, you can burn milliseconds. Worse: if the flag SDK falls back to a network call when the local cache is cold, you have just coupled latency to an external service.
  * **Inconsistent flag state across services.** Order service evaluates `newCheckout` as true for user X; Inventory service evaluates it as false. The request gets half the new flow and half the old one, creating data corruption. This happens when services use different SDK versions or have different rule caches.
</Warning>

<Tip>
  **Solutions & Patterns:**

  * **Treat flag cleanup as part of the feature's definition of done.** The ticket that creates the flag also creates the ticket to remove it. CI fails if a flag lives past its expiration date without an explicit extension review.
  * **Pass flag evaluation results downstream.** When the Order service evaluates a flag, include the result in a request header (`X-Feature-NewCheckout: true`). Downstream services read the header rather than re-evaluating. Guarantees consistency across the entire request.
  * **Kill switches separate from feature flags.** A kill switch is a flag owned by ops/SRE with 1-click disable. It should be independent from product flags and should never be tangled with experimentation logic. When prod is on fire, you do not want to navigate an experimentation UI.
  * **Evaluate once per request, cache for the request scope.** Middleware evaluates all flags at request entry, attaches them to the request context, and every downstream call reads from the context. Avoids re-evaluating the same flag 20 times per request.
</Tip>

### Feature Flag System

The core of any feature flag system is consistent user bucketing -- given the same user ID and the same flag, you must always return the same answer. Otherwise a user could see feature X on one page load and not on the next, which is worse than not having the feature at all. The standard technique is hashing the user ID plus the flag name and mapping it to a number in \[0, 100). If the number falls below the flag's rollout percentage, the user is in the experiment. This is stateless (no database of user assignments) and stable across service restarts.

Beyond simple boolean and percentage rollouts, the patterns below cover user allowlists (beta testers), time-based gradual rollouts (ramp from 0% to 100% over a week), and attribute-based targeting (only premium subscribers see this). You do not need all of these on day one -- but building the abstraction upfront means adding new strategies later is a one-function change rather than a rewrite.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // features/feature-flags.js
    const ConsulConfig = require('./config/consul-config');
    const crypto = require('crypto');

    class FeatureFlags {
      constructor(options = {}) {
        this.config = options.config || new ConsulConfig({ serviceName: 'features' });
        this.flags = new Map();
        this.overrides = new Map(); // For testing
      }

      async initialize() {
        await this.config.load();
        
        // Watch for feature flag changes
        this.config.watch('flags', (newFlags) => {
          console.log('Feature flags updated:', newFlags);
          this.updateFlags(newFlags);
        });
        
        this.updateFlags(this.config.get('flags', {}));
      }

      updateFlags(flags) {
        this.flags.clear();
        for (const [name, config] of Object.entries(flags)) {
          this.flags.set(name, this.parseFlag(config));
        }
      }

      parseFlag(config) {
        if (typeof config === 'boolean') {
          return { enabled: config, type: 'boolean' };
        }
        return config;
      }

      // Check if feature is enabled
      isEnabled(flagName, context = {}) {
        // Check overrides first (for testing)
        if (this.overrides.has(flagName)) {
          return this.overrides.get(flagName);
        }
        
        const flag = this.flags.get(flagName);
        if (!flag) return false;
        
        switch (flag.type) {
          case 'boolean':
            return flag.enabled;
          
          case 'percentage':
            return this.checkPercentage(flag, context);
          
          case 'userList':
            return this.checkUserList(flag, context);
          
          case 'gradualRollout':
            return this.checkGradualRollout(flag, context);
          
          case 'userAttribute':
            return this.checkUserAttribute(flag, context);
          
          default:
            return flag.enabled || false;
        }
      }

      // Percentage-based rollout
      checkPercentage(flag, context) {
        const userId = context.userId || context.sessionId || 'anonymous';
        const hash = crypto.createHash('md5').update(userId + flag.name).digest('hex');
        const percentage = parseInt(hash.substring(0, 8), 16) % 100;
        return percentage < flag.percentage;
      }

      // User allowlist
      checkUserList(flag, context) {
        if (!context.userId) return false;
        return flag.users.includes(context.userId);
      }

      // Gradual rollout based on time
      checkGradualRollout(flag, context) {
        const now = Date.now();
        const start = new Date(flag.startDate).getTime();
        const end = new Date(flag.endDate).getTime();
        
        if (now < start) return false;
        if (now >= end) return true;
        
        const progress = (now - start) / (end - start);
        return this.checkPercentage({ ...flag, percentage: progress * 100 }, context);
      }

      // User attribute matching
      checkUserAttribute(flag, context) {
        const userValue = context[flag.attribute];
        if (!userValue) return false;
        
        switch (flag.operator) {
          case 'equals':
            return userValue === flag.value;
          case 'contains':
            return userValue.includes(flag.value);
          case 'in':
            return flag.values.includes(userValue);
          case 'regex':
            return new RegExp(flag.pattern).test(userValue);
          default:
            return false;
        }
      }

      // Get flag value (for non-boolean flags)
      getValue(flagName, defaultValue, context = {}) {
        const flag = this.flags.get(flagName);
        if (!flag) return defaultValue;
        
        if (!this.isEnabled(flagName, context)) {
          return defaultValue;
        }
        
        return flag.value !== undefined ? flag.value : defaultValue;
      }

      // Set override for testing
      setOverride(flagName, value) {
        this.overrides.set(flagName, value);
      }

      clearOverrides() {
        this.overrides.clear();
      }

      // Get all flags for debugging
      getAllFlags() {
        const result = {};
        for (const [name, config] of this.flags) {
          result[name] = config;
        }
        return result;
      }
    }

    // Singleton instance
    let instance = null;

    module.exports = {
      FeatureFlags,
      
      async getFeatureFlags() {
        if (!instance) {
          instance = new FeatureFlags();
          await instance.initialize();
        }
        return instance;
      }
    };
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # features/feature_flags.py
    import hashlib
    import re
    from datetime import datetime, timezone
    from typing import Any, Literal

    from pydantic import BaseModel, Field

    from config.consul_config import ConsulConfig


    FlagType = Literal[
        "boolean", "percentage", "userList", "gradualRollout", "userAttribute"
    ]


    class FlagDefinition(BaseModel):
        """One feature flag's configuration. Extra fields allowed because each
        flag type uses a different shape."""

        model_config = {"extra": "allow"}

        name: str = ""
        type: FlagType = "boolean"
        enabled: bool | None = None
        percentage: float | None = None
        users: list[str] = Field(default_factory=list)
        start_date: datetime | None = None
        end_date: datetime | None = None
        attribute: str | None = None
        operator: str | None = None
        value: Any = None
        values: list[Any] = Field(default_factory=list)
        pattern: str | None = None


    class FeatureFlags:
        def __init__(self, config: ConsulConfig | None = None) -> None:
            self.config = config or ConsulConfig(service_name="features")
            self.flags: dict[str, FlagDefinition] = {}
            self.overrides: dict[str, bool] = {}  # for tests

        async def initialize(self) -> None:
            await self.config.load()
            self.config.watch("flags", lambda new, old: self._update_flags(new or {}))
            self._update_flags(self.config.get("flags", {}))

        def _update_flags(self, flags: dict[str, Any]) -> None:
            self.flags = {}
            for name, raw in flags.items():
                if isinstance(raw, bool):
                    raw = {"enabled": raw, "type": "boolean"}
                self.flags[name] = FlagDefinition(name=name, **raw)

        def is_enabled(self, flag_name: str, context: dict[str, Any] | None = None) -> bool:
            context = context or {}

            # Overrides win (test support)
            if flag_name in self.overrides:
                return self.overrides[flag_name]

            flag = self.flags.get(flag_name)
            if flag is None:
                return False

            if flag.type == "boolean":
                return bool(flag.enabled)
            if flag.type == "percentage":
                return self._check_percentage(flag, context)
            if flag.type == "userList":
                return self._check_user_list(flag, context)
            if flag.type == "gradualRollout":
                return self._check_gradual_rollout(flag, context)
            if flag.type == "userAttribute":
                return self._check_user_attribute(flag, context)
            return bool(flag.enabled)

        def _check_percentage(self, flag: FlagDefinition, context: dict) -> bool:
            user_id = context.get("userId") or context.get("sessionId") or "anonymous"
            digest = hashlib.md5(f"{user_id}{flag.name}".encode()).hexdigest()
            bucket = int(digest[:8], 16) % 100
            return bucket < (flag.percentage or 0)

        def _check_user_list(self, flag: FlagDefinition, context: dict) -> bool:
            user_id = context.get("userId")
            return bool(user_id and user_id in flag.users)

        def _check_gradual_rollout(self, flag: FlagDefinition, context: dict) -> bool:
            if not (flag.start_date and flag.end_date):
                return False
            now = datetime.now(timezone.utc)
            if now < flag.start_date:
                return False
            if now >= flag.end_date:
                return True
            total = (flag.end_date - flag.start_date).total_seconds()
            progress = (now - flag.start_date).total_seconds() / total
            synthetic = flag.model_copy(update={"percentage": progress * 100})
            return self._check_percentage(synthetic, context)

        def _check_user_attribute(self, flag: FlagDefinition, context: dict) -> bool:
            user_value = context.get(flag.attribute or "")
            if user_value is None:
                return False
            if flag.operator == "equals":
                return user_value == flag.value
            if flag.operator == "contains":
                return flag.value in user_value
            if flag.operator == "in":
                return user_value in flag.values
            if flag.operator == "regex" and flag.pattern:
                return re.search(flag.pattern, str(user_value)) is not None
            return False

        def get_value(self, flag_name: str, default: Any, context: dict | None = None) -> Any:
            flag = self.flags.get(flag_name)
            if flag is None or not self.is_enabled(flag_name, context):
                return default
            return flag.value if flag.value is not None else default

        def set_override(self, flag_name: str, value: bool) -> None:
            self.overrides[flag_name] = value

        def clear_overrides(self) -> None:
            self.overrides.clear()


    # Singleton instance
    _instance: FeatureFlags | None = None


    async def get_feature_flags() -> FeatureFlags:
        global _instance
        if _instance is None:
            _instance = FeatureFlags()
            await _instance.initialize()
        return _instance
    ```
  </Tab>
</Tabs>

### Feature Flag Configuration in Consul

```json theme={null}
// Stored in Consul at: config/production/features/flags
{
  "newCheckoutFlow": {
    "type": "percentage",
    "name": "newCheckoutFlow",
    "percentage": 25,
    "description": "New streamlined checkout experience"
  },
  
  "betaFeatures": {
    "type": "userList",
    "name": "betaFeatures",
    "users": ["user-123", "user-456", "user-789"],
    "description": "Beta features for selected users"
  },
  
  "darkMode": {
    "type": "boolean",
    "enabled": true,
    "description": "Dark mode UI"
  },
  
  "newRecommendationEngine": {
    "type": "gradualRollout",
    "name": "newRecommendationEngine",
    "startDate": "2024-01-01T00:00:00Z",
    "endDate": "2024-01-15T00:00:00Z",
    "description": "Gradual rollout of new ML recommendations"
  },
  
  "premiumFeatures": {
    "type": "userAttribute",
    "attribute": "subscriptionTier",
    "operator": "in",
    "values": ["premium", "enterprise"],
    "description": "Features for premium subscribers"
  },
  
  "experimentalApi": {
    "type": "percentage",
    "name": "experimentalApi",
    "percentage": 10,
    "value": {
      "apiVersion": "v2",
      "timeout": 10000
    },
    "description": "Test new API version"
  }
}
```

### Using Feature Flags in Routes

The pattern below is the "feature flag dependency" -- pass flag evaluation context (user ID, session, subscription tier) into the flag check at the point of use, then branch on the result. Middleware is the cleanest way to propagate flag evaluation results through a request: evaluate the flag once, attach the result to the request object, and let handlers read it without re-evaluating. Avoid scattering `isEnabled` calls deep in business logic -- you want feature branches to be visible at the request-handling level where they can be audited and traced.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // routes/checkout.js
    const express = require('express');
    const router = express.Router();
    const { getFeatureFlags } = require('../features/feature-flags');

    router.post('/checkout', async (req, res) => {
      const featureFlags = await getFeatureFlags();
      const context = {
        userId: req.user.id,
        subscriptionTier: req.user.subscriptionTier,
        country: req.user.country
      };
      
      if (featureFlags.isEnabled('newCheckoutFlow', context)) {
        // New checkout flow
        return handleNewCheckout(req, res);
      }
      
      // Legacy checkout flow
      return handleLegacyCheckout(req, res);
    });

    // Feature flag middleware
    const featureFlagMiddleware = (flagName, options = {}) => {
      return async (req, res, next) => {
        const featureFlags = await getFeatureFlags();
        const context = {
          userId: req.user?.id,
          sessionId: req.sessionID,
          ...req.user
        };
        
        const isEnabled = featureFlags.isEnabled(flagName, context);
        req.featureFlags = req.featureFlags || {};
        req.featureFlags[flagName] = isEnabled;
        
        if (options.required && !isEnabled) {
          return res.status(404).json({
            error: 'Feature not available'
          });
        }
        
        next();
      };
    };

    // Use middleware
    router.get('/new-dashboard',
      featureFlagMiddleware('newDashboard', { required: true }),
      (req, res) => {
        res.render('new-dashboard');
      }
    );

    module.exports = router;
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # routes/checkout.py
    from typing import Any

    from fastapi import APIRouter, Depends, HTTPException, Request

    from features.feature_flags import FeatureFlags, get_feature_flags

    router = APIRouter()


    def _build_context(request: Request) -> dict[str, Any]:
        user = getattr(request.state, "user", None)
        return {
            "userId": getattr(user, "id", None) if user else None,
            "subscriptionTier": getattr(user, "subscription_tier", None) if user else None,
            "country": getattr(user, "country", None) if user else None,
            "sessionId": request.cookies.get("session_id"),
        }


    @router.post("/checkout")
    async def checkout(
        request: Request,
        flags: FeatureFlags = Depends(get_feature_flags),
    ) -> dict[str, Any]:
        context = _build_context(request)

        if flags.is_enabled("newCheckoutFlow", context):
            return await handle_new_checkout(request)
        return await handle_legacy_checkout(request)


    def require_flag(flag_name: str):
        """Dependency factory: returns a dependency that 404s if the flag is off."""

        async def _dependency(
            request: Request, flags: FeatureFlags = Depends(get_feature_flags)
        ) -> bool:
            context = _build_context(request)
            enabled = flags.is_enabled(flag_name, context)
            if not enabled:
                raise HTTPException(status_code=404, detail="Feature not available")
            # Attach to request state so handlers can read it
            request.state.feature_flags = {
                **getattr(request.state, "feature_flags", {}),
                flag_name: enabled,
            }
            return enabled

        return _dependency


    @router.get("/new-dashboard")
    async def new_dashboard(enabled: bool = Depends(require_flag("newDashboard"))) -> dict:
        return {"template": "new-dashboard"}


    async def handle_new_checkout(request: Request) -> dict[str, Any]:
        return {"flow": "new"}


    async def handle_legacy_checkout(request: Request) -> dict[str, Any]:
        return {"flow": "legacy"}
    ```
  </Tab>
</Tabs>

***

## Kubernetes ConfigMaps and Secrets

### ConfigMap for Non-Sensitive Config

ConfigMaps are Kubernetes' built-in answer to the configuration problem, and for small-to-medium clusters they are often all you need. A ConfigMap is a cluster-scoped object that stores key-value pairs or entire config files; pods can consume them as environment variables or mounted files. The tradeoff vs a dedicated config server like Consul: ConfigMaps have no native hot-reload (changes require a pod restart unless you mount them as files and the app watches the filesystem), no cross-cluster replication, and no audit trail beyond Kubernetes audit logs. What you get in return is zero extra infrastructure -- ConfigMaps are just Kubernetes.

```yaml theme={null}
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: order-service-config
  labels:
    app: order-service
data:
  # Simple key-value pairs
  LOG_LEVEL: "info"
  API_TIMEOUT: "5000"
  MAX_RETRIES: "3"
  
  # JSON config file
  config.json: |
    {
      "server": {
        "port": 3000,
        "host": "0.0.0.0"
      },
      "database": {
        "pool": {
          "min": 2,
          "max": 10
        }
      },
      "features": {
        "caching": true,
        "compression": true
      }
    }
  
  # Application properties
  application.properties: |
    spring.datasource.hikari.minimum-idle=2
    spring.datasource.hikari.maximum-pool-size=10
    logging.level.root=INFO
```

### Secrets for Sensitive Data

Kubernetes Secrets are the sibling of ConfigMaps for sensitive data, but the name is deceptive: by default, Secret values are only base64-encoded, not encrypted. Anyone with `kubectl get secret` permissions can trivially decode them. For real security you need encryption-at-rest enabled in etcd (which requires configuring the API server with an encryption config) plus strict RBAC. For anything sensitive enough to warrant a dedicated secrets manager -- production database credentials, API keys for paid services, signing keys -- use Vault or a cloud provider's secrets manager instead. Secrets are fine for low-stakes values where the main goal is keeping tokens out of ConfigMaps and source control.

```yaml theme={null}
# secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: order-service-secrets
type: Opaque
data:
  # Base64 encoded values
  DB_PASSWORD: cGFzc3dvcmQxMjM=
  API_KEY: c2VjcmV0LWFwaS1rZXk=
  JWT_SECRET: and0LXN1cGVyLXNlY3JldC1rZXk=
---
# For Docker registry credentials
apiVersion: v1
kind: Secret
metadata:
  name: registry-credentials
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: eyJhdXRocyI6eyJyZWdpc3RyeS5leGFtcGxlLmNvbSI6eyJ1c2VybmFtZSI6InVzZXIiLCJwYXNzd29yZCI6InBhc3MifX19
```

### Using ConfigMaps and Secrets in Deployments

```yaml theme={null}
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      containers:
      - name: order-service
        image: myregistry/order-service:v1
        
        # Environment variables from ConfigMap
        envFrom:
        - configMapRef:
            name: order-service-config
        
        # Individual env vars from secrets
        env:
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: order-service-secrets
              key: DB_PASSWORD
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: order-service-secrets
              key: API_KEY
        
        # Mount config files
        volumeMounts:
        - name: config-volume
          mountPath: /app/config
          readOnly: true
        - name: secrets-volume
          mountPath: /app/secrets
          readOnly: true
      
      volumes:
      - name: config-volume
        configMap:
          name: order-service-config
          items:
          - key: config.json
            path: config.json
      - name: secrets-volume
        secret:
          secretName: order-service-secrets
```

### Hot Reload with ConfigMap Updates

When a ConfigMap is mounted as a volume, Kubernetes automatically updates the mounted files when the ConfigMap changes -- typically within about a minute. This gives you a free hot-reload channel: have your app watch the config directory with `chokidar` (Node) or `watchdog` (Python), parse the file when it changes, and apply the new values. The gotcha: ConfigMap values injected as environment variables do NOT get updated; env vars are set at container start and never change. So for anything you want to hot-reload, mount it as a file. Also beware: there is no atomicity guarantee across multiple files, so if you have two related config files updating, your app might briefly see one old and one new. Favor a single `config.json` file for related values.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // config/kubernetes-config.js
    const fs = require('fs');
    const path = require('path');
    const chokidar = require('chokidar');
    const EventEmitter = require('events');

    class KubernetesConfig extends EventEmitter {
      constructor(configPath = '/app/config') {
        super();
        this.configPath = configPath;
        this.config = {};
      }

      load() {
        const configFile = path.join(this.configPath, 'config.json');
        
        if (fs.existsSync(configFile)) {
          const content = fs.readFileSync(configFile, 'utf8');
          this.config = JSON.parse(content);
        }
        
        return this.config;
      }

      watch() {
        const watcher = chokidar.watch(this.configPath, {
          persistent: true,
          ignoreInitial: true
        });
        
        watcher.on('change', (filePath) => {
          console.log(`Config file changed: ${filePath}`);
          const oldConfig = { ...this.config };
          this.load();
          this.emit('change', { oldConfig, newConfig: this.config });
        });
        
        return watcher;
      }

      get(key, defaultValue) {
        const keys = key.split('.');
        let value = this.config;
        
        for (const k of keys) {
          if (value && typeof value === 'object' && k in value) {
            value = value[k];
          } else {
            return defaultValue;
          }
        }
        
        return value;
      }
    }

    module.exports = KubernetesConfig;
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # config/kubernetes_config.py
    import asyncio
    import json
    import logging
    from copy import deepcopy
    from pathlib import Path
    from typing import Any, Awaitable, Callable

    from watchdog.events import FileSystemEvent, FileSystemEventHandler
    from watchdog.observers import Observer

    logger = logging.getLogger(__name__)

    ChangeListener = Callable[[dict[str, Any], dict[str, Any]], Awaitable[None] | None]


    class _ReloadHandler(FileSystemEventHandler):
        def __init__(self, config: "KubernetesConfig", loop: asyncio.AbstractEventLoop) -> None:
            self.config = config
            self.loop = loop

        def on_modified(self, event: FileSystemEvent) -> None:
            if event.is_directory:
                return
            # Bridge back to the asyncio loop from the watchdog thread
            asyncio.run_coroutine_threadsafe(self.config._handle_change(event.src_path), self.loop)


    class KubernetesConfig:
        """Reads a mounted ConfigMap from disk and hot-reloads on change.

        Kubernetes updates the mounted files when the ConfigMap is patched
        (typically within ~60s). We watch the directory and re-parse config.json
        whenever it changes.
        """

        def __init__(self, config_path: str = "/app/config") -> None:
            self.config_path = Path(config_path)
            self.config: dict[str, Any] = {}
            self._listeners: list[ChangeListener] = []
            self._observer: Observer | None = None

        def load(self) -> dict[str, Any]:
            config_file = self.config_path / "config.json"
            if config_file.exists():
                self.config = json.loads(config_file.read_text())
            return self.config

        def on_change(self, listener: ChangeListener) -> None:
            self._listeners.append(listener)

        async def _handle_change(self, file_path: str) -> None:
            logger.info("Config file changed: %s", file_path)
            old = deepcopy(self.config)
            self.load()
            for listener in self._listeners:
                result = listener(old, self.config)
                if asyncio.iscoroutine(result):
                    await result

        def watch(self) -> None:
            loop = asyncio.get_running_loop()
            self._observer = Observer()
            self._observer.schedule(
                _ReloadHandler(self, loop), str(self.config_path), recursive=True
            )
            self._observer.start()

        def close(self) -> None:
            if self._observer is not None:
                self._observer.stop()
                self._observer.join()
                self._observer = None

        def get(self, key: str, default: Any = None) -> Any:
            cursor: Any = self.config
            for part in key.split("."):
                if isinstance(cursor, dict) and part in cursor:
                    cursor = cursor[part]
                else:
                    return default
            return cursor
    ```
  </Tab>
</Tabs>

***

## HashiCorp Vault for Secrets

Vault is the gold standard for secrets management, and it is worth understanding why before you commit to the operational cost. The key idea is **dynamic secrets**: instead of sharing one long-lived database password across all your services, Vault generates a unique short-lived credential per service instance on demand. When the service is done (or its lease expires, usually after 24 hours), Vault automatically revokes the credential. This turns credential rotation from a quarterly security project into a continuous background process. You also get audit logs (every secret access is recorded), fine-grained policies (service X can read these paths, write nothing), and multiple auth methods (Kubernetes service account tokens, AWS IAM, LDAP).

The cost is real operational complexity. Vault has a "sealed" state it enters on restart and requires unsealing (either manually with key shares or automatically via a cloud KMS). Running Vault in HA requires a backend like Consul or Raft, plus careful backup strategy for the encryption keys. If you do not have compliance requirements (PCI, HIPAA, SOC 2) forcing the issue, start simpler.

### Vault Integration

The code below shows the standard Vault lifecycle: authenticate using the pod's Kubernetes service account token (no hardcoded credentials), fetch static secrets from the KV store, request dynamic database credentials, and schedule lease renewal before expiry. Lease renewal is the subtle part -- if you let a lease expire without renewing, Vault revokes the credential and your next database query fails. The typical pattern is to renew at 75% of the lease duration, giving a safety margin for network hiccups.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // config/vault-config.js
    const vault = require('node-vault');

    class VaultConfig {
      constructor(options = {}) {
        this.client = vault({
          apiVersion: 'v1',
          endpoint: process.env.VAULT_ADDR || 'http://localhost:8200',
          token: process.env.VAULT_TOKEN
        });
        
        this.secretPath = options.secretPath || 'secret/data';
        this.serviceName = options.serviceName || process.env.SERVICE_NAME;
        this.secrets = {};
        this.leases = new Map();
      }

      // Authenticate with Kubernetes
      async authenticateWithKubernetes() {
        const jwt = require('fs').readFileSync(
          '/var/run/secrets/kubernetes.io/serviceaccount/token',
          'utf8'
        );
        
        const response = await this.client.kubernetesLogin({
          role: this.serviceName,
          jwt: jwt
        });
        
        this.client.token = response.auth.client_token;
        
        // Set up token renewal
        this.scheduleTokenRenewal(response.auth.lease_duration);
      }

      async loadSecrets() {
        const environment = process.env.NODE_ENV || 'development';
        
        // Load service-specific secrets
        const servicePath = `${this.secretPath}/${environment}/${this.serviceName}`;
        
        try {
          const response = await this.client.read(servicePath);
          this.secrets = response.data.data;
          console.log(`Loaded secrets for ${this.serviceName}`);
        } catch (error) {
          if (error.response?.statusCode === 404) {
            console.warn(`No secrets found at ${servicePath}`);
            this.secrets = {};
          } else {
            throw error;
          }
        }
        
        return this.secrets;
      }

      // Get dynamic database credentials
      async getDatabaseCredentials(dbRole = 'readonly') {
        const path = `database/creds/${dbRole}`;
        
        try {
          const response = await this.client.read(path);
          
          // Schedule credential renewal before expiry
          this.scheduleLease(path, response.lease_id, response.lease_duration);
          
          return {
            username: response.data.username,
            password: response.data.password,
            leaseDuration: response.lease_duration
          };
        } catch (error) {
          console.error('Failed to get database credentials:', error);
          throw error;
        }
      }

      // Renew lease before expiry
      scheduleLease(path, leaseId, leaseDuration) {
        // Renew at 75% of lease duration
        const renewAt = leaseDuration * 0.75 * 1000;
        
        const timer = setTimeout(async () => {
          try {
            const response = await this.client.write('sys/leases/renew', {
              lease_id: leaseId
            });
            
            console.log(`Renewed lease for ${path}`);
            this.scheduleLease(path, leaseId, response.lease_duration);
          } catch (error) {
            console.error(`Failed to renew lease for ${path}:`, error);
            // Get new credentials
            await this.getDatabaseCredentials(path.split('/').pop());
          }
        }, renewAt);
        
        this.leases.set(leaseId, timer);
      }

      scheduleTokenRenewal(leaseDuration) {
        const renewAt = leaseDuration * 0.75 * 1000;
        
        setTimeout(async () => {
          try {
            await this.client.tokenRenewSelf();
            console.log('Vault token renewed');
            this.scheduleTokenRenewal(leaseDuration);
          } catch (error) {
            console.error('Failed to renew Vault token:', error);
            await this.authenticateWithKubernetes();
          }
        }, renewAt);
      }

      get(key, defaultValue) {
        return this.secrets[key] || defaultValue;
      }

      async close() {
        for (const timer of this.leases.values()) {
          clearTimeout(timer);
        }
        this.leases.clear();
      }
    }

    module.exports = VaultConfig;
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # config/vault_config.py
    import asyncio
    import logging
    import os
    from dataclasses import dataclass
    from pathlib import Path
    from typing import Any

    import hvac

    logger = logging.getLogger(__name__)


    @dataclass
    class DatabaseCredentials:
        username: str
        password: str
        lease_id: str
        lease_duration: int


    class VaultConfig:
        """Vault client with Kubernetes auth, static secrets, and dynamic DB creds."""

        def __init__(
            self,
            service_name: str | None = None,
            secret_path: str = "secret/data",
        ) -> None:
            self.service_name = service_name or os.environ["SERVICE_NAME"]
            self.secret_path = secret_path
            self.client = hvac.Client(
                url=os.environ.get("VAULT_ADDR", "http://localhost:8200"),
                token=os.environ.get("VAULT_TOKEN"),
            )
            self.secrets: dict[str, Any] = {}
            self._lease_tasks: dict[str, asyncio.Task] = {}
            self._token_task: asyncio.Task | None = None

        async def authenticate_with_kubernetes(self) -> None:
            """Trade the pod's service account JWT for a Vault token."""
            jwt_path = Path("/var/run/secrets/kubernetes.io/serviceaccount/token")
            jwt = jwt_path.read_text().strip()

            response = await asyncio.to_thread(
                self.client.auth.kubernetes.login,
                role=self.service_name,
                jwt=jwt,
            )
            self.client.token = response["auth"]["client_token"]
            self._schedule_token_renewal(response["auth"]["lease_duration"])

        async def load_secrets(self) -> dict[str, Any]:
            environment = os.environ.get("NODE_ENV", "development")
            path = f"{environment}/{self.service_name}"

            try:
                response = await asyncio.to_thread(
                    self.client.secrets.kv.v2.read_secret_version, path=path
                )
                self.secrets = response["data"]["data"]
                logger.info("Loaded secrets for %s", self.service_name)
            except hvac.exceptions.InvalidPath:
                logger.warning("No secrets found at %s", path)
                self.secrets = {}

            return self.secrets

        async def get_database_credentials(
            self, db_role: str = "readonly"
        ) -> DatabaseCredentials:
            """Request dynamic credentials from Vault's database secrets engine."""
            response = await asyncio.to_thread(
                self.client.secrets.database.generate_credentials, name=db_role
            )
            creds = DatabaseCredentials(
                username=response["data"]["username"],
                password=response["data"]["password"],
                lease_id=response["lease_id"],
                lease_duration=response["lease_duration"],
            )
            self._schedule_lease_renewal(db_role, creds)
            return creds

        def _schedule_lease_renewal(
            self, db_role: str, creds: DatabaseCredentials
        ) -> None:
            # Renew at 75% of the lease duration
            async def _renew() -> None:
                try:
                    await asyncio.sleep(creds.lease_duration * 0.75)
                    response = await asyncio.to_thread(
                        self.client.sys.renew_lease, lease_id=creds.lease_id
                    )
                    logger.info("Renewed lease for %s", db_role)
                    refreshed = DatabaseCredentials(
                        username=creds.username,
                        password=creds.password,
                        lease_id=creds.lease_id,
                        lease_duration=response["lease_duration"],
                    )
                    self._schedule_lease_renewal(db_role, refreshed)
                except Exception as exc:
                    logger.error("Failed to renew lease for %s: %s", db_role, exc)
                    # Fall back to getting fresh credentials
                    await self.get_database_credentials(db_role)

            self._lease_tasks[creds.lease_id] = asyncio.create_task(_renew())

        def _schedule_token_renewal(self, lease_duration: int) -> None:
            async def _renew() -> None:
                try:
                    await asyncio.sleep(lease_duration * 0.75)
                    await asyncio.to_thread(self.client.auth.token.renew_self)
                    logger.info("Vault token renewed")
                    self._schedule_token_renewal(lease_duration)
                except Exception as exc:
                    logger.error("Failed to renew Vault token: %s", exc)
                    await self.authenticate_with_kubernetes()

            self._token_task = asyncio.create_task(_renew())

        def get(self, key: str, default: Any = None) -> Any:
            return self.secrets.get(key, default)

        async def close(self) -> None:
            for task in self._lease_tasks.values():
                task.cancel()
            self._lease_tasks.clear()
            if self._token_task is not None:
                self._token_task.cancel()
                self._token_task = None
    ```
  </Tab>
</Tabs>

### Using Vault in Application

Tying Vault into your app during startup follows a clear order: authenticate, load static secrets (API keys, shared secrets), then request dynamic credentials for anything that supports them (most commonly database credentials). The dynamic credentials path is where Vault shines -- your service starts up, gets a fresh unique database user with a 24-hour lease, uses it, and when the lease is about to expire the background renewal task extends it. If Vault goes down mid-operation, your service keeps running on its current credentials until the lease expires; by then Vault should be back. This graceful-degradation property is why static caching of credentials matters even when using Vault.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // app.js
    const express = require('express');
    const VaultConfig = require('./config/vault-config');
    const { Pool } = require('pg');

    const app = express();
    const vaultConfig = new VaultConfig({ serviceName: 'order-service' });

    let dbPool = null;

    async function initializeDatabase() {
      // Get dynamic credentials from Vault
      const credentials = await vaultConfig.getDatabaseCredentials('order-service-rw');
      
      dbPool = new Pool({
        host: process.env.DB_HOST,
        database: process.env.DB_NAME,
        user: credentials.username,
        password: credentials.password,
        max: 10
      });
      
      // Reconnect when credentials are renewed
      vaultConfig.on('credentialsRenewed', async (newCredentials) => {
        console.log('Database credentials renewed, reconnecting...');
        await dbPool.end();
        dbPool = new Pool({
          host: process.env.DB_HOST,
          database: process.env.DB_NAME,
          user: newCredentials.username,
          password: newCredentials.password,
          max: 10
        });
      });
    }

    async function startServer() {
      // Authenticate with Vault (Kubernetes auth)
      await vaultConfig.authenticateWithKubernetes();
      
      // Load static secrets
      await vaultConfig.loadSecrets();
      
      // Initialize database with dynamic credentials
      await initializeDatabase();
      
      const apiKey = vaultConfig.get('API_KEY');
      const jwtSecret = vaultConfig.get('JWT_SECRET');
      
      app.listen(3000, () => {
        console.log('Server started with Vault integration');
      });
    }

    startServer().catch(console.error);
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # app.py - FastAPI + Vault + asyncpg
    import logging
    import os
    from contextlib import asynccontextmanager

    import asyncpg
    from fastapi import FastAPI

    from config.vault_config import VaultConfig

    logger = logging.getLogger(__name__)
    vault = VaultConfig(service_name="order-service")

    db_pool: asyncpg.Pool | None = None


    async def initialize_database() -> None:
        global db_pool
        creds = await vault.get_database_credentials("order-service-rw")
        db_pool = await asyncpg.create_pool(
            host=os.environ["DB_HOST"],
            database=os.environ["DB_NAME"],
            user=creds.username,
            password=creds.password,
            max_size=10,
        )
        logger.info("Database pool initialized with dynamic credentials")


    @asynccontextmanager
    async def lifespan(app: FastAPI):
        # Authenticate with Vault via Kubernetes service account JWT
        await vault.authenticate_with_kubernetes()

        # Load static secrets (API keys, JWT signing keys, etc)
        await vault.load_secrets()

        # Initialize database with short-lived dynamic credentials
        await initialize_database()

        api_key = vault.get("API_KEY")
        jwt_secret = vault.get("JWT_SECRET")
        logger.info("Loaded %d static secrets", len([api_key, jwt_secret]))

        yield

        if db_pool is not None:
            await db_pool.close()
        await vault.close()


    app = FastAPI(lifespan=lifespan)


    @app.get("/health")
    async def health() -> dict:
        return {"status": "healthy"}
    ```
  </Tab>
</Tabs>

***

## Environment-Specific Configuration

### Multi-Environment Setup

The pattern below -- a default file plus per-environment overrides plus env-var mappings -- is battle-tested across large teams because it expresses three concerns cleanly: defaults that rarely change, per-environment overrides that track structural differences between dev/staging/prod, and runtime env-var injection for things like passwords and host-specific values. The "local.json" file (gitignored) is a key detail: it lets individual developers override any value for local dev without polluting shared config. When someone asks "why does it work on your machine but not mine," the answer is almost always something in their local.json.

```
config/
├── default.json          # Base configuration
├── development.json      # Dev overrides
├── staging.json         # Staging overrides
├── production.json      # Prod overrides
└── custom-environment-variables.json  # Env var mappings
```

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // config/loader.js
    const fs = require('fs');
    const path = require('path');

    class ConfigLoader {
      constructor(configDir = './config') {
        this.configDir = configDir;
        this.environment = process.env.NODE_ENV || 'development';
        this.config = {};
      }

      load() {
        // Load base config
        this.config = this.loadFile('default.json');
        
        // Load environment-specific config
        const envConfig = this.loadFile(`${this.environment}.json`);
        this.config = this.deepMerge(this.config, envConfig);
        
        // Load local overrides (not committed to git)
        const localConfig = this.loadFile('local.json');
        this.config = this.deepMerge(this.config, localConfig);
        
        // Apply environment variable overrides
        this.applyEnvVars();
        
        return this.config;
      }

      loadFile(filename) {
        const filePath = path.join(this.configDir, filename);
        
        if (fs.existsSync(filePath)) {
          return JSON.parse(fs.readFileSync(filePath, 'utf8'));
        }
        
        return {};
      }

      applyEnvVars() {
        const envVarMappings = this.loadFile('custom-environment-variables.json');
        this.applyEnvVarsRecursive(this.config, envVarMappings);
      }

      applyEnvVarsRecursive(config, mappings) {
        for (const [key, value] of Object.entries(mappings)) {
          if (typeof value === 'object') {
            if (!config[key]) config[key] = {};
            this.applyEnvVarsRecursive(config[key], value);
          } else {
            // Value is the env var name
            const envValue = process.env[value];
            if (envValue !== undefined) {
              config[key] = this.parseValue(envValue);
            }
          }
        }
      }

      parseValue(value) {
        // Try to parse as JSON
        try {
          return JSON.parse(value);
        } catch {
          return value;
        }
      }

      deepMerge(target, source) {
        const result = { ...target };
        
        for (const key in source) {
          if (source[key] instanceof Object && key in target && !(source[key] instanceof Array)) {
            result[key] = this.deepMerge(target[key], source[key]);
          } else {
            result[key] = source[key];
          }
        }
        
        return result;
      }
    }

    module.exports = new ConfigLoader().load();
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # config/loader.py
    import json
    import os
    from copy import deepcopy
    from pathlib import Path
    from typing import Any


    class ConfigLoader:
        """Layered config loader: defaults -> env file -> local -> env vars."""

        def __init__(self, config_dir: str | Path = "./config") -> None:
            self.config_dir = Path(config_dir)
            self.environment = os.environ.get("NODE_ENV", "development")
            self.config: dict[str, Any] = {}

        def load(self) -> dict[str, Any]:
            # 1. Base defaults
            self.config = self._load_file("default.json")

            # 2. Per-environment overrides (development/staging/production)
            env_config = self._load_file(f"{self.environment}.json")
            self.config = self._deep_merge(self.config, env_config)

            # 3. Developer-local overrides (gitignored)
            local_config = self._load_file("local.json")
            self.config = self._deep_merge(self.config, local_config)

            # 4. Env var overrides -- highest priority
            self._apply_env_vars()

            return self.config

        def _load_file(self, filename: str) -> dict[str, Any]:
            path = self.config_dir / filename
            if path.exists():
                return json.loads(path.read_text())
            return {}

        def _apply_env_vars(self) -> None:
            mappings = self._load_file("custom-environment-variables.json")
            self._apply_env_vars_recursive(self.config, mappings)

        def _apply_env_vars_recursive(self, config: dict, mappings: dict) -> None:
            for key, value in mappings.items():
                if isinstance(value, dict):
                    config.setdefault(key, {})
                    self._apply_env_vars_recursive(config[key], value)
                else:
                    # Leaf: value is the env var name to look up
                    env_value = os.environ.get(value)
                    if env_value is not None:
                        config[key] = self._parse_value(env_value)

        @staticmethod
        def _parse_value(value: str) -> Any:
            try:
                return json.loads(value)
            except (TypeError, ValueError):
                return value

        @staticmethod
        def _deep_merge(target: dict, source: dict) -> dict:
            result = deepcopy(target)
            for key, value in source.items():
                if (
                    isinstance(value, dict)
                    and isinstance(result.get(key), dict)
                    and not isinstance(value, list)
                ):
                    result[key] = ConfigLoader._deep_merge(result[key], value)
                else:
                    result[key] = value
            return result


    config = ConfigLoader().load()
    ```
  </Tab>
</Tabs>

***

## Interview Questions

<AccordionGroup>
  <Accordion title="Q1: How do you manage configuration across multiple microservices?">
    **Answer:**

    Use a **centralized configuration server** (Consul, etcd, Spring Cloud Config):

    1. **Hierarchy**: Global → Environment → Service-specific
    2. **Hot reload**: Watch for changes without restart
    3. **Secrets**: Separate from config (Vault, AWS Secrets Manager)
    4. **Versioning**: Track config changes in Git
    5. **Validation**: Schema validation on load

    **Best Practices:**

    * 12-Factor App: Config in environment
    * Encryption at rest and in transit
    * Audit logging for changes
    * Feature flags for gradual rollouts
  </Accordion>

  <Accordion title="Q2: How do you implement feature flags?">
    **Answer:**

    **Types of feature flags:**

    * **Boolean**: Simple on/off
    * **Percentage**: Gradual rollout (10% → 50% → 100%)
    * **User targeting**: Specific users or attributes
    * **Time-based**: Scheduled activation

    **Implementation:**

    * Hash user ID for consistent bucketing
    * Use configuration store for flag definitions
    * SDK in each service to evaluate flags

    **Best Practices:**

    * Clean up old flags (tech debt)
    * Monitor flag usage
    * Have kill switches for quick rollback
  </Accordion>

  <Accordion title="Q3: How do you handle secrets in microservices?">
    **Answer:**

    **Never in code or config files!**

    **Solutions:**

    * **HashiCorp Vault**: Dynamic secrets, leasing, rotation
    * **AWS Secrets Manager**: AWS-native, automatic rotation
    * **Kubernetes Secrets**: Basic, encode with base64 (not encrypted)

    **Best Practices:**

    * Dynamic credentials (short-lived)
    * Automatic rotation
    * Principle of least privilege
    * Audit access
    * Encrypt at rest
  </Accordion>

  <Accordion title="Q4: How do you update configuration without downtime?">
    **Answer:**

    **Hot Reload Pattern:**

    1. Watch config source for changes
    2. Validate new config before applying
    3. Gracefully transition (connection pools, caches)
    4. Rollback if validation fails

    **Kubernetes approach:**

    * Update ConfigMap/Secret
    * Trigger rolling restart: `kubectl rollout restart`
    * Or use sidecar to watch and signal reload

    **Application approach:**

    * Watch config file (chokidar/inotify)
    * Poll config server periodically
    * Subscribe to config change events
  </Accordion>
</AccordionGroup>

***

## Chapter Summary

<Info>
  **Key Takeaways:**

  * Centralize configuration with tools like Consul or etcd
  * Use hierarchical config: global → environment → service
  * Implement feature flags for safe progressive rollouts
  * Separate secrets from configuration (use Vault)
  * Enable hot reload for zero-downtime config updates
  * Follow 12-Factor App principles
</Info>

**Next Chapter:** CI/CD for Microservices - Pipelines and deployment strategies.

***

## Interview Questions: Silent Config Failures

<AccordionGroup>
  <Accordion title="A config change deployed to 10% of pods caused a 2% error rate spike that took 45 minutes to detect. How do you prevent this from happening again?">
    **Strong Answer Framework:**

    1. **Root-cause the detection delay, not just the failure.** 45 minutes to detect a 2% error rate spike means your alerts are tuned for 5%+ thresholds. Tighten SLO burn-rate alerts so a 2% error rate on 10% of traffic (effectively 0.2% of global traffic) still pages within 5 minutes. Fast alerts are the first line of defense.
    2. **Stage config rollouts like code rollouts.** A config change should never apply to 10% of pods instantly. Use the same canary pattern: 1 pod, 5% pods, 25%, 100%, with metrics checks at each step.
    3. **Type-check and schema-validate configs at load time.** The most common cause of silent failures is a misspelled key or wrong-typed value. `pydantic-settings`, `convict`, or JSON schema validation catches this before the service serves traffic. A pod that cannot parse its config should fail readiness, not run with undefined behavior.
    4. **Automate drift detection between environments.** Daily comparison: staging's config schema should match production's, with documented exceptions. A key that exists in prod but not staging means staging is not actually testing what prod runs.
    5. **Emit a structured log event on every config change.** "Config change: key=`max_retries`, old=3, new=5, source=consul-watch, env=production, ts=..." This is gold for incident correlation -- when something breaks, you can trace it to an exact config change.
    6. **Create a config-change-specific dashboard.** Traffic, latency, and error rate overlaid with config change events. When SRE opens the incident, they see immediately whether a recent config change correlates with the spike.

    **Real-World Example:** Cloudflare's July 2020 outage (public postmortem) was triggered exactly this way -- a regex config change deployed progressively, took down a small fraction first, detection was delayed because the error rate was below paging thresholds. Their fix: tighter burn-rate alerts and gated progressive rollouts for config changes.

    **Senior Follow-up Questions:**

    <Note>
      **Q: What if the config change was technically valid but semantically wrong -- no schema can catch it?**
      A: Then the defense shifts to canary analysis and automated rollback. The canary pod's error rate is compared against the stable pods' error rate; if the delta exceeds 0.5%, roll back. This catches semantic regressions even when the config schema is valid.
    </Note>

    <Note>
      **Q: You can't canary every config change -- some are urgent prod fixes. How do you handle those?**
      A: Have an explicit "break glass" mode with extra scrutiny: double-review required, paged-SRE watching the dashboard, automatic rollback armed if error rate increases above baseline. The goal is not to slow down emergencies but to ensure they do not become their own incidents.
    </Note>

    <Note>
      **Q: How do you ensure config changes are reversible? What if rolling back introduces its own bug?**
      A: Config rollback is the same class of problem as code rollback: sometimes reverting a config after 2 hours means rejecting data that was written with the new config in effect. For this reason, config changes should be backward-compatible by default (add, do not remove; widen, do not narrow). Breaking config changes go through a multi-step expand-contract pattern just like schema changes.
    </Note>

    **Common Wrong Answers:**

    * "Add more logging." More logs do not help if nobody reads them during the incident. The fix is alerts that trigger on the specific failure mode, not more data to sift through.
    * "Require approval for every config change." This creates approval queue bottlenecks and trains teams to batch changes, which makes diagnosis harder. Instead, make the changes safer and the detection faster.

    **Further Reading:**

    * Cloudflare's outage postmortems (indexed at blog.cloudflare.com).
    * Google SRE Workbook, Chapter 5 ("Alerting on SLOs") -- burn-rate alerting formulas.
    * LaunchDarkly's "Progressive Delivery" whitepaper on feature flag rollout patterns.
  </Accordion>

  <Accordion title="Your team keeps committing secrets to git by accident. Three times this quarter a developer pushed a .env file. How do you fix the process?">
    **Strong Answer Framework:**

    1. **Accept that the secrets are already compromised.** Once in git history, a secret is exposed to anyone with repo read access plus GitHub's own secret scanners plus any tool with clone access. Rotate all three secrets immediately, before doing process work.
    2. **Install a pre-commit hook.** `gitleaks` or `trufflehog` scan staged files before commit. A secret is rejected at the developer's machine, before it ever hits the remote. Make this mandatory via a husky or pre-commit.com template in the repo.
    3. **Add a second-line check in CI.** Even if someone bypasses the pre-commit hook (or installs a fresh clone), CI rescans on every push. A matched secret fails the build. This catches both accidents and intentional bypasses.
    4. **Scan the entire git history once.** `gitleaks detect --source .` Walks the full history. Every finding triggers a rotation ticket. Assume every secret ever committed is compromised regardless of whether it was "removed."
    5. **Make the right path the easiest path.** If developers commit secrets because `.env` is the easiest way to inject values locally, make the secret manager easier. `direnv` + a shell wrapper that fetches from Vault is one pattern. Secrets in a tool that integrates cleanly into the dev loop get used correctly.
    6. **Educate on the specific threat model.** Developers often believe a force-push removes the secret. Show them that GitHub's secret scanner still detected it, show them the log of scanner hits. Make the failure mode visceral.

    **Real-World Example:** The Uber 2016 breach exposed a private GitHub repo containing AWS credentials. The engineer who committed them believed they were harmless because the repo was private -- the attacker obtained access via a separate breach and pivoted from leaked credentials to 57 million user records. Secrets in git are a timed explosive with the timer set at commit time.

    **Senior Follow-up Questions:**

    <Note>
      **Q: Pre-commit hooks can be disabled with `--no-verify`. How do you prevent that?**
      A: You cannot fully prevent it on developer machines, but you make it visible: CI rejects pushes with unverifiable commits, PRs require CI green, and a weekly report lists force-pushes and `--no-verify` commits for review. Make the bypass expensive socially, not just technically.
    </Note>

    <Note>
      **Q: Some services legitimately need to write secrets to disk at runtime. How do you distinguish accidental commits from legitimate writes?**
      A: Runtime writes go to `/var/run/secrets/...` or a tmpfs volume, never to the working tree. Scanners ignore those paths. If a secret appears anywhere under the checked-out source tree, it is always wrong.
    </Note>

    <Note>
      **Q: A former employee had read access to this repo. The secrets they saw are being rotated now. Is that enough?**
      A: No. Also check what systems those secrets granted access to and whether any logs show suspicious access in the window between commit and rotation. If the secret was an AWS key, pull CloudTrail for that key's usage. Assume worst case until proven otherwise.
    </Note>

    **Common Wrong Answers:**

    * "Use `git filter-branch` to remove the secret from history." Does not help -- the secret has already been pulled by anyone with repo access, and scanners keep copies. Rotate instead.
    * "Train developers better." Training has never prevented this class of mistake at any organization. Technical controls (pre-commit + CI scan) are the only reliable defense.

    **Further Reading:**

    * GitHub's documentation on secret scanning and push protection.
    * OWASP: "Sensitive Data Exposure" top-10 category.
    * Vault's Kubernetes auth method documentation for replacing `.env` files.
  </Accordion>

  <Accordion title="Your feature-flag system now has 340 active flags. Engineers say they're afraid to remove any because they do not know which ones are safe. How do you dig out?">
    **Strong Answer Framework:**

    1. **Instrument evaluation counts per flag.** For the next 30 days, log every flag evaluation with flag name, result, and context. At the end of the window, you know which flags are actually in use and which ones are dead code.
    2. **Categorize by state.** Flags that always evaluate to the same value (100% on or 100% off for 30 days) are either fully rolled out or fully abandoned -- either way, they are removable. Flags with dynamic rollout (targeting by user attributes or percentages) need more care.
    3. **Assign ownership retroactively.** Use `git blame` on each flag definition to find who last touched it. Send that person (or their current team) a ticket: "You own flag X. Decide: keep, remove, or transfer ownership." Flags with no owner after 30 days are removed by default.
    4. **Remove in waves, not all at once.** Start with the "always true" flags (safest). Remove the flag reference and the else branch in the code; only the on-path code remains. Validate in staging, deploy. Repeat for "always false" flags, then for dormant-but-still-dynamic flags.
    5. **Add automated guardrails against regression.** Linter check: every flag in code must have a corresponding flag definition with owner, creation date, and expiration date. CI fails if a flag is older than 90 days without documented extension.
    6. **Budget the cleanup.** Removing 300 flags is a 6-month project. Allocate 10% of each team's sprint capacity to "feature flag debt" until the list is under 50. Treat it like any other debt-reduction program.

    **Real-World Example:** Facebook publicly discussed their feature-flag system (Gatekeeper) at various conferences; at peak they had tens of thousands of active flags and had to build automated cleanup tooling to track ownership, staleness, and impact analysis. Smaller orgs hit this wall at 200-500 flags and have to build their own version.

    **Senior Follow-up Questions:**

    <Note>
      **Q: What if removing a flag breaks something you did not know depended on it?**
      A: The instrumentation in step 1 should catch this -- you know who is calling the flag. But defensively: use a feature-flag SDK that logs "flag lookup returned default because not defined" at warn level. If you remove the flag and something starts warning in prod, you have a fast signal to restore it.
    </Note>

    <Note>
      **Q: How do you handle flags that were part of experimentation? The data team wants to keep them for historical analysis.**
      A: Separate concerns: flag values live in the feature-flag system; historical assignments live in the data warehouse as an immutable event stream. Removing the flag does not remove the historical record. Make this contract explicit with the data team.
    </Note>

    <Note>
      **Q: Engineering leaders want a metric to prevent this from happening again. What do you propose?**
      A: "Flag-age p95" -- the 95th percentile age of active flags. A healthy system has p95 under 60 days. Track this monthly as part of platform health review. When it climbs past 90, trigger a cleanup sprint.
    </Note>

    **Common Wrong Answers:**

    * "Just delete all flags older than 6 months." Without instrumentation, you do not know which old flags are safely removable. Some old flags are kill switches that rarely fire -- removing them removes the escape hatch.
    * "Force engineers to remove flags before shipping new ones." Creates a queue bottleneck and encourages engineers to leave flags named as generically as possible to avoid cleanup. Use technical guardrails instead of social ones.

    **Further Reading:**

    * Pete Hodgson, "Feature Toggles" on martinfowler.com -- the canonical taxonomy and lifecycle model.
    * LaunchDarkly's "Effective Feature Management" e-book (chapters on tech debt).
    * John Allspaw, "On Being a Senior Engineer" (2012) -- relevant for thinking about long-term system ownership.
  </Accordion>
</AccordionGroup>

***

## Interview Deep-Dive

<AccordionGroup>
  <Accordion title="'Your company has 30 microservices and needs to rotate a database password. Currently, each service reads it from an environment variable and requires a redeploy. How do you fix this, and how do you handle the rotation without downtime?'">
    **Strong Answer:**

    The immediate problem is that secrets are baked into the deployment -- changing them requires restarting every pod. The solution is decoupling secret retrieval from deployment.

    I would introduce HashiCorp Vault (or AWS Secrets Manager) as the centralized secret store. Each service authenticates to Vault at startup using its Kubernetes service account (no hardcoded Vault credentials). Vault returns the database password, and the service caches it locally with a TTL.

    For zero-downtime rotation, I use Vault's dynamic database secrets. Instead of one shared password, Vault creates a unique database user/password per service instance with a 24-hour TTL. Vault automatically creates the credentials and revokes them when they expire. Rotation means creating new credentials, not changing an existing password.

    If dynamic secrets are not feasible (legacy database, compliance constraints), I use dual-password rotation. Step one: add a second valid password to the database. Step two: update the secret in Vault to the new password. Step three: services pick up the new password on their next TTL refresh (no restart needed). Step four: remove the old password from the database. At no point during this process are zero valid passwords configured, so there is no downtime window.

    The key operational practice: every secret rotation should be automated and tested monthly. If your rotation procedure requires a human to run 30 kubectl commands, it will fail during an actual security incident when you are under time pressure.

    **Follow-up: "How do you handle the case where Vault is down and a service restarts? It cannot fetch its secrets."**

    I implement a secret cache on disk (encrypted with a local key derived from the Kubernetes service account). When the service starts, it tries Vault first. If Vault is unreachable, it falls back to the cached secrets. The cache has a maximum age -- if secrets are older than 48 hours, the service refuses to start with stale secrets and raises a critical alert. This balances availability (service can start without Vault) against security (stale secrets are eventually rejected).
  </Accordion>

  <Accordion title="'Explain how you would implement feature flags in a microservices environment. What are the risks of feature flags at scale?'">
    **Strong Answer:**

    Feature flags decouple deployment from release. You deploy code with the flag defaulting to off, then enable it progressively: 1% of users, then 10%, then 50%, then 100%. If anything goes wrong, you toggle the flag off -- no rollback, no redeploy.

    Implementation: a feature flag service (LaunchDarkly, Unleash, or custom-built) stores flag definitions with targeting rules. Each microservice has a lightweight SDK that evaluates flags locally using cached rules (no network call per flag check). The SDK receives rule updates via streaming (SSE or WebSocket) for near-instant propagation.

    In a microservices environment, the critical challenge is flag consistency across services. If the Order Service evaluates a flag as "on" for user X but the Inventory Service evaluates it as "off," the order flow breaks. I enforce consistency by having all services use the same flag evaluation SDK and the same targeting rules. For flags that span services, I pass the flag evaluation result as a header (X-Feature-Checkout-V2: true) rather than having each service evaluate independently.

    The risks at scale are real. First, technical debt accumulation. Every flag is a branch in your code. After 12 months with 200 flags, your codebase is a maze of if/else branches. I enforce a policy: every flag has an owner and an expiration date. Flags older than 90 days without a documented reason get a removal ticket auto-created.

    Second, testing combinatorial explosion. 10 flags mean 1024 possible combinations. You cannot test all of them. I group flags into independent categories and test each category independently, accepting that rare cross-flag interactions might slip through.

    Third, flag evaluation performance. If a hot code path checks 5 flags, and each flag evaluation involves a map lookup and rule evaluation, the cumulative overhead matters. The SDK should cache evaluated results per user context, not re-evaluate on every call.

    **Follow-up: "How do you handle a feature flag that accidentally gets flipped on for all users in production and causes an outage?"**

    This is why feature flags need the same operational rigor as deployments. I implement flag change auditing (who changed what, when), flag change approval workflows for high-risk flags (percentage rollout changes above 50% require review), and a kill switch that disables all recently changed flags with one command. The kill switch is the equivalent of a rollback -- it reverts all flag changes from the last N minutes. I also set up alerts on flag change events: if a flag changes and error rates spike within 5 minutes, the on-call gets paged with the flag change as the probable cause.
  </Accordion>

  <Accordion title="'How do you manage environment-specific configuration (dev, staging, production) without configuration drift across environments?'">
    **Strong Answer:**

    The principle is: configuration should be layered, not duplicated. I use a hierarchical configuration model with four layers: defaults (in code), global (shared across all services), environment-specific (overrides per environment), and service-specific (overrides per service per environment).

    In practice with Consul KV: `config/defaults/database_pool_size = 10`, `config/production/database_pool_size = 50`, `config/production/order-service/database_pool_size = 100`. The service reads in order: service-specific overrides the environment, which overrides the default. This means most configuration is shared (defaults), and only the values that genuinely differ per environment are overridden.

    Configuration drift happens when someone manually changes a production value without updating staging. I prevent this three ways. First, all configuration changes go through version control (GitOps for config). A PR changes a config file, gets reviewed, and an automated process applies it to the target environment. No manual Consul KV edits in production.

    Second, I run a daily drift detection job that compares the configuration across environments and reports differences that are not in the "expected overrides" list. If staging has `payment_gateway = sandbox` and production has `payment_gateway = live`, that is expected. If staging has `max_retries = 3` and production has `max_retries = 5` but the override was not documented, that is drift and gets flagged.

    Third, I use infrastructure-as-code (Terraform) for environment provisioning. The same Terraform module creates all environments with parameterized differences. This ensures structural consistency even as values differ.

    **Follow-up: "What about configuration that is specific to a single developer's local environment?"**

    Local config should never leak into the shared configuration system. I use a `.env.local` file (gitignored) that overrides the defaults for local development. The application loads configuration in priority order: `.env.local` > environment variables > configuration service > defaults in code. This way, a developer can override any value locally without affecting anyone else, and the committed code always references the default or environment-specific values.
  </Accordion>
</AccordionGroup>
