Learning Kubernetes for Beginners: 02 Configuration, Storage, and Workload Types

You’ve learned about Pods, ReplicaSets, and Services. You understand that Kubernetes orchestrates containers across nodes. But here’s what happens next in most interviews: the interviewer asks, “How would you handle database storage in Kubernetes?” or “How do you manage application secrets?” and suddenly, you’re stuck.

Most engineers can explain what a Pod is. Far fewer can explain when you actually need a StatefulSet instead of a Deployment, or why you’d use a DaemonSet over a ReplicaSet. These aren’t academic distinctions—they reflect real production decisions that separate junior engineers from those ready for senior roles.

In production systems at companies like Uber or Netflix, you’ll encounter scenarios where the basic building blocks from Part 1 aren’t enough. You need databases that persist data even when Pods restart. You need configuration that changes without rebuilding images. You need to run monitoring agents on every single node. That’s what we’re covering here.

Why Configuration and Storage Matter in Real Systems

When you first start with Kubernetes, everything feels like it should be stateless. Deploy a web server? Easy—just throw it in a Deployment. But real applications have state. They need to store files. They need credentials. They need to remember things between restarts.

I’ve seen teams struggle for weeks because they treated configuration as an afterthought. They hardcoded database passwords in container images. They lost customer data because they didn’t understand volume lifecycles. They built systems that couldn’t update configuration without downtime.

Here’s the truth: Kubernetes gives you tools for every scenario, but you need to know which tool to use and why. Using a Deployment when you need a StatefulSet isn’t just a minor mistake—it’s the difference between a system that works and one that loses data.

What Interviewers Are Really Testing

When an interviewer asks about Secrets or StatefulSets, they’re not testing memorization. They’re checking if you understand:

  • State management: Can you distinguish stateless from stateful workloads?
  • Production readiness: Do you know how to handle credentials securely?
  • Trade-offs: Can you explain when complexity is justified?
  • Real-world thinking: Have you actually thought about what happens when a database Pod restarts?

A junior engineer might say, “Use a StatefulSet for databases.” A senior engineer explains, “StatefulSets provide stable network identities and ordered deployment, which matters for distributed databases like Cassandra where nodes need to know each other’s identity. But for a simple Postgres instance, you might just need a Deployment with a PersistentVolumeClaim—the extra guarantees aren’t always necessary.”

ConfigMaps and Secrets: Managing Configuration

Let me tell you what happens in most startups: someone hardcodes an API key in the code. It works. They ship it. Six months later, they need to rotate that key, and suddenly they’re rebuilding containers and redeploying everything. I’ve watched teams take down production during key rotation because they never set up proper configuration management.

ConfigMaps: Environment-Specific Settings

ConfigMaps store non-sensitive configuration data as key-value pairs. Think of them as external configuration files that you can inject into your Pods without baking them into your container images.

Here’s a real scenario: You’re building a microservice that needs different database connection strings for dev, staging, and production. Without ConfigMaps, you’d either:

  1. Build three different images (terrible idea)
  2. Pass everything as environment variables in your deployment YAML (messy and hard to manage)
  3. Mount configuration files from somewhere (but where?)

ConfigMaps solve this cleanly:

yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
data:
  database_host: "prod-db.example.com"
  database_port: "5432"
  cache_ttl: "300"
  log_level: "info"
  feature_flags: |
    {
      "new_ui": true,
      "experimental_api": false
    }

Now inject this into your Pod:

yaml

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
  - name: app
    image: my-app:1.0
    env:
    - name: DATABASE_HOST
      valueFrom:
        configMapKeyRef:
          name: app-config
          key: database_host
    - name: DATABASE_PORT
      valueFrom:
        configMapKeyRef:
          name: app-config
          key: database_port
    volumeMounts:
    - name: config
      mountPath: /etc/config
  volumes:
  - name: config
    configMap:
      name: app-config

What’s happening here? The Pod gets DATABASE_HOST as an environment variable, but it can also read the entire ConfigMap as files in /etc/config. This gives you flexibility—use environment variables for simple values, mount as files for complex configuration like JSON or YAML.

⚠️ Common Mistake: ConfigMaps aren’t automatically updated in running Pods when you change them. If you modify a ConfigMap, you typically need to restart your Pods. Some teams use tools like Reloader to automate this, but in interviews, know that ConfigMap updates don’t trigger Pod restarts by default.

Secrets: Handling Sensitive Data

Secrets work almost identically to ConfigMaps, but with a crucial difference: they’re designed for sensitive data like passwords, tokens, and SSH keys. Kubernetes stores them base64-encoded (note: not encrypted by default) and provides additional access controls.

yaml

apiVersion: v1
kind: Secret
metadata:
  name: db-credentials
type: Opaque
data:
  username: cG9zdGdyZXM=  # base64 encoded "postgres"
  password: c3VwZXJzZWNyZXQ=  # base64 encoded "supersecret"

In production systems, you’d typically use external secret management systems like AWS Secrets Manager, HashiCorp Vault, or Google Secret Manager, and sync them into Kubernetes using operators. But understanding Kubernetes Secrets is fundamental.

💡 Pro Insight: At scale, teams don’t manually create Secret objects. They use tools like External Secrets Operator or sealed-secrets to sync from external sources. But in interviews, explaining the basic Secret object shows you understand the foundation.

Volumes and Persistent Storage: Where State Lives

Here’s a scenario that breaks many designs: You deploy a database as a regular Deployment. It works great. Then a node dies, Kubernetes reschedules the Pod to another node, and… all your data is gone. Why? Because container filesystems are ephemeral by default.

Understanding Volume Types

Kubernetes offers multiple volume types, but let’s focus on what matters in real systems:

mermaid

graph TD
    A[Pod Needs Storage] --> B{Data Lifetime?}
    B -->|Pod Lifetime| C[emptyDir]
    B -->|Survives Pod Restarts| D[PersistentVolume]
    D --> E{Cloud Provider?}
    E -->|AWS| F[EBS Volume]
    E -->|GCP| G[GCE Persistent Disk]
    E -->|Azure| H[Azure Disk]
    E -->|On-Prem| I[NFS/Ceph/Local]
    C --> J[Temporary Data<br/>Caches, Scratch Space]
    F --> K[Long-term Storage<br/>Databases, Files]
    G --> K
    H --> K
    I --> K

emptyDir: Created when a Pod starts, deleted when it stops. Perfect for caches or temporary processing. If your Pod has multiple containers that need to share files, emptyDir is your friend.

yaml

volumes:
- name: cache
  emptyDir: {}

PersistentVolumes (PV) and PersistentVolumeClaims (PVC): This is where it gets interesting. In real systems, you don’t directly create PersistentVolumes—you create PersistentVolumeClaims that describe what you need, and Kubernetes provisions the actual storage.

Think of it like ordering from a restaurant: You (the Pod) place an order (PVC) for what you want. The kitchen (StorageClass) prepares it. The waiter (Kubernetes) delivers the food (PV) to your table.

yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: fast-ssd

This PVC requests 20GB of storage from the “fast-ssd” StorageClass. On AWS, this might provision an EBS volume. On GCP, a Persistent Disk. The beauty is your application doesn’t care—it just gets a filesystem.

🎯 Interview Tip: When discussing storage, mention access modes: ReadWriteOnce (single node), ReadOnlyMany (multiple nodes for reading), ReadWriteMany (multiple nodes for writing). Most candidates forget that not all storage supports all modes. AWS EBS, for example, only supports ReadWriteOnce.

StatefulSets: When Order and Identity Matter

Here’s where most engineers get confused. When do you actually need a StatefulSet instead of a Deployment?

StatefulSets are for workloads where:

  1. Each Pod needs a stable identity (predictable names like app-0, app-1, app-2)
  2. Startup/shutdown order matters (app-0 must start before app-1)
  3. Each Pod needs its own persistent storage (app-0 and app-1 have different data)

Real-world examples where this matters:

  • Cassandra clusters: Each node needs a stable identity to participate in the ring
  • Kafka: Brokers need stable network identities for partition leadership
  • MySQL primary-replica: You need to start the primary before replicas
  • Elasticsearch: Nodes need stable identities to maintain cluster state

Here’s a StatefulSet for PostgreSQL:

yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:14
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: password
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 20Gi

The volumeClaimTemplates section is key. Unlike a Deployment where all Pods share the same PVC, a StatefulSet creates a unique PVC for each Pod. If postgres-0 restarts, it reattaches to the same PVC—your data survives.

mermaid

sequenceDiagram
    participant K as Kubernetes
    participant SS as StatefulSet
    participant P0 as postgres-0
    participant PVC0 as data-postgres-0
    participant P1 as postgres-1
    participant PVC1 as data-postgres-1
    
    K->>SS: Create StatefulSet
    SS->>P0: Create postgres-0
    SS->>PVC0: Create PVC for postgres-0
    P0->>PVC0: Attach volume
    Note over P0: postgres-0 fully running
    SS->>P1: Create postgres-1
    SS->>PVC1: Create PVC for postgres-1
    P1->>PVC1: Attach volume
    Note over P1: postgres-1 fully running

Notice the order: postgres-0 must be running before postgres-1 starts. This ordered startup is automatic with StatefulSets.

⚠️ Common Mistake: Many candidates think StatefulSets automatically handle database replication or clustering. They don’t. A StatefulSet gives you stable identities and ordered deployment, but you still need to configure your database to replicate. The StatefulSet just provides the infrastructure foundation.

Ingress: External Access to Your Services

You’ve deployed your application. It’s running in Pods behind a Service. Now how do users actually reach it from the internet?

A Service gives you internal networking—Pods can talk to each other. But external access requires either:

  1. LoadBalancer Service: Creates a cloud load balancer (expensive—one per service)
  2. NodePort: Exposes a port on every node (awkward port numbers, hard to manage)
  3. Ingress: HTTP/HTTPS routing with a single entry point (this is what you want)

How Ingress Works

Think of Ingress as a smart reverse proxy sitting at the edge of your cluster. It routes incoming requests to different Services based on hostnames and paths.

yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /users
        pathType: Prefix
        backend:
          service:
            name: user-service
            port:
              number: 80
      - path: /orders
        pathType: Prefix
        backend:
          service:
            name: order-service
            port:
              number: 80
  - host: admin.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: admin-service
            port:
              number: 80

This Ingress routes:

  • api.example.com/users/* → user-service
  • api.example.com/orders/* → order-service
  • admin.example.com/* → admin-service

mermaid

graph LR
    User[User] -->|api.example.com/users| Ingress[Ingress Controller]
    Ingress -->|Route /users| US[user-service]
    Ingress -->|Route /orders| OS[order-service]
    Ingress -->|admin.example.com| AS[admin-service]
    US --> UP1[User Pod 1]
    US --> UP2[User Pod 2]
    OS --> OP1[Order Pod 1]
    AS --> AP1[Admin Pod 1]

💡 Pro Insight: Ingress is just the configuration. You need an Ingress Controller (like nginx-ingress, Traefik, or cloud-specific controllers) actually running in your cluster to implement these rules. This confuses many beginners—the Ingress object is declarative config, the controller is the implementation.

DaemonSets: One Pod Per Node

Sometimes you need exactly one Pod running on every node. Not two. Not zero. One per node. This is what DaemonSets do.

Real-world use cases:

  • Log collectors: Fluentd or Filebeat collecting logs from every node
  • Monitoring agents: Prometheus node exporter on each node
  • Network plugins: CNI plugins that need to run on every node
  • Storage daemons: Ceph or GlusterFS storage agents

yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
spec:
  selector:
    matchLabels:
      name: fluentd
  template:
    metadata:
      labels:
        name: fluentd
    spec:
      containers:
      - name: fluentd
        image: fluentd:latest
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

The key difference from a Deployment: Kubernetes automatically schedules one Pod on each node, including new nodes that join the cluster. Add a node, get a DaemonSet Pod. Remove a node, the DaemonSet Pod goes with it.

🎯 Interview Tip: Explain that DaemonSets ignore the normal scheduler logic about spreading Pods across nodes. They specifically want one per node. If an interviewer asks, “How would you run monitoring on all nodes?” and you suggest a Deployment with replicas, you’ve missed the point.

Jobs and CronJobs: Batch Processing

Not everything needs to run forever. Sometimes you need to:

  • Run a database migration
  • Process a batch of files
  • Generate a daily report
  • Clean up old data

Jobs: Run Once Until Complete

A Job runs a Pod until it successfully completes (exits with code 0). If the Pod fails, the Job creates a new Pod and tries again.

yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: database-migration
spec:
  template:
    spec:
      containers:
      - name: migration
        image: my-app-migrations:v2
        command: ["python", "migrate.py"]
      restartPolicy: Never
  backoffLimit: 3

This Job runs a database migration. If it fails, Kubernetes retries up to 3 times (backoffLimit). Once it succeeds, the Job is complete and the Pod remains for you to check logs.

CronJobs: Scheduled Execution

CronJobs are Jobs that run on a schedule, using the same cron syntax you’d use in Linux.

yaml

apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup-database
spec:
  schedule: "0 2 * * *"  # Every day at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: postgres:14
            command:
            - /bin/sh
            - -c
            - pg_dump -h $DB_HOST -U postgres mydb > /backup/dump.sql
            env:
            - name: DB_HOST
              value: postgres-service
          restartPolicy: OnFailure

This backs up a database every night at 2 AM. Kubernetes creates a new Job (which creates a new Pod) on schedule.

⚠️ Common Mistake: CronJobs don’t guarantee exactly-once execution. If the cluster is busy or a node fails, a scheduled run might be missed. For critical workflows, you need external job scheduling systems or at least monitoring to detect missed runs.

Putting It All Together: A Real Production Example

Let’s walk through a realistic scenario. You’re building a content management system with:

  • Web frontend (stateless)
  • API backend (stateless)
  • PostgreSQL database (stateful)
  • Redis cache (can be stateful or not)
  • Background workers for image processing
  • Daily cleanup job

Here’s how you’d architect this:

mermaid

graph TD
    Internet[Internet] --> Ingress[Ingress Controller]
    Ingress -->|cms.example.com| Frontend[Frontend Service]
    Ingress -->|api.cms.example.com| API[API Service]
    Frontend --> FP1[Frontend Pod]
    Frontend --> FP2[Frontend Pod]
    API --> AP1[API Pod]
    API --> AP2[API Pod]
    API --> DB[(PostgreSQL<br/>StatefulSet)]
    API --> Redis[(Redis<br/>StatefulSet)]
    AP1 --> Queue[Message Queue]
    AP2 --> Queue
    Queue --> Worker1[Worker Pod]
    Queue --> Worker2[Worker Pod]
    Worker1 --> S3[S3 Storage]
    Worker2 --> S3
    CronJob[Daily Cleanup<br/>CronJob] -.->|2 AM| DB

Frontend and API: Deployments with 2+ replicas each. They’re stateless, so scaling and updates are simple.

PostgreSQL: StatefulSet with persistent storage. Even if the Pod restarts, data persists.

Redis: Could be a Deployment with a PVC for persistence, or a StatefulSet if you need Redis clustering.

Workers: Deployment that scales based on queue depth. These process images and upload to S3.

Cleanup Job: CronJob that runs nightly to delete old data.

Ingress: Single entry point routing cms.example.com to frontend and api.cms.example.com to the API.

This architecture uses every concept we’ve covered:

  • ConfigMaps for environment-specific settings
  • Secrets for database passwords and API keys
  • PersistentVolumeClaims for database storage
  • StatefulSet for the database
  • Deployments for stateless services
  • Ingress for external access
  • CronJob for scheduled maintenance

How to Talk About This in Interviews

When an interviewer asks, “How would you deploy a database in Kubernetes?” don’t just say “StatefulSet.” Walk through the reasoning:

“For a database, I’d use a StatefulSet rather than a Deployment because we need stable Pod identities and persistent storage that survives Pod restarts. Each Pod in a StatefulSet gets its own PersistentVolumeClaim, so if postgres-0 crashes and restarts, it reattaches to the same volume—no data loss.

I’d configure the StatefulSet with volumeClaimTemplates to automatically provision storage using a StorageClass that maps to our cloud provider’s block storage—AWS EBS or GCP Persistent Disks. I’d set the access mode to ReadWriteOnce since most databases can’t be safely accessed from multiple nodes simultaneously.

For configuration like connection credentials, I’d use Secrets rather than hardcoding them in the container image. This lets us rotate credentials without rebuilding images.

That said, for production databases, many teams actually run databases outside Kubernetes—either managed services like RDS or dedicated database servers. StatefulSets work, but they add operational complexity. You need to handle backups, replication, and failover yourself. It’s a trade-off between operational flexibility and complexity.”

Notice what this answer demonstrates:

  • Understanding of StatefulSets vs Deployments
  • Knowledge of storage concepts
  • Security awareness (Secrets)
  • Real-world pragmatism (maybe databases shouldn’t be in Kubernetes)
  • Trade-off analysis

That’s senior-level thinking.

Wrapping Up

You now understand the infrastructure beyond basic Pods and Services. You know how to:

  • Manage configuration with ConfigMaps and Secrets
  • Handle persistent data with volumes and StatefulSets
  • Route external traffic with Ingress
  • Run system-level services with DaemonSets
  • Execute batch work with Jobs and CronJobs

The real skill isn’t memorizing these objects—it’s knowing when to use each one. A Deployment for stateless apps. A StatefulSet when identity matters. A DaemonSet when you need one per node. A Job for one-time tasks. A CronJob for scheduled work.

In your next interview, when someone asks about Kubernetes, you won’t just list objects. You’ll explain the problems they solve and the trade-offs they involve. That’s the difference between reciting definitions and demonstrating understanding.

Practice explaining these concepts out loud. Draw the diagrams. Write the YAML. Build the mental model of how these pieces connect. When you can explain why Netflix uses StatefulSets for Cassandra but Deployments for their API gateways, you’re ready for the interview.

Leave a Reply