Kubernetes Cost Optimisation

Kubernetes has become the default platform for containerised workloads. It is battle-tested, extensible, and deeply capable. It is also one of the most effective ways to accidentally double your cloud bill if you configure it without a deliberate cost strategy from day one.

Across our cloud engagements, the same five patterns appear in every over-spend situation. The good news: fixing them is mostly a configuration and process exercise, not a re-architecture one. These patterns have reduced monthly infrastructure spend by an average of 40% for the clients where we have applied them systematically.

40%

average monthly bill reduction

3–6 wks

typical time to implement all five patterns

additional tooling cost in most cases

1. Right-sizing resource requests and limits

The most common source of Kubernetes waste we encounter is over-declared resource requests. When developers set CPU and memory requests, they tend to be conservative — sometimes extremely so. A service that uses 150m CPU in production might have requests set to 1000m because "that's what the vendor recommended" or "we didn't want it to be throttled."

The cluster scheduler uses requests to determine where pods are placed. Over-declared requests cause nodes to appear full when they have significant spare capacity, triggering unnecessary scale-out events. The fix is to use kubectl top pods data (or better, a tool like Goldilocks or Vertical Pod Autoscaler in recommendation mode) to measure actual consumption over a representative period, then set requests to the 90th-percentile of observed usage.

Pro Tip

Never set requests by guessing. Run your workloads for at least two weeks, collect P90 CPU and memory usage metrics, and use those as your request values. Set limits at 150–200% of requests to allow for burst headroom.

2. Namespace resource quotas and LimitRanges

Without namespace-level governance, individual teams can gradually inflate their resource footprint without any visibility into the cluster-wide cost implications. A ResourceQuota object defines maximum total resource consumption for a namespace. A LimitRange sets default requests and limits for containers that do not declare them explicitly — closing the gap where teams deploy without any resource declarations at all.

We recommend a quarterly review cycle where namespace owners receive a report of their actual vs. allocated consumption and are asked to justify allocations that exceed 60% utilisation. This alone has prompted voluntary rightsizing in every organisation where we have introduced it.

3. Horizontal and Vertical Pod Autoscaler, used together correctly

The Horizontal Pod Autoscaler (HPA) scales the number of pod replicas based on observed metrics — typically CPU utilisation or custom metrics from Prometheus. The Vertical Pod Autoscaler (VPA) adjusts the resource requests of individual pods based on historical usage. They are complementary, but using both on the same workload requires care — VPA and HPA on CPU will conflict. The recommended pattern is: HPA on custom metrics (request rate, queue depth) and VPA on CPU/memory in recommendation mode only.

For batch workloads or applications with variable traffic, KEDA (Kubernetes Event-Driven Autoscaling) provides event-source-driven scaling that avoids the continuous polling cost of CPU-based HPA.

4. Spot and preemptible instances for non-critical workloads

Spot instances (AWS), preemptible VMs (GCP), and Spot VMs (Azure) offer discounts of 60–90% compared to on-demand pricing. The catch is that they can be reclaimed by the cloud provider with 2 minutes' notice. The engineering investment required to tolerate spot reclamation is well worth it for the right workloads: stateless web services, batch processing, CI/CD runners, and dev/staging environments.

The key implementation detail is using a mixed node pool strategy: a baseline of on-demand nodes sized to handle minimum traffic, with spot nodes handling the majority of capacity. Tools like Karpenter (AWS) and Cluster Autoscaler with spot support make this straightforward to manage.

5. Eliminating idle resources and ghost workloads

In any cluster that has been running for more than six months, there will be workloads that were deployed for a specific purpose and never cleaned up. Staging deployments that outlived their branch. One-off debug pods. Load-testing namespaces. PersistentVolumeClaims attached to nothing. These "ghost workloads" silently consume CPU, memory, and storage every day.

We run a quarterly ghost workload audit on every cluster we manage: identify pods with zero inbound traffic for 30+ days, PVCs not mounted by any pod, and deployments in namespaces with no active development activity. In a medium-sized cluster, this audit typically recovers 8–15% of spend.

Tooling that makes this sustainable

The five patterns above are not a one-time fix — they require ongoing discipline. The tooling that makes them sustainable in practice:

Kubecost or OpenCost — real-time cost visibility broken down by namespace, deployment, and label
Goldilocks — VPA-based resource recommendation dashboard
Karpenter (AWS) or Cluster Autoscaler — intelligent node provisioning with spot support
Prometheus + Grafana — utilisation dashboards with alerting on low-utilisation namespaces
kubectl-cost CLI plugin — on-demand cost breakdown from the terminal

The infrastructure is already in most clusters. The missing ingredient is almost always the process: assigned ownership, regular review cadences, and making cost a first-class metric alongside reliability and performance.

If your Kubernetes bill has been growing faster than your traffic, get in touch — we offer infrastructure cost audits that typically pay for themselves within the first month.

Kubernetes cost optimisation: five patterns that cut our clients' bills by 40%

1. Right-sizing resource requests and limits

2. Namespace resource quotas and LimitRanges

3. Horizontal and Vertical Pod Autoscaler, used together correctly

4. Spot and preemptible instances for non-critical workloads

5. Eliminating idle resources and ghost workloads

Tooling that makes this sustainable

More from the blog

Building a real-time analytics pipeline: architecture decisions that matter at scale

Why most AI projects fail — and what the successful ones do differently

The OWASP Top 10 for 2026: what's changed and how to adapt your stack