Back to Blog
7 min read

Kubernetes cost optimisation: five patterns that cut our clients' bills by 40%

Container orchestration platforms are powerful — and easy to over-provision. We walk through the five techniques our infrastructure team applies to every Kubernetes deployment to eliminate waste without sacrificing reliability.

Kubernetes has become the default platform for containerised workloads. It is battle-tested, extensible, and deeply capable. It is also one of the most effective ways to accidentally double your cloud bill if you configure it without a deliberate cost strategy from day one.

Across our cloud engagements, the same five patterns appear in every over-spend situation. The good news: fixing them is mostly a configuration and process exercise, not a re-architecture one. These patterns have reduced monthly infrastructure spend by an average of 40% for the clients where we have applied them systematically.

40%
average monthly bill reduction
3–6 wks
typical time to implement all five patterns
$0
additional tooling cost in most cases

1. Right-sizing resource requests and limits

The most common source of Kubernetes waste we encounter is over-declared resource requests. When developers set CPU and memory requests, they tend to be conservative — sometimes extremely so. A service that uses 150m CPU in production might have requests set to 1000m because "that's what the vendor recommended" or "we didn't want it to be throttled."

The cluster scheduler uses requests to determine where pods are placed. Over-declared requests cause nodes to appear full when they have significant spare capacity, triggering unnecessary scale-out events. The fix is to use kubectl top pods data (or better, a tool like Goldilocks or Vertical Pod Autoscaler in recommendation mode) to measure actual consumption over a representative period, then set requests to the 90th-percentile of observed usage.

Pro Tip

Never set requests by guessing. Run your workloads for at least two weeks, collect P90 CPU and memory usage metrics, and use those as your request values. Set limits at 150–200% of requests to allow for burst headroom.

2. Namespace resource quotas and LimitRanges

Without namespace-level governance, individual teams can gradually inflate their resource footprint without any visibility into the cluster-wide cost implications. A ResourceQuota object defines maximum total resource consumption for a namespace. A LimitRange sets default requests and limits for containers that do not declare them explicitly — closing the gap where teams deploy without any resource declarations at all.

We recommend a quarterly review cycle where namespace owners receive a report of their actual vs. allocated consumption and are asked to justify allocations that exceed 60% utilisation. This alone has prompted voluntary rightsizing in every organisation where we have introduced it.

3. Horizontal and Vertical Pod Autoscaler, used together correctly

The Horizontal Pod Autoscaler (HPA) scales the number of pod replicas based on observed metrics — typically CPU utilisation or custom metrics from Prometheus. The Vertical Pod Autoscaler (VPA) adjusts the resource requests of individual pods based on historical usage. They are complementary, but using both on the same workload requires care — VPA and HPA on CPU will conflict. The recommended pattern is: HPA on custom metrics (request rate, queue depth) and VPA on CPU/memory in recommendation mode only.

For batch workloads or applications with variable traffic, KEDA (Kubernetes Event-Driven Autoscaling) provides event-source-driven scaling that avoids the continuous polling cost of CPU-based HPA.

4. Spot and preemptible instances for non-critical workloads

Spot instances (AWS), preemptible VMs (GCP), and Spot VMs (Azure) offer discounts of 60–90% compared to on-demand pricing. The catch is that they can be reclaimed by the cloud provider with 2 minutes' notice. The engineering investment required to tolerate spot reclamation is well worth it for the right workloads: stateless web services, batch processing, CI/CD runners, and dev/staging environments.

The key implementation detail is using a mixed node pool strategy: a baseline of on-demand nodes sized to handle minimum traffic, with spot nodes handling the majority of capacity. Tools like Karpenter (AWS) and Cluster Autoscaler with spot support make this straightforward to manage.

5. Eliminating idle resources and ghost workloads

In any cluster that has been running for more than six months, there will be workloads that were deployed for a specific purpose and never cleaned up. Staging deployments that outlived their branch. One-off debug pods. Load-testing namespaces. PersistentVolumeClaims attached to nothing. These "ghost workloads" silently consume CPU, memory, and storage every day.

We run a quarterly ghost workload audit on every cluster we manage: identify pods with zero inbound traffic for 30+ days, PVCs not mounted by any pod, and deployments in namespaces with no active development activity. In a medium-sized cluster, this audit typically recovers 8–15% of spend.


Tooling that makes this sustainable

The five patterns above are not a one-time fix — they require ongoing discipline. The tooling that makes them sustainable in practice:

  • Kubecost or OpenCost — real-time cost visibility broken down by namespace, deployment, and label
  • Goldilocks — VPA-based resource recommendation dashboard
  • Karpenter (AWS) or Cluster Autoscaler — intelligent node provisioning with spot support
  • Prometheus + Grafana — utilisation dashboards with alerting on low-utilisation namespaces
  • kubectl-cost CLI plugin — on-demand cost breakdown from the terminal

The infrastructure is already in most clusters. The missing ingredient is almost always the process: assigned ownership, regular review cadences, and making cost a first-class metric alongside reliability and performance.

If your Kubernetes bill has been growing faster than your traffic, get in touch — we offer infrastructure cost audits that typically pay for themselves within the first month.

More from the blog

Data AnalyticsApril 6, 2026

Building a real-time analytics pipeline: architecture decisions that matter at scale

From event ingestion to dashboard rendering in under 200ms — the choices we made for a client processing 1.2M events per day.

Read more
AI & Machine LearningMay 22, 2026

Why most AI projects fail — and what the successful ones do differently

After 40+ AI implementations, we've identified the patterns that separate projects that deliver ROI from those that stall at the POC stage.

Read more
CybersecurityApril 22, 2026

The OWASP Top 10 for 2026: what's changed and how to adapt your stack

The latest OWASP release shifts the threat landscape in meaningful ways. We break down the three changes that will affect most teams immediately.

Read more
WhatsApp