Kubernetes has become the default platform for containerised workloads. It is battle-tested, extensible, and deeply capable. It is also one of the most effective ways to accidentally double your cloud bill if you configure it without a deliberate cost strategy from day one.
Across our cloud engagements, the same five patterns appear in every over-spend situation. The good news: fixing them is mostly a configuration and process exercise, not a re-architecture one. These patterns have reduced monthly infrastructure spend by an average of 40% for the clients where we have applied them systematically.
1. Right-sizing resource requests and limits
The most common source of Kubernetes waste we encounter is over-declared resource requests. When developers set CPU and memory requests, they tend to be conservative — sometimes extremely so. A service that uses 150m CPU in production might have requests set to 1000m because "that's what the vendor recommended" or "we didn't want it to be throttled."
The cluster scheduler uses requests to determine where pods are placed. Over-declared requests cause nodes to appear full when they have significant spare capacity, triggering unnecessary scale-out events. The fix is to use kubectl top pods data (or better, a tool like Goldilocks or Vertical Pod Autoscaler in recommendation mode) to measure actual consumption over a representative period, then set requests to the 90th-percentile of observed usage.
Never set requests by guessing. Run your workloads for at least two weeks, collect P90 CPU and memory usage metrics, and use those as your request values. Set limits at 150–200% of requests to allow for burst headroom.
2. Namespace resource quotas and LimitRanges
Without namespace-level governance, individual teams can gradually inflate their resource footprint without any visibility into the cluster-wide cost implications. A ResourceQuota object defines maximum total resource consumption for a namespace. A LimitRange sets default requests and limits for containers that do not declare them explicitly — closing the gap where teams deploy without any resource declarations at all.
We recommend a quarterly review cycle where namespace owners receive a report of their actual vs. allocated consumption and are asked to justify allocations that exceed 60% utilisation. This alone has prompted voluntary rightsizing in every organisation where we have introduced it.
3. Horizontal and Vertical Pod Autoscaler, used together correctly
The Horizontal Pod Autoscaler (HPA) scales the number of pod replicas based on observed metrics — typically CPU utilisation or custom metrics from Prometheus. The Vertical Pod Autoscaler (VPA) adjusts the resource requests of individual pods based on historical usage. They are complementary, but using both on the same workload requires care — VPA and HPA on CPU will conflict. The recommended pattern is: HPA on custom metrics (request rate, queue depth) and VPA on CPU/memory in recommendation mode only.
For batch workloads or applications with variable traffic, KEDA (Kubernetes Event-Driven Autoscaling) provides event-source-driven scaling that avoids the continuous polling cost of CPU-based HPA.
4. Spot and preemptible instances for non-critical workloads
Spot instances (AWS), preemptible VMs (GCP), and Spot VMs (Azure) offer discounts of 60–90% compared to on-demand pricing. The catch is that they can be reclaimed by the cloud provider with 2 minutes' notice. The engineering investment required to tolerate spot reclamation is well worth it for the right workloads: stateless web services, batch processing, CI/CD runners, and dev/staging environments.
The key implementation detail is using a mixed node pool strategy: a baseline of on-demand nodes sized to handle minimum traffic, with spot nodes handling the majority of capacity. Tools like Karpenter (AWS) and Cluster Autoscaler with spot support make this straightforward to manage.
5. Eliminating idle resources and ghost workloads
In any cluster that has been running for more than six months, there will be workloads that were deployed for a specific purpose and never cleaned up. Staging deployments that outlived their branch. One-off debug pods. Load-testing namespaces. PersistentVolumeClaims attached to nothing. These "ghost workloads" silently consume CPU, memory, and storage every day.
We run a quarterly ghost workload audit on every cluster we manage: identify pods with zero inbound traffic for 30+ days, PVCs not mounted by any pod, and deployments in namespaces with no active development activity. In a medium-sized cluster, this audit typically recovers 8–15% of spend.
Tooling that makes this sustainable
The five patterns above are not a one-time fix — they require ongoing discipline. The tooling that makes them sustainable in practice:
- Kubecost or OpenCost — real-time cost visibility broken down by namespace, deployment, and label
- Goldilocks — VPA-based resource recommendation dashboard
- Karpenter (AWS) or Cluster Autoscaler — intelligent node provisioning with spot support
- Prometheus + Grafana — utilisation dashboards with alerting on low-utilisation namespaces
- kubectl-cost CLI plugin — on-demand cost breakdown from the terminal
The infrastructure is already in most clusters. The missing ingredient is almost always the process: assigned ownership, regular review cadences, and making cost a first-class metric alongside reliability and performance.
If your Kubernetes bill has been growing faster than your traffic, get in touch — we offer infrastructure cost audits that typically pay for themselves within the first month.