How to reduce the cost of your GKE clusters — and other GCP services…

Stephane Karagulmez
6 min readJul 15, 2020

In this article I am going to give you cost optimization recommendations regarding GKE. The list will not be exhaustive but it will for sure make you spend less money !

Keep in mind that your goal has to be : maximize business value while optimizing cost, keeping in mind the most effective and efficient use of GCP services.

As always in my articles, the point is to give you the right GCP documentation pointers, not to extract every useful information from it. The GCP documentation is full of useful information and you should absolutely read it.

Optimize your environment

Before talking about GKE you can start by optimizing your environment.

Most of the time you have many other GCP services evolving around GKE. For many of this services Google Cloud offers guides where you can find best practices regarding cost optimization

Tons of good information in this blog post : https://cloud.google.com/blog/products/gcp/best-practices-for-optimizing-your-cloud-costs

Export your billing data to find the best optimization solution

GCP offers a dashboard in the billing section where you can find a lot of useful information.

But sometimes you are going to need more insights.

For example, what if you want to find which BigQuery queries cost you the most ?

In order to do this you have to activate billing export : https://cloud.google.com/billing/docs/how-to/export-data-bigquery

Then you can use a lot of interesting data studio dashboards on top of the created datasets : https://cloud.google.com/billing/docs/how-to/visualize-data

Another way to use the exported logs is to query from BQ. Here are some examples of useful queries : https://github.com/GoogleCloudPlatform/bigquery-utils/tree/master/scripts/billing

Regarding GKE, GCP offers a specific data studio dashboard, the overall solution is called GKE usage metering : https://cloud.google.com/blog/products/containers-kubernetes/use-gke-usage-metering-to-combat-over-provisioning

Setup budget, labels and alert

Having a good cost monitoring is a cost optimization solution.

Because you can only have a real impact if you know what costs you the most.

You don’t want to say : “This month was cheaper than last month”. You want to say : “I reduce the cost of this instance by 50% because it was the most expensive of my instances”.

Start by adding labels to your ressources, labels provide a convenient way for developers and administrators to organize resources at scale.

https://cloud.google.com/blog/products/gcp/use-labels-to-gain-visibility-into-gcp-resource-usage-and-spending?hl=nl

When you have labels on your ressources you can create alerts based on budget consumption.

I recommend you to use the budget API instead of doing this by hand.

https://cloud.google.com/blog/products/management-tools/monitor-cloud-costs-and-create-budgets-at-scale

If you love to work with the interface: https://cloud.google.com/billing/docs/how-to/budgets#create-budget

Committed use discounts

Committed use discounts (CUDs) are spend-based or resource-based discounts, depending on the Google Cloud service offering the discounts. When you purchase CUDs, you receive discounted prices in exchange for your commitment to use either a minimum level of resources or spend a minimum amount for a specified term of one or three years. Upon purchase, you are billed a monthly fee for the duration of the term you selected, whether or not you use the services.

Easy to use, CUD offers a fast and easy way to reduce your cloud spend. In order to know how much you should commit, GCP offers a set of dashboards you can use : https://cloud.google.com/billing/docs/how-to/cud-analysis-resource-based

These dashboards will help you understand your daily usage regarding CPU and RAM, but also the percentage of CUD you are using.

It is important to understand that you can do as many CUDs as you want. They can stack with each other.

With this in mind, you don’t have to commit the full amount of CPU and RAM in one CUD.

The recommended way is to Crawl, Walk and Run. Start with a small commit and take the time to understand the dashboard. When you feel ready you can then extend the reservation with another CUD.

If you activated the BQ billing export you can also find some queries to help you monitor your CUD usage here : https://github.com/GoogleCloudPlatform/bigquery-utils/tree/master/scripts/billing

Preemptible instances

One of the best ways to reduce your spend within GKE is to use Preemptible instances.

It’s required to fully understand the limitations of these instances before migrating your production workloads :

Preemptible instances function like normal instances but have the following limitations:

  • Compute Engine might terminate preemptible instances at any time due to system events. The probability that Compute Engine will terminate a preemptible instance for a system event is generally low, but might vary from day to day and from zone to zone depending on current conditions.
  • Compute Engine always terminates preemptible instances after they run for 24 hours. Certain actions reset this 24-hour counter.
  • Preemptible instances are finite Compute Engine resources, so they might not always be available.
  • Preemptible instances can’t live migrate to a regular VM instance, or be set to automatically restart when there is a maintenance event.
  • Due to the above limitations, preemptible instances are not covered by any Service Level Agreement (and, for clarity, are excluded from the Compute Engine SLA).
  • The Google Cloud Free Tier credits for Compute Engine do not apply to preemptible instances.

Most of the time you want to use both standard and preemptible instances in your cluster. You can have two node-pools and use node-affinity to help you manage the pods scale in your cluster.

If you didn’t read it yet, I recommend you this article to learn more about how to be ready for preemptible usage : GKE checklist for production

There is also a really good GKE add-on you should use called ballast.

The idea is to randomly kill PVM to avoid a full shutdown of your node-pool after 24 hours.

Use of second generation or AMD instances

Based on my experience, the following is something that most people tend to overlook.

GCP offers multiple types of instances (medium, large, etc..) but also multiple generations (E2, N1, N2, M1). You should always check if a new generation can help you reduce the cost of your cluster.

As an example, if you are running web serving applications on GKE, do you know that N2 is cheaper and has better performance than N1 ?

Read the documentation regarding machine types : https://cloud.google.com/compute/docs/machine-types#recommendations_for_machine_types

Blog post about N2D : https://cloud.google.com/blog/products/compute/announcing-the-n2d-vm-family-based-on-amd

Move some workloads to Cloud Run

I would highly recommend this article: https://medium.com/@hariharananantharaman/migrating-from-gke-to-cloudrun-f713b2514d2

The idea is not to move everything out of GKE. You should ask yourself the following question : what can I move out of GKE to help the scheduler do a better job ?

A lot of your workloads don’t require high scalability, nor does they require daily updates and canary releases. For such workloads, Cloud Run can be a good fit.

Cloud Run offers scaling down to 0, auto-scaling, A/B testing, integration with GCP products such as Cloud SQL and runs on top of Kubernetes.

Node auto provisioning

Most of the GKE users are aware of Cluster Auto-scaler. When demand is high, cluster autoscaler adds nodes to the node pool. When demand is low, cluster autoscaler scales back down to a minimum size that you designate.

Without Node auto-provisioning, GKE considers starting new nodes only from the set of user created node pools. With node auto-provisioning, new node pools can be created and deleted automatically.

This can have a huge impact on your spend. You don’t have to size the node-pool but you can let GKE choose the best node-pool configuration for you.

Currently, Node auto-provisioning will only consider creating node pools with machines from N1 machine types with up to 64 vCPUs.

Node auto-provisioning supports creating node pools based on preemptible virtual machine (VM) instances.

More information in this blogpost : https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-provisioning

Conclusion

This article is not exhaustive and there are as many ways to reduce cost as there are GCP users. If you have any GKE best practices to reduce cost that you want to share, feel free to comment.

--

--

Stephane Karagulmez

J’écris pour le plaisir. Je suis passionné par le cloud et Kubernetes. Si vous souhaitez discuter de ces sujets: Par MP ou en live sur twitch