Watching the pennies while designing for High Availability

I was a little obsessed with “The queue ” being a Brit it was hard not to be . The queue had to be paused a few times and pleas were made for people to not join the queue . The pausing of the queue got me thinking about high availability ( I had been thinking about this for my day job too ).

But my thoughts didn’t stop there. I began thinking of the current world economic situation & how this has led to folks being even more cost conscious than ever before.

Which led to this post where I discuss how you can optimise the cost of your cloud deployment while still achieving your SLOs

Things fail all the time but you can architect your application to try and mitigate against interruptions.

I have placed the attributes that you need to consider for designing for High availability when the purse strings are tight into 4 categories.

  • Implementation of foundational archetypes
  • Autoscaling
  • Go smaller/ cheaper
  • Implementing guardrails

And although I am writing about the cloud I am most used to Google Cloud , the categories can be applied to the cloud of your choosing. You just need to replace the products with the appropriate substitutes from that cloud provider.

Foundational Archetypes

Be cognizant that when designing for high availability there are essentially two paradigms you should take into consideration and they should be used together.

  • Designing for availability which at a low level focuses on redundancy, fault detection and automated correction to help you meet your SLOs.
  • DR planning which is focused on recovery actions in the event of adverse conditions.

You will see that I refer to techniques in this post that will fall into one of these paradigms. They are related but orthogonal.

Define your SLOs . Use them to figure out what can be achieved with an active HA architecture and what needs to be taken into consideration for your disaster recovery plan .

Break your application into parts so you can apply the appropriate SLOs to the various parts of your application. The most stringent SLOs are not necessarily required for all of your application.

Choose appropriate DR patterns (cold , warm or hot) for the relevant areas of your application that will help you meet your RTO & RPO values, your SLOs and your budgetary constraints.

If your availability requirements require it, build out a highly distributed application across 2 or more regions. This will probably have higher costs as you may have to double up on resources, and use some premium services such as the premium Network Service Tier .

A highly available distributed architecture may also have added complexity which has an operational overhead.

Based on your availability and RTO/RPO requirements, it may be necessary to duplicate data into a separate replica of the database. This is arguably the simplest way to address ensuring you can recover access to your data in the event of a disaster with the smallest RTO , but if you have petabytes of data that may be a huge cost inhibitor as a technique.

This may seem a strange one but you can manage costs while improving resilience by reducing your operational overhead. This boils down to introducing automation techniques to configure and maintain your cloud environment or automating responses to typical events .

Learn an Infrastructure as Code tool so you can create templates that define the configuration of your cloud infrastructure. You can use the templates to create a replica of your infrastructure if your production environment becomes unavailable.

Terraform is a popular choice

Adopt event driven architecture techniques . Eventarc allows you to build event-driven architectures without having to implement, customize, or maintain the underlying infrastructure reducing the operational overhead even further.

Adopt managed services. Using managed services allows you to minimize the operational overhead of managing services yourself.

Next, you need to build upon these foundational archetypes and adopt a set of architectural techniques or patterns that fit your workloads helping you get close to your desired RTO & RPO & SLO values without incurring more costs than you can afford .


There needs to be a balance between managing costs and actually providing a service that can scale up to meet the demands when needed and then to scale back down releasing resources during lower demand periods.

Implement autoscaling with a sensible and affordable maximum on how far you can scale up. You may also want to set a minimum that is not 0 so that your application is ready to respond to requests even during low demand periods

Each product that can be autoscaled has a way to set the maximum and minimum value, and you can use Cloud Monitoring to keep track of your scaling levels and even alert on unexpected rapid scaling events.

I’m only going to call out services that have an autoscaler.

Compute engine

Managed instance groups (MIG) have autoscaling capabilities that let you automatically add or delete virtual machine (VM) instances from a MIG based on increases or decreases in load . Whatever criteria you use to trigger an autoscaling action in the policy you have to set the _maxNumReplicas _parameter which sets the cap on the maximum number of VMs that the autoscaler will allow .

You can turn off an autoscaler to temporarily prevent it from scaling your MIG.

Google Kubernetes engine

You can automatically resize your Standard Google Kubernetes Engine (GKE) cluster’s node pools based on the demands of your workload by using the cluster autoscaler which allows you to specify the minimum and maximum size for each node pool in your cluster

Data services

It’s not only compute that you can autoscale some of Google cloud’s managed database services also have autoscaling features:

Autoscaler tool for Cloud Spanner is an open source companion tool to use with Cloud Spanner that lets you scale the compute capacity nodes or processing units . As with the compute scaling features you set minimum and maximum values for the The scaling method you select.

Autoscaling Bigtable can help you optimize costs because Bigtable reduces the number of nodes in your cluster whenever possible. This can help you avoid over-provisioning. You can set a minimum and maximum number of nodes . When you configure autoscaling for Bigtable you need to be aware that if a cluster has scaled up to its maximum number of nodes and the CPU utilization target is exceeded, requests might have high latency or fail. If a cluster has scaled up to its maximum number of nodes and the storage utilization limit is exceeded, write requests will fail.

Go Smaller/Cheaper

Smaller versions of products and thinking of micro services are ways to help manage costs yet still achieve your availability goals. Referring back to autoscaling, each micro service can be independently scaled, which depending on your architecture may be more efficient than scaling each instance of a monolith.


Google Cloud has GCE machine types which vary in terms of CPU and memory and you can configure the storage you need for your workloads optimizing for the correct machine type for your workload . You can create smaller GKE clusters, or use GKE Autopilot to rightsize your clusters to the demands of your workloads.

Spot VMs are virtual machine (VM) instances that make use of excess Compute Engine (GCE) capacity. Spot VMs have significant discounts, but GCE might preemptively stop or delete Spot VMs when it needs to reclaim that capacity.

The preemptive nature of spot VMs means that you need to plan carefully how you incorporate them into your workload. If your workloads are stateless and/or fault-tolerant and can withstand possible VM preemption, Spot VMs can reduce your GCE costs significantly. In addition to vanilla GCE you can use Spot VMs with GKE and with GCE based products like dataproc .

A common practice is to use a mix of normal and Spot VMs so that if the Spot VMs are pre-empted, you do not use all of your capacity.

But why not go further? and see how small you really can go. Do you really need a whole VM? or can you use Cloud Functions or Cloud Run instead?

All customers who use Cloud Run can take advantage of the monthly free tier

If you use Cloud Functions the first 2 million invocations per month are free .

Data services

Cloud Spanner has granular instance sizing , Which provides a way to use Cloud Spanner at a much lower cost while still designing to meet your availability goals - you get proportional resources for proportional price

BigTable can be configured to optimise costs by looking at various parameters such as HDD versus SSD storage, replication or just backups , The number of nodes through to where it is located in comparison to your application that accesses it to minimize network egress costs. The BigTable pricing docs are very comprehensive with examples showing how to calculate costs

Firestore has a pay-per use model with a free tier where you only pay for the storage used and the read/write/delete requests you make. This is a great choice for small or rarely used databases.

Network tiers

Network configuration you can choose either premium or standard Network Service Tiers . Standard Tier offers a lower-cost alternative for the following use cases:

  • You have applications that are not latency or performance sensitive.
  • You’re deploying VM instances or using Cloud Storage that can all be within a single region.

Cloud load balancing

If you are trying to minimise the use of premium networking services choose a load balancer type that is compatible with standard tier . Understand the trade offs you are making in terms of resilience If you choose a regional load balancer The network service tiers decision tree can help you design an optimal network . You can mix and match using Standard Tier for some resources and Premium Tier for others

Implementing guardrails

Org Policies

As a cost-saving measure, you can enforce the use of single regions by setting an organization policy that restricts resource locations. By restricting resources to a single region you can reduce costs and still have the ability to design across zones. Maybe a single zone is good enough though say for test and dev environments?

Managing Quotas

Setting a low quota helps mitigate unintentional high cost from misconfigured resources. You can achieve this by capping your API requests . A better approach is to implement a Quota override

Cloud armor

Configure Cloud Armor to block requests using rate limiting rules so they do not transit through the load balancer, this effectively reduces the amount of outbound data processed by the load balancer. Configuring rate limiting can also help to increase resilience as well as avoid excessive costs incurred as a result of accidental or malicious spikes in traffic.