Kubernetes: One Cluster or Many?

February 8, 2019 Cornelia Davis

Choices, choices. One of the many strengths of Kubernetes is just how much flexibility you have when deploying and operating the orchestrator for your containerized workloads. You get to choose the number of nodes, pods, containers per cluster and a host of other parameters to fit your needs. But, as always, flexibility comes with responsibility—especially when Kubernetes will be running in production and serving internal and external business customers.

Kubernetes By The Book

If you look at the Kubernetes documentation you’ll find some advice:

“The selection of the number of Kubernetes clusters may be a relatively static choice, only revisited occasionally. By contrast, the number of nodes in a cluster and the number of pods in a service may change frequently according to load and growth.”

and

“At v1.12, Kubernetes supports clusters with up to 5000 nodes. More specifically, we support configurations that meet all of the following criteria:

  • No more than 5000 nodes

  • No more than 150000 total pods

  • No more than 300000 total containers

  • No more than 100 pods per node”

But this guidance assumes that a decision has already been made for you, that your Kubernetes capacity will be made available as a single, or very few, large Kubernetes clusters—5000 nodes is big! But what if that wasn’t predetermined for you? What if you could choose whether you wanted fewer larger Kubernetes clusters or a larger number of smaller ones? What’s the guidance then?

One Cluster or Many? Real World Experiences

Initially the above guidance—which is biased toward fewer, larger clusters—was all that we had, but, production user experiences have begun to influence the industry’s thinking here. Let’s look at two cases in particular.

Monzo Bank is a digital bank that has grown rapidly since its founding in 2015. They provide a superior mobile experience to their users through constant innovation, rolling out new features, and making changes based on user feedback. Their technology stack enables this highly responsive, agile way of working and at the core is Kubernetes. Of the many things they have spoken about publicly—and transparency is one of their core values—a particularly interesting one came with their keynote at Kubecon in 2018 where they described in great detail an outage of their production Kubernetes cluster. The outage came from an incompatibility between Kubernetes and another component they were using with Kubernetes, Linkerd. The bug only showed itself because something that was otherwise innocuous existed in the cluster. In the end it resulted in almost one and a half hours of Kubernetes downtime. And with that, all apps running in K8s were down.

Zalando, an online fashion platform,  took a different approach to cluster configuration with their Kubernetes implementation.  While both Monzo and Zalando are doing very clever things with Kubernetes, there is one striking difference in their approaches. Zalando has close to 100 Kubernetes clusters to Monzo’s single cluster. In this 2018 Kubecon video from the same event as Monzo’s keynote, Mikkel Larsen, a software engineer at Zalando, describes their multi-cluster approach, citing both team autonomy and reliability as two advantages. Managing clusters at this scale requires additional tooling, and that’s what the bulk of their presentation covered.

 

Separate Your Concerns with Multiple Clusters

Operating multiple clusters helps separate workloads and tenants, enabling higher availability, providing greater levels of isolation between tenants, and customizing maintenance lifecycles to suit individual workloads.

Improved Availability

As the Monzo Bank example demonstrates, issues with the correct functioning of a cluster likely has a global impact on every workload or tenant running in the cluster. By giving each workload their own cluster, you restrict the blast radius of cluster issues to individual workloads while other clusters can continue to function normally.  

Better Isolation

Kubernetes has the namespace abstraction to help enable multiple workloads to operate within the same Kubernetes cluster, but it’s effectively a ‘soft’ multi-tenancy model. There are multiple cluster components that are shared across all tenants within a cluster, regardless of namespace. These shared components include the master components such as the API server, controller manager, scheduler, and DNS, as well as worker components such as the Kubelet and Kube Proxy. Sharing these non-namespace-aware components between tenants necessarily exposes tenant resources to all other tenants in the cluster or on the same worker node—that’s why it’s called soft multi-tenancy. For example, tenant 1 can see the services that tenant 2 has published into the shared cluster DNS.

Customized Operational Lifecycle

Putting tenant isolation concerns to one side, another consideration is whether all tenants should follow the same operational lifecycle imposed by the shared Kubernetes cluster. This includes a shared IaaS resource (for example, if you wanted GPU support), Kubernetes version, upgrade cycles, CVEs, and outages. While you may choose to synchronize lifecycles across clusters, there will likely be occasions where you need to make exceptions. Multi-cluster makes this possible.

PKS - A Vending Machine for Kubernetes Clusters

Okay, so maybe you’re now thinking that multiple clusters is a good idea, but are also wondering how you’ll possibly be able to manage dozens or hundreds of clusters with your small team. If this is the case, I’ve got you!

Pivotal Container Service (PKS) is designed from the ground up to make operating a dynamic multi-cluster deployment a breeze. With it you can quickly & easily create and destroy clusters on-demand from a set of templates. A critical OS or Kubernetes vulnerability is made public? Roll out the patch to all of your clusters through a single workflow. Have teams that are constantly calling for the latest, greatest version of Kubernetes when new versions become available roughly every three months? Either delegate control to cluster owners or centrally manage the upgrade schedule for subsets of clusters. Just survived Black Friday and want to reclaim some resource? Apply scaling policies uniformly across your Kubernetes ecosystem.  

By making multi-cluster Kubernetes easy, PKS addresses the challenges of cluster availability, soft multi-tenancy and a shared operational lifecycle associated with operating a few large clusters. In the following video I dig into the details.

For an independent perspective on PKS and other Kubernetes solutions, Gartner’s Market Guide for Container Management Software outlines capabilities, use cases, and representative vendors to help you navigate this dynamic landscape. Download the guide for free.

To get started with PKS and learn more, visit http:/pivotal.io/pks  


More on this topic

Webinar: 6 Things You Need to Know to Safely Run Kubernetes

Webinar: Cloud-Native Operations with Kubernetes and CI/CD

About the Author

Cornelia Davis

Cornelia Davis is Vice President of Technology for Pivotal.

Previous
Start Your (Kubernetes) Engines: VMware PKS Competency Goes Live
Start Your (Kubernetes) Engines: VMware PKS Competency Goes Live

VMware launched their PKS competency for VMware partners today. Learn what VMware PKS competency is, and wh...

Next
Root Cause of an Application Outage on Kubernetes, and How We Fixed It
Root Cause of an Application Outage on Kubernetes, and How We Fixed It

A story of an outage one application took while running on Kubernetes, how we determined the root cause, an...