The CIO's guide to Kubernetes

February 20, 2019

What is Kubernetes?

In a nutshell, Kubernetes is a system for managing application containers. Many people refer to what Kubernetes and similar technologies do using a slightly more technical term: container orchestration. Kubernetes was originally created at Google as a broadly accessible implementation (although by no means a clone) of its Borg and Omega cluster-management systems, and was released as an open source project in 2014.
 
Since then, people have launched untold numbers of Kubernetes clusters and its contributor base counts in the thousands. Google is still the leading corporate contributor, although there are now hundreds of software vendors from around the world that also contribute—from startups to giants like Microsoft. The large, active community has helped Kubernetes add features at a remarkable rate, and although it’s still relatively unpolished from a user-experience perspective, many end-user companies (including large enterprises) are experimenting with Kubernetes and even running it in production.
 
Kubernetes is one of an ecosystem of related open source projects managed by the Cloud Native Computing Foundation (CNCF), a Linux Foundation organization, that’s focused on microservices and other distributed computing technologies, and that has dozens of member companies. If you can think of a relevant software company or cloud provider (or, increasingly, end-user company), the chances are it’s a CNCF member.
 
The question of why Kubernetes is so popular is best answered as two separate questions, with the first being “Why did container orchestration become a thing?” The answer to this is relatively straightforward: Because Docker containers and microservices both took off like wildfire several years ago. Docker containers took off because they were a relatively simple (and open source) way for developers to package an application and all its dependencies (OS, database, etc.) on a laptop and then deploy it to a server unchanged. Microservices took off because many people believe that type of architecture—where an application is broken down into its component parts, which are then managed individually—is an improvement over monolithic architectures in terms of manageability, scalability, deployment and other factors.
 
Naturally, containers became the default packaging for many of these microservices, which can range in number from several to several hundred depending on the application. Multiply that across an organization, even if it’s only re-architecting a small fraction of its applications, and you can see where problems might start to arise. You’re managing myriad containers and services with various properties (duration, function, permissions, etc.) and relationships to each other, and you need something to manage it all. That something is container orchestration.
 
An alternative, or perhaps parallel, explanation, is that some people view containers not so much as a way of packaging microservices, but instead as a more granular form of virtualization. The idea here is that rather than just partition servers into virtual machines using a hypervisor (or renting a cloud instance), organizations can take efficiency to the next level by running multiple applications inside containers on those VMs. This view of the world can support a more traditional lift-and-shift (as opposed to re-architecture) approach to adopting new technologies, but it still requires an orchestration layer to manage all those containers.
 
Which brings us to the question of “Why is Kubernetes so popular?” The answer here is that although Kubernetes wasn’t the first container orchestration platform—and arguably wasn’t the best when it was launched—the world bought the vision that Google was selling. And, in true open source fashion, network effects came into play quickly: as the community of developers and engineers experimenting and contributing to Kubernetes grew, software vendors and cloud providers followed along. 
 
Left with a choice between developing their own container orchestration platforms in relative solitude or embracing Kubernetes and its huge, fast-moving community, the IT world by and large chose the latter. As it turns out, microservices and container orchestration are very promising to to a lot of potential buyers, which is why nearly every vendor selling anything to do with cloud computing, devops, data management or infrastructure now has a Kubernetes play.
 
For a more detailed description of how Kubernetes works, read this explanation by the Kubernetes project team.
 

How does it relate to other things I care about?

Kubernetes, like all other enterprise technologies, is not the be-all, end-all to problems that IT organizations face, and does not operate in a vacuum. If you’re considering deploying Kubernetes, the chances are you’ve also deployed, considered, or at least read about other next-big-thing technologies over the past several years. Because there’s merit to all of those things, it’s worth thinking about how they relate to Kubernetes and why there’s likely a place for all of them in your data center (or cloud).
 

Containers

As explained above, Kubernetes and containers are intrinsically connected. Without containers—at least for the time being—there’s no use for Kubernetes. They are the unit of packaging with which it is concerned. Even as the project evolves to address new abstractions such as serverless computing (aka functions or lambdas), Kubernetes is still spinning up containers in order to execute tasks.
 
If there’s one major shift that has taken place (aside from new features and capabilities), it’s the decoupling of Kubernetes from the Docker container runtime. While Docker is still the most popular format among the vast majority of Kubernetes users (and container users overall), Kubernetes support for the Open Container Initiative format opens it up to a broader potential range of applications. Essentially, Kubernetes is becoming the center of value in the container universe, while containers themselves become commodities.
 
However, the idea of containers predates even Docker* and extends beyond it. If the concern is simply around resource isolation and efficient resource utilization, rather than everything that comes with packaging and consuming applications as Docker containers, technologies such as Linux cgroups, Cloud Foundry, and others support the containerization of workloads in their own native ways.
 
(*”Docker,” in this case, is used as a catch-all term referring to the original open source container format and runtime. In 2017, Docker—the commercial entity—moved many of those core pieces into an open source project called Moby, but most people still just say “Docker” (and occasionally “docker” with a lowercase d to distinguish from the commercial entity).)
 

Cloud-native computing

Because it’s the flagship CNCF project and has garnered so much popularity, Kubernetes is generally considered the epicenter of the cloud-native world. However, Kubernetes isn’t technically a requirement for doing cloud-native computing, nor is it the only technology necessary to manage cloud-native applications. That being said, many organizations’ cloud-native strategies do begin with Kubernetes, and most (if not all) CNCF projects include Kubernetes compatibility as a first-order feature.
 
Cloud-native computing is typically defined more generally as the practice of building applications that are designed to take advantage of the cloud computing delivery model. In practice, this means microservices architectures; “new” service packaging such as containers or “serverless” functions; and application-lifecycle advances such as continuous testing, monitoring and delivery. Depending on user preferences, the tools for doing cloud-native computing can be done in the cloud or locally, and utilize open source or proprietary tools/services.
 
However, doing cloud-native responsibly means complementing a container-orchestration platform (Kubernetes or something else) with a collection of other tools around, among other things, application management, CI/CD, security, monitoring, networking and service discovery. This is especially true when using pure open source tools that don’t package up necessary components and capabilities out of the box. The CNCF landscape chart above gives an idea of how large and complex this ecosystem can be.
 
 

Platform as a Service (PaaS)

Kubernetes, and container orchestration generally, is often positioned as a more modern alternative to PaaS—an approach to saving developers from underlying infrastructure concerns that first caught on with offerings such as Cloud Foundry, Heroku and Google App Engine. This is mostly incorrect, and based largely on the notion that more customization is necessarily better. 
 
Although both PaaS and container orchestration (sometimes referred to as Containers as a Service, or CaaS) both abstract some degree of infrastructure interaction, the latter still requires developers to touch a not-insignificant amount of infrastructure, while PaaS provides a much higher level of abstraction and opinionation. Thus, developers and operators focus more of their attention on things that provide value (e.g., application code and performance tuning) and less time doing what Amazon CTO Werner Vogels used to describe as “undifferentiated heavy lifting” (e.g., worrying about operating systems and server types, or managing bespoke platform integrations).
 
However, there are tradeoffs that make CaaS/Kubernetes more compelling for certain workloads. While PaaS makes it relatively simple to push code live and connect to components such as databases and messaging queues, the heavy-touch nature of Kubernetes can be preferable for services that require “lifting and shifting” of legacy applications, or other custom requirements. In the end, the question of PaaS versus CaaS often comes down to the level of control or customization required for each application, and the level of cluster management that an organization is willing to take on.
 
The diagram at the end of this section illustrates where these differences generally lie, and what an organization is responsible for when it decides to deploy an application using PaaS, managed Kubernetes or pure open source Kubernetes. 
 

Serverless

For the sake of this guide, serverless computing is defined as event-driven infrastructure or Functions as a Service (FaaS)—that is, an application or task that only consumes resources when triggered by a predefined event type, and is otherwise scaled down to zero—rather than simply any cloud service that doesn’t require setting up a server (e.g., Google BigQuery). In this light, the relationship between Kubernetes and serverless can be a little complicated because many people (arguably incorrectly) position it as a discussion of serverless versus containers
 
A more useful way to think about serverless with regard to Kubernetes is to think about it like you might with PaaS. Serverless is a higher-order abstraction (arguably the highest-order abstraction) that today is useful for certain types of tasks, whereas there are other workloads that will absolutely benefit more from the level of control that Kubernetes provides. And although serverless products and projects are maturing quickly, they’re still quite a way behind containers in terms of enterprise-class features and proven use cases.
 
All that being said, the connection between Kubernetes and serverless is actually growing stronger by the day, thanks in large part to the introduction of Knative. Knative is an open source project meant to simplify the process of launching serverless functions on a Kubernetes cluster, and is already the foundation of commercial products by Google, Pivotal and others. The idea here is that Kubernetes can serve as the platform for numerous abstractions and workload types, thereby eliminating the decision of having to choose one over the other.
 

Microservices

Hopefully, this has been made clear already, but just to clarify: Kubernetes can be a critical part of an organization’s microservices architecture, but it doesn’t have to be. Depending on an organization’s needs, PaaS, FaaS and other types of platforms can also serve the purpose of managing containers, functions or whatever other form factor a microservice might take. 
 
Furthermore, many companies also use Kubernetes as part of a “lift-and-shift” strategy. There,  the goal isn’t re-architecting existing applications, but rather is packaging existing applications as containers (more or less as is) and running them on a re-architected, Kubernetes-based platform. It’s an attempt to modernize operations without undergoing the legwork of modernizing applications, some of which might function perfectly well in their current state.
 
Still, the bottom line is that companies deploying microservices at any reasonable scale will require tooling to manages the myriad connections, communications and dependencies among those services. And when they’re packaged and managed as containers, Kubernetes plus its ecosystem of service mesh, monitoring and other tools is certainly a popular and fast-maturing option.
 

Big data

This might not seem like a natural connection given the original stateless nature of Kubernetes applications, but the worlds of Kubernetes and data are merging quite fast. In fact, it’s quite common to not only connect Kubernetes services to stateful data systems such as storage, databases, Spark and Kafka, but also to run those systems as services on Kubernetes. Incidentally, there’s also a common assumption that Kubernetes will kill any lingering excitement over Hadoop, because Kubernetes provides a simpler and more flexible orchestration layer for data services, as well as support for a broader range of storage platforms.
 
There are a couple of reasons for this coming-together of worlds. One is that Kubernetes is an integral part of many modernization efforts, which also include machine learning/artificial intelligence, IoT and other data-heavy components. It only makes sense that integration between these systems is a priority. The other reason is that more and more legacy applications are actually being ported to containers running on Kubernetes clusters, and these applications require stateful storage from a wide variety of sources.
 

 

What else do I need to run Kubernetes?

Kubernetes is often described as a platform, but, for better or worse, that description isn’t entirely accurate. As noted elsewhere in this guide, while Kubernetes excels at managing the relationships among containers and scheduling them onto lower-level resources, it intentionally does not tackle related functionality such as advanced networking, service discovery, monitoring, and certain security capabilities. This is where the ecosystem of CNCF projects, as well as work from other open source projects and commercial vendors to enhance the Kubernetes experience, comes into play.
 
A small list of related projects includes: Prometheus (monitoring), Envoy (service proxy), Linkerd (service mesh), Harbor (container registry) and CoreDNS (service discovery). All told, there are approximately three dozen CNCF projects in various stages of development, and even more—including the popular Istio service mesh—that live externally. While many of these projects focus on the operational aspects of Kubernetes, there also are projects such as Buildpacks (a CNCF project) and Knative (not a CNCF project) that focus on the developer experience—in these cases, simply application deployment and support for serverless functions, respectively.
 
The decision to keep the scope of Kubernetes itself quite narrow is probably a good thing. It means (a) that Kubernetes hopefully won’t grow too big and complex for its own good, and (b) that Kubernetes users can benefit from a vibrant ecosystem of modular, best-of-breed components. However, it also means that users need to (1) connect and manage all these pieces on their own, or (2) engage with a vendor(s) that packages many of these capabilities as part of a commercial Kubernetes distribution. 
 
Conventional wisdom probably points toward a commercial distribution as the best option. Beside the manual labor of getting a Kubernetes environment up and running, there are longer-term considerations such as: how to keep up with its quarterly release cycle, how to keep up with CVE patches, and how to launch and manage multiple/many clusters (which is generally considered a best practice, for reasons including security and reliability). 
 
Gartner analysts put it this way in a 2018 report titled How to Choose Your Kubernetes Deployment Model:
 
To become operationally sound for production, Kubernetes requires coupling with many third-party plug-in components, creating significant infrastructure integration work. Integrating multiple open-source components with Kubernetes in a custom manner and maintaining your own system without vendor support is very tough. This model is only viable for organizations that can interact with and contribute to open-source projects.
 
However, even if your organization has human resources who can interact with Kubernetes and other open-source communities, Gartner recommends you avoid building your own Kubernetes system with upstream unless your organization has special customization requirements that market solutions cannot fulfill. 
 
Essentially, the Kubernetes community is a beacon of fast-paced open source innovation and ecosystem-building, but the Kubernetes community is not responsible for managing its users’ applications. Organizations adopting Kubernetes need to be certain they have the time, team and resources to manage everything on their own, or they need to find a partner that’s focused on handling much of the operational heavy-lifting. Google, Netflix, Facebook and their peers are edge cases when it comes to how they do IT; their do-it-yourself approaches are fueled by necessity, enabled by highly skilled engineers, and funded by technology budgets in the billions.
 

What are the alternatives to Kubernetes?

Looking specifically at container orchestration (rather than different abstractions such as PaaS or FaaS), there are several available alternatives to Kubernetes—although all were arguably more viable options several years ago before Kubernetes development and adoption kicked into overdrive. On the open source side, there are Apache Mesos/Marathon/DCOS, Docker Swarm and Nomad, all of which are also backed by enterprise support or commercial versions. On the strictly proprietary side, there also are options like Amazon Elastic Container Service and Microsoft Azure Service Fabric, both of which, of course, are limited to running on their respective cloud platforms.
 
However, it’s worth noting that momentum in the container orchestration space lies almost entirely with Kubernetes. Even the commercial entities backing some of the alternatives now support Kubernetes, and all major cloud providers also offer their own Kubernetes services. The most future-proof guidance to organizations and decision-makers thinking about adopting a container orchestration system is probably to choose Kubernetes. For most organizations, a commercial distribution that tracks with open source releases and eliminates certain operational headaches is probably the best choice.
 

How will it affect my business?

After learning about the history of and ecosystem around Kubernetes, this really is the billion-dollar question. And like most things in IT, the answer is not cut and dried. Kubernetes itself won’t fix business models, improve sales or market fit, improve developer productivity, or slash operations cost. It is not digital transformation in a bottle. But, done right, it can help with all of those things.
 
To use a golf analogy: Kubernetes is more like a set of advanced clubs than it is a magic formula for collecting PGA Tour victory paychecks. One doesn’t get good at golf by purchasing the most-expensive clubs in the store. In fact, those clubs might actually hinder performance because they’re designed for skilled golfers looking for more feel and control. Instead, beginning golfers require clubs that help provide solid contact with the ball and straight shots, as well as lots and lots and lots of practice on lots and lots and lots of things. And they need to learn the rules. 
 
Likewise, a business doesn’t get good (especially not Google-level good) at thriving in a digital-first world simply by deploying Kubernetes. In order to do that, organizations first need to get their culture—both corporate and IT—aligned with a new way of doing things, focused around pipeline-driven automation, continuous deployment, and generally moving faster and fixing faster. Without those fundamental skills and mindsets in place, managing a Kubernetes deployment can add a lot of complexity with relatively little gain on the business side. 
 
Once a company has put the right people and processes in place, and mastered the fundamentals of cloud-native development, deployment and operations, then Kubernetes—like that advanced set of golf clubs—can absolutely empower them to take things to the next level in terms of customization, creativity and efficiency. There’s profit to be had from more dynamic application lifecycles and architectures (stemming from revenue increases as well as internal efficiencies), but it can be hard-earned.
 
Still, given its current pace of development, Kubernetes will likely continue to become easier to use, enable new capabilities and subsume more legacy workloads. Even if it’s not yet the sole technology on which most large enterprises should bet their futures, it’s certainly worth investigating and is a safe bet for the long term. 
 

Additional reading

 
Previous
Recommendations for Deploying Apache Kafka on Kubernetes
Recommendations for Deploying Apache Kafka on Kubernetes

Next
The Evolution of a Data Platform:  A Journey With Greenplum and Kubernetes
The Evolution of a Data Platform: A Journey With Greenplum and Kubernetes

Pivotal Greenplum has undergone myriad changes since it’s initial release more than a decade ago. Get the l...