The Evolution of a Data Platform: A Journey With Greenplum and Kubernetes

June 13, 2018 Dan Baskette

Pivotal Greenplum has undergone myriad changes since it’s initial release more than a decade ago, but the pace of development has increased significantly in the last few years. This is largely due to the efforts of the Greenplum development team, which adopted practices like pairing and continuous integration, to simultaneously maintain high-quality code and get new features into the hands of users faster than ever.

Cloudy with a Chance of Machine Learning

During this period of rapid product development, a trend began to emerge. Customers began asking about running Greenplum in the public cloud. While the public cloud providers offer their own analytical database and data warehouse offerings, they don’t provide the enterprise-grade capabilities that our customers require and are accustomed to with Greenplum. We’re talking about capabilities like distributed data loading — possible with Greenplum because of its massively parallel processing engine - and built-in machine learning - possible thanks to Pivotal’s commitment to Apache MADlib and other open source projects that run in-database within Greenplum.

So, Pivotal started the effort to embrace the public cloud as a home for company’s data products. We initially built AWS Cloud Formation scripts to deploy Pivotal Greenplum on AWS. This work has since been replicated on Microsoft Azure, with other IaaS platforms on the horizon. Additionally, support for cloud-specific object storage was added to allow customers to offload/archive data out of the database and into long-term, less performant, but cost-effective storage.

The “State” of Cloud Native Computing

This “push-button” multi-platform deployment methodology definitely attracted some customer attention. The feedback from those customers was that this functionality is great and we’d love to have it within our own on-premises Greenplum deployment, as well.

Meanwhile, the state of infrastructure and platform technology continued to evolve. Specifically, Kubernetes and it’s first-class support of stateful workloads has gained enormous popularity. K8s provides support for these workloads via techniques such as Stateful Sets, which let a unique identifier and storage connections follow a pod regardless of where it’s started. Stateful Sets and related technologies have jump started a new market for databases in K8s.

Greenplum’s “push-button” mentality also aligns well with Pivotal Container Service (PKS). PKS combines the operational power of BOSH with Kubernetes to simplify how enterprises deploy, manage and run Kubernetes clusters. PKS addresses Day 1 and Day 2 operations of Kubernetes clusters, but also includes VMware NSX-T to provide a software-defined network for easier configuration of networking with multiple tenants across multiple K8s cluster. All of this is available for self-provisioning by application teams to provide that “push-button” experience.

Additionally, some of our largest customers expressed interest in deploying the entire Pivotal Greenplum stack. What does that mean? At the time, Greenplum customers relied on a separate vendor for operating system support. Many customers told us they were interested in removing this variable from their implementations by leveraging an embedded OS like they get when using Pivotal Cloud Foundry. This could potentially be a cluster running PKS to provide a Kubernetes installation, and then Greenplum running in Kubernetes leveraging an embedded Ubuntu OS.

Hello Road, Meet Rubber.

At PostgresConf 2018 in New Jersey, Pivotal held the first Greenplum Summit. At the event, our own Goutam Tadi presented “Greenplum Kontained: Coordinating Many PostgreSQL Instances on Kubernetes: Cloud-Native Greenplum.” (You can watch Goutam’s presentation here and check out his slides here.

It was a very early peek into the work the team had been doing in containerizing Pivotal Greenplum and deploying container-based clusters in a Kubernetes cluster. This early work deployed a Kubernetes cluster and then used a Helm chart to install a Greenplum cluster within that K8s cluster. Helm charts provide some lifecycle management hooks that allow timed call-out, such as pre-install or post-install, to handle some configuration tasks associated with the software being installed. While these are called Lifecycle Hooks, they don’t address regular operation of the software as part of the lifecycle and instead are focused only on the install, upgrade, or deletion of the software. This early version did not include automation of any of the day-to-day operations of the database, but these are areas that are currently being developed by the Pivotal team.

Applying Learnings to Deliver Value

Based on this early version of the software, the Greenplum on Kubernetes team grew and are now hard at work building a production-ready Kubernetes Operator for Greenplum. An operator builds on Kubernetes custom resources and custom controllers by coding domain-specific knowledge into a Kubernetes API extension. These customer controllers have access to the Kubernetes API. This domain-specific knowledge allows the operator to monitor the application and perform application-specific tasks in addition to Kubernetes tasks based on the state of the monitored application.

For example, if a node fails, a Greenplum operator could spin up a new Kubernetes node, start the Greenplum containers, join them to the database cluster and initiate a resync, if required. This is a powerful addition to the Kubernetes ecosystem and enables automation of many of the Day 2 tasks associated with running a stateful application. This is the where we start to obtain true value from running these workloads in Kubernetes, value-add functionality above and beyond that of a standard, bare metal deployment.

The team has also done a lot of testing with various storage models within Kubernetes. An exciting development for stateful workloads, such as Greenplum, is the progression of local persistent volumes. Also, the recent addition of local raw persistent volumes opens up even more deployment possibilities that weren’t available with remote volumes or network-based storage. Both of these models allow for an increased performance profile at the expense of K8S pod portability.

The engineering team is early in the development cycle and the current plan is to build and release this functionality in multiple phases. The first phase will be an operator that installs and configures clusters on-demand, and the follow-on phases will address more Day 2 functionality around running the cluster, such as node failure, master failover, and scaling of the Greenplum cluster

This is an exciting time in the history of Pivotal Greenplum. It’s exciting to see the Pivotal Greenplum team embrace open-source and cloud architectures to tackle traditional data challenges in a modern way. So, stay tuned!

About the Author

Dan is Director of Technical Marketing at Pivotal with over 20 years of experience in various pre-sales and engineering roles with Sun Microsystems, EMC Corporation, and Pivotal Software. In addition to his technical marketing duties, Dan is frequently called upon to roll-up his sleeves for various "Will this work?" type projects. Dan is an avid collector of Marvel Comics gear and you can usually find him wearing a Marvel shirt. In his spare time, Dan enjoys playing tennis and hiking in the Smoky Mountains.
Follow on Twitter More Content by Dan Baskette

The 3 Stages to Observability for Modern Apps

Enterprise Architects, It's Time to Learn How the CredHub Service Broker Applies the Principle of Least Privilege to Your Secrets.

The CredHub Service Broker is now a beta. It's a service broker that helps developers secure off-platform s...

The Evolution of a Data Platform: A Journey With Greenplum and Kubernetes

Cloudy with a Chance of Machine Learning

The “State” of Cloud Native Computing

Hello Road, Meet Rubber.

Applying Learnings to Deliver Value

About the Author

Previous

Next

The Evolution of a Data Platform: A Journey With Greenplum and Kubernetes

Cloudy with a Chance of Machine Learning

The “State” of Cloud Native Computing

Hello Road, Meet Rubber.

Applying Learnings to Deliver Value

About the Author

Previous

Next

Most Recent

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.

Experience enhanced security with Tanzu Application Platform. Elevate your organization's defenses from code to build with SLSA Level 3, image scanning scheduling & automatic upgrades for new patches.

Explore Spring's exceptional NPS score of 75, surpassing industry benchmarks by 18%. Discover why it matters.

From single apps to portfolios of apps in large enterprises and our experience has led us to identify four of the most common anti-patterns impacting organizations.