PostgreSQL for Kubernetes Quickstart for PKS

March 20, 2018 Jonathan Katz

[This is a guest post from Jonathan Katz of Crunchy Data] 

Crunchy Data has been a Pivotal Technology partner ever since  we created an open source, cloud-agnostic, Pivotal Cloud Foundry (PCF) Native PostgreSQL-as-a-Service for a US Government customer.  Pivotal recently released the Pivotal Container Service (PKS) for Pivotal Cloud Foundry (PCF) that enables enterprises to deploy and manage their container-based solutions securely and at scale.  At Crunchy Data, we are big fans and have built tools around Kubernetes, the technology at the heart of PKS, to quickly and safely set up a PostgreSQL cluster where you scale, backup, upgrade, and destroy thousands of nodes with simple commands.

 

In February 2018, Crunchy Data released version 2.5 of its Crunchy PostgreSQL for Kubernetes, which provides an open source PostgreSQL operator for managing PostgreSQL clusters.  In this release, the team provided a quickstart script that installs the operator in your PKS environment and has you ready to manage and monitor your PostgreSQL databases in seconds.

 

This post will provide background on why Crunchy Data created a tool to help you manage your containerized PostgreSQL environment on PKS and a guide to getting the operator setup in your PKS environment.  If you are already set up on PCF Pivotal Application Service (PAS) using the Crunchy PostgreSQL for PCF tile, you do not need to make any changes to your platform. We want to introduce how you can work with PostgreSQL on PKS and give you the choice of whether you want to deploy PostgreSQL on PAS or PKS!

 

Why Run PostgreSQL With Kubernetes?

PostgreSQL is an open source relational database management system (RDBMS) that has a robust developer feature set, is highly extensible, has many options to replicate data, and has a strong developer community and software ecosystem.  Many organizations have adopted PostgreSQL due to its flexible open source license, its well-documented durability and stability, and its ability to efficiently handle enterprise workloads.

Traditionally, databases like PostgreSQL were deployed to bare metal servers for performance considerations, but with the advances in cloud-native technology, more there are significant advantages to running database infrastructure in containers.

Before running PostgreSQL on Kubernetes, it’s important to keep in mind the infrastructure dependencies. Running PostgreSQL in a container is similar to running any database within a virtual machine: you still have to consider the hardware infrastructure that is powering your setup and ensure you have the right tools you need for managing your database according to your organizational requirements.  The biggest operational hurdle for a database is its underlying storage system, and this is something that must be considered in any PostgreSQL deployment.

 

With that said, there are many advantages to running PostgreSQL on Kubernetes, including:

 

  • Easily provisioning new PostgreSQL clusters

  • The ability to scale stateful PostgreSQL replicas up and down based upon your present workload

  • Managing PostgreSQL software upgrades to thousands of instances with a single command

  • Creating customizable backup and security policies across your clusters

 

PKS simplifies the process of managing a Kubernetes platform, and in turn, makes it easy to get the PostgreSQL operator up and running to help users provision and scale their database clusters.

Quickstart: Getting PostgreSQL Up & Running on PKS

Prerequisites

 

To get started with the PostgreSQL operator, please first complete the following prerequisites:

 

  1. A CentOS 7 or RHEL7 system for interfacing with your PKS platform

  2. Install wget, kubectl, and pks on your CentOS 7 / RHEL 7 system

  3. Ensure you have access to a PKS platform as well as a cluster within the platform

 

First, ensure you are in your home directory by using the following command:

 

cd ~

 

 

From there, log into the PKS platform with valid credentials using the following command:

 

pks login -a <your-pks-api-url> -u <your-username> -p <your-password> -k

 

 

The “-k” flag is to skip SSL verification; if you need this feature, please remove the flag.

 

Next, set the credentials for accessing your PKS cluster with kubectl using the following command:

pks get-credentials <clustername>

Now all the commands you run from “kubectl” will be in the context of your PKS cluster.

The previous command will generate a configuration file that will allow you to log into the user interface for your Kubernetes API server.  This file is located at “~/.kube/config” and you can save a copy of this file in your home directory using the following command:

cat ~/.kube/config > kubeconfig && chmod 0600 kubeconfig

Finally, start up a proxy to your Kubernetes API server.  You can start a proxy in the background with the following command:

kubectl proxy &

After the proxy is running, you can navigate your browser to http://localhost:8001/ui/ to log into the visual interface to manage your cluster.

 

When prompted, choose to log in with a configuration file and from the file browser, select the “kubeconfig” file that you saved previously.  You should then be redirect to a screen providing an overview of your cluster:

 

 

If you browse around the interface, you can find some helpful services such as helm and kube-dns are already deployed, as well as being able to view the health of your cluster:

 

 

Install the PostgreSQL operator and scale your database cluster

With the release of version 2.5 of the PostgreSQL operator, Crunchy Data provided a quickstart script which you can download that installs the operator in your PKS cluster in a namespace called “demo.”  The command below downloads the quickstart script and makes it executable in your system.

 
wget --quiet https://raw.githubusercontent.com/CrunchyData/postgres-operator/master/examples/quickstart.sh -O quickstart.sh && chmod +x quickstart.sh

You can run the quickstart script with the following command:

./quickstart.sh

 

 

The script checks to see if Kubernetes is indeed available, and then sets up the necessary environmental variables needed to support the PostgreSQL operator, as well as the operator itself.

When prompted to created the “demo” namespace, type “yes” and hit enter. If you have already created the “demo” namespace, type “no” and hit enter.

If you chose “yes,” you will then see output that comes from the quickstart script invoking “kubectl config view” which outputs the current state of your “~/.kube/config” file.  In the list of contexts, you should see something that looks similar to this:

contexts:

- context:

   cluster: <clustername>

   user: <kube-user-uuid>

You will then be prompted to enter the name of your cluster.  Enter the name you are given in <clustername>, and then you will see a prompt to enter your Kube user name.  Enter the name you see in <kube-user-uuid> and hit enter.  This will properly set up your context and allow you to easily work with the PostgreSQL operator.

Finally, you will be prompted if you want to deploy the operator to the cluster.  Type “yes” and hit enter.

The script will terminate provide you a port-forward command to set up an endpoint to access the PostgreSQL operator.  Copy and paste the command and set it up to run in the background similar to this, ensuring you add an “&” to the end of your command:

kubectl port-forward postgres-operator-<unique-identifier> 8443:8443 &

To easily work with the PostgreSQL operator in your terminal, run the below command, which will set up some environmental variables that make it easy to work with the operator:

source ~/.bashrc

Now you are ready to work with the PostgreSQL operator!  You can see the operator API running by going to your Kubernetes API server interface and selecting the “demo” namespace:

 

Creating, Scaling, and Accessing your PostgreSQL Cluster

The PostgreSQL operator provides functionality for managing and scaling your database cluster, and here we will go through a few examples of using it.

 

First, we need to create a database cluster.  In your command line, enter the following command to create a cluster:

pgo create cluster pivotal

Depending on your setup, it will take a few seconds for the database to be provisioned.  You can check on the status of the provisioning from the Kubernetes API server user interface by navigating to the overview of the “demo” namespace:



 

 

When the database is provisioned, you will see a service named “pivotal” with a checkmark next to it, as well as a pod named “pivotal” with a string after it.

 

You can view the defaults that are setup in your database cluster by clicking into the pod:

 

 

By default, the PostgreSQL operator creates a database named “userdb” that grants access to a user named “testuser” - you can find more about the “testuser” by navigating “Secrets” and clicking on “pivotal-testuser-secret”

 

Before we scale up our cluster, first let’s see if we can connect to the database and add some tables.  In our current networking setup, you will need to add another port-forward to access the primary database.  Find the name of your primary pod (e.g. “pivotal-75f657c9c9-7c8sh”) and set up a port forward running in the background using this command:

 

kubectl port-forward <your-primary-pod-name> 13000:5432 &

 

You can now connect to the default “userdb” in your primary database cluster using this command:

 

psql -h localhost -p 13000 -U testuser userdb

 

You should see the PostgreSQL prompt.  Use the following SQL to create a table and add some data:

 

CREATE TABLE test (

   id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,

   a int NOT NULL,

   created_at timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP

);


INSERT INTO test (a) SELECT * FROM generate_series(-1, -1000, -1);

 

 

To see the last five entries created by the “created_at” timestamp, you can use the following SQL:

SELECT * FROM test ORDER BY created_at DESC LIMIT 5;

You can quite out of your PostgreSQL terminal using “\q”

 

Now, let’s scale up our database.  As mentioned earlier, PostgreSQL data is stateful, and the PostgreSQL operator provides a way to safely and correctly create read-only replicas of your data.  The following command creates two read-only replicas in the “pivotal” cluster:

pgo scale pivotal --replica-count=2

After a few moments, two replicas will be provisioned and created. You can see this from the Kubernetes API server interface:

 

 

Let’s see how data is passed from the primary database to the replicas.  In a new terminal window. Determine which replica pod you want to access (the pods have a name like “pivotal-replica-bozk-d94cc7b96-lx8gl”) and set up a port forward:

 

kubectl port-forward <your-replica-pod-name> 13001:5432 &

Then, open up a connection to the replica database:

psql -h localhost -p 13001 -U testuser userdb

Run the SQL statement to check the last five entries in the “test” table:

SELECT * FROM test ORDER BY created_at DESC LIMIT 5;

The results should appear to be the same as the primary.

Let’s add some more data to the primary and see if it is replicated over.  Open up a new terminal and reconnect to the primary database:

psql -h localhost -p 13000 -U testuser userdb

Once you are logged in, run the following SQL command to bulk insert more data:

INSERT INTO test (a) SELECT * FROM generate_series(-1, -1000, -1);

 

Switch back to the terminal with the connection to your replica database.  Run the SQL statement to return the last five entries in the “test” table:

SELECT * FROM test ORDER BY created_at DESC LIMIT 5;

You should see the results have been updated with the new data inserted into the primary.

Next Steps

The goal of this quickstart script and guide is to demonstrate the power of utilizing PostgreSQL operator on PKS to provision, manage, and scale your database cluster.  To create an setup that is suitable for an enterprise production environment, there is some additional work that you need to do. However, the quickstart is enough to help you get started on creating your own database-as-a-service setup on top of PKS.

 

Coming to Cloud Foundry Summit this April? We will be there too! To find out more about how you can manage your PostgreSQL workload on the Pivotal Cloud Foundry platforms please reach out to the Crunchy Data team at https://www.crunchydata.com/contact and we will be happy to set up some time with you. 

About the Author

Jonathan Katz

Jonathan S. Katz is the Director of Customer Success & Communications at Crunchy Data, a leading provider of trusted open source PostgreSQL technology, support, and training. Jonathan is also responsible for the advocacy efforts of the PostgreSQL Global Development Group, is a board member of the nonprofit United States PostgreSQL Association, and is a co-organizer of the NYC PostgreSQL User Group.

Previous
FaaS.local - The Benefits of On-Premises FaaS
FaaS.local - The Benefits of On-Premises FaaS

Dan Baskette debunks the myth that functions are only valuable running in the public cloud. FaaS within ent...

Next
Use Value Stream Mapping to Guide Your Software-Driven Future
Use Value Stream Mapping to Guide Your Software-Driven Future

Value stream mapping is a proven way to improve the pace and reliability of software delivery. This blog su...