The Four Levels of HA in Pivotal CF

May 5, 2014 Cornelia Davis

Can you name the 4 levels of HA in Cloud Foundry?

A platform as a service (PaaS) is not only about providing middleware that your application can leverage, it is about doing more on behalf of the developer and operator. A modern PaaS must keep apps up and running in the face of failures within the system. From the onset, the Pivotal CF enterprise PaaS has been built to make both the developer and operator’s jobs easier, and in this post I’ll tell you a bit about how it’s done.

First off, there is no voodoo magic here, you are going to have to deploy multiple instances of your application. Then Pivotal CF has the notion of availability zones* and these must be sensibly defined, for example, one availability zone (AZ) is defined for one vCenter resource pool, and another AZ defined for a second resource pool. Finally, you configure your Pivotal CF deployment so that the DEAs (the nodes on which your application instances run) are created across the availability zones. When application instances are then deployed, Pivotal CF will distribute them evenly across the availability zones. You lose one AZ and you still have instances up and serving traffic. That’s one level.

* Availability Zone will be available in Pivotal CF later this year

Of course, if you lose application instances for any reason, a bug in the app, an AZ goes down, etc. you’ll want the system to compensate, restarting new instances so that we keep the capacity we are aiming for. This is where the elastic runtime health manager comes in. The health manager is constantly keeping tabs on the state of the system, in particular, how many instances of each application are running across all of the DEAs. When it detects a discrepancy between the actual state of the app instances in the cloud and the desired state, as known by the cloud controller, it advises the cloud controller of the difference and the cloud controller will initiate the deployment of new application instances. That’s another level.

Before we talk about the next level, let’s go on a brief aside. You may already know that the various components of the Pivotal CF Elastic Runtime, the things that host your running applications (DEAs), manage the system health (the aforementioned health manager), provide consolidated application logging (loggregator), the api endpoint and brains of the operation (cloud controller), and so on are all running on virtual machines that Ops Manager provisions. The Ops Manager spins up the virtual machines with a Linux OS that includes a BOSH agent where, for now, it is enough to simply know that it’s there to stay in touch with Ops Manager Director. The patterns for how it does this are all designed for web scale using asynchronous messaging and other tricks but that’s the topic of another post. So let’s go on.

There are all of these things that are working in concert to keep your application instances up and running–the DEAs, cloud controller, health manager and so on. You might then ask, “what happens if one of these pieces of software stops working? If the health manager isn’t there, what will compensate for app instance failures?” The answer is that there is another level that is keeping an eye on the health manager (and all of the other components). The processes running on the virtual machines (i.e. the health manager) are monitored, so that if a process dies it will automatically be restarted, whether the restart is successful or not, it will tell the BOSH agent about the failure. Recall that the BOSH agent is there to communicate with Operations Manager and in this case it will relay this failure information to the Operations Manager Health Monitor (not to be confused with the Health Manager of the Elastic Runtime discussed above)–we’ll abbreviate it OMHM. The OMHM will take this alert and pass it through a list of responders that do things things like send emails, page administrators and display alerts in operations dashboards. There’s a good chance that monit will already have recovered the process, but we also want there to be an opportunity for a human to respond. That’s another level.

Of course, the BOSH agent on a VM can only communicate back to the Ops Manager if the VM is there, so let’s talk about what happens when a VM disappears. By “disappear” I mean that the BOSH agent is not functional; the VM could be there, but Ops Manager no longer knows what it is up to so for all intents and purposes it’s “gone”. How does Ops Manager know? One of the things that a BOSH agent is responsible for is sending out heartbeat messages and by default it does so every 60 seconds. The OMHM is constantly listening for those heartbeats and when it finds that one is missing it will itself produce and alert and pass that through the list of responders. Just as described above, this could result in emails, pages and operations dashboard alerts, but in this case there is one more responder that kicks in – the “resurrector”. The resurrector will communicate with the IaaS over which Pivotal CF is running and will ask that the failed VM be replaced. Of course it will be replaced with a VM running the appropriate part of the Elastic Runtime–i.e., a health manager or DEA, etc. That’s right, Ops Manager will restart failed cluster components. That is the fourth level.

Availability Zones.
Health management for app instances.
Monitored processes.
Health management for virtual machines.

Count ‘em. 4. Can your platform do that?

UPDATE: Vines too short for your liking, see here for a more digestible review.

About the Author

Cornelia Davis is Vice President of Technology for Pivotal.

EMC World 2014: Pivotal and Isilon Take Hadoop Prime Time in the Enterprise

This week, Pivotal is headed to EMC World as a premier member of the Federation. Together this week, Pivota...

Pivotal To Showcase Progress Across EMC Federation at EMC World

EMC World is coming May 5-8 to the Venetian in Las Vegas, Nevada. This year, there will be significant focu...

The Four Levels of HA in Pivotal CF

About the Author

Previous

Next

The Four Levels of HA in Pivotal CF

About the Author

Previous

Next

Related content in this Stream

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.

Experience enhanced security with Tanzu Application Platform. Elevate your organization's defenses from code to build with SLSA Level 3, image scanning scheduling & automatic upgrades for new patches.

Explore Spring's exceptional NPS score of 75, surpassing industry benchmarks by 18%. Discover why it matters.