How to Build a Critical Application Assessment Framework for Your Bank

June 28, 2019 Fadzi Ushewokunze

[Editor’s Note: This is the fourth in a series of posts describing how banks are modernizing with Pivotal Cloud Foundry. Browse other posts on identity management, securing backing services, and infrastructure security.]

Banks like yours have thousands of applications. How can you go about systematically modernizing them for the cloud?

Most of your time, attention, and investment should go toward modernizing your most critical applications. These are the crown jewels of your app portfolio, the custom code that makes or breaks your company’s performance. But what’s a “critical” app? We have a useful set of questions that can help you prioritize.

8 Simple Questions to Assess Criticality

If you answer “yes” to any of these questions, then your application is mission-critical. As such, it will need to have tight specifications for failure remediation, resiliency, and availability.

  1. Does this application collect, store or process any PII, PIFI or bank’s proprietary data?

  2. Is the application bound by regulatory requirements such as SOX, Dodd-Frank, PCI DSS or another regulatory body?

  3. Does a significant proportion of your customer base use this application to transact?

  4. Is the application accessible via the Internet?

  5. If this application or data is not available or accessible, will it have a financial impact to the business per hour, per day?

  6. Does the application have an impact on another critical application?

  7. Does a significant number of employees use this application for daily internal operations?

  8. Is the application strategic to the business in terms of revenue and market share?

How many of your apps meet one or more of these criteria? For most, you’ve probably narrowed your scope from a few thousand apps to a few hundred. So now what? Well, you need to get on with building a framework that reflects your business, technology, security, and compliance imperatives for these workloads.

In our experience, IT teams at banks use frameworks that account for these five factors: 

  • Resiliency

  • Monitoring

  • Rapid updates and patching 

  • Zero-downtime practices

  • No end-of-life components

Simple enough. But things can get more complicated in a hurry if you have a slightly different way to meet these factors for your most important apps. Ideally, you’d have a consistent, uniform way to address these five factors.

You do that by running your apps on a modern platform, and that’s what Pivotal Cloud Foundry (PCF) gives you out of the box. That’s why 7 of the top banks use PCF for their most critical workloads.

Let’s go through each of these five factors, and review how PCF helps you satisfy the parameters of your new critical application framework.

Resiliency

What This Means

Systems should implement disaster recovery and high availability strategies that protect against unexpected events. You need to have your apps online serving traffic when it counts.

One of the most critical questions you face: How long can my business be without this service before we incur a substantial loss? What kind of service level agreements (SLAs) can I promise customers? How am I building resiliency into my applications?

Resiliency is the ability of an application, platform, or an entire data center, to recover quickly and continue operating even when there has been a major disruption. Ideally, your users would never know there’s been a failure in your stack.

How PCF Helps: High Availability

Resiliency is about ensuring that the actual system state (the number of application instances, for example) matches the desired state at all times, even in the event of failures. PCF can help you achieve this. Let’s explore how.

Pivotal Application Service, the app platform in PCF, automates the recovery of failed applications, components, and processes. This self-healing removes the recovery burden from the operator, ensuring rapid recovery. In the event of failure, it will automatically:

  • Restart failed system processes

  • Recreate missing or unresponsive VMs

  • Redeploy new application containers when an application crashes or becomes unresponsive

If you’ve deployed PAS across availability zones (a best practice), PAS is smart enough to shift traffic from an unresponsive AZ to a working AZ in the event of a catastrophic failure.

Of course, all of this is only possible thanks to the dynamic routing and load balancing included in PAS!

How PCF Helps: Backup and Restore

Operators have a range of approaches for ensuring they can recover apps, data and the platform itself in case of a disaster. This comes down to two steps: backing up the data, and restoring the data.

To help you achieve these two operations, the Cloud Foundry ecosystem offers BOSH Backup and Restore (BBR). BBR is tailor-made for backing up and restoring distributed systems that change constantly. This mechanism is baked into many PCF components, offering effortless backup and restore of your data and configuration.

How PCF Helps: Multi-Site Deployments

It’s pretty easy to come up with the business case for multi-site deployments: it’s a hedge against geographic risk, and it gives you an easy way to bring a secondary site online should your primary site fail for any reason.

PCF gives you three deployment options in this area:

  • Active-active is two fully functional PCF platforms deployed and primed with applications to serve traffic in case of failure. This offers the highest resiliency, but with the most complexity. Even so, it could make sense for the mission-critical apps...especially those involved in credit card processing. When seconds of downtime can cost you millions, this is a good choice.

  • An active-passive configuration, by contrast, features one platform deployment that acts as a “standby” site. This is the option we recommend for most enterprises and those critical workloads where 99.9% availability is acceptable.

  • A “stretched” deployment has a single PCF deployment that spans across two data centers. This is only really an option in specialized cases. 

Here’s a closer look at the popular active-passive configuration:

Want to know more about these scenarios? There’s a whitepaper for that!

Monitoring

What This Means

You should be able to manage applications and infrastructure in real time. That means quickly pinpointing problematic behavior, and fixing it just as fast.

When you can monitor application performance, track key metrics, and identify the root cause of performance issues, life is good. Specifically, you can:

  • Respond faster to business needs

  • Improve reliability and availability

  • Reduce costs

  • Improve security

  • Improve capacity planning 

How PCF Helps: Application Performance Monitoring

PCF includes PCF Metrics, an add-on module to help you troubleshoot the health and performance of your apps. It  displays useful telemetry like:

  • Container Metrics. Three graphs measuring CPU, memory, and disk usage percentages.

  • Network Metrics. Three graphs measuring requests, HTTP errors, and response times.

  • Custom Metrics. User-customizable graphs for measuring app performance, such as Spring Boot Actuator metrics.

  • App Events. A graph of update, start, stop, crash, SSH, and staging failure events

  • Logs. A list of app logs that you can search, filter, and download.

  • Trace Explorer.  A dependency graph that traces a request as it flows through your apps and their endpoints, along with the corresponding logs.

This data is helpfully presented visually on a timeline. That means it’s easy to correlate events, metrics, and logs when investigating an anomaly or service disruption.

Of course, PCF captures all this telemetry data for use with your preferred APM tool as well. It’s easy to wire up the platform with popular monitoring tools like AppDynamics, New Relic, Dynatrace, and Datadog among others.

How PCF Helps: Infrastructure Monitoring

PCF has you covered when it comes to your platform and the underlying infrastructure.

Use PCF Healthwatch to monitor your deployment. This service monitors and alerts on the current health, performance, and capacity of the platform. PCF Healthwatch helps operators understand the operational state of their platform by ingesting and visualizing key performance indicators and key scaling indicators. It also alerts on metrics from core PCF components and performs continuous validation tests.

Or perhaps you’ve already settled on an open-source observability stack? Then you can plug platform metrics into Prometheus and Grafana using Nozzles. Nozzles are programs that consume data from the Loggregator Firehose. Nozzles can be configured to select, buffer, and transform data. From there, data can be forwarded to other tools, where you can create custom dashboards. Want to use Prometheus? You can connect up your PCF data via Prometheus Exporters, then visualize performance in Grafana.

Rapid Updates & Patching / Zero Downtime Updates

Let’s combine these two together. We’ll explain how these two ideas are linked in Pivotal Cloud Foundry.

What This Means

Developers should be able to easily update systems with new features on a moment’s notice, or on a regular schedule. Operations staff should be able to apply security updates and patching during business hours. Both scenarios should be routine.

More specifically for zero downtime updates, you should be able to continue to provide access to critical applications during maintenance. This often means embracing immutable infrastructure concepts and agile practices like canary deployments and application rollbacks.

How PCF helps: Continuous Integration and Continuous Delivery

To be a software-driven enterprise, you need a frictionless path to production for your code. That means the full embrace of continuous integration (CI) and continuous delivery (CD) tools and practices.

Pivotal Cloud Foundry is ready-made for CI/CD, particularly Concourse and Spinnaker. We can’t sum it up any better than our own Lyle Murphy did recently:

Your CI tool takes new bits of your custom code, and constantly tests (i.e., “integrates”) them. When your team is ready to take that code to production, you enter the continuous delivery phase. The CD tool takes over, and deploys your code in a safe, secure way. Combine these two together, and you have an automated path to production.

We’ve invested in two open source projects to help organizations achieve this: Concourse and Spinnaker.

Read Lyle’s full post for Pivotal’s take on the tools and cultural practices you need to thrive.

How PCF Helps: A New Patching Paradigm

As we like to say, there are three certainties in life: death, taxes, and patching. So when those CVEs hit (and they will hit), PCF has you covered. As we wrote a while back, the process works like this.

Pivotal helps our customers improve their security posture with a rapid update capability. We give them confidence in their software supply chain (more on this later). The process works like this:

1. A CVE is identified. A fix is supplied, and a new OS image is created.

2. Pivotal conducts end-to-end tests with the new OS image.

3. After the image has passed these tests, it is posted as a new “stemcell” on Pivotal Network.

4. Customers are automatically notified about the availability of these updates.

5. Platform engineers download the new files, then add them to PCF Ops Manager. Mature platforms with multiple PCF foundations tend to have automation pipelines for this flow. These pipelines allow engineers to manage versions and configuration for several foundations centrally. (We recommend Concourse.)

6. Once the deployment is kicked off through Ops Manager or a Concourse pipeline, updated stemcells are automatically rolled out to the Pivotal Cloud Foundry installation.

How is all this possible? It’s all thanks to BOSH. BOSH is a project that unifies release engineering, deployment, and lifecycle management of small and large-scale cloud software. BOSH can provision and deploy software over hundreds of VMs. It also performs monitoring, failure recovery, and software updates with zero-to-minimal downtime. BOSH is at the heart of PCF’s zero downtime patching capability.

How PCF Helps: Platform Updates and Upgrades

Platform Automation for Pivotal Cloud Foundry (PCF) provides essential building blocks for automating the installation and upgrades of PCF foundations and services. Platform engineers can realize the benefits of small, constant platform upgrades. This way, they significantly reduce risk, streamline upgrading, and improve stability.

This module is purpose-built to simplify how platform teams automate regular platform updates, reducing the time required to stay current. Through repeatable, reusable building blocks, you gain confidence in your upgrade protocol and can scale upgrades to support your enterprise. Best of all, you can perform these operations with no downtime! Speaking of which, let’s move on to the next topic.

No End of Life Components

What This Means

All system components and software must be on current, supported versions. Upgrades to new versions are done pragmatically. There’s never the use of out-of-support operating systems, runtime frameworks or outdated backing services. The use of out-of-date software or older gear can hinder recovery times when things don’t go as planned.

How PCF Helps: Application Lifecycle

PCF Buildpacks provide the framework and run time support for applications. They manage the framework dependencies when you push applications to PCF. 

As a developer, when you deploy an app via the cf push command, the platform will automatically associate the app with the right buildpack (i.e. Java, .NET, Node.js). You don’t have to think about dependencies, middleware, or runtime components. The platform does it for you.

Buildpacks are deployed (and logged) in a consistent, repeatable way with each cf push command. This consistency makes it easy to audit and control what’s running on the platform at any given time.

Once a buildpack is approved by a security team, developers just have to make sure their actual application code meets the requisite standards before they can deploy to production.

How PCF Helps: OS Lifecycle

It’s 2019, you shouldn’t spend a second managing an operating system. That’s why PCF uses the stemcell model.

A stemcell is a versioned Operating System (either Linux and Windows Server) image wrapped with IaaS specific packaging. Stemcells provide a powerful separation between the OS and the other software packages bundled in a deployment. Each stemcell, no matter the underlying infrastructure, is exactly the same. This allows for rapid, reliable mobility between different infrastructure targets. Stemcells are key to the application portability delivered by PCF.

As an operator, stemcells automate OS patching and updates—keeping your cloud-native platform secure and running smoothly with less effort. Stemcells help operators “repair” and “repave” large fleets of servers, thereby reducing the risk of harmful attacks.

You’re Gonna Need a Platform

Your most important apps need a good home—either in the public or private cloud. What matters is that you have a consistent, scalable way you do solve for the 5 factors mentioned earlier.

To do that, you’re going to need a platform. It’s going to be one you build on your own from public cloud components or open-source projects. Or it’s going to be one you buy, like Pivotal Cloud Foundry. When you buy a platform, you gain an incredible head start on your digital transformation. 

We detail the reasons why in this recent whitepaper. The TL;DR:

Companies that decide to roll their own platform quickly realize a cold truth: it’s more expensive than they thought. Their investment in platform engineers balloons higher than projected. Multiple product teams are needed. Each team demands multiple engineers and a product manager.

 

In working with numerous Fortune 500 companies, we’ve found that even a minimal DIY platform effort can take 2 years to build and cost $14M in payroll alone (for 60 engineers). And that’s just to get to a minimum viable product. Can your company wait two years to start your cloud-native transformation in earnest?

 

Of course, once the platform is live in production, the costs continue to grow.

 

Adding new features to a custom platform requires a corresponding investment in engineering staff. After a while, your platform team starts to look a lot like a cloud platform software company, with one important difference:

 

The platform you have built doesn’t actually generate revenue for your core business. It generates expense and quickly accrues technical debt.

 

This expense can’t be justified, unless your core business is selling cloud platform services. The problem of creating a vibrant, scalable, and secure enterprise platform has already been solved by Pivotal with Pivotal Cloud Foundry (PCF).

Now that you’ve got a framework for evaluating your apps, it’s time to get on with the job of finding them a good home. Here’s how you can start evaluating Pivotal Cloud Foundry: 

About the Author

Fadzi Ushewokunze

Fadzi Ushewokunze is a Senior Platform Architect at Pivotal working with Financial Services companies on Wall Street. Fadzi has worked in the FinTech industry for more than 10 years, gaining experience in security, software development and digital transformation. His passion for FinTech can be traced back to the Asia Pacific region, where he spent a significant time working for RSA on security transformation.

Follow on Twitter Follow on Linkedin
Previous
The Path to the Modern Data Warehouse is a Stream
The Path to the Modern Data Warehouse is a Stream

Next
Pivotal Cloud Foundry 2.6, Now GA, Offers More Ways to Build, Run, and Wire Your Apps
Pivotal Cloud Foundry 2.6, Now GA, Offers More Ways to Build, Run, and Wire Your Apps

Pivotal Cloud Foundry 2.6 is now generally available. Highlights include support for sidecar processes, int...

SpringOne Platform 2019 Presentations

Watch Now