How PCF Metrics Helps You Reduce MTTR for Spring Boot Apps (and Save Money too)

Judy Wang

The hits just keep coming for Spring Boot.

According to Redmonk’s most recent Language Framework Popularity for Java:

Spring Boot is growing at an exponential rate and is set to become the most popular Java Framework soon.

Many Spring Boot developers run their apps on Pivotal Cloud Foundry. They troubleshoot these apps with PCF Metrics. In recent months, customers urged us to bring Spring Boot and PCF Metrics closer together.

We asked developers “if we made PCF Metrics the ideal companion to Spring Boot, what should it include, and why?”

The answers we heard are reflected in PCF Metrics 1.4, now GA!

 

PCF Metrics Now Displays Spring Boot Actuator metrics

Spring Boot Actuator metrics are now rendered on the PCF Metrics timeline UI. This rich set of data helps developers diagnose issues quickly.

All Spring Boot Actuator metrics (e.g. System, DataSource, Cache) are now streamed to PCF Metrics. This information combines with the data already visualized in PCF Metrics to give you deeper context. You gain even more insight into how your Spring Boot apps are running in production.

Screen Shot 2017-09-19 at 11.41.06 AM.png

PCF Metrics shows system metrics and Spring Boot Actuator metrics on the same dashboard.

Use this feature, and reduce the mean-time-to-resolution (MTTR) for your apps!

 

No Code Changes Required

What good is this handy integration if it’s a pain to configure? Here’s more good news - setup is effortless.

Spring Boot Actuator metrics are automatically streamed to the Cloud Foundry Firehose. From there, they are forwarded on to PCF Metrics. No code changes required. All you need is a service binding (more on this in a minute).

 

Application-Instance Level Metrics

Apps that run on Pivotal Cloud Foundry run on application instances. It follows, then, that developers need a way to drill into metrics at the application-instance level. That’s part of PCF Metrics 1.4 too. This granularity helps you understand if an issue is localized to an app instance, or if it’s more pervasive.

Application-Instance Level Metrics

So how easy is it for a Spring developer to use these new capabilities? Let’s take a look!

 

Analyzing a Spring Boot app with PCF Metrics

Let’s look at a sample Spring app, kaboom.

The deep integration between Spring Boot Actuator metrics and PCF Metrics is enabled by the Metrics Forwarder for PCF tile. This service allows apps to emit custom metrics to Loggregator, and for operators to consume those metrics from the Loggregator Firehose.

In our case, we’re consuming these custom metrics in PCF Metrics. So the first thing we need to do is create a service instance of Metrics Forwarder. We’ll do this from the Cloud Foundry CLI:

cf create-service metrics-forwarder unlimited forwarder-service

Let's make sure the app is using Java buildpack 3.18 or higher. Or we can simply push the app with the latest Java buildpack.

cf push kaboom -b https://github.com/cloudfoundry/java-buildpack

Next, we’ll  bind  kaboom  to  the service instance. 

cf bind-service kaboom forwarder-service

Finally, we restage the app.

cf restage kaboom

An SRE has been receiving pages that kaboom has crashed twice in the past hour. She needs to get to the root cause, and uses PCF Metrics to investigate.

From the default system metrics, she sees that her normally performant app has some spikes in both cf.system.latency and cf.system.cpu that correlate with the crashes.

Screen Shot 2017-09-25 at 4.45.01 PM.png

Our SRE has a hunch about what’s going on with kaboom. She decides to dig deeper by viewing the heap.used and mem.free Spring Boot Actuator metrics. Sure enough, her hunch was correct; the app appears to be maxing out on heap usage.

Screen Shot 2017-09-25 at 4.51.20 PM.png

It looks like kaboom is doing something that uses a ton of heap and maxes out. Then the app crashes and restarts. To finish the investigation, our SRE zooms in on a single occurrence, and views the correlated logs.

Screen Shot 2017-09-25 at 4.59.12 PM.png

Thanks to her debug logs, she sees that her app has been processing a ‘kaboom request’ of size 25000, which was too large for the app to handle. Case closed! Now, she can then review the code and determine the best way to fix the bug.

 

Behind the Scenes

How does all of this information get at your fingertips? Here’s a look at how the integration works.

  1. When an app is bound to the Metrics Forwarder service, the app receives credentials and the URL of the Forwarder API. It uses this information to post Spring Boot Actuator metrics to the Metrics Forwarder tile.

  2. This configuration data is stored in VCAP_SERVICES environment variables.

  3. When you cf push or cf restage the app, the Java buildpack downloads an additional metrics exporter jar, and includes it with the application droplet.

  4. When the app is running, the metrics exporter jar reads Actuator metrics from a metrics registry every minute. It then posts the data to the Metrics Forwarder URL.

  5. From there, the Metrics Forwarder service sends this data to Loggregator. PCF Metrics then reads from the Firehose to ingest metrics data for retention and visualization.

Want more details? Check out the updated architecture docs for PCF Metrics and Metrics Forwarder.

 

Yes, We <3 Other Frameworks Too

PCF Metrics 1.4 makes life that much easier for Spring developers. But what about developers that use other frameworks? Capitalize on these new features by instrumenting your own custom app metrics.

Just include a library that manage app metrics instrumentation (such as Dropwizard for Java). A second library that exports those metrics to the Metrics Forwarder is also needed. Check out the example libraries for Go.

 

Reduce your IaaS Consumption

What about the price savings mentioned in the title?

Customers asked us to slim down the PCF Metrics footprint. So we did! The default tile configuration now requires nearly half as many VMs as before (only 7 VMs are needed, down from 12). That lowers your costs in the public cloud, and reduces your chargebacks on-prem.

IaaS with PCF Metrics 1.3.8:

Screen Shot 2017-09-19 at 3.21.41 PM.png

IaaS with PCF Metrics 1.4.0:

Screen Shot 2017-09-19 at 3.21.57 PM.png

Operators can also configure the retention windows in PCF Metrics 1.4 - for both metrics and logs. Keep your application telemetry for anywhere between a day and two weeks, it’s up to you. This helps keep persistent disk usage to a minimum, saving you even more on infrastructure costs.

The tile is easier to install too; no configuration is required. Of course, you can choose to customize your installation to fit the needs of your PCF foundation.

 

A Look Ahead: Metrics-Based App Alerting

Next up for the Metrics product team? Examining metrics-based app alerting. How would you use this feature with your apps on PCF? Help us design and build the future of metrics. Shoot us an email with your ideas: pcf-metrics-app-dev@pivotal.io. We look forward to hearing from you!

About the Author

Judy Wang

Judy is a Product Manager for Pivotal Cloud Foundry's application metrics and logs monitoring service, PCF Metrics.

Previous
Cloud-Native Recovery Tool, BOSH Backup & Restore, Now Available in Public Beta
Cloud-Native Recovery Tool, BOSH Backup & Restore, Now Available in Public Beta

Operators have a range of approaches for ensuring they can recover Cloud Foundry, apps, and data in case of...

Next
SVP: The Shoddiest Viable Product
SVP: The Shoddiest Viable Product

“If your product is a swiss army knife,” the workshop leader told us, “then your MVP is this simple pocket ...

×

Subscribe to our Newsletter

!
Thank you!
Error - something went wrong!