Spring Cloud Data Flow 2.2 Delivers Value-Adds for Ephemeral Microservices on Cloud Foundry and Kubernetes

July 30, 2019 Sabby Anandan

We are pleased to announce the general availability of Spring Cloud Data Flow 2.2!

Operational improvements for ephemeral microservices running on Cloud Foundry and Kubernetes are the primary themes of this release.

Observability continues to be the key improvement areas, now with the option to interact with the Spring Cloud Task and Spring Batch application-metrics using Micrometer. Adding to that, the integration of Grafana widgets in Spring Cloud Data Flow’s dashboard simplifies the overall monitoring experience for Tasks.
For users with 100s of 1000s of ephemeral task execution records, there’s now the ability to traverse through, filter, and be able to delete the task executions from the historical snapshots.
For users who want to stop a task execution intentionally, there’s an on-demand ability to hard-stop the task or even a composed-task with the parent-child relationship.
For expedited troubleshooting, a new platform-agnostic log-streaming RESTful endpoint was added to retrieve application logs.

Let’s dive into the details.

Metrics for Spring Batch, Spring Cloud Task, and Composed Tasks

Metrics and monitoring for short-lived ephemeral tasks remain as one of the most frequented topics in the community and at customers.

Spring Batch added integration with Micrometer in 4.2.x. Building upon this core infrastructure, Spring Cloud Task 2.2.x adds instrumentation to track and record Task execution metrics in the desired Micrometer backends (e.g., Prometheus, InfluxDB, etc.).

With the foundation release bits aligned, it is now possible to create and launch Task definitions in Spring Cloud Data Flow (SCDF) and interact with the captured metrics using the Micrometer meter-registries. See below the screencast from Christian Tzolov of the Task-execution metrics in SCDF with InfluxDB and Grafana in action.

Cloud Native Batch Data Pipelines

Though there’s the increasing evidence for streaming-first architectures, Batch processing is still an essential requirement in the enterprise. Modernization efforts to refactor batch-jobs as cloud-native standalone ephemeral microservice applications are increasing. The following value-adds were added explicitly as part of the direct feedback from customers and the community.

Delete Task Executions

The scheduled batch-jobs that run as a Task in SCDF interact with a relational database to track and persist statistics for historical snapshots, and at times for auditing reasons. The statistics include comprehensive transactional data that tend to grow over the period. 100s of 1000s of Task transaction records is a typical footprint in the enterprise.

Though there’s the need for historical snapshots, we have feedback from multiple sources to purge the old task executions. Many of the users are manually deleting older transaction records using custom SQL queries to keep a manageable footprint.

We are solving that with a new RESTful endpoint that can selectively delete the footprint by a date range or in bulk entries. The SCDF dashboard integrates with the API to interactively remove the historical footprint gracefully.

Stop Task Executions

Similar to deleting the historical Task executions from the database, we have feedback from users who wanted to hard-stop a running Task from running.

There could be a variety of reasons for when a hard-stop would be useful. Here are a few examples:

Consider a Composed Task with 3 step batch-jobs set up to run in sequence one after the other. If you intend to stop at the first step, you didn’t have the option to prevent that step, and as well as any additional steps that will launch afterward. It leads to unintentional launches and thus corrupting the business data.
Another typical case we see in the field is when someone accidentally launches a Task (or a Composed Task) with wrong task-parameters and task-arguments. Once again, it wasn’t possible to hard-stop the Task execution to prevent the downstream anomalies.

We are pleased to announce the addition of a new RESTful endpoint that can traverse through to find the Task and its parent-child executions, to collectively stop them together from Cloud Foundry or Kubernetes. Furthermore, the SCDF dashboard provides a single-click option from Task executions to stop them interactively.

Pluggable Composed Task Runner

The Composed Tasks has turned into the de facto approach to orchestrate a complex connected batch-jobs topology. Due to popular demand to alter the choreography of Composed Tasks, we have simplified the extension model so that the users can plug a custom Composed Task Runner (CTR) to customize the particular Task-launch behavior, and default to the out-of-the-box CTR that we ship for the remaining Task launches.

Streaming/Batch Applications Logs

Developers love tools to troubleshoot applications - really anything to improve the feedback loops. While designing and deploying streaming and batch data pipelines in SCDF is easy to get started; if the apps in the data pipeline fail to start, developers dig into the logs first.

In 2.2, we are adding a new RESTful API to retrieve logs of the streaming/batch applications either running locally, in Cloud Foundry, or Kubernetes. The platform-agnostic API endpoint can bootstrap with default role/authorizations, or it can be customized with the desired security setup, too. With the API endpoint accessible for the logged-in user, the application logs are a few clicks away from the dashboard.

Operators can directly also interact with the API endpoint to drain the application logs to the Application Performance Management (APM) tooling of choice.

Check out the screencast from Gunnar Hillert where he walks through the task execution deletes, stops, and the streaming logs.

Kubernetes Deployments

The Spring Cloud Deployer Kubernetes [see: Deployer SPI] provides the facade to bootstrap the application with relevant configuration overrides and its deployment to Kubernetes. SCDF relies on the deployer implementation to deploy the streaming/batch applications included in the data pipeline as bare-pods to Kubernetes. In SCDF 2.2, the deployer implementation adds the option to influence the deployment of streaming and batch data pipelines with:

nodeSelector: Pin the streaming/task deployments to a unique node.
secretKeyRef: Inject secured credentials to the deployed streaming/task pods.
configMap: Resolve configurations at runtime to alter the stream/task behavior.
securityContext: Deploy a stream/task pod with a specific user or group-id.

Shipping with Confidence

With a two-months cadence for minor releases, we needed to improve the confidence further to ship quality software. To that end, we have invested in the resources to build out the infrastructure to run end-to-end SCDF acceptance tests of streaming and batch data pipeline deployments running in Local, Pivotal Cloud Foundry (PAS - 2.5.x), Kubernetes (GKE - 1.11.x, 1.12.x, and 1.13.x), and Pivotal Kubernetes Service (PKS - 1.11.x, 1.12.x, and 1.13.x). The goal is to continue to automate the streaming and batch topology deployments - hence the increased confidence in incremental release deliverables.

Deeper Integration with Pivotal Cloud Foundry and Kubernetes

The commercial version of SCDF for Pivotal Cloud Foundry will build upon the 2.2 GA release in the coming weeks. Likewise, the SCDF helm-chart for Kubernetes will update to 2.2 GA shortly. Stay tuned.

Join the Community!

We can’t wait for you to try out Spring Cloud Data Flow 2.2, and if you want to ask questions, or give us feedback, please reach out to us on Gitter, StackOverflow, or GitHub. We’ll also be at SpringOne Platform 2019 – the premier conference for building scalable applications. Come meet us there and join thousands of like-minded Spring and Java developers to learn, share, and have fun in Austin, TX from October 7th to 10th. Use the discount code S1P_Save200 when registering to save money on your ticket.

About the Author

Sabby Anandan is a Product Manager on the Spring Team at VMware. He focuses on building products that address the challenges faced with iterative development and operationalization of data-intensive applications at scale. Before joining VMware, Sabby worked in engineering and management consulting positions. He holds a Bachelor’s degree in Electrical and Electronics from the University of Madras and a Master’s in Information Technology and Management from Carnegie Mellon University.
More Content by Sabby Anandan

The People Behind PAS for K8s

Three Signs It’s Time to Upgrade Your Continuous Integration Tool: A True Story

Spring Cloud Data Flow 2.2 Delivers Value-Adds for Ephemeral Microservices on Cloud Foundry and Kubernetes

Metrics for Spring Batch, Spring Cloud Task, and Composed Tasks