Audit Trails, New GUI, Kubernetes CronJob Integration, Streaming Application DSL, and More! Spring Cloud Data Flow 1.7 is GA.

October 26, 2018 Sabby Anandan

We are pleased to announce the general availability of Spring Cloud Data Flow 1.7!

Spring Cloud Data Flow (SCDF) is a toolkit for building data integration and real-time data processing pipelines to support IoT and other data-intensive applications.

You can download the Local, Cloud Foundry and Kubernetes releases from the Spring repository right now.

What’s New in SCDF 1.7?

Spring Cloud Data Flow 1.7 includes enhancements with support from other Spring projects, audit-trails for “streams”, “tasks”, and “schedules”, and a new dashboard. Further, Kubernetes users can now rely on a native `CronJob` spec integration to define and schedule batch-jobs. There’s also a brand new streaming-application DSL to define Kafka Streams applications in a data pipeline. Finally, the release improves support for file-ingest data processing workloads.

Let’s dive into each of the key features of this release.

Ecosystem / Related Projects

Spring Cloud Stream: Now with New Binders for Amazon Kinesis and Google PubSub

Developers use Spring Cloud Stream to build highly scalable event-driven microservices that communicate with a shared messaging systems. In the Fishtown release, we’ve added even more messaging systems as pluggable options! Let’s take a peek on what’s available and what is coming next.

Available Binder Implementations

The RabbitMQ, Apache Kafka, and Kafka Streams binder implementations are well-known to the community. We continue to improve the developer experience when using the binders.

Artem Bilan recently announced 1.0 GA release of Amazon Kinesis Binder. This is yet another binder implementation to graduate with the kind involvement of the community. Thank you, community, for all you do!

The partner implemented binders include: Google PubSub by Google and Azure Event Hubs by Microsoft (in development).

Kafka Streams

We all know that Kafka Streams has become a popular component when doing stateful stream processing. The Fishtown release-train includes Spring adaption of Kafka Stream’s State-stores, Interactive Queries, KTable, and GlobalKTable abstractions.

Spring Cloud Function and Spring Cloud Stream

An exciting newer approach to building “business logic” as an implementation of `java.util.function` functional interfaces are in the works. Oleg Zhurakousky leads an initiative to bring Spring Cloud Function’s programming model into Spring Cloud Stream natively. It completely transforms the way a developer writes streaming applications. Developers laser-focus on business logic. And now, they can “compose” multiple functional units in an application too. Stay tuned for a blog on this subject very soon!

Even More “Utility Applications”

Building upon the Spring Cloud Function, Spring Cloud Stream and the programming model improvements as discussed in the previous paragraph, there’s more exciting news for Spring Cloud Data Flow users. With the framework providing composability as an option, the out-of-the-box applications are now easily composable. Thanks to the customers and community who have given us the feedback on this area so far - it was long coming - we are happy to announce it in the coming weeks.

As part of the continuing development efforts, a Darwin SR2 and Dearborn SR1 are available. The next release-train (stream: Einstein | task: Elston) planning is in the works, which will bring Spring Boot 2.1 compatibility.

Spring Cloud Skipper: Deeper Support for Cloud Native Apps

Publishing applications to Cloud Foundry in its native form has become easier with a new `kind` named  `CloudFoundryApplication`. It is available in the 1.1 GA release of Spring Cloud Skipper. The newer model takes advantage of deploying applications using the Cloud Foundry’s application manifests.

To open up for the config-server use-cases where the application properties are typically managed centrally through the GitHub repository, a new `--force` option was created. This allows for a package redeployment, in a rolling-upgrade manner. The new property which is also supported in the REST-API can be used in scenarios with CI automation. Likewise, it bubbles up to Spring Cloud Data Flow, so a production running streaming pipeline (made of apps) can be “force” upgraded in a rolling upgrade manner.

Spring Cloud Task: Now with Distributed Tracking of Task Executions

What’s Spring Cloud Task? A framework for short-lived microservices to help you with finite data processing use-cases. Spring Cloud Task is the foundation for batch-processing experience in Spring Cloud Data Flow. The module makes it easy to operate your batch-jobs at scale because it includes rich lifecycle management and powerful orchestration abilities.

Specifically, for file-ingest and data processing use-cases, there is often a need to track the task’s execution status. Those of you doing batch-processing in Spring Cloud Data Flow requested this common feature. You asked, we delivered! Spring Cloud Task 1.3 includes the improvement, so the custom Task applications defined in SCDF can take advantage of it along with the Spring Cloud release-train upgrades.

A New Audit Trail Answers the Question “Who Did What, and When?”

When doing an audit, it is critical to understand the historical context at which an action was performed. This is no different for a class of actions in a data pipeline. To that end, a new audit-trail feature is available in SCDF 1.7.

The dashboard makes it easy to define and operate on the data pipelines. Now, it’s also easy to track the historical audit-trails. The streams, task, and schedules natively integrate into this workflow with no extra dependencies. Check out the screenshot below:

New Dashboard Simplifies How You Work with SCDF

Yes, you read it right - SCDF 1.7 includes a snazzy new dashboard! The GUI is more user-friendly, interactive, and more vibrant. As you can guess, lots of work behind-the-scenes took place to make this happen (foundation level changes, refactoring, etc.).

Here’s a short screencast from Gunnar Hillert showing the new experience.

Go ahead and try it today - see it for yourself. We are always looking for feedback, so let us know what you think.

Scheduling Batch Jobs in Kubernetes

In the SCDF 1.6, we introduced the ability to schedule batch-jobs/tasks in Pivotal Cloud Foundry. In this release, we offer this feature for Kubernetes! The implementation builds upon the Kubernetes CronJob spec at the foundation. Users can interact with the batch-job/tasks from SCDF’s dashboard to apply and operate on existing schedules defined as cron-expressions.

Streaming Application DSL For Applications With Multiple Input/Output Channels

Using Kafka Streams as a processor? Then you know it’s common for the processor to subscribe to multiple Kafka topics to perform correlation and aggregation. The same can be said about the computed outcome from the processor, where the downstream consumers can subscribe to multiple outcomes from the processor. While we could build a Spring Cloud Stream application following this pattern previously, it gets easier in this release. SCDF 1.7 adds a Streaming Application DSL. Users can now drag-and-drop applications with multiple inputs and output destinations in the data pipeline. Here's a short screencast from Andy Clement of this scenario in action:

The new DSL is also useful in the context of polyglot workloads. Here, one or more processors in a data pipeline are not necessarily Java applications. (Python is a popular choice). Such applications can directly publish and subscribe to and from destinations (a.k.a., Kafka-topics) defined in SCDF.

Cloud Native File Ingest and Processing

As we work with customers on their modernization journey, we often encounter repeating use-cases. We often solve for problems we’ve already seen before. The file-ingest and data processing in the cloud is a frequent use case. The 10,000-foot definition of the use-case looks like this:

“As a user, I’d like to monitor for new file events from a remote SFTP source, and on each new event, I’d like to launch a batch-job to kick-off an ETL operation.”

Although this problem has been solved many times on on-premise scenarios, doing this in cloud platforms brings new challenges:

  • Multiple SFTP hosts and directories recursively: The need to effectively monitor multiple SFTP hosts and directories.

  • Ephemeral volume-drives: Modern cloud platforms often treat the application-container and attached volume as ephemeral drives. Files can be large and there’s a risk that the container will become unresponsive. Reprocessing might become necessary in such scenarios.

  • Quotas: Traffic bursts may occur when there are 100s of new file events. Preventing saturation of available resources becomes also a much-needed feature.

These are hard problems. To address these challenges, we have made several improvements to different components in Spring Cloud Data Flow ecosystem. The addition of a new app (tasklauncher-dataflow), improved integration abilities with persistent Volume Services, and the native concurrent task-launch throttling facility.

Look out for a blog detailing the reference architecture coming out soon. In the meantime, if you’d like to discuss the solution, please join the Gitter channel and start a dialog.

Next Up: Deeper Integration with Pivotal Cloud Foundry and Kubernetes

Our commercial version of SCDF - Spring Cloud Data Flow for PCF will be upgraded to the 1.7 GA release shortly.  This module includes all the open-source features discussed here while automating the deployment of SCDF and its dependent services. From there, developers can use PCF’s Apps Manager module to deploy their own SCDF instances.

SCDF for PCF also includes end-to-end single sign-on and multi-tenancy features so enterprises can get started quickly, and operate at scale. The tile also brings automation and rolling-upgrades to the underlying stack versions.

We’re making good progress on the Kubernetes front too. Spring Cloud Data Flow’s helm-chart has been proposed to move out of incubation to the stable repository. In addition, we have plans to have an operator to further improve the provisioning and automation experience for Kubernetes.

Join the Community!

Reach out to us in Gitter, StackOverflow, or in GitHub. Lastly, please try out Spring Cloud Data Flow 1.7, ask questions, and give us feedback.

About the Author

Sabby Anandan

Sabby Anandan is a Product Manager on the Spring Team at VMware. He focuses on building products that address the challenges faced with iterative development and operationalization of data-intensive applications at scale. Before joining VMware, Sabby worked in engineering and management consulting positions. He holds a Bachelor’s degree in Electrical and Electronics from the University of Madras and a Master’s in Information Technology and Management from Carnegie Mellon University.

More Content by Sabby Anandan
Previous
Does Your Platform Engineering Team Have a Train Driver? Here’s Why You Should Consider this New Role.
Does Your Platform Engineering Team Have a Train Driver? Here’s Why You Should Consider this New Role.

What's a “Train Driver?” A new role that has helped the CloudOps group at Pivotal improve the life of our o...

Next
A UX Designer’s Guide to Overcoming Imposter Syndrome
A UX Designer’s Guide to Overcoming Imposter Syndrome

How designers in complex technical domains can quash self-doubt, wrangle tough problems, and deliver user v...

×

Subscribe to our Newsletter

!
Thank you!
Error - something went wrong!