We are pleased to announce the general availability of Spring Cloud Data Flow 1.5.
What’s new in 1.5?
Spring Cloud Data Flow 1.5 includes enhancements to the ecosystem (related projects), Metrics Collector, nested-splits support in Composed Tasks, continuous deployment improvements in Skipper, as well as an enhanced user experience.
Ecosystem & Related Projects
Spring Cloud Stream
Spring Cloud Stream 2.0 recently reached general availability (GA). In this next generation, the framework has evolved significantly in many areas including a complete rewrite of content-type negotiation strategy, addition of polling consumers, new binding actuator endpoint to start, stop, pause, and resume consumers, and the metrics-emitter compatibility with Micrometer. All of these features directly apply to streaming workloads in Spring Cloud Data Flow. Join the webinar on June 7th to learn more about the feature improvements.
Spring Cloud Task
Spring Cloud Task 2.0 has also reached the GA with key capabilities to assist with orchestration mechanics in Spring Cloud Data Flow. A notable improvement is the addition of distributed locking mechanism - it provides a control plane for Task workloads that aren’t meant to be run concurrently. For the other feature improvements, check out the 2.0 release blog.
Spring Cloud Skipper
For Kubernetes users, it is now also possible to define multiple Kubernetes cluster backends (eg: team1-dev, k8s-qa, etc.). Having the ability to pick and choose the deployment environment for Apps/Packages is a key value-add of Skipper, and this edition brings feature parity for both Cloud Foundry and Kubernetes.
Skipper’s backend uses JPA for database interaction. Given the pluggable database support in the JPA layer, with the help of Concourse, we have automated database schema evolution testing against several databases. The current GA release is at 1.0.4—it is a highly recommended upgrade.
Spring Cloud Stream App Starters 2.0 release-train, “Darwin”, is approaching general availability. All the out-of-the-box applications individually will be upgraded to Spring Boot, Spring Cloud Stream, and the Spring Cloud upstream changes. A gRPC-processor along with Tensorflow Object-detection-processor and image-recognition-processor applications join the Darwin release-train.
Spring Cloud Task App Starters 2.0 release-train, “Dearborn”, is approaching general availability as well. To provide feedback or request feature improvements for this or later release trains, please start a discussion in Gitter or StackOverflow.
Newly Improved Metrics Collector
A companion server for Spring Cloud Data Flow, Metrics Collector gets a facelift with Spring Cloud Stream 2.0 and Micrometer. This release adds support for 1.x and 2.x based Spring Cloud Stream applications. Regardless of the versions, the Metrics Collector probes into individual application stats to reconstitute a data pipeline level end-to-end metrics.
Metrics with InfluxDB and Prometheus
Metrics continued to be an essential theme for this release. Building upon Micrometer, we have experimented with end-to-end data pipeline metrics such as send rates, receive rates, error rates, end-to-end latency, and the correlation of message broker stats such as publish, consume and reject rates. With the GA of Spring Cloud Data Flow and the native Micrometer support for the streaming Apps, it's now possible to monitor and understand the overall behavior of a distributed application in a data pipeline.
Check out the Prometheus and InfluxDB solution architectures to get an idea of how the respective backends can be used to build end-to-end visibility into streaming data pipelines. Support for ElasticSearch and Atlas are also coming soon.
Nested Splits and Composed Tasks
Building upon the existing linear-splits and branching capability, support for nested-splits is now also available in 1.5. Nested-splits are particularly important when there are a number of data processing flows and sub-flows that are required to be processed and managed as a coherent direct-acyclic graph (DAG). For instance, imagine a scenario with three sources of data that needs parallelly processed, of which two needs to be split further to include additional processing steps. The DSL definition for this type of DAG would look like:
<<extractFTP && cleanseFiles || extractS3 && enrichFiles> && mergeFiles || extractHDFS && removeNulls>
A graphical representation of the DAG would be:
We’ve continued to enhance the Dashboard in order to make it more fluid, lightweight and interactive. Our focus areas included: bulk-operations, context-specific navigation, customized paging, stream-builder optimization, and bringing a crisper look and feel to the overall UI components. We are also committed to continually improving the quality of the UI in future releases.
Take it for a spin and let us know what you like, and what we could improve!
Join the Community!
We are entirely a distributed team spanning across eight time zones, so one of us will be always online. Reach out to us in Gitter, StackOverflow, or in GitHub. Lastly, please try it out, ask questions, and give us feedback. We welcome contributions!
About the Author
Sabby Anandan, Product Manager in the Spring Team at Pivotal. He focuses on building products that address the challenges faced with iterative development and operationalization of data-intensive applications at scale. Before joining Pivotal, Sabby worked in engineering and management consulting positions. He holds a Bachelor's degree in Electrical and Electronics from the University of Madras and a Master's degree in Information Technology and Management from the Carnegie Mellon University.Follow on Twitter More Content by Sabby Anandan