Spring Cloud Data Flow 1.3: Continuous Delivery, Usability Improvements, and Function Runner

February 2, 2018 Sabby Anandan

We are pleased to announce the general availability of Spring Cloud Data Flow 1.3.

Before we dive into the release highlights, it is noteworthy to share the pointers to customer presentations from SpringOne Platform. We have had the privilege to host customers from different industries to present real-world implementation of Spring Cloud Data Flow and the ecosystem of projects. We would like to thank the teams at Charles Schwab (Latency Tracing with SCDF), CoreLogic (Batch Processing with SCDF), HCSC (Low Latency Event-Driven ETL), and Liberty Mutual (Domain Driven Design with Spring Cloud Stream) for their continued commitment to the open-source.

Here are the release 1.3 highlights followed by the overview for each of the category.

Spring Cloud Data Flow Ecosystem Updates

Continuous Delivery with Spring Cloud Skipper

Security

Dashboard Modernization

Java DSL

Usability Improvements

New Applications

Function Runners

Core Improvements

Spring Cloud Data Flow for PCF

Helm Chart for Kubernetes

Ecosystem Updates

Spring Cloud Stream: In the recent Ditmars release-train (roundup), Apache Kafka and RabbitMQ binder implementations continued to evolve with a variety of stability improvements. A notable feature is the promotion of Kafka Streams binder implementation to a top-level project. The following screencast walks through the Spring Cloud Stream programming model and how it simplifies the developer experience while building event-driven data aggregates, continuous queries, and stateful stream-processing applications.

Spring Cloud Task: Though the framework didn’t evolve to the next minor release, a significant effort has been put in to improve the Spring Batch/Spring Cloud Task and its integration to Spring Cloud Data Flow.

Thank you to the users switching from Spring Batch Admin to Spring Cloud Data Flow for all the questions and feature requests. We are carefully curating the requests, and we plan to deliver them in incremental releases.

Both Spring Cloud Stream and Spring Cloud Task’s 2.0 development is actively underway. We are closely aligning with Spring Boot’s 2.0 release timeline.

Continuous Delivery with Spring Cloud Skipper

A brand new project in the Spring Cloud Data Flow ecosystem!

It is not a trivial task to continuously deliver streaming data pipelines, especially when there’s a dependency to how the data pipelines are used in the cloud platforms. Skipper is a lightweight Spring Boot application. It was purpose-built to fulfill this gap in Spring Cloud Data Flow. You can read more about Skipper from the project-site and the reference documentation.

One can build and run a streaming data pipeline made of Spring Cloud Stream apps. However, to rolling-upgrade an app or a series of them in a production setting without interrupting the data processing, our users have had to switch to the cloud platform and use its blue-green deployment model. We wanted to simplify that. With the newly added DSL primitives for stream updates, Skipper can support granular application lifecycle operations in Local, Cloud Foundry, or Kubernetes. Here’s the snippet of how the application level granular updates can be possible.

dataflow:>app register --name transform --type processor --uri maven://com.eg:transformer:0.0.1

dataflow:>app register --name transform --type processor --uri maven://com.eg:transformer:0.0.2

dataflow:>stream create foo --definition “jdbc | transform | mongodb” # uses 0.0.1 (default version)

dataflow:>stream update foo --properties “version.transform=0.0.2” # updates it to 0.0.2

dataflow:>stream rollback foo # rolls it back to 0.0.1

There’s Shell-client, JavaDSL, and REST-APIs to perform these new operations. For more details, watch the Skipper and SCDF-integration presentations by Mark Pollack at SpringOne 2017.

Security

Skipper is built with OAuth2 support from the ground up.

If SCDF and Skipper are used together (to leverage the CD benefits), we have automated the OAuth token propagation from SCDF to Skipper, so the implicit single sign-on has been taken into account, and thus the backend Skipper calls from SCDF are automatically protected.

To reuse the existing security infrastructure that was already embedded in SCDF, we have extracted the common security modules into a new project named Spring Cloud Common Security Config. Skipper and SCDF use this library today. We anticipate reusing the library for Metrics Collector, Provenance, and similar companion applications in the future.

Dashboard Modernization

A brand new Angular 4 based Dashboard UI!

A significant improvement has been the utilization of TypeScript. In conjunction with Angular 4, TypeScript is a natural fit for Spring developers and reduces the friction between server-side and client-side code.

Though it was a foundation-level UI stack refresh, the overall interactiveness, improved UX workflows in Flo, and the look and feel of the Dashboard were the core focus. The team also paid attention to state management, cache busting, and test harness improvements.

Java DSL

The programmatic approach to building data pipelines remains one of the popular methods in SCDF. We have had DataflowTemplate and DataflowOperations, which helped with automating the programmatic registration of apps and defining and deploying streams and tasks (see example). Further improving upon this, we added the Java DSL, which provides the Fluent API-style development experience. The following function builds and deploys a data pipeline made of 3 data-centric microservice applications.

public void deploySimpleStream() {

 StreamApplication source = new StreamApplication("http").addProperty("server.port", 9900);

 StreamApplication processor = new StreamApplication("filter").addProperty("expression", "payload != null");

 StreamApplication sink = new StreamApplication("log");

 Stream simpleStream = streamBuilder.name("simpleStream")

       .source(source)

       .processor(processor)

       .sink(sink)

       .create()

       .deploy();

}

Usability Improvements

The addition of autocomplete support in the Shell brings another level of usability improvement. TAB-TAB-TAB - a natural method to bring the possible “names” or “properties” depending on the context.

A developer typically iterates over the business logic, and when they’re ready with the logic change, they want to register the application with SCDF to do a dry-run test. Sound familiar? Yes, we have heard a lot of this coming from the customers and community. For applications resolved from Maven repository or HTTP-served locations, it can now be always downloaded, thus avoiding to override the Disk-Quota limit allocated for the maven caching.

Another popular request from the community is to help with the remote debugging capability. We have got you covered! With Local-server, it is now possible to deploy a stream/task in SCDF that can orchestrate and spawn the applications with remote debugging mechanics. It is straightforward to attach a debugger from an IDE to add breakpoints and step through the runtime specifics of the developed business logic.

Deployment properties are typically applied when there’s a requirement to influence the runtime aspects of the applications. It is relevant for use cases where you want to deploy a stream with extra memory or CPU resources, to deploy a stream with scaling characteristics, and others. Typically, once these deployment properties are standardized, they continue to be reused upon recurring topology adjustments or platform updates. Take, for instance, when the stream processing is disrupted due to a platform failure or when the streams are migrated from one environment to another (such as DEV → QA → UAT → PROD). It is cumbersome to re-supply those properties over and over. This release adds persistence support for deployment properties. That simplifies the workflows mentioned above, because there’s no more manual intervention: It can be reused from the deployment repository.

Applications

MQTT-source and MQTT-sink applications can interact with IoT devices.

TensorFlow processor graduates to a top-level application that, when used in a stream processing data pipeline, can help with real-time predictive model evaluations. Checkout the Twitter-sentiment model-prediction use of it to get an idea.

Improving upon the data science capabilities, Python-HTTP and Python-Jython processors are available as top-level processor applications.

The new bit.ly links, Celsius.SR1 and Clark.GA, are available to bulk import and register all the out-of-the-box applications in SCDF.

Function Runners

It is an excellent time in the industry to be a developer. FaaS platforms have become the mainstream topic of discussion by the developers and the community.

Spring Cloud Function was introduced to bring the Spring Boot goodness and the developer experience to a standalone implementation of business functions. Once you have a series of business functions, there can be requirements to orchestrate them as a series of operations. If you squint your eyes, you can see how it looks similar to Spring Cloud Data Flow’s DSL and orchestration primitives.

Building upon this thought process, we have teased out the initial take on running simple business functions in SCDF, with the help of a general purpose function-runner application. Like any of our other out-of-the-box applications, the function-runner is a Boot application, and it is built as a separate application with RabbitMQ and Apache Kafka binder implementations. With this, when creating a streaming pipeline, it is possible to supply a Spring Cloud Function jar and the “function” to run in the DSL. Here’s an example of the user-experience (see full-sample here).

dataflow:> stream create foo --definition "http | function-runner --function.className=com.example.functions.CharCounter --function.location=file:///<PATH/TO/SPRING-CLOUD-FUNCTION>/spring-cloud-function-samples/function-sample/target/spring-cloud-function-sample-1.0.0.BUILD-SNAPSHOT.jar | log"

The responsibility of the developer is more focused at the level of the business function, and the function-runner, along with SCDF, takes care of running it on platforms such as Kubernetes, Cloud Foundry, and others.

Core Improvements

While adding all the exciting new features, we also wanted to address the technical debt. Several improvements to the core foundation were added. These include the consolidation of redundant service layers, cleaner separation between client and server components, and overall optimization to eliminate cycles. With these improvements, we hope the community can benefit from extending the core infrastructure around the APIs, Shell-client, `DataFlowTemplate`, Dashboard, and DSL.

Spring Cloud Data Flow for PCF

Spring Cloud Data Flow’s Cloud Foundry tile has been in a closed-BETA state for the last few months. We have iterated on customer and field feedback and it is set to graduate out of BETA to a 1.0 GA status officially. This release automates the provisioning (including the metrics-collector, skipper, database, and message-broker) along with end-to-end OAuth/SSO integration in Cloud Foundry. There are a lot of other value-adds, so stay tuned for a more focused discussion, documentation, and pointers to the tile-page in Pivotal Network.

Helm Chart for Kubernetes

Spring Cloud Data Flow’s helm-chart will be updated to the latest 1.3 GA release once kubernetes/charts#3525 is merged. With this chart, the latest release of SCDF along with the companion components (such as metrics-collector, skipper, database, and message-broker) can be automatically provisioned with the following helm-commands.

helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator

helm repo update

helm install --name scdf incubator/spring-cloud-data-flow --set rbac.create=true

Join the Community!

We are entirely a distributed team spanning across eight time zones, so one of us will be always online. Reach out to us in Gitter, StackOverflow, or in GitHub. Lastly, please try it out, ask questions, and give us feedback. We welcome contributions!

 

About the Author

Sabby Anandan

Sabby Anandan, Product Manager in the Spring Team at Pivotal. He focuses on building products that address the challenges faced with iterative development and operationalization of data-intensive applications at scale. Before joining Pivotal, Sabby worked in engineering and management consulting positions. He holds a Bachelor's degree in Electrical and Electronics from the University of Madras and a Master's degree in Information Technology and Management from the Carnegie Mellon University.

Follow on Twitter More Content by Sabby Anandan
Previous
Kubernetes, Serverless, and DevOps, with Paul "Czarkernetes" Czarkowski (Ep. 94)
Kubernetes, Serverless, and DevOps, with Paul "Czarkernetes" Czarkowski (Ep. 94)

Next Presentation
The journey to DevOps: What I learned after leading transformation at 2 Enterprise Companies
The journey to DevOps: What I learned after leading transformation at 2 Enterprise Companies

SpringOne Platform 2017 Brian Roche, Cognizant The rapid advancement of technology has created an inescapa...

Dedicated DevOps Topic Page

Read Now