Introducing Spring Cloud Data Flow

September 15, 2015 Sabby Anandan

sfeatured-SpringCloudDataFlow-textLarge-scale online service architectures for enterprise applications have been validated this year, through the use of composable microservices on structured platforms. The most powerful and successful products for continuous delivery of these enterprise applications are Spring Boot, with over 1.4 million downloads per month, Spring Cloud, which includes the NetflixOSS components, and Pivotal Cloud Foundry. Together, these products comprise the Pivotal Cloud Native stack.

When enterprise developers think of microservices, they think RESTful services and dynamic service discovery, precisely what Spring Boot and Spring Cloud provide. But enterprise applications don’t stop at these basics, so we have continued to define and refine the Pivotal stack to bring the benefits of our structured platform to the full range of development scenarios.

Today we announce Spring Cloud Data Flow: Cloud Native composable microservices for ingesting, transforming, storing, and analyzing data. Spring Cloud Data Flow enables developers to build both real-time and batch processing applications using the same simple-yet-powerful model as their Spring Boot RESTful microservices, and run these applications on the Pivotal Cloud Foundry structured platform.

From Spring XD to Data Flow

We originally designed Spring XD as a standalone product for easily building sophisticated, distributed data pipelines for real-time and batch processing. The 1.x architecture has proven itself to be a powerful tool for a range of applications, including traditional enterprise ETL, connected vehicle data collection, and real-time analytics. Our experience with 1.x, Spring Boot, and Pivotal Cloud Foundry revealed new ways to take a Cloud Native approach:

  • New Requirements: It is complex to create and maintain an integrated pipeline for data workflows. There are market requirements for uninterrupted scaling capabilities, canary deployments, dynamic resource allocation, and distributed tracing. The current architecture didn’t allow us to build this easily.
  • Broader Scope: Big data problems are still integration problems: data must be ingested, scrubbed, enriched, transported, stored, and processed in myriad ways. While we already supported big data use cases, not every integration or batch use case requires Apache Hadoop® or Apache Spark for persistence and crunching data. Yet regardless of scale or application, there are integration requirements for every type and size of enterprise application.
  • Focus: Operational and non-functional capabilities were pioneered and battle-tested by platforms like Pivotal Cloud Foundry. Instead of investing efforts on redundant features, and accumulating technical debt, we wanted to realign and focus on value for customers rather than on the undifferentiated heavy lifting of a runtime platform.
  • Deployment and Operation: Setting up a distributed cluster of Spring XD runtime requires peripherals such as ZooKeeper, a message transport, and a database. Given these moving parts, it is particularly difficult to size the cluster, as the requirements vary depending on scale and throughput expectations. We wanted to rely on elastic scaling capabilities of the platform to simplify operational requirements instead.

Spring Cloud Data Flow is Spring XD reimagined as message-driven microservices: Spring Boot data microservices, coordinated through Spring Cloud Services including Eureka, and deployed on Pivotal Cloud Foundry. The Spring XD runtime is gone, replaced by a service provider interface (SPI) that takes advantage of native platform capabilities, including Pivotal Cloud Foundry, Lattice, or Yarn. This refactoring produces a radically simpler architecture and unifies traditional RESTful microservices with new, message-driven microservices.

A Better XD

With Spring Cloud Data Flow, developers continue to use familiar XD tools such as the java-dsl, xd-shell, admin-ui, flo, and rest-api’s. At the same time, the newly redesigned architecture abstracts all the boilerplate configurations to assist with cleaner and reliable ways to orchestrate stream and batch data pipelines.

Since integration and batch modules are Spring Boot applications, we can port them as-is to rapidly compose them into data pipelines. Nothing needs to change at the application level, they’re automatically Spring Cloud Data Flow-compatible.

Operators need not worry about setting up a distributed cluster anymore. The Spring Cloud Data Flow Admin SPI is a Spring Boot application itself. The Admin SPI is a thin shim that is Spring profile-driven and auto-configured for service discovery, channel bindings, etc. An operator’s only responsibility is to deploy the Admin itself on the targeted deployment environment, such as Pivotal Cloud Foundry, and bind to respective external services, such as Kafka, Redis, and RabbitMQ.

Spring Cloud Stream

While Spring Boot underlies Data Flow, developers can use Spring Cloud Stream to develop message-driven microservices using Spring Integration’s declarative programming model and run them locally, in the cloud, or on Spring Cloud Data Flow. Spring Cloud Stream offers a collection of patterns to quickly build message-driven microservice applications that can independently evolve in isolation. These applications can correspond to source, processor, sink, and job modules in the current Spring XD architecture, but now operate as portable and autonomous deployable units.

Cloud Native Data

Spring Cloud Data Flow is the Cloud Native framework for data, dramatically increasing the scope and power of composable microservices. The Spring Boot, Spring Cloud, and Spring Cloud Data Flow projects provide the foundation for a comprehensive microservice architecture. Running on Pivotal Cloud Foundry, they represent a complete Cloud Native stack for the enterprise, enabling all developers and architects to enjoy the benefits of what’s been learned from the large-scale online service world.

Learn More:

Editor’s Note: Apache, Apache Hadoop, Hadoop and Apache Spark are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

 

About the Author

Sabby Anandan

Sabby Anandan, Product Manager in the Spring Team at Pivotal. He focuses on building products that address the challenges faced with iterative development and operationalization of data-intensive applications at scale. Before joining Pivotal, Sabby worked in engineering and management consulting positions. He holds a Bachelor's degree in Electrical and Electronics from the University of Madras and a Master's degree in Information Technology and Management from the Carnegie Mellon University.

Follow on Twitter Follow on Linkedin More Content by Sabby Anandan
Previous
Talking DevOps ROI with the Finance Department
Talking DevOps ROI with the Finance Department

In this week's podcast, Coté speaks with Ed Goodwin, a programmer friend who went and got an MBA and entere...

Next
Pivotal and Cognizant Partner to Drive Cloud-Native Development
Pivotal and Cognizant Partner to Drive Cloud-Native Development

Gartner predicts that by 2020, 75 percent of application purchases supporting digital business will be "bui...