As of today, Spring Cloud Stream 1.0.0.RELEASE is now generally available. The release includes new capabilities and improvements to help developers and operators develop and operationalize data microservices. Before we begin discussing the new capabilities, let’s start with the basics by reviewing the current state of data at the enterprise architecture and where it is heading.
Years worth of software components requires consistent overhauling. With decades old of legacy influenced architectures, large-scale enterprises face challenges adapting to modern development methods and practices. Martin Fowler says, “You don’t decide to refactor, you refactor because you want to do something else, and refactoring helps you do that other thing”—small, incremental, and continuous change is the key to digital transformation and continuous innovation.
We live in the era of microservices, Cloud Native applications, and containers. This movement is transforming the way enterprise deliver digital initiatives. As enterprises march towards their digital journey, flexibility to adapt to change and timely reaction to the data-driven business decision, matters. Typically, the enterprise transformation journey revolves around applications and the distributed operationalization. This, however, requires a tool-chain, framework, and best practices, so it is consistent and easily repeatable across all the teams within the organization—Spring Boot and Spring Cloud projects precisely addresses this requirement.
Now—what happens when there’s data present in this architecture? How does data blend into this modern development and deployment paradigm? Can data be part of this continuously deliverable software journey?
Enter, Spring Cloud Stream, an event-driven microservices framework powered by Spring portfolio of projects underneath that enables continuous delivery for data-centric applications. The core premise of Spring Cloud Stream is, Spring Integration meets Spring Boot and that together evolves into a lightweight event-driven microservices framework.
This new GA release allows users to:
- Develop using simplified programming model
- Create, unit-test, and manage data microservices in isolation
- Focus on application business logic and the messaging middleware access comes out-of-the-box, for free
- Build upon powerful common abstractions to customize core fundamental capabilities including middleware binding, data partitioning, client-consumer grouping, and pluggable binder API
Event-driven Stream Processing
Typically, a streaming data pipeline includes consuming events from external systems, data processing, and polyglot persistence. These phases are commonly referred to as Source, Processor, and Sink in Spring Cloud Stream terminology.
- Source: event source, that the downstream applications may consume
- Processor: consumes data in, does some processing on it, and emits the processed data to downstream applications
- Sink: either consumes from a Source or Processor and writes the data to the desired persistence layer
When Spring Cloud Stream’s Source, Processor, and Sink applications are composed together, they form an event-driven streaming pipeline. This composition in Spring Cloud Stream is made possible by pluggable messaging middleware abstraction—options available to choose from a variety of binders including RabbitMQ, Apache Kafka, Gemfire/Apache Geode (incubating), and Redis. With framework hiding the boilerplate and infrastructure concerns, developers can focus on the core business premise and develop standalone data-centric applications.
At a high-level, a developer journey would be:
- Select either the Stream RabbitMQ or Stream Kafka variant from Spring Initializr
- Generate Spring Boot project
- Open the project in an IDE
- Add the desired Spring Cloud Stream interface:
`@EnableBinding(Source.class)`, `@EnableBinding(Processor.class)`, `@EnableRxJavaProcessor`,or
`@EnableBinding(Sink.class)`to the Spring Boot application
- Write tests and incrementally evolve the business logic in compliance with the above contracts
- Build the project to produce uber-jar—self-runnable and autonomous data microservice application with an embedded application container
- Run and scale the standalone application in a structured platform such as Cloud Foundry, that’d be using
`cf scale`CF commands, respectively
A Success Story: Monolith To Data Microservices
With Spring XD’s Cloud Native redesign underway and it’s called Spring Cloud Data Flow, Spring Cloud Stream is the first microservice project to break out from Spring XD’s monolith cluster-based-orchestration runtime. As Spring Cloud Stream moved to GA, several development and operational challenges with respect to data pipelines are addressed with this Cloud Native redesign.
First, Spring Cloud Stream’s event-driven model along with its publish-subscribe semantics facilitates seamless broadcasting of data from the source to all of its subscribers—reducing the overall complexity of data pipelines. Developers can focus on the core business logic and forget about the infrastructure specifics including binder discovery, channel creation, and payload type-conversion as expected by the application business logic. As well, this helps practicing extreme programming methods; specifically, test-driven development at the data-centric business logic is now feasible. The business logic can be incrementally developed, evaluated, and continuously refactored—all without having to disrupt other data pipeline components or interruptions at the data processing layers.
Second, given the loose coupling and cleaner separation of concerns, both, data streaming and event-driven use-cases benefit sufficiently from the microservice architecture—because any of the application logic in the data pipeline can be independently operated, managed, scaled or monitored.
Lastly, the core benefit of natively running the data-driven applications within a structured platform such as Cloud Foundry, comes with several value-adds including Blue-Green Deployments, Security, Metrics, Logging, High Availability, Fault Tolerance, Multi-Tenancy, among many others.
Spring Cloud Data Flow’s GA release is next in line to build upon Spring Cloud Stream to support composition and orchestration for stream processing pipelines using the DSL, Shell, RESTful APIs, or Flo—the visual drag and drop GUI. As well, allowing bi-directionality between Streaming and Batch data pipelines using the Spring Cloud Task interface.
Given the high throughput stream processing and event driven use-case expectations, Apache Kafka plays a prominent role in Spring Cloud Stream’s binder story. Adapting to Apache Kafka’s 0.9 specs, Spring Cloud Stream’s Kafka binder will be redesigned to take advantage of Apache Kafka’s core improvements with partitioning, dynamic scaling, auto-rebalancing, and security.
- Visit the Spring Cloud Stream project site
- Check out the Spring Cloud Stream samples
- Review the list of common purpose “utility” Spring Cloud Stream Application Starters
- Find out how Spring Cloud Data Flow uses Spring Cloud Stream Application Starters from these samples
About the Author
Sabby Anandan, Product Manager in the Spring Team at Pivotal. He focuses on building products that address the challenges faced with iterative development and operationalization of data-intensive applications at scale. Before joining Pivotal, Sabby worked in engineering and management consulting positions. He holds a Bachelor's degree in Electrical and Electronics from the University of Madras and a Master's degree in Information Technology and Management from the Carnegie Mellon University.Follow on Twitter More Content by Sabby Anandan