What do Charles Schwab, HCSC, and CoreLogic all have in common? They’re building cloud-native data pipelines with Spring Cloud Data Flow and Pivotal Cloud Foundry. Developers are unleashing the power of enterprise data by connecting data sources in modern ways.
With today’s release of Spring Cloud Data Flow for PCF, that gets even easier. These two products are now tightly integrated together. Let’s explore why this is such a powerful combination.
Customers told us that the traditional methods for integrating data between enterprise systems have painful shortcomings. These legacy data integrations tend to be hard to maintain. They don’t work real-time, nor do they support continuous delivery. And connections are often authored with proprietary tooling, adding even more complexity.
Pivotal’s customers told us about these challenges. They wanted a better way to build data pipelines. The recent release of Spring Cloud Data Flow 1.3 does just that.
Spring Cloud Data Flow (SCDF) offers a complete toolkit for building data integration and real-time data processing pipelines. Pipelines consist of Spring Boot apps, built using the Spring Cloud Stream or Spring Cloud Task microservice frameworks. Consequently, SCDF is ideal for a range of data processing use cases, from import/export to event streaming and predictive analytics.
Your peers in the enterprise are using SCDF to solve a range of data-centric use cases. Check out these videos for a closer look:
- Charles Schwab describes how they use Spring Cloud Data Flow to identify latency via tracing data.
- HCSC deconstructs an ETL monolith into a low-latency event-driven processing pipeline.
- CoreLogic presents on their Batch Processing implementation that enables real-time processing of events.
Real customers use Spring Cloud Data Flow to drive real solutions! Now, let’s look at what comes out-of-the-box for PCF customers.
Spring Cloud Data Flow for PCF: What’s Included
What can Pivotal Cloud Foundry customers expect when they install the Spring Cloud Data Flow for PCF tile? Here’s a look at the highlights:
Simple service instance setup. The tile uses the standard Cloud Foundry Service Broker model to create Data Flow service instances. In turn, these instances deploy a Data Flow server, a Data Flow Metrics-collector, and Spring Cloud Skipper for granular lifecycle management for the applications in a streaming data pipeline. It’s all pre-configured, and it “just works.”
Integration with the Cloud Foundry UAA security model. As with other services on PCF, UAA handles authentication and authorization to the Data Flow server via the Shell and the Dashboard.
Support for Pivotal’s popular managed data services. By default, SCDF for PCF can deploy these familiar services to support Data Flow service instance operations:
Support for custom data services. Easily configure Data Flow instances to work with custom data services. Build pipelines to integrate with services from public cloud providers (e.g. Azure SQL Database or GCP’s BigQuery) and those managed by your enterprise IT staff (on-premises SAP or Oracle).
Cloud Foundry CLI plugins. Boost developer productivity! Use the
Automatically download and attach to a Data Flow service instance, via the appropriate Data Flow shell binary. No manual steps!
View aggregated Data Flow server, Data Flow metrics, and Skipper logs to troubleshoot runtime issues.
Dozens of “starters” to get you up and running quickly. There are many pre-built apps, ready to connect to your data. Building custom pipelines is a snap!
With all of this goodness inside, what is the best way to get started? Glad you asked! Let’s dive into a tutorial, and demonstrate how some of the basics can take you a long way.
Getting Started: Creating Your First Data Pipeline
This tutorial will help platform engineers get SCDF for PCF installed via Pivotal Operations Manager and make the
p-dataflow service available to developers who can then deploy streams and tasks. To get started, platform engineers can download the Spring Cloud Data Flow for PCF tile from the Pivotal Network. Install the tile. Once installed, the
p-dataflow service will be available in the PCF marketplace:
Developers may now create a new
p-dataflow service instance. Configure it to use the default data services by running the following command:
The service instance will be created asynchronously. Once the service instance is created successfully, you can deploy streams or launch tasks using the Data Flow server’s Dashboard or the Shell. We’ll focus on using the command line in this tutorial.
Developers may use two Cloud Foundry CLI plugins. Let’s install the first plugin, to ease Data Flow service instance interactions. This is the Spring Cloud Data Flow for PCF plugin; it will download, install and attach a Data Flow shell to the service instance. This plugin is installed with the following command:
To attach the Data Flow shell run the following command:
Now that the Data Flow shell is attached successfully, we can install the second plugin: the Service Instance Logging plugin. Use it to troubleshoot data pipeline issues. This plugin will stream the logs of the Data Flow service instance’s backing application logs, including its companion application-logs of Skipper and Metrics-collector. Install it with the following command:
Once the plugin is installed, you can run the following command to look at the most recent logs:
Or watch the logs streaming in real-time without the
Now that we have a service instance successfully created, and the Cloud Foundry CLI plugins available in our environment, it is time to create our first data pipeline. We can take advantage of the Spring Cloud Stream application starters for SCDF by importing them into our Data Flow service instance. For this example, we’ll use a service instance created with the default data services. Let’s run the
app import command in the SCDF shell:
If you want to see all of the applications that are available to use for developing data pipelines after the import, run the
app list command in the SCDF shell.
The next step is to create and deploy a stream. This simple example will take in POST data via HTTP and then split the data into words, which are then logged as output. In the SCDF shell, create the example stream definition using the following command:
Once the stream is defined we can deploy it using the following command:
After the stream is deployed successfully, you will see the following applications in the space where you created the Data Flow service instance:
And here is how it looks on the Data Flow service instance dashboard:
At last, we can test our stream. Start up a terminal that will be for watching the `words` log output:
In a separate terminal, send a HTTP POST request using the Data Flow shell to the `http` source application URL with a phrase that will be parsed in the stream:
words log terminal you should see the following output:
We’ve done it! We have created a stream that will take in text from an HTTP endpoint, parse it into its individual words, and log the parsed words as output. I'm sure you can imagine a set of enterprise scenarios such as taking database record change events and updating downstream systems based on those changes.
Let’s Get Building!
With Spring Cloud Data Flow for PCF, you have everything you need to break down these ossified siloes once and for all. Join your peers, and build data pipelines to power new, data-driven applications. Run them on a modern cloud platform, and unlock new value for your enterprise!
Ready to integrate your data sources in new and better ways? Download the SCDF for PCF tile. Read the docs here. Then, check out the Spring Cloud Data Flow sample applications site to try out more scenarios.
About the Author
Chris Sterling is Principal Product Manager for Spring Cloud Services at Pivotal (https://www.pivotal.io). Chris published the book Managing Software Debt: Building for Inevitable Change with Addison-Wesley in 2010 to provide a framework for teams and organizations to assess and manage debt in their software systems. Chris has successfully supported organizational transformation across multiple verticals with organizations of 10 up to 800 people. Chris co-founded a company in 2009 called Agile Advantage focused on solving portfolio management problems to leverage the value that Agile teams can deliver, which lead to a successful acquisition by Rally Software. Chris brings his diverse experience and deep passion for technology when presenting on topics such as Continuous Delivery, Cloud Native architecture, DevOps, Lean and Agile.Follow on Twitter More Content by Chris Sterling