For Digital Transformers, It's About Fast-Moving Data. Here Are Three Ways to Speed Up.

July 18, 2019 Richard Seroter

I just finished reading the book "AI Superpowers" by Dr. Kai-Fu Lee. Both inspiring and eye-opening, the book explained the rise and impact of Artificial Intelligence. Not surprisingly, CIOs around the world say that AI is a top priority for their organizations. And, lest we forget, AI is powered by data. Lots of it. 

Whether you're building recommendation engines, automating business activities, or just trying to have more timely information for decision making, it's all about processing data faster. It's not optional at this point; your customers are starting to demand that you effectively (and quickly) use data to improve their experience with you.

What You Have, and What You Want

Are you working at the typical enterprise? If so, your systems move data around in batches, use message brokers for a subset of business transactions, and analyze data after it hits the data warehouse.

Survey after survey shows tech people embracing event streaming. Why? The Royal Bank of Canada wanted more real-time data processing among business systems. Anheuser-Busch InBev tracked pallets of beer using RFID and performed data stream analytics to manage inventory and forecast trends. Zillow needed to ingest volumes of diverse data and run it through machine learning models to give customers near real-time home value estimates. These are good examples of the value of event streaming, but deployment of this pattern in the enterprise is slow going. According to Gartner, more than 80% of participants in the 2018 Magic Quadrant for Data Integration Tools reference survey said they "make significant use of bulk/batch" and are slowly growing their usage of more real-time mechanisms. Meanwhile, according to respondents from their annual data integration tools market survey, "47% of organizations reported that they need streaming data to build a digital business platform, yet only 12% of those organizations reported that they currently integrate streaming data for their data and analytics requirements." We have a way to go.

What does the new world look like? Pivotal SVP of Strategy James Watters had a viral tweet at the recent Kafka Summit in New York.

The gist? The modern company is frequently shipping software that processes data in real-time. Faster data processing means:

  • Real-time integration with partner companies

  • Data ingestion from a wider set of on- and off-premises sources

  • Up-to-date information flowing into business systems

  • Turning information into knowledge, faster

What To Do Next

You don't get to "faster data" by simply doing what you're already doing more quickly. It requires changes. Here are three changes you should make right now.

1) Evolve your architecture and extend it to the cloud

The standard enterprise data architecture isn't architected for data movement and event-based triggers. Rather, it optimizes for centralized data at rest, and pull-based data retrieval.

Today, you have data sourced from more places. Your architecture has to accept data ingress from edge locations, mobile devices, SaaS systems, infrastructure telemetry, and social media. This often means introducing iPaaS (integration platform-as-a-service), cloud storage, and API gateways to your architecture.

iPaaS products — think Dell Boomi or Azure Logic Apps — make it possible to build data processing pipelines out of cloud endpoints. Instead of building custom integrations and data transformations for each cloud or on-premises system, an iPaaS makes this a straightforward configuration. These tools cater to real-time processing and data movement and unlock access to off-premises systems.

Your data storage architecture also needs a refresh. As you collect and store more data for real-time and batch analysis, you'll need elastic storage. Ensure that your modern architecture takes advantage of cloud object storage and databases. Maybe you just use it for ingress caching or temporary analytics, or maybe Amazon S3 becomes your new data lake. Regardless, cloud storage and databases will play an increasingly important part of your strategy.

Finally, upgrade your microservices machinery, particularly API gateways. Appliances and monolithic instances aren't going to serve you well in a fast-moving, data-rich architecture. As more data comes into systems from outside the network, you'll want lightweight API gateways that scale to handle spikes, offer response caching, and are data-stream friendly. Consider Spring Cloud, which offers developer-friendly and configuration-driven software that caters to continuous delivery. In all cases, this is about evolving your data architecture for scale and speed.

2) Adopt a (stateful) streaming mindset

To be sure, moving wholesale from batch to streaming is a big change. It requires new technology and, more importantly, a new mode of thinking. Here's the shift you have to make:

 
 

Traditional Batch Processing

Modern Stream Processing

What data represents

You process things that “are”, such as orders, employee records, and today’s weather.

You process things that “happen”, such as new orders, employee promotion, and the latest temperature reading.

Data sets

Data is bounded and finite in size. You work with static files.

Data is unbounded, and conceptually infinite. You work with unending streams of events.

Data processing

Scheduled or irregular data dumps get processed in bulk and results are available once the batch is completed.

Data is processed as it changes, with constant results available to interested parties.

Storage

Store, then process data. Use databases or persistent storage for calculations over the data.

Process, then optionally store data. Logs and stateful computing options allow for in-memory calculations.

Time considerations

When something occurred (“event time”) doesn’t correspond to when data was observed in your system (“processing time”). Batch often works against processing time, which is easier to implement.

Event time is closer to the processing time. Calculations on the data often done in time-bounded “windows.”

 

If event time is captured, it’s easier to handle out-of-order events through practices like watermarking.

Role of middleware and clients

Business logic and data transformation happens in the ETL or messaging middleware.

Raw events are stored and made available to clients for later transformation. Clients may be responsible for backpressure handling.

 

3) Cater to Developers, Not Integration Experts

Early in my career, I built up expertise with integration middleware. It required specialized training and experience to use this powerful, but complex software. Customers of this type of software grew accustomed to the cost (e.g., bottlenecks in delivery and expensive specialists) that accompanied the capability to stitch systems together. Those days are thankfully disappearing.

Now? Connecting systems is the job of most people in technology, but instead of complex software operated by specialists, we’re using developer-friendly software and hosted platforms to quickly assemble our data-driven systems. You get faster delivery of data integration solutions thanks to a smaller learning curve.

Your data transformation empowers developers when you:

  • Offer on-demand access to data-processing infrastructure. Whether you’re deploying RabbitMQ clusters to handle business transactions, an Apache Kafka cluster to cache the event stream, or spinning up Amazon Kinesis for stream analysis, your developers get access to tech they need, when they need it. Use platforms that make it straightforward to create, and manage, this supporting infrastructure.

  • Introduce frameworks catered to event and data processing. Working directly with message brokers and event-stream processors isn’t easy for everyone. We’re fans of Spring Cloud Stream as a way to talk to messaging systems. Developers don’t need to worry about knowing the specific APIs or configuration details for a given system. They just need to write great Spring code.

  • Consider new protocols for processing real-time data streams. HTTP wasn’t designed for many of the ways we use it! That’s why protocols like gRPC and RSocket should intrigue you. In the case of RSocket, it’s a purpose-built protocol for reactive stream processing. This means native support for stream-based interaction models, connection hopping, and flow control.

To get better business outcomes through software, you’ll have to figure out how to get better with data. In most cases, that means processing more of it, faster. This requires evolving your architecture, adopting a streaming mindset, and improving your developers’ experience. We can help!

Want to learn more? You're in luck. Check out resources like:

About the Author

Richard Seroter

Richard Seroter is the VP of Product Marketing at Pivotal, a 12-time Microsoft MVP for cloud, an instructor for developer-centric training company Pluralsight, the lead InfoQ.com editor for cloud computing, and author of multiple books on application integration strategies. As VP of Product Marketing at Pivotal, Richard heads up product, partner, customer, and technical marketing and helps customers see how to transform the way they build software. Richard maintains a regularly updated blog (seroter.wordpress.com) on topics of architecture and solution design and can be found on Twitter as @rseroter.

Follow on Twitter More Content by Richard Seroter
Previous
Spring Cloud Data Flow 2.2 Delivers Value-Adds for Ephemeral Microservices on Cloud Foundry and Kubernetes
Spring Cloud Data Flow 2.2 Delivers Value-Adds for Ephemeral Microservices on Cloud Foundry and Kubernetes

Spring Cloud Data Flow 2.2 is now available. The release simplifies the operation of ephemeral microservice...

Next
Introducing RabbitMQ for Kubernetes
Introducing RabbitMQ for Kubernetes

Why Pivotal is building RabbitMQ on Kubernetes, what it is, and how to learn more.

SpringOne Platform 2019 Presentations

Watch Now