Moving from Monoliths to Microservices? Don’t Forget to Transform the Data Layer. Here’s How.

May 29, 2019 Gregory Green

Transforming large monolithic applications into microservices is often a worthwhile effort. Particularly when your scenario meets these 6 factors.

Why are engineering teams willing to invest in this effort? After a few decades in production, monolithic applications tend to show their age. They start to show these attributes:

 

  • Too difficult to enhance

  • Changes introduce long testing cycles

  • Different teams with colliding or conflicting requirements, changes and scheduling demands

  • Applications that are difficult to scale become a performance bottleneck

 

There is a proven playbook to deal with these applications. But what about the data? Sure, enterprise teams often make a effort to decompose monolithic applications into microservices. However, data management often remains the same.

 

Thankfully, this is changing. Many teams now realize that the goal is for both the application and data to be handled as a single microservice.

 

Challenges with Large Data Monoliths

Why does the data layer need to be transformed anyway? Three reasons stand out:

 

  • Data monoliths are brittle and change-resistant, a mismatch when paired with agile application development. Schema changes can have ripple effects.

  • They are expensive. When you consider all the database instances needed (dev, test/QA, prod, performance testing, training sandbox), the total infrastructure costs add up. Procuring resources adds wait time.

  • They scale, and are often a drag on performance. Why? Because they support a large number of applications with their own schemas. The problem is compounded by the frequency of ETL jobs. Cloud-native patterns demand horizontal scalability, and that’s not how these systems were designed.



What’s the ideal data store? You can probably guess. You need on-demand provisioning via self-service, and you need horizontal scaling!

 

Cloud Native Applications Need Cloud Native Data

So you need to transform your data monoliths. There’s good news: the same cloud-native principles that have evolved in support of applications can also be applied to data.

See “What Is "Cloud-Native" Data and Why Does It Matter?” by Richard Seroter.

Think of cloud-native data as data that’s scoped to match the domain of a single microservice. It’s an approach to refactoring the application data into smaller domains.

This reduces the scope of the monolith. During the initial phases of the transition plan, this data subset will be a replica of the monolith. The monolith can be retired once all the smaller (micro) data have been implemented.

The clients and downstream systems can use either the shared monolithic database or its cloud-native data for participle domains, with decreasing reliance on the monolith over time.

Let’s refer to the cloud-native data that is tied to microservices as “Microdata”. The migration of monolith to Microdata can be developed in phases. For example, in the picture below the monolithic database is migrated into cloud-native data across three phases. There is a common practice to convert the application logic first, and come back to the data 'later'. However, the full benefits of microservice are not complete until the data is also addressed.

Here, each phase introduces new microservices that progressively replace the monolithic architecture, for the application logic as well as the relevant data.

The number of phases is a function of complexity. It's up to the business stakeholders to determine the number of phases needed to completely and safely migrate the data to a modernized cloud-native data implementation.

The corresponding database schema tables are implemented as databases that are owned by (and bound to) each microservice as isolated resources. You can see how this addresses the challenges with large data monoliths mentioned earlier. This approach also yields the following benefits:

  • Releasing new versions of the microservice is easier because we’ve eliminated database dependencies across services.

  • Each microservice’s data can be scaled independently, which is more predictable and cost-effective than scaling an entire monolith.

  • Any lapse in the availability of a microservices data should not impact other microservices.

 

Implementation—How it Works

The most flexible architecture will look something like this:

With this model, clients and downstream systems can use the monolithic database or its MicroData interchangeably.

The following table provides an overview of the architecture modules. This should help developers implement this design.

  • Data  Access Object (DAO)—this layer involves exposing objects to be injected/consumed by clients. A DAO service is usually. The DAO platform can also wire the connection between interfaces object instances. This is typically implemented using dependency injection or Inversion Control (IoC) frameworks. For example, Spring Framework/Spring Data may be used for Java implementations.

  • Data Domain Objects (DDO)—DDOs are data structures that represent business entities. The objects hold the value of attributes along with any needed relationships in order to implement business features. They are the transport mechanism for read/write operations between the application, database/data store and or remote clients. They may be shared across internal components or external dependent downstream systems. Note that it is very important to version the Data Domain Objects that are shared across components/systems. Versioning these objects can reduce change management risks. Changes can be introduced into newer versions of the DDO if older clients cannot be updated.

  • Cache—The cache can be implemented with an in-memory database (IMDB). The cache plays a broad and critical role in this architecture. It provides fast performance for both read and write access to data. It forms the basis of the elasticity and scalability of the data layer. The cache maintains high availability to the data, even when the underlying persistent store is not available. Asynchronous updates and event-driven architectures, supported by the caching layer, protect the autonomy between teams and allow for high-velocity software development You can read more about the role of caching in microservices architectures in this white paper. The cache allows data domain objects to be stored in an intermediate storage area that is solely owned by the Micro Data. This can be implemented as Apache Geode or Pivotal Cloud Cache.  Note that many IMDBs can be configured to store data in persistent storage. This can allow data to be read from storage after a cache restart. Also persistence storage can be used when not all data can be cached in memory. If the Data Domain Objects can be referenced by a key, then a NoSQL IMDB can be used. If a query is needed to access data than a SQL based IMDB can be used.

  • Write (Behind/Through) - The MicroData acts as a wrapper to existing information that is cached. The Write (Behind/Through) module is normally a feature of the cache. It will save the data in the original database server schema when a Micro Data client performs a write operation of cached information. This allows existing components or downstream systems to continue to read directly from the database until the migration is complete. The write-behind can be disabled when the migration is completed. See Apache Geode Cache Writer.

  • Loader - The loader module keeps the Micro Data owned cache data in sync with its wrapped database sources. The loader is needed until all of the underlying legacy database sources/schemas are decommissioned. The loader should support an initial load of data in the Micro Data owned persisted data store. The external database source can use a cache eviction or triggering.

  • Persistence - A persistent storage is often needed in conjunction with the cache. This persistent data store can survive restarts and potential application failures. The cache can be loaded from the persistent storage using lazy or eager loading techniques. The persistent storage can be backed up for disaster recovery.

That’s a look at a modern model to carry forward. Let’s now go back to the status quo, and take a deeper look at the monolith we’re looking to modernize.

 

Let’s Look At How an Example Monolithic Can Be Transformed

The following is an example of a monolithic application/database. The application server represents a large monolithic system such as an Enterprise Application Server (EAS), Business Process Manager (BPM) or Enterprise Service Bus (ESB).

 

 

What is the problem with this architecture? These systems are one large program with many components that are hard to decouple. Over time, these systems become the “one stop shop” for all processing needs of a particular business unit. They also become complex, hard to change, and hard to manage.

Over time, the data in the monolith takes on the same characteristics. Very often one component (such as the “Customer” component) needs data-related changes that will break other components (ex: “Accounting” and “Origination”). What makes this particular architecture even worse is that administrators have granted external systems (like CRM or a marketing automation add-on read/ write access to the underlying database. These interactions may not be fully understood or managed at all!

Database admins can be challenged with continuously monitoring processes and reacting to frequent issues.  New functionality cannot be introduced because there is simply no more bandwidth. The system needs larger and more powerful hardware (which can take a long time to provision). The larger databases may require a significant amount of resources such as memory, processors, disk space and networking bandwidth.

 

Start the Monolith to Microservices Migration

The example architecture diagram below shows the initial migration phase.

In this case the customer, account, and relationship components are merged into a Micro Data named Client Mgmt (Management). These components use the knowledge and customer database schemas. It is important to note that the application is the system of record for the data domains related to the customer, accounting and relations components. If the application was not the system of record for the data domains, then an Anti-Corruption Layer pattern can be used. (Note that the Anti-Corruption Layer pattern encapsulates potential Data Domain Object changes into a more stable version used by the Micro Data and its clients. Only the Anti-Corruption Layer objects are impacted when changes are made externally.)

 

Now, let’s review how we think about “read” and “write” operations during this phase

Read Operation

The following describes the activities for reading data domain object information that exist in the monolithic database, but not in the Micro Data. This diagram illustrates how the data is retrieved from the monolith using an on-demand load pattern.

 

#

Activity

  1.  

A client initiates a read operation to a DAO component. The client may pass some criteria details that identify the Data Domain object to be retrieved.

  1.  

The DAO will use the Cache to get the data domain by a key based on the input criteria.

  1.  

The Cache object will return the value associated with the key provided. In the event of a cache miss (the requested key-value doesn’t exist in the cache), the cache can use the configured loader to load data domain object. Note that the loader would not be called more than once for the same key once the data domain object is loaded in the cache.

  1.  

The loader acts as a data access layer to select the raw records from the database of the legacy monolith applications. .

 

Ex: select * from customer, account where customer_id= :key and customer.acc_id = account.acc_id

  1.  

The loader will convert the result set from the select into a Data Domain Object. This is usually done using an Object Relational Mapping (ORM) framework like JPA.

  1.  

The Cache will store the data domain retrieved from the loader in the persistence storage of the microservice. This persistence layer can become the system of record once the monolith application is decommissioned. The persistence is used to refresh the Cache's data domains entries during restarts.

 

* The data domain object is returned back to the calling client.

 

Write Operation

The following describes the sequence of steps for writing data domain objects in the Micro Data that must also be saved in the monolithic application data store. Writing back to the monolithic application data store is typically needed if the monolith has system dependencies. This Write (Behind/Through) can be disabled once the third party dependencies are migrated to the Micro Data and the monolithic is fully decommissioned.

#

Activity

  1.  

The client uses the DAO write feature providing a data domain object.

  1.  

The DAO uses the Cache to put the data domain object by its key.

  1.  

The Cache will store the data domain in the persistence layer.

  1.  

The Cache should use a configured Write (Behind/Through) strategy to write the data domain to an external monolith database.

  1.  

The Write (Behind/Through) object will save the data domain data in the monolith data store such as a Relational Database Management System.

 

Ex: insert into customer(...) values(?,,,,,?)

 

It is important that failures to write to the database must also prevent cache entries from being written in order to maintain consistently. If there is a problem with writing to the persistence store the update to the cache will be rolled backed. This eliminates consistency issues.

6.

The cache is updated once the write completes successfully.

 

What Does an Example Data Architecture For Cloud-Native Microservices Look Like?

The following is an example of technical architecture that can be used to implement a Cloud Native Micro(service) Data architecture to migrate monolithic databases.

 

 

Technology

Overview

Spring Boot

Spring Boot is a Cloud Native framework based on Spring that simplifies the deployment of complex applications and their needed runtime dependencies. REST Web Services can be enabled with an Embedded web container such as through text-based configurations and Java annotations. It also provides runtime metrics, health checks and externalized configuration for operational support of production applications.Note the Pivotal Cloud Foundry (PCF) is a Multi-cloud platform for Spring Spring Boot applications.

Spring Data

Spring Data provides a data access layer for data stores. It provides wrapper implementations for many data technologies like JDBC, JPA, Apache Geode and GemFire. Spring Data can aid in the development of DAO(s), loaders and Write (Behind/Through) modules to Spring boot based applications.

JPA HIBERNATE

JPA HIBERNATE is a database access framework for the Java language.

Its’ support for mapping relational database tables to data domain objects  makes it a good choice for loaders and write-back implementations. The mapped object based on the Object Relations Mapping (ORM) configurations can be directly saved to the Cache by the loader, if the cache is object oriented, like Apache Geode and Pivotal Cloud Cache. These same objects’ data can be saved to the RDMS thru the object to relational table records configuration rules managed by ORM. Spring Data makes introducing JPA hibernate into Spring Boot applications easy through its Spring Data JPA implementation.

Apache Geode

Apache Geode is an In-Memory Data Grid (IMDG). Pivotal GemFire is powered by Apache Geode. Apache Geode client application programming in can be used as the Cache within Spring Boot applications. Apache Geode supports plugin modules to add loader and write-back strategy implementations to synchronize data externally with the monolith until it is fully decommissioned.

PCC/GemFire

Pivotal Cloud Cache (PCC) is a cloud native 12 factor backing service implementation of Pivotal GemFire what that works on PCF. GemFire is Pivotal’s commercial implementation of Apache Geode. See Spring Boot for Apache Geode & Pivotal GemFire.

PostgreSQL

Database

PostgreSQL (also known as Postgres) is a popular Relational database management system (RDBMS). It can be used as a persistence storage system of record for cached data used by the loader and write-back backed by Spring Data JPA.  See article Say Hello to Pivotal Postgres for more details.

 

Learning More

We’ve shown you how monolith legacy databases and apps can be transformed into a smaller, more manageable microservices.

We have seen companies move to these types of architectures. The decision to build this type of architecture generally increases the overall agility in how  the application’s data is managed.

Visit us at Pivotal.io to learn more about using technologies like Spring Boot, Spring Data and GemFire (powered by Apache Geode).

Also, see the following related content

 

 

 

 

About the Author

Gregory Green

Gregory Green is a Data Solution Architect at Pivotal. He has over 20+ years of diverse software engineering experience in various industries such as pharmaceutical, financial, telecommunication and others. Gregory specializes in Apache Geode, GemFire, SCDF, Spring, Pivotal Cloud Cache and Pivotal Cloud Foundry-based solutions. He holds a Bachelor and Master degree in Computer Science.

Previous
Every Microservice Needs a Cache. So Use Pivotal Cloud Cache!
Every Microservice Needs a Cache. So Use Pivotal Cloud Cache!

Getting started with Pivotal Cloud Cache (PCC)

Next
Announcing Pivotal Cloud Cache v1.7
Announcing Pivotal Cloud Cache v1.7

PCC 1.7 Brings OAuth 2 Integration to Further Centralize Credential Management

SpringOne Platform 2019 Presentations

Watch Now