Run Spring Apps 60x Faster with Spring Data + Pivotal GemFire

June 27, 2013 Stacey Schneider

Spring 60x FasterRecently, I was passed a case study by one of our Pivotal GemFire customers. The case claimed they built an app that allows them to manage “risk in real time and run calculations 60x faster than before.”

Naturally, this kind of improvement caught my attention, so I decided to dig in and find out why. The short answer is that they upgraded from a traditional database approach to one that was centered on Pivotal GemFire. Of course, the case offers a solid proof-point about how traditional databases will soon be broken and how companies are getting results with big, fast data platforms.

Since many development teams might be interested in this type of 60x performance improvement, this post takes a closer look at how Spring Data GemFire and Pivotal GemFire can be used to speed up just about any Java or Spring project, not to mention greater scale, availability, and geographic redundancy.

How GemFire and Spring Data GemFire Came Together

Even though we referenced a financial services solution above, it’s not just financial risk or trading that needs speed. Search, click analysis, online gaming, and travel applications like Southwest.com also need speed, high performance, and real-time insight. In virtually every industry and for every stage of an application’s lifecycle, we keep hearing developers talk about the need to go faster and scale linearly. The need goes beyond new or recently built apps too—there are many mainframe applications that need a performance boost and can help save tens of millions of dollars annually.

Pivotal GemFire was born out of these high performance requirements—ones like out-running a mainframe. The product first shipped in 2003 as the in-memory data grid (IMDG) category pioneer. It has solved the most demanding data problems in the world for hundreds of customers across industries like financial services, government, healthcare, retail, pharmaceuticals, travel, and more.

The Spring Data Project exists to improve development on the entire new world of data store technologies like Hadoop, Redis, MongoDB, Neo4j, HBase, Solr, Elasticsearch, Couchbase, and FuzzyDB. The idea follows the principle of the Spring platform in general: To make developers more productive and simplify the overhead of management. The Spring Data GemFire project provides Spring developers an add on component that expands the Spring Framework coverage to in-memory data grids like Pivotal GemFire:

  • Developers gain productivity in building fast, scalable apps
  • Projects have easier, big data on-ramp for Spring or Java-based development
  • Applications scale out through HTTP session state, L2 caching for Hibernate, and Memchached adapters

How Does Spring Data GemFire Help Developers?

spring-data-gemfire-repository

Spring Data already makes developers lives easier by supporting many of the NoSQL and grid data-stores in a neutral—with a common API and programming model. As well, Spring annotations make it possible to inject a data grid without writing any code. With Spring Data GemFire 1.2.1, Spring Data support continues moving into the NoSQL arena, adding to Spring’s historical data access support for transactions, exceptions, JDBC templates, ORM, object to XML mapping, etc. The core component is a new GemFire repository, and it makes Spring development on a cloud-scale data grid much easier.

The repository helps developers by mediating between the business domain of an application and Pivotal GemFire. It offers CDI integration with the Java Persistence API (JPA) and basic CRUD methods. There are also basic functions like pagination and sorting, among others. With the repository, you can basically do puts and gets on distributed key/value stores in a java.util.map kind of way. We can inject the repository into service classes without writing code, and we can use a SQL-like query language within Java without being constrained by a schema. One of the most powerful capabilities is the ability to derive query logic automatically from method names and parameters within your domain objects. Here are a few examples of how this automated, dynamic approach can help your app access GemFire data—you don’t have to write these, you can just declare them in an interface:

spring-data-gemfire-automated-query-methods

How Do Developers Commonly Deploy GemFire?

There are three common topologies for developing applications with GemFire. These can serve a wide variety of needs when it comes to Spring application development.

  1. Application Cache: With session management caching, you get fine-grain control on how you replicate session data across servers, partitions, WANs and more. Similarly, GemFire can improve the performance of Hibernate by reducing traffic to the database, partitioning for scale, and ensuring consistency without distributed locks. These capabilities help with large or long-lasting sessions, bursting into the cloud when new app servers are provisioned automatically, or when multiple applications need to share session information. The cache approach can also be used to read from and write to a database or be embedded within Java applications, and there is an adapter for Memchached clients, making GemFire a member of a Memcached network.
  2. Primary data store. Developers can work in Java, C#, or DOT.NET to access data. Meanwhile, GemFire supports durability and recovery like disk-based architectures and even allows a full relational database model to be placed on top of it through SQLFire, it’s sister product, where developers can use SQL via JDBC or ADO.NET. Of course, transaction support can be configured for full ACID compliance or an eventually consistent architecture. There is an extensible plug-in model for the security framework. As an object store, there are simple, intuitive, native APIs for storing XML or JSON—this allows you to access, query, or join data with full transaction support via mobile or web clients.
  3. WAN replication. This allows the GemFire data grid to span data centers in an active/active replication model. The WAN capabilities handle network performance, reliability, data inconsistency, and conflict resolution issues—many of which are configurable or customizable. For example, you can have a local cluster operate in an ACID mode while there is eventual consistency across the WAN. There are many fine-grained details about how high throughput and low latency is supported across the WAN.

5 Reasons Why GemFire is 60x Faster for Application Development?

1. In-Database Parallel Computations
For just about any Java program you’ve written to run on a single node, adding simple Spring annotations to your business methods transforms them—they can now run in parallel on the GemFire data grid, much like a MapReduce job or a stored procedure. GemFire supports the principle of moving the application behavior closer to the data for greater performance, and this model can execute any arbitrary functions in parallel while streaming aggregated results back to the client.

2. Tiered Data Management via JVM Cache
Many NoSQL and distributed memory management products allow you to manage state in multiple servers. GemFire allows you to embed a live “edge” cache within the application JVM to manage the hottest data sets like session or reference data. The server looks for changes inside the caches and automatically merges “deltas” to keep state refreshed.

3. Single Hops via Client Driver Meta Data
Multiple process spaces pool memory to give applications a view of all the information across hardware nodes, and edge clients are constantly made aware of partition meta data. So, clients can access data without intermediary hops—data is always accessed in a single hop. For example, accessing data on a 100-node cluster is still done in a single hop, reducing both serialization and network costs. As well, applications query via built-in, redundant data locators when data is grouped, partitioned, made redundant, or colocated for additional performance.

4. Distributed Sub-Queries
GemFire breaks queries up into distributed sub-queries in a map-reduce fashion—similar to Hadoop MapReduce but in one step. Queries run on separate nodes, and the results are aggregated. GemFire can also limit execution to only certain nodes and data instead of the entire architecture—this can actually be much more efficient than a traditional Hadoop Map Reduce model.

5. Colocation for Faster Joins
Joins can get expensive. Yet, it’s often reference data that has to be connected within a query. For example, a data mart might include a large fact table and join in smaller reference tables. With GemFire, OLTP joins can be sped up by colocating or replicating reference data on each partition. With this approach, the minimum number of nodes are involved in any given query, and the overhead of cross-server communication is avoided. This also allows queries run independently and in parallel.

Learn More

The combination of Spring and GemFire offer a powerful combination for any Spring application—both in terms of developer productivity as well as runtime speed.

Learn more about Pivotal Spring Data and GemFire:

About the Author

Biography

Previous
5 Key Highlights: A Field Report from #HadoopSummit
5 Key Highlights: A Field Report from #HadoopSummit

In this field report from the Hadoop Summit, we cover community growth, the focus on real-time data, SQL on...

Next
Installing Cloud Foundry on Vagrant
Installing Cloud Foundry on Vagrant

This is a guest blog post by Altoros, a software development firm specializing in PaaS enablement and integ...