Pivotal GemFire: The Scale-Out, In-Memory Distributed Data Grid for Mission-Critical Applications

December 2, 2016

Scaling Out Strategic Data-Driven Applications

For today’s data-driven companies, executing on business strategy means developing and deploying custom data-intensive applications at scale. These applications help companies optimize business processes, create new revenue opportunities and enhance competitive differentiation.

To achieve success, these applications must be delivered with performance, scale, global reach and always-on availability. Often, developers have difficulty meeting these service level expectations by scaling out traditional RDBMS’s and they cannot meet data consistency requirements with most in-memory data grids and caching technologies.

These days, application developers and IT architects are having to meet ever higher service level requirements, dealing with many terabytes of operational data utilized in thousands to hundreds of thousands of concurrent transactions at global levels of scale previously only seen in the most extreme applications.

Pivotal GemFire is an in-memory distributed data grid for high scale custom applications. GemFire provides in-memory access for all operational data spread across hundreds of nodes with a “shared nothing” architecture. This enables GemFire to provide low latency data access to applications at massive scale with many concurrent transactions involving terabytes of operational data. Designed for maintaining consistency of concurrent operations across its distributed data nodes, Pivotal GemFire can support ACID transactions for massively scaled applications such as stock trading, financial payments and ticket sales in proven customer deployments of more than 10 million user transactions a day.

Originally developed to serve data for mission critical applications in the financial industry, GemFire offers built in fail-over and resilient self-healing clusters to allow developers to meet the most stringent service level requirements for data accessibility.

Scale-Out Performance

In-Memory Storage: GemFire stores all required data in random access memory across distributed nodes to provide low latency access to data, eliminating the performance penalty associated with reading data from disk. GemFire allows storage of data in off-heap memory, i.e. memory space that is not part of the JVM heap. This removes much of the processing overhead associated with the general purpose Java garbage collector. Off-heap memory resides in the same process space as the JVM itself, but the Java garbage collector isn’t aware of it. Instead, GemFire provides its own proprietary garbage collector for managing off-heap memory. The net effect is to significantly increase the amount of data that can be managed in-memory and made available to mission critical, data-intensive applications.

Elastic, Linear Scalability: GemFire provides linear scalability that allows administrators and developers to predictably increase capacity for number of operations per second and data storage simply by adding additional nodes to a cluster. Data distribution and system resource usage is automatically adjusted as nodes are added or removed, making it easy to scale up or down to quickly meet expected or unexpected spikes of demand.

Optimized Data Distribution Across Nodes: GemFire will automatically optimize how data is distributed across nodes to optimize latency and usage of system resources. Administrators and developers can also configure partitioning and replication of data to further optimize application response time. GemFire will appropriately direct processing operations on data to the specific nodes where data resides in order to reduce latency and network traffic.

Consistent Data Grid Operations Across Globally Distributed Applications

Performance-Optimized Persistence: To ensure durability of data in the event of node failure, GemFire writes to disk a log of all creates, updates, and deletes of data managed by a node to disk. This log can then be read to reconstruct the last consistent state of the in-memory data grid on that node when a node comes back online.

Configurable Consistency: GemFire is capable of providing ACID consistency across distributed nodes to support high capacity transactional applications. Administrators and developers can also configure consistency models for higher performance such as allowing the entire grid to cache and operate on data, or turn consistency off for highest performance caching.

Distributed Queries and Regional Functions: Pivotal GemFire supports the Object Query Language (OQL) for authoring queries. Queries are sent to the appropriate nodes that serve relevant partitions of data. Query results are then merged and sent back to the client application. Developers can define indexes on key values to improve performance. Similarly, when functions that operate on classes of data are invoked, processing will be routed to appropriate nodes responsible for serving partitions of targeted data.

High Availability, Resilience and Global Scale

Cluster Resilience and Fail Over: GemFire provides continuous uptime with built in high availability and disaster recovery. Multiple failure detection models detect and react to failures quickly, ensuring that the cluster is always available, and that the data set is always complete.

Resilient Self-Healing: GemFire self-healing automation allows nodes that fail to quickly rejoin a cluster once they become operational again, with fast startup, reconnect, and incremental updates of changed data, all handled without administrator intervention.

Rolling Upgrade: When it becomes time to deploy a new version of application or GemFire software, system administrators can take advantage of redundancy zones to update portions of a cluster automatically at the same time. The remaining redundant nodes can stay operational serving the application in a highly available manner. This means no planned maintenance downtime is required for version upgrades or patching.

Cluster-to-Cluster WAN Connectivity: GemFire allows multiple clusters to be connected via WAN gateways. This allows application data access to span across the globe, and allows companies to meet local data requirements, such as country-specifc privacy regulations. WAN connected clusters also enable multi-site fail over capability, ensuring ongoing availability and built-in disaster recovery in the case of catastrophic failure.

Powerful Developer Features

Data Types, Languages and API Support: GemFire allows developers to manage data from user-de ned classes as well as JSON documents. Native language clients and support are provided for Java, C++, and C# programming languages. Applications written in other programming languages can access the same features via a REST API. Other supported API’s include Java Hashmap, Memcached, and Spring Data GemFire.

Flexible Schemas and Versioning: GemFire schema serialization, called PDX, allows data types to be dynamically modi ed, such as when new kinds or a new version of an application are deployed against the same data nodes. The system automatically bridges between application versions allowing the di erent versions to work with the same data, since schema type information is dynamically discovered while processing queries from any version application. Data types in PDX are language independent, allowing applications written in any language to access the same data.

Out of Box Caching and Powerful Application Features: Developers can add GemFire caching to their applications running on Pivotal Tc Server with little or no modification to their application code. Tc Server will cache user sessions, even across web servers and data centers. Spring L2 Hibernate is also supported in Tc Server. Developers need only annotate their code to invoke this Spring framework capability.

GemFire provides powerful advanced application features to developers that want to leverage its distributed caching and data grid capabilities. Like many data grid platforms, developers can embed and generate queries, in GemFire’s case using OQL. OQL can also be used to set up “continuous queries” that return a streaming result set updated whenever there are new entries meeting your query criteria. GemFire provides a sophisticated event handling mechanism providing a publish and subscribe approach and durable asynchronous queues suitable for mission critical application requirements.

Easy Administration of Distributed Nodes

Automated Tuning & Simplified Cluster Configuration: GemFire is built to automate administrative tasks as much as possible. This includes automating tuning of system resources between nodes in a cluster by intelligently managing the placement of data while reducing network round trips. Data gets replicated only to those nodes that need the data, and requests for access are routed intelligently using the most direct path available. This data placement and resource allocation is adjusted automatically if nodes are added to, or removed from, the cluster. Furthermore, node configuration is handled centrally with automatic redundancy for high-availability. New nodes can get their configuration from the centralized configuration manager upon startup to quickly join a cluster with no additional system administration tasks.

Comprehensive Monitoring & Administration Tools: GemFire provides a comprehensive set of online and offline tools for monitoring and administering clusters. The online dashboard allows drill down into cluster and node status, and querying of stored data. The offline analytics tool allows diagnosis of system bottlenecks through analysis of historical statistics logging. A command line tool allows administrators to take action on clusters and nodes such as starting, stopping and configuring settings.

Integrated Security: GemFire’s client/server connection for access to data handles security via server authentication and authorization. Administrative access to data via GemFire’s command line interface, the Pulse monitoring tool, or any third party JMX based or REST based administrative tools, is also authenticated and authorized by role. Operations like viewing region information and performing data browser operations can be secured individually allowing granular control over permitting these operations.

Flexible Deployment Options: GemFire runs in Java Virtual Machines in 32 and 64-bit mode on Windows, Linux, and Solaris operating systems. Client nodes running in C++, C#, .Net, and Java are supported. Other popular web-scale programming languages such as Ruby, Node.JS, Scala, and Python can access GemFire capabilities via REST API. GemFire grids can be set up with active/active multi-site bi-directional WAN replication to enable disaster recovery, business continuity, and geographical proximity for lowest possible latency worldwide.

Real-Time Transactions Meet Real-Time Analytics

The GemFire-Greenplum Connector enables mirroring of data between GemFire and the Pivotal Greenplum data warehouse. The connector supports bi-directional data ow so that changes to data in one system can be propagated to the other, ensuring that both systems remain synchronized. The connector brings together the unique attributes of each system.

Enterprises often have operational and transactional processes that need to run at high speed (supported by GemFire) and analytical processes that need to run on data that is large scale (supported by Greenplum). Single systems that try to handle both run into too many trade-offs.

A common use of the connector is to conduct analysis and build predictive models on big data in Greenplum, the output of which is used for scoring data that is streaming into GemFire (fast data). GemFire’s low-latency and event-based features ensure that business events with a limited window of opportunity for in uencing an outcome are addressed rapidly. The analysis from Greenplum ensures that the response delivered is ‘smart’, i.e. based on the output of analytics.

  Download the PDF

Previous
Redis for Pivotal Cloud Foundry: Quickly Deploy an Advanced Key-Value Store and Cache as a Service
Redis for Pivotal Cloud Foundry: Quickly Deploy an Advanced Key-Value Store and Cache as a Service

Next
Pivotal Cloud Foundry: The Power of an App-Centric Approach
Pivotal Cloud Foundry: The Power of an App-Centric Approach