Big Data & Brews Video Explains How Pivotal's Hadoop Distribution Is Different

April 15, 2014 Paul M. Davis

Pivotal HD differs from other Hadoop distributions in several important ways. It offers the flexibility and scalability of Hadoop while enabling a robust, integrated suite of services which are available through the Pivotal Big Data Suite. In a video interview for the Big Data & Brews series, Pivotal’s Chief Scientist Milind Bhandarkar shares a beer with Datameer’s CEO Stefan Groschupf and provides an overview of the many features that differentiate Pivotal’s Hadoop distribution from the rest.

Bhandarkar diagrams Pivotal’s Big Data offering from the ground-up, beginning with the bottom layer, which at its core is native Apache HDFS. In addition to core Hadoop functions such as MapReduce, Pivotal’s stack integrates HAWQ, a whip-fast, 100% ANSI compliant SQL engine ported from Pivotal Greenplum Database, which operates atop HDFS.

Screen Shot 2014-04-16 at 6.35.43 PM

This approach presents a number of advantages, including speed, the ability to create expressive queries in SQL, and security and monitoring tools.

Another advantage, as Groschupf states in the interview, is that migration from a traditional MPP database to a Hadoop-based architecture “could be absolutely pain free.” Aside from the removal of append-only tables due to HDFS limitations, “The execution engine essentially remains identical,” Bhandarkar says. Adding to its allure, because HAWQ is based on Greenplum, it delivers not only the compliance and performance you’d expect but also delivers support of the popular and powerful open source MADLib library of best practice analytic algorithms across a variety of industries, giving users a head start to mining their data.

Screen Shot 2014-04-16 at 6.36.11 PM

Pivotal’s suite also boasts Pivotal GemFire XD, which operates as an in-memory data store that offers a SQL query interface. The Pivotal GemFire XD component is optimized for rapid data ingestion and analytics, combining OLTP and OLAP while using Hadoop as the common storage layer. “These are the two components that are sort of a special in our Pivotal Hadoop distribution which are not available from elsewhere,” Bhandarkar says. “For scan based workloads—when you are scanning large amounts of data—we already have a product which is optimized for that.”

Screen Shot 2014-04-16 at 6.36.52 PM

At the top of the stack stands Spring XD, an application development layer that benefits greatly from GemFire’s speed and direct writes to HDFS. This enables the development of sophisticated data-aware apps that interact with Big Data stores in real time. Moreover, this enables a virtuous cycle of integrated data ingestion, extraction, and analysis, as Bhandarkar explains. “The data,” he says, “when it gets retired from GemFire XD actually lands in HDFS, so that it can be ingested back into HAWQ as well.” As Groschupf notes, this functionality “makes [for] a really strong enterprise application.”

Looking ahead, Bhandarkar says, Pivotal’s offering will integrate packages and features such as GraphLab, Open MPI, and Apache Spark. Yet two of the most compelling differentiators—HAWQ and Pivotal GemFire, running atop native Hadoop—are available now, he tells Groschupf.

Screen Shot 2014-04-16 at 6.37.58 PM

Also important to prospective customers is Pivotal’s support pedigree, which it has retained from EMC. “You can call them at 2:00 a.m.,” Groschupf says, “[and] they pick up.” Bhandarkar confirms Pivotal’s always-on support, which is essential for data-driven enterprises operating on a global scale, to which Groschupf jokes, “All right, I want to have that phone number.”

Watch the Big Data & Brews interview with Datameer’s Stefan Groschupf and Pivotal’s Milind Bhandarkar:

About the Author

Biography

More Content by Paul M. Davis
Previous
Partner 101: How To Do Business With Cloud Foundry
Partner 101: How To Do Business With Cloud Foundry

In this first of a two part blog, Pivotal's partner manager Nima Badiey provides an overview answering key ...

Next
PaaS Judo: With Cloud Foundry Ippon Deploys in 2 Minutes Versus 2 Weeks
PaaS Judo: With Cloud Foundry Ippon Deploys in 2 Minutes Versus 2 Weeks

Ippon Hosting CTO, Ghislain Seguy, recently shared a profound point about Cloud Foundry, “What used to take...