Big Data & Brews: Pivotal’s Milind Bhandarkar on Why Hadoop is Like Rocket Science

April 29, 2014 Paul M. Davis

Hadoop’s impact on the emergent Big Data industry cannot be overstated, and Pivotal’s Chief Scientist Milind Bhandarkar has played a key role in its development since its early days. In conversations with Datameer’s CEO Stefan Groschupf for the Big Data & Brews video series, Bhandarkar talks about the early days of Hadoop’s development, his professional experience, and how his unique long-view perspective has influenced Pivotal’s vision.

Screen Shot 2014-04-29 at 7.51.47 AM

During the video interviews, Bhandarkar also speaks to his role within Pivotal, as well as his involvement in the early days of Hadoop development, while working at Yahoo! Both in terms of technology and talent, heritage is a key differentiator for Pivotal’s Hadoop distribution. Pivotal HD boasts robust custom components developed by a team with combined decades of Hadoop experience, who are armed with a leading analytical data management portfolio.

The two major components are HAWQ, an extremely fast SQL engine running on top of the Hadoop File System, and Pivotal GemFire XD, an in-memory SQL processing engine which enables persistent storage on top of HDFS. These components represent the evolution of Greenplum Database and VMware vFabric SQLFire respectively, bringing those technologies’ speed, reliability, and maturity into Pivotal’s Hadoop stack.

As Bhandarkar explains in the video, HAWQ iterates upon a decade of work invested in building the lightning-fast Greenplum Database, bringing that product’s advantages — speed, reliability, and the common, powerful, and expressive language of SQL — to data stored natively in HDFS.

Concurrently, the Pivotal GemFire XD component enables real-time SQL analysis of in-memory data. This allows for high-speed data ingestion and processing for a tiered system geared toward data prioritization and availability — a hot layer of in-memory data, and warm and cold layers of data that resides in HDFS but can be easily queried through HAWQ, MapReduce, or GemFire XD.

As Bhandarkar states, Pivotal’s Hadoop stack enables rapid ingestion and analysis of new data in-memory, while iteratively moving the data to HDFS clusters where it can be easily accessed and quickly processed. “Within a few minutes that data gets on to HDFS,” he says, “which then becomes queryable not only by GemFire itself, but also by MapReduce, by HAWQ, or whatever technologies you have which [are] actually querying the same data.”

Screen Shot 2014-04-29 at 7.49.39 AM

This points to the Business Data Lake model embraced by Pivotal and partner Capgemeni in a recent announcement. The Data Lake metaphor speaks to Hadoop’s ability to store seemingly unlimited amounts of data inexpensively, with the value for finding insights from that mass of data being the role of components running atop HDFS, such as HAWQ and GemFire XD.

Bhandarkar’s experience affords him a long-view perspective, demonstrated both by his career and choice of beer for the Big Data & Brews segment. In the interview, he tells Groschupf that he would break his career into two distinct eras, the first being from 1991 to 2005, when he worked on high performance computing projects for the government of India, building the country’s first indigenous super computer in the country. Following this achievement, he moved to the United States to earn a PhD in Parallel Computing at the University of Illinois at Urbana–Champaign.

After he received his PhD, Bhandarkar decided that the academic path was not for him. Soon after, he was tapped by Yahoo to work on a project that would basically “revamp the entire search content engine.” They began working on a project named Juggernaut — research and development work inspired by Google’s seminal MapReduce and GFS papers.

In the interview, Bhandarkar details that he worked on a small team in those early days of developing what would become Hadoop. The team was abundant with folks focused on search infrastructure, such as Eric Baldeschweiler (head of Yahoo search engine content formerly at Inktomi), Sameer Paranjpye (who built the first version of Dreadnaught, a precursor to Juggernaut,) and former NASA/Ames researcher Owen O’Malley.

“Hadoop is rocket science,” jokes Groschupf during the video interview, To which, Bhandarkar jokingly said, “Yes it is rocket science,” recounting his previous experience at Center for Simulation of Advanced Rockets at the University of Illinois.

Screen Shot 2014-04-29 at 7.50.31 AM

In short time, the team realized that what they were working on was a tool for data science — a tool which came to be known as Hadoop during the five and a half years that Bhandarkar spent at Yahoo. Bhandarkar was there on the ground floor, contributing his first batch of code to Hadoop with its 0.1.1 release, when he contributed the serialization system known as Hadoop Record I/O. This legacy continues with Pivotal’s commitment to contributing to these, and many more, open source software projects.

A decade later, Bhandarkar serves as Pivotal’s Chief Scientist. His role is to build technical strategies towards developing Big Data technologies. During his Big Data & Brews talks, Bhandarkar cites the integration of Apache Spark as an example of a project that he’s been watching for two years, investigating how its innovations can be applied to real use cases undertaken by customer’s using Pivotal’s unified platform-as-a-service.

Watch the entire Big Data & Brews video interview with Pivotal’s Milind Bhandarkar:

Big Data & Brews: Milind Bhandarkar on his Experience Leading up to Pivotal

Big Data & Brews: Milind Bhandarkar of Pivotal Talks About the Beginnings of Hadoop

About the Author

Biography

New Fellow Travelers Join the Cloud Foundry Foundation Mission

Today, Pivotal announces eight new Gold level members who have stated their intention to join the Cloud Fou...

CloudFoundry Performance Acceptance Tests

Simon Leung, Jonathan Berkhahn, and Danial Lavine discuss a framework they’ve created to run performance ac...

Big Data & Brews: Pivotal’s Milind Bhandarkar on Why Hadoop is Like Rocket Science

About the Author

Previous

Next

Big Data & Brews: Pivotal’s Milind Bhandarkar on Why Hadoop is Like Rocket Science

About the Author

Previous

Next

Related content in this Stream

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.

Experience enhanced security with Tanzu Application Platform. Elevate your organization's defenses from code to build with SLSA Level 3, image scanning scheduling & automatic upgrades for new patches.

Explore Spring's exceptional NPS score of 75, surpassing industry benchmarks by 18%. Discover why it matters.