The Way to Hadoop Native SQL

September 29, 2015 Gavin Sherry

sfeatured-MADlibHAWQ-to-Apache Today, Pivotal open sourced HAWQ and MADlib, contributing them to the Apache Software Foundation (ASF) and the global Apache Hadoop® community. Both projects are now in official incubating status with the ASF. You can find the HAWQ project page here, and MADLib here.

What this means for Hadoop, its users, and the larger community is that there is now a full featured, SQL standards compliant, battle tested and proven interactive SQL engine purpose built for demanding, analytical workloads and business transformation available in open source.

Data is, increasingly, one of the most valuable assets a company can possess. As technologists, those companies we admire and consider to be a gold standard in working with data (Uber, Netflix, etc) have built data infrastructure for themselves that integrates directly with their applications, their users and their business. With the contribution of HAWQ and MADlib, we hope that we’ve brought some of the most important missing building blocks into the open so that all can benefit.

It also means that these same businesses can gain an enormous head start on solving real business problems by leveraging the work of hundreds of commercial, academic and open source developers that formed the basis for the MADlib framework. With an open source library of over 30 principle in-database, scalable algorithms, industries of all types can advance their machine learning agendas.

A Decade In The Making

Both HAWQ and MADlib represent more than a decade of R&D that has taken place at Pivotal and, before that, Greenplum. HAWQ incorporates the SQL processor and relational query engine of the Greenplum Database. What started as Greenplum on Hadoop has evolved significantly to a system recast in terms of Hadoop.

Similarly, MADlib grew out of a collaboration between researchers at UC Berkeley, University of Wisconsin, University of Florida and engineers and computer scientists at Pivotal (formerly EMC/Greenplum). Designed for in-database analytics, MADlib relies on the massively parallel processing power of Greenplum Database and HAWQ.

Developed with and proven in some of the most demanding environments, across a wide variety of industries, we feel that the combination of these two technologies enriches the Hadoop ecosystem and can help accelerate adoption and success of the platform.

Bigger Than Pivotal

So why move them from Pivotal’s engineering to the open governance of the Apache Software Foundation? The rationale lies in the fact that over the last decade, the database industry has seen radical transformation. To give but a few examples:

Open source is now totally mainstream and factors into enterprise buying patterns at all levels
The rise and rise of mobile and Internet-of-Things workloads has surpassed every expectation of scale
The migration of systems and database research away from veteran vendors toward startups and Internet companies has led to completely new approaches, including Hadoop
The demonstration by those startups and Internet companies of the power of data, when combined with rapid, continuous delivery of applications, to drive engagement with customers

These industry level shifts have lead to the inevitability of Hadoop as the fundamental substrate of new generation data warehousing. As the technology is finding solid ground in more and more enterprises, the demand is accelerating, and the need for these tools is bigger than Pivotal, or Pivotal’s customers.

The Journey to Hadoop Native SQL

We feel that by contributing HAWQ and MADlib to the ASF, making them bigger than Pivotal, and continuing to integrate them deeply into the Hadoop ecosystem is a first big step toward building not only a Hadoop Native SQL engine, but ultimately an entire Hadoop Native, data center-class, high performance analytic database infrastructure.

We still have work to do to realize such a vision. Today is but the first big step. We hope you’ll join us on this journey.

Editor’s Note: Apache, Apache Hadoop, and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

About the Author

Biography

Open Data Platform Initiative: Putting an End to “Faux-pen” Source Apache Hadoop Distributions

In the wake of the Open Data Platform (ODP) initiative announced earlier this week, Pivotal’s Roman Shaposh...

Financial Compliance: New Frontiers with Data Science

In this post, two expert Pivotal Data Scientists explain new ways to help financial institutions address co...

The Way to Hadoop Native SQL

A Decade In The Making

Bigger Than Pivotal

The Journey to Hadoop Native SQL

About the Author

Previous

Next

The Way to Hadoop Native SQL

A Decade In The Making

Bigger Than Pivotal

The Journey to Hadoop Native SQL

About the Author

Previous

Next

Related content in this Stream

Following the xz supply chain attack blog, explore security and trust in open source with VMware Tanzu's secure container solutions and proactive measures.

VMware Tanzu empowers Netflix accelerates its service evolution and boosts the capabilities of its development teams. Tanzu helps to provide them with the platform to run on and scale.

Unveil regulatory compliance ease with VMware Tanzu Spring Runtime! Elevate audits, adhere to FIPS & NIST standards, benefit IT, DevOps, and Auditors.

Uncover open source risks and the 'Zero CVE' myth with insights on continuous lifecycle management. Discover how VMware Tanzu supports diverse projects effectively.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This blog provides a summary of VMware Tanzu CloudHealth news and product updates for the month of April, 2024

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

How VMware Tanzu CloudHealth helps customers uncover spiraling AWS Extended Support charges.

VMware Tanzu enhances Spring development with simplified operations, accelerated innovation, seamless microservices transition, increased security, and effortless scaling.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

Bitnami-packaged open source software is loved by developers for its ease of use, which enables developers to directly pull a Bitnami package and seamlessly start using it with little effort.

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.