VLDB 2015—Pivotal’s Chief Scientist Recaps Keynotes and Papers

September 23, 2015 Jignesh Patel

sfeatured-vldb-hawaii It was a memorable 41st Annual International Conference on Very Large Data Bases (VLDB). VLDB is the oldest database conference, tied with the other venerable database conference, SIGMOD, which became a conference in 1975.

Yes—the field of databases, as an independent community, started long ago!

Back then, we had a Turing Award winner from the community every decade. In case you are counting, it was Charles Bachman in 1973 (for IDS), Ted Codd in 1981 (for the relational model), and Jim Gray in 1998 (for transaction processing). The Turing committee must have missed a beat in the last decade, but they more than made up for it by awarding Mike Stonebreaker the Turing Award this year. In fact, the key highlight of VLDB this year was Mike delivering his Turing talk, which you can find here.

For any computer scientist, and especially those interested in how to take ideas from research to practice, Mike’s Turing talk is highly recommended. It is a beautifully woven story of an impressive bike ride across the United States and Mike’s (numerous) adventures with startups. If you listen closely, you will also find a reference to Pivotal Greenplum Database (GPDB). Mike was the inventor of Postgres and both GPDB and HAWQ are based on Postgres.

Mike’s talk really stole the thunder at VLDB, but there were other impressive talks by folks from Pivotal as well. Foremost amongst these was a talk delivered by Amr El-Helw on dealing with common table expressions (CTEs). The talk is based on the paper that you can find here, titled “Optimization of Common Table Expressions in MPP Database Systems”, and it includes Venkatesh Raghavan, Mohamed A. Soliman, George Caragea, Zhongxian Gu, and Michalis Petropoulos as co-conspirators with Amr on this work.

At its heart, CTEs provide a way to name a SQL expression and use it by reference in subsequent SQL expressions. CTEs show up often in enterprise-grade SQL, where complex queries are the norm, and CTEs are an important abstraction for tools that work on top of the database engine. Dealing with CTEs is challenging— the optimal plan for each use of a CTE instance depends on its actual use and simply inlining the CTE in the original query often leads to suboptimal query plans. There are other challenges too. When dealing with CTEs, you have to make sure that the data flow—from the execution of the CTE to the rest of the query—does not result in the actors (i.e. processes) deadlocking the system. If this all sounds complicated, it is! The beauty of this paper is that it proposes an elegant set of mechanisms to address CTEs. All of this is built into Orca, which is part of both Pivotal Greenplum and HAWQ, and both are in the process of being open-sourced. As the paper shows, the approach results in about a 2X improvement for the TPC-DS benchmark. Quite impressive!

There were also a host of other papers and keynotes associated with folks from Pivotal at VLDB. This includes two keynotes that I delivered. One at the In-Memory Data Management and Analytics Workshop in which an initial proposal is made for how hardware and software could work together to make data analytics systems more efficient and green (save the planet!). The other keynote was at the TPCTC conference, which proposed a dramatic rethink for how we create benchmarks. If you are interested in that talk, please come to the meetup.

Amongst the other papers at VLDB, we had two papers on topics related to making machine learning work better with data platforms. The first paper discussed how R makes poor use of modern hardware and points to some things that we could do to fix these issues. My collaborators, Prof. Somesh Jha and two students at Wisconsin, are working on fixing the problems that we identified in this paper. So, stay tuned. The second paper discussed how to take a simple type of relational learning, called Inductive Logic Programming, and map it to relational algebra. With it, we can run this type of learning method in a relational database engine (which really is a relational algebraic expression evaluator) and allow this class of algorithms to scale data sizes that have historically been considered prohibitive. If you are interested in this topic, then come to the meetup on.

Learn More:

Read our white paper on Orca
Find out more about Pivotal Greenplum
See the product info, documentation, and downloads on Pivotal HAWQ
Read more blog articles on Pivotal big data

About the Author

Biography

An iPad Remote Presence Device

For years I’ve been searching for a simple, high quality remote presence device for remote team members. No...

Pivotal Big Data Suite: Open. Agile. Cloud-Ready.

Pivotal announced today groundbreaking product enhancements to Pivotal Big Data Suite, including plans to c...

VLDB 2015—Pivotal’s Chief Scientist Recaps Keynotes and Papers

About the Author

Previous

Next

VLDB 2015—Pivotal’s Chief Scientist Recaps Keynotes and Papers

About the Author

Previous

Next

Related content in this Stream

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.

Experience enhanced security with Tanzu Application Platform. Elevate your organization's defenses from code to build with SLSA Level 3, image scanning scheduling & automatic upgrades for new patches.

Explore Spring's exceptional NPS score of 75, surpassing industry benchmarks by 18%. Discover why it matters.