We recently announced the availability of Pivotal Query Optimizer on Greenplum Database and discussed the technical details behind the big leap in performance obtained in Greenplum Database in a blog post.
Before we explore the new capabilities and how they can be applied, let’s step back and look at how Analytical Databases are being used today in contrast to the traditional Enterprise Data Warehouses (EDWs).
The figure below shows major dimensions along which enterprises evaluate various data warehousing options. These dimensions (besides “Cost of Ownership” and “Deployment Options”) translate to the capabilities needed for analytical use cases.
The first two columns show how traditional EDWs and the Greenplum Database address the various dimensions of data warehousing capabilities.
Traditional EDWs are primarily used as systems of record and the source of truth for the stored data. They are often packaged as vertically integrated appliances built to handle interactive BI type queries, and as such provide a limited number of deployment options.
Most high end EDWs can handle mixed workloads from large teams of analysts. On the other hand, they are very difficult to scale to big data (petabyte) volumes both from hardware footprint and total cost perspectives. They also don’t support advanced analytics use cases and presume fixed data models into which applications have to fit—only changing once an IT ticket is filed and the EDW vendor is requested to update the model.
Analytical Databases are used traditionally to off-load analytical work from the EDWs and provide new capabilities addressing new big data use cases.
Greenplum Database has better cost of ownership than traditional EDWs and delivers a broad set of machine learning capabilities, more flexible data models and multiple deployment options (appliance, commodity hardware, virtualized IaaS, etc.).
Also, given its enterprise-grade capabilities, Greenplum Database has been used as the system of record in massive enterprise deployments.
Enter Pivotal Query Optimizer
Pivotal Query Optimizer (PQO) is the industry’s first cost-based query optimizer for big data workloads. PQO was first built for Pivotal HAWQ, and has delivered unparalleled query performance to Pivotal HAWQ. With the new release, PQO continues to enhance its performance in HAWQ and now delivers the same performance in Greenplum Database 4.3.5.
Lets talk about the dimensions highlighted in the figure where PQO makes a significant difference to Greenplum Database.
PQO can scale interactive and batch mode analytics to large data sets in the petabytes without degrading query performance and throughput, a task that is prohibitively expensive for traditional EDWs and existing alternatives.
PQO is also capable of handling a wide range of complex queries with concurrent and mixed workloads. This enables large teams to work in parallel on multiple analytics use cases with advanced analytics and diverse workloads.
Also, because Greenplum is leveraged in so many critical enterprise environments, PQO has been carefully designed with advanced configuration and tuning. This streamlines the upgrade process with minimal downtime and performance impacts to production systems by providing the flexibility to apply the optimizer at various levels of granularity including database, session and query levels.
Coming back to the above figure, with the inclusion of PQO, Greenplum Database is able to address all the major dimensions along which enterprises evaluate various data warehousing options: it delivers interactive and batch-mode processing and advanced analytics at big data scale, supports concurrent and diverse workloads, flexible data models and deployment options and provides these capabilities at a different cost curve than where most EDWs lie today.
This makes Greenplum Database the most complete Analytical Database in the industry, This enables companies to focus on solving the business problems at hand and helps them transform into data-driven enterprises.
Learn More:
- Roundup of Pivotal’s big news for big data:
- Big Data Suite 2.0 release, Open Sourcing key technologies: Blog Article and Press Release
- Open Data Platform: Announcement | Blog Article | Website
- Project Geode, the open source distribution of Pivotal GemFire submitted to Apache Software Foundation: Blog Article and Press Release
- Pivotal HAWQ certified on Hortonworks Data Platform: Blog Article and Press Release
- First cost-based query optimizer for Pivotal Greenplum Database: Blog Article and Press Release
- Find out more about Pivotal Big Data Suite
- Product | Downloads | Documentation | Blogs
- Check out Pivotal’s data science services
- Read about big data and data science from Pivotal in other blog articles
About the Author