Introducing Pivotal Greenplum 5.3

December 13, 2017 Cesar Rojas

Agile software development is at the very core of Pivotal and it is a key driver of innovation for Pivotal Greenplum. After our recent announcement of version 5, today we’re excited to announce the Pivotal Greenplum 5.3 release.

Being released so close to the holiday season, we believe this is a nice gift from our engineering team to the Greenplum community. Let's review what makes Pivotal Greenplum 5.3 such a great version:

Greenplum Containerization

Greenplum 5.3 is a foundational release that delivers early containerization features as we move toward future integration with Pivotal Container Service (PKS).

A fully containerized Greenplum will be unique in the analytical database world as many traditional data analytics platforms are monolithic and difficult to abstract. A containerized Greenplum will provide the ability to scale to more users, more workloads, and have less noisy neighbor impacts. It will also give the database administrator (DBA) ultimate control to manage the system and balance different user query requests.

Greenplum 5.3 delivers foundational components that enhance resource isolation and elasticity by allowing Query Interfaces (e.g. ANSI compliant SQL, Python, and R) to be containerized within the platform.

Query Containerization

Powered by the new Greenplum 5.3 Resource Groups feature.
This new capability further enhances the stability and manageability of Greenplum while at the same time allowing for richer resource isolation for multi-tenant and mixed workloads.
It provides OS level grouping of CPU and memory resources along with concurrent transactions to ensure each is guaranteed a predetermined amount.
Resource group CPU management is built on top of Linux Control Groups (cgroups), which provide well isolated and automatically bursted CPU resources to all groups.
Memory allocation of each resource group is pre-configured both at the group and query level.
Resource groups implement transaction-based concurrency management. This allows for the level of concurrency to be managed by the DBA and create an orderly queue for queries waiting to enter the system.

Support for Trusted Languages (R/Python) Containerization on Greenplum

Powered by the new Greenplum 5 PL/Container (preview feature).
This is an implementation of a trusted language execution engine capable of bringing up Docker containers to isolate executors from the host OS, which allows sandboxing.
PL/Container runs Python and R code inside a Docker container. The server side code running inside Greenplum communicates with the container using an RPC protocol.
Containers are pre-configured with Pivotal Greenplum for data science workloads and can also be customized or built from scratch for different end user workloads. Multiple different containers can be deployed to accommodate different development teams with different requirements.

Greenplum Data Ecosystem Extensibility

Greenplum 5.3 significantly improves the existing level of integration with the Apache Hadoop and Apache Spark frameworks.

Improved integration with the Hadoop ecosystem

Apache Hadoop is a popular distributed processing framework that has been primarily deployed as large data repositories (or “data lakes”). Enterprises are looking for hybrid approaches that combine the best elements of the data lake with the query performance of an MPP engine, like Pivotal Greenplum, for advanced analytics. For those use cases, Pivotal Greenplum 5.3 offers the Platform eXtension Framework (PXF), a REST API abstraction layer that allows Pivotal Greenplum to query Hadoop data in a highly parallel way.
The new PXF integrates functionality from Pivotal HDB (a feature known as “Pivotal Extension Framework”) to provide feature parity with Pivotal HDB and data integration to a broader Hadoop ecosystem.
With PXF, Pivotal Greenplum users can federate queries across both data within the platform as well as federated queries to external Hadoop sources. This symbiotic relationship combines the cost and storage advantages of the data lake with the performance of the Pivotal Greenplum MPP query engine.
PXF includes built-in plugins for accessing data inside HDFS files, Hive tables, and HBase tables. Designed to be extended, users can create custom extensions to access other parallel data stores, processing engines, or file and storage formats.

Pivotal Greenplum and Apache Spark integration

Apache Spark is an extremely popular and fast in-memory engine for big data processing. It provides built-in modules for streaming, SQL, machine learning and graph processing. Spark users, such as data scientists and data engineers, want to run fast in-memory analytics, exploratory analytics and ETL processing while using data that is persisted on Pivotal Greenplum. Users will be able to leverage Spark JDBC driver to load and unload data from Greenplum.
The Pivotal Greenplum Spark Connector provides high speed, parallelized data transfer between the Greenplum Database and Apache Spark clusters.

Greenplum Open Source Improvements

Greenplum 5.3 builds on the open source support by adding Greenplum Database open source binaries for the Ubuntu Linux operating system.

Greenplum Database Open Source Binaries on Ubuntu

Prior to Greenplum Database 5.3, distribution was only available via source code from Github; this all changes with 5.3 pre-packaged binaries.
A binary open source option will provide the Greenplum community with an easier, faster, and more consistent installation.
We expect this will significantly increase the mindshare and adoption of Greenplum (both open source and commercially).
Ubuntu users can leverage native apt-get commands to install Greenplum with ease from the Personal Package Archive that contains the compiled releases.

Other Capabilities

Finally, Pivotal Greenplum 5.3 adds a number of new capabilities, including; a new backup & restore utility, a case-insensitive based module for text searches, and our new enterprise support for SUSE (SLES) 12.

New Version of Greenplum Backup & Restore (preview feature)

The new Greenplum Backup & Restore provides higher performance, reduced lock contention for online backups, progress monitoring & reporting, and additional configurability options.
The new Greenplum Backup & Restore utility is included in the Greenplum 5.3 release. Based on extensive feedback from Greenplum customers, we have implemented many of their suggestions specific to performance and usability for a brand new backup and restore experience.
Improved Performance
- Multiple concurrent backups resulting in 50% faster run times.
- 6x performance increase in metadata backups.
- Improved compression efficiency, decreasing run times by up to 3x.
User Experience
- Decreased catalog locking, resulting in less contention with ETL processes.
- Improved levels of monitoring and logging.
- Additional levels of object filtering for selective backup & restores.
- Multiple output file formats to aid in migrations from previous versions of Greenplum.

Case Insensitive Text (citext) Module

This is a new feature backported from PostgreSQL and it allows the execution of case-insensitive text searches. It can compare all matches to ‘cesar rojas’ (‘Cesar Rojas’ || ‘CESAR ROJAS’ || ‘cesar rojas’ || etc).
This is an important feature for customer migrating from databases like Teradata into Pivotal Greenplum and it is a key element of our Greenplum text processing strategy.

SUSE Linux Enterprise Server (SLES) 12 Support

Now Pivotal provides official Pivotal Greenplum support for SLES 12. With this addition, Pivotal Greenplum now offers full support for the enterprise distributions of Redhat and SUSE.

For More Information

Watch the Pivotal Greenplum 5.3 interview
Watch the Pivotal Greenplum 5.3 presentation
Download and install the Pivotal Greenplum 5.3 commercial binaries
Download and install the Greenplum Database 5.3 open source binaries (Ubuntu)
Read the Pivotal Greenplum Spark Connector blog post
Read the Greenplum Backup and Restore blog post

About the Author

Cesar Rojas serves as the Head of Product Marketing for Pivotal Greenplum, responsible for setting the messaging and go to market strategy for Greenplum. Prior to joining Pivotal, Mr. Rojas was Director of Product Marketing for the Teradata Portfolio for Hadoop and Teradata Aster offerings. Mr. Rojas is an advanced analytics and data management veteran with 15 years of experience working for the largest data analytics vendors as well as successful data startups. Mr. Rojas has an MBA with emphasis in eBusiness from Notre Dame de Namur University, as well as a bachelor's in Computer Engineering.
Follow on Twitter

Getting Kubernetes to Production

How the partnership between Pivotal, Google and VMWare brings agility and security to your containers.https...

.NET or Java — For Microsoft and its New Partners, it’s not Either, but Both

Learn about Microsoft’s growing collaboration with the Java community and open source.https://medium.com/me...

Introducing Pivotal Greenplum 5.3

Greenplum Containerization

Query Containerization

Support for Trusted Languages (R/Python) Containerization on Greenplum

Greenplum Data Ecosystem Extensibility

Improved integration with the Hadoop ecosystem

Pivotal Greenplum and Apache Spark integration

Greenplum Open Source Improvements

Greenplum Database Open Source Binaries on Ubuntu

Other Capabilities

New Version of Greenplum Backup & Restore (preview feature)

Case Insensitive Text (citext) Module

SUSE Linux Enterprise Server (SLES) 12 Support

For More Information

About the Author

Previous

Next

Introducing Pivotal Greenplum 5.3

Greenplum Containerization

Query Containerization

Support for Trusted Languages (R/Python) Containerization on Greenplum

Greenplum Data Ecosystem Extensibility

Improved integration with the Hadoop ecosystem

Pivotal Greenplum and Apache Spark integration

Greenplum Open Source Improvements

Greenplum Database Open Source Binaries on Ubuntu

Other Capabilities

New Version of Greenplum Backup & Restore (preview feature)

Case Insensitive Text (citext) Module

SUSE Linux Enterprise Server (SLES) 12 Support

For More Information

About the Author

Previous

Next

Related content in this Stream

Following the xz supply chain attack blog, explore security and trust in open source with VMware Tanzu's secure container solutions and proactive measures.

VMware Tanzu empowers Netflix accelerates its service evolution and boosts the capabilities of its development teams. Tanzu helps to provide them with the platform to run on and scale.

Unveil regulatory compliance ease with VMware Tanzu Spring Runtime! Elevate audits, adhere to FIPS & NIST standards, benefit IT, DevOps, and Auditors.

Uncover open source risks and the 'Zero CVE' myth with insights on continuous lifecycle management. Discover how VMware Tanzu supports diverse projects effectively.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This blog provides a summary of VMware Tanzu CloudHealth news and product updates for the month of April, 2024

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

How VMware Tanzu CloudHealth helps customers uncover spiraling AWS Extended Support charges.

VMware Tanzu enhances Spring development with simplified operations, accelerated innovation, seamless microservices transition, increased security, and effortless scaling.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

Bitnami-packaged open source software is loved by developers for its ease of use, which enables developers to directly pull a Bitnami package and seamlessly start using it with little effort.

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.