Pivotal Greenplum Database 4.3.3 Adds Delta Compression

October 15, 2014 Scott Kahler

featured-greenplum The latest release of Pivotal Greenplum Database, version 4.3.3, adds a number of notable updates, including Delta Compression. This exciting update adds an additional way to compress data in a column to save space. Internal tests and customer data have demonstrated well over 100x compression on 10G worth of TIME values from a dataset.

Pivotal Greenplum DB is a Massive Parallel Processing (MPP) database, which means it spreads data over multiple nodes to harness the compute and IO power of a cluster to process petabyte scale data sets. It also embraces polymorphic storage—the ability to store data in multiple formats within one logical table. A table can have partitions that are row-oriented side by side with partitions that are column oriented. In addition to this, various compression algorithms can be applied at the table and column level.

For example, the latest three months of data in the table can be row-oriented, the next three months columnar and uncompressed, and the following three or more months columnar with compression. As far as the end user is concerned, all data are queried the same, not requiring any changes for data in different parts of the lifecycle.

Delta Compression adds a new approach to compression. In addition to standard lzo and zlib column compression, Pivotal Greenplum DB has been able to perform Run Length Encoding (RLE) compression for awhile now. To understand this, imagine if you had a table with dinner orders. One of those columns defines the order, and when you change from row-based to columnar, the data stored for the column will look like this: Fish, Fish, Fish, Fish, Fish, Fish. RLE compression stores that same type of data as Fish(6).

For data sets with a large number of repeating values, this can save large amount of space. Delta Compression adds data types such as integers and time, which are expressed as their offset. For example, the dates 2014-01-02, 2014-01-03, 2014-01-04 would be 2014-01-02, +1 , +1 with Delta Compression.

If we take this and combine it with RLE on the following data set:

2014-01-02, 2014-01-02, 2014-01-02, 2014-01-03, 2014-01-03, 2014-01-03, 2014-01-04, 2014-01-04, 2014-01-04

We end up with:

2014-01-02 (3), +1 (3), +1 (3)

After applying both Delta Compression and RLE, we compress the entire block with zlib.

From customer data and testing, we are seeing well over 100x compression on 10G worth of TIME values from a customers’ dataset. Even more impressive is performance of up to 5000x compression on a similar 10G sequence column. In addition, Pivotal Greenplum Database 4.3.3 adds the following features:

Netbackup integration
PL/R update to 3.1
Fuzzy String Match module

Learn More:

Product Overview, Features, and Technology
Product Documentation
Other blog articles on Pivotal Greenplum Database

About the Author

Biography

Why MPP-based Analytical Databases Are Still Key For Enterprises

Over the past five years, we have seen the Apache Hadoop® ecosystem grow at an escalating pace. This week’s...

All Things Pivotal Episode #2 – Why Customers Want To Use Pivotal CF

In this week's episode, Simon shares insights on why organisations want to use Platform-as-a-Service, and a...

Pivotal Greenplum Database 4.3.3 Adds Delta Compression

About the Author

Previous

Next

Pivotal Greenplum Database 4.3.3 Adds Delta Compression

About the Author

Previous

Next

Related content in this Stream

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.