Pivotal Greenplum 6 on Amazon Web Services: Optimized for the cloud

November 26, 2019 Jon Roberts

Recently, we wrote about how Pivotal and Amazon Web Services have worked together to make deployment and ongoing operations of Pivotal Greenplum easy and painless. That’s necessary, but not sufficient. In particular, we’ve invested in making Greenplum truly cloud-native. That means embracing the benefits of cloud computing with easy to use and scale solutions that are cost effective yet still highly performant.

Greenplum on the public cloud now rivals the performance of dedicated, bare metal installations, but without any of the up-front capital expenditures. It is fast, cross-IaaS, hybrid cloud, open source, and highly capable (with very appealing economics)!

We’ve optimized Greenplum 6 on public cloud to embrace cloud-native patterns. Our work with AWS is a terrific example.

Greenplum is Optimized for Cloud Native

Greenplum on AWS features these architectural attributes:

  • Automated deployments with AWS CloudFormation.

  • Self-healing: If any virtual machine instance fails, a replacement is automatically created, followed by an end-to-end recovery process.

  • Snapshot backups: automated EBS disk snapshots provide extremely fast database backups.

  • Grow storage independently from compute, completely online.

  • On-demand disaster recovery: copy EBS disk snapshots to other regions and only provision a DR cluster when needed.

Greenplum is optimized for cloud-managed

What about the operational experience? Greenplum on AWS fits the public cloud operating model quite well:

  • Pause / resume: pay for the cluster only when you need it.

  • Automated maintenance: automatic execution of common database administration maintenance commands.

  • Automated optional installs: a single command installs a variety of optional components.

  • Automated upgrades: when new versions are available, the administrator is alerted and can choose to upgrade with a single command.

Optimizing performance is a never-ending task and the latest release of Greenplum on AWS takes another step forward. Let’s take a deeper look into how we help you balance performance and your compute spend.

Greenplum is optimized for storage

Throughput, not IOPs, is what matters for Greenplum disk performance. This is measured in MB/s and can be observed with the Greenplum gpcheckperf utility.  AWS publishes performance and cost metrics, too. Here’s the summary:

Disk Type

Perf (MB/s)

Cost per GB per Month*

GP2

262

$0.10

IO1

1048

$0.125+$0.065 IOPS

ST1

524

$0.045

SC1

262

$0.025

*Pricing for us-east-1.

For ST1 and SC1 disks, one must also consider the base versus burst throughput. As the volume size increases, so does the base throughput.

GP2 and IO1

We evaluated these options as cost-prohibitive relative to ST1 and SC1 storage. Further, these aren’t the optimal choice for high-throughput applications such as Greenplum.

ST1 and SC1

The minimum size for SC1 disks to reach the highest burst speed possible is 3.125TB; for ST1, the minimum size is 2TB.

If you need 8TB of data, you could use 2 SC1 disks at 4TB each and reach the same burst throughput as 1 8TB ST1 disk. The price, however, is 44% cheaper using SC1 disks! With our configuration, you get the same burst performance but pay 44% less.

Greenplum is optimized for compute

We reviewed many of the instance types available in AWS and determined that the R5 Series is the best combination of CPU, memory, and EBS disk performance for Greenplum.

R5 Series: A closer look

In June of 2018, Amazon introduced the R5 series. These instances are a revision of the R4 series, but faster and more economical. Like other instance types, there are associated speed limits on EBS disk performance, shown in the table below.

Instance Type

CPU

Memory

Disk (MB/s)

Cost

r5.xlarge

4

32

437.5

$0.252

r5.2xlarge

8

64

437.5

$0.504

r5.4xlarge

16

128

437.5

$1.008

r5.8xlarge

32

256

625

$2.016

r5.12xlarge

48

384

875

$3.024

r5.16xlarge

64

512

1250

$4.032

r5.24xlarge

96

768

1750

$6.048

Key features of the R5 series:

  • Disk throughput values for r5.xlarge, r5.2xlarge, and r5.4xlarge are identical.

  • The price for r5.8xlarge is twice that of r5.4xlarge. But the disk throughput is not 2x more.

This chart shows how different R5 instance types stack up. Note that the r5.2xlarge type shows a 100% CPU, memory, and cost improvement over r5.xlarge. However, there is a 0% difference in disk performance, because it’s the same.

Put another way: the r5.12xlarge is 3x the cost of the r5.4xlarge, with 3x the CPU, and 3x the Memory. But the r5.12xlarge offers only 2x the Disk performance over the r5.4xlarge.

Instance Type

CPU

Memory

Disk (MB/s)

Cost

r5.4xlarge

16

128

437.5

$1.008

r5.12xlarge

48

384

875

$3.024

Since we are I/O bound with Greenplum, it is better to use 3 r5.4xlarge nodes instead of a single r5.12xlarge node. Using SC1 storage, this results in higher throughput at the same cost.

Putting it all together

We determined that using multiple SC1 disks can achieve the same throughput at a lower cost than a single ST1 disk. Next, we realized that, by using a smaller instance type, we get greater total throughput from the cluster for a given amount of money.

The current configuration for Greenplum on AWS implements these findings, and uses the r5.4xlarge instance type for the segment instances with 3 SC1 disks each. Even with a relatively small disk size, the maximum throughput of the instance type can be reached . . .all while saving you money. Ultimately, we realize a 44% reduction in data storage cost from previous marketplace offerings!

The Master node in the cluster typically doesn't need a great deal of resources, so a small and inexpensive r5.xlarge can be used. The r5.2xlarge and r5.4xlarge instance types are also available for the Master instance, but the smaller instance type should suffice. The storage is a single SC1 disk that can be sized quite small, saving money yet still delivering incredible performance.

Take the next step: Deploy Greenplum on AWS

Deploying Greenplum on AWS has never been easier. Choose the Be sure to check out the Pivotal Greenplum Bring Your Own License (BYOL) and Pivotal Greenplum (Hourly) products in the AWS Marketplace. The BYOL even has a free 90 day evaluation period!

Single-instance options are also available and are ideal for development and test use cases.

Greenplum on AWS: Bring Your Own License

Greenplum on AWS: Hourly

About the Author

Jon Roberts

Jon Roberts is a Principal Engineer leading the development of deploying Greenplum and Postgres in the AWS, Microsoft Azure, and Google Cloud Platform Marketplaces. Prior to leading this work, he held Platform Engineering and Sales Engineering roles at Pivotal dating back to 2010. Prior to joining Pivotal, he was a Greenplum customer for three years. He holds a Bachelor of Science in Business Administration from the University of Louisville. Go Cards!

More Content by Jon Roberts
Previous
Steeltoe 2.4 boosts .NET microservices development with a code generator, new getting started guides, and more
Steeltoe 2.4 boosts .NET microservices development with a code generator, new getting started guides, and more

Application generation, a new site, and the Steeltoe CLI - Get familiar with Steeltoe 2.4.

Next
Discover explains why digital transformation is a marathon, not a sprint
Discover explains why digital transformation is a marathon, not a sprint

At SpringOne Platform 2019, Discover's Ying Zhe explained how an event-driven architecture led to more, hap...

SpringOne Platform 2019 Presentations

Watch Now