Greenplum Building Blocks: Redefining the Modern Data Analytics Platform

June 14, 2018 Ward Maddux

When it comes to advanced analytics on large-scale data, how do you improve performance and reduce cost simultaneously? The answer is (still) surprisingly simple.

Combine the economics of open-source software and commodity hardware with the performance and productivity of a robust, analytics-at-scale data platform.

Cost and performance are obviously two of the most fundamental decision drivers for people designing or upgrading a data platform strategy. It seems just as obvious that these two factors must be improved together if you are to derive any value from a database modernization effort. Decreasing cost while also decreasing performance or productivity will not move the needle for your business.

In the not-too-distant-past, practitioners of large-scale data solutions defaulted to exorbitant prices to ensure certain performance characteristics like query and load speed, reliability, and variety of data formats and programming languages. A decision for Teradata made sense since it was once the gold standard for analytics at scale. Today, these feature requirements that were once critical to the most demanding business are table stakes for modern analytics platforms. Businesses are no longer willing to be locked into high-cost proprietary products and services.

On the flip side, Hadoop represented the promise of cost savings leveraging open-source software coupled with new data processing features on commodity hardware. In retrospect, the cost of redundant hardware and the amount of professional skill sets required to achieve a base level of performance outweighed the value of the flexible programming model. Your business likes to save money but not at the expense of performance and productivity.

Proven Analytics-at-Scale

Pivotal Greenplum is an open-source, massively-parallel data analytics platform that runs exceptionally well on cloud and in your data center. Pivotal Greenplum enables analytics at petabyte scale using your language of choice (SQL, Python, Spark, R, Java, etc.) and leveraging your choice of data format (JSON, XML, S3, Parquet, Avro, etc.). Pivotal Greenplum is based on open-source PostgreSQL and is proven in some of the most demanding businesses in the world.

Today, Pivotal Greenplum is an industry leading analytical data platform and innovating faster than ever.

As the only industry proven, open-source, massively parallel analytical database on the market, one question remains.

How can you effectively capitalize on commodity hardware and preserve or enhance performance and productivity?

Greenplum Building Blocks (GBB) is a new strategy for on-premises deployments of Pivotal Greenplum. Through extensive performance testing and optimization of Pivotal Greenplum on modern, off-the-shelf Dell equipment, a straightforward set of hardware and configuration parameters represents the most cost/performant Pivotal Greenplum ever achieved. GBB is a validated reference architecture that is delivered as a turnkey system from Dell Technologies.

Designed from the top-down for optimal cost/performance

  • From the servers to the switches to the storage arrays, GBB leverages enterprise-class, commodity components. No specialized or proprietary hardware. Costs are kept low because Pivotal Greenplum software handles all of the performance optimization and advanced analytics features while the infrastructure is fundamentally selected and configured to avoid constraints.
  • Modern, high-core count CPUs provide a tunable ratio between CPU cores and data volume that allows you to strike the right balance between concurrency and throughput. Adjust the segment to core ratio as your workloads evolve.
  • Standardized on Dell PowerEdge servers to ensure compatibility of components and reliable, effective support and maintenance. The overall configuration reduces complexity and promotes flexibility.

Architected for mixed advanced analytic workloads

  • Expand your storage capacity independent from compute
  • Proven, scalable and expandable Interconnect to separate competing workloads
  • Multi-tenant ready
  • Able to run Pivotal Greenplum, Hadoop, and Pivotal GemFire all on the same cluster

Deploy Anywhere

  • On-premises
  • Fully managed by Virtustream
  • Hybrid cloud
  • Cloud integration through direct connect - operate within your AWS VPC

Pivotal Greenplum has always been known as a “software only” and “infrastructure-agnostic” product. The other Massively Parallel Processing (MPP) data platforms on the market rely on specialized hardware, packaged as an appliance, to optimize query performance. Those MPP products have not seen comparable performance when removed from their specialized hardware.

Complimented by the latest technology from Dell, Pivotal Greenplum stands alone in terms of cost/performance. Many of the value-boosting factors are surprisingly simple yet very effective.

  • High density cores per CPU - fewer servers, less overhead
  • High volume of drives increase throughput (via expansion storage arrays)
  • Incredibly cheap storage on separate arrays
  • More expansion cards per server
  • Bigger nodes mean more affordable network switches
  • NVME for temp space saves expensive SSDs for catalog and indexes
  • Separate high IOPs activity (NVME and SSDs) from Sequential scans (disk)

 

 

Accelerating Economics

Lower cost and better performance mean your business is better equipped to derive value from your data assets. Proprietary appliance owners are seeing greater than 50% cost reduction. Budget is freed up to capture more data and pursue more value-driven projects. More projects mean more revenue. The economics get better and better.

For more information

For more information about Greenplum Building Blocks please contact your Dell Technologies or Pivotal account executive.

 
 

About the Author

Ward Maddux

Ward is Product Manager for the Data Innovation Lab at Pivotal. The team explores creative ways to leverage the experience and capabilities of Pivotal Data to dramatically improve customer and developer experience. Prior to Pivotal, Ward enjoyed working with passionate teams at Boeing, Informix, and SunGard Higher Education. He lives in Golden, Colorado with his family and likes to be outside doing just about anything.

Follow on Twitter Follow on Linkedin More Content by Ward Maddux
Previous
Greenplum 5.9.0: A Minor but Powerful Release
Greenplum 5.9.0: A Minor but Powerful Release

We have recently released Greenplum 5.9.0. The release has  a good number of exciting features, including: ...

Next
Pivotal Greenplum 5.8 Released!
Pivotal Greenplum 5.8 Released!

We recently released Pivotal Greenplum 5.8. Learn all the details about this release.