Fine Tuning Pivotal HAWQ and EMC Isilon HDFS

September 14, 2015 Jemish Patel

sfeatured-hawq-isilonTo help our customers and partners, Pivotal’s Platform Engineering team has been testing all the common Pivotal HAWQ functionality used by customers with EMC Isilon as the HDFS layer. This post is not a performance benchmark, but provides a description of the performance testing environment, an overview of the tests, and an in-depth review of key findings across Isilon and Pivotal HAWQ tuning parameters and configurations. Coverage includes access patterns, block size, thread count, balancing traffic, data distribution, data format, table partitions, concurrency, and external tables.

Environment and Test Overview

Much more information is provided below, but, at a high level, the test environment was an EMC Isilon scale-out NAS platform with 16 commodity server nodes, 8 storage nodes and archetypical compute, memory, network, and disk. The test used the industry standard 99 queries for TPC-DS benchmarking.

This is a relatively small big data platform and provides a scalable, shared data storage infrastructure. As well, it has multi-protocol access to shared data and enterprise features like backup, replication, snapshots, file system audits, and data at rest encryption. There are also some truly unique capabilities like native integration with HDFS and advantages over direct attached storage (DAS)—some might think this is counterintuitive to Apache Hadoop’s™ original premise, but disk locality isn’t always the answer.

NOTE: Testing was performed by Pivotal only. The testing performed and results collected were not audited by or submitted to TPC.

Isilon Specific Configuration

You can read more about Isilon, in this technical overview whitepaper or their best practices for Hadoop data storage.

Access Patterns

EMC Isilon has 3 different access patterns that are available for customers to utilize at a specific filepool, directory or file level—streaming, concurrency and random. A filepool policy is a way to assign filtering rules and operations that are both system and user-defined. Once you configure a filepool policy to filter certain types of files, you can then set protection and I/O optimization settings for those file types in your file pool policy.

We ran multiple iterations of the TPC-DS query set and saw that the best results were produced when we used streaming as the access pattern on EMC Isilon with HAWQ. This is because concurrency optimizes for mixed workloads and many simultaneous clients, random optimizes for unpredictable access with adjusted striping and disabled prefetch cache, and streaming optimizes for high-speed, single file scenarios like a very fast read from a single client.

Block Size

Block Size for HAWQ, EMC Isilon’s HDFS (isi_hdfs_d daemon) and HDFS on the Pivotal HD cluster need to be configured to be the same value. For HAWQ, this is a manual change in a configuration file. For Pivotal HD, Apache Ambari admin UI can be used to make this change. For EMC Isilon, this is a change that can only be applied via the CLI—you need access and the correct privileges as well.

All three of these block size values have to match in order to get the best performance out of Pivotal HD, HAWQ and EMC Isilon. We saw the best query runtimes when we set these block sizes to 128MB or 512MB versus 64MB, 256MB, or even larger sizes.

Thread Count

Isilon’s isi_hdfs_d daemon has a configurable thread count value. This allows a set of worker threads to perform the requested tasks. It is a per node value, and, once enabled, isi_hdfs_d daemon is running on every Isilon node in the cluster.

We tested thread counts at various settings (auto, 64,128, 256) and auto produced the best runtimes in our environment. In our case, we had 12 HAWQ segment servers with 10 HAWQ segment instances each—a total of 120 total workers that could request I/O from EMC Isilon. We chose 10 segment instances because lower segment instances did not give us adequate performance, and queries were taking too long to complete. Make sure to choose a segment instance count that your compute environment has resources to support.

In order to alleviate any thread contention/limits, we also ran tests that pushed connections to more than 4700 per Isilon node. That is a total of 37,600 connections for the 8 node Isilon cluster, and we saw the cluster run without any issues.

Balancing Traffic

In order to get a balanced throughput out of the Isilon cluster, you need to carve up the Isilon cluster into 2 IP Pools—1 for DataNode traffic and the other for NameNode Traffic. For the DataNode IP Pool, 3 IP addresses per network interface in each Isilon node need to be configured as shown from the EMC Isilon OneFS Management UI below:

image00

We recommend round robin for this pool’s connection policy. The IP allocation method needs to be dynamic—in case of Isilon node or network interface failures, another Isilon node can service requests to the failed DataNode IP address in the pool, and this provides redundancy and load balancing at the DataNode level. You can choose a SmartConnect zone name that exists in DNS and makes sense in your environment. Using the isi hdfs racks command, you can create a rack that will now use this new pool as its DataNodes as shown below:

isi hdfs racks create --client-ip-ranges= --ip-pools=pool1 --rack=/rack0

In the above command, you should replace with the actual IP range of your compute environment.

A separate IP pool to be used for NameNode traffic also needs to be created for each 10Gig Network interface per Isilon node in the Isilon cluster as shown below:

image01

The connection policy for the NameNode pool needs to be round robin so that each request goes to a different NameNode IP address. The IP allocation for the NameNode pool should be static—as SmartConnect itself will provide the redundancy needed in case of Node or network Interface failures. You can choose a SmartConnect zone name that exists in DNS and makes sense in your environment.

NOTE: This will be the URI you will use when configuring HDFS to point to Isilon for HDFS data.

HAWQ Specific Configuration

Data Distribution

You can choose to distribute data either randomly or via a key when you create tables in HAWQ. If a distribution key is not specified, HAWQ will choose the first column as its key, and this is based on the table definition to distribute the data by. Refer to the HAWQ documentation for the exact syntax to use with the ‘DISTRIBUTED BY’ clause in your table definitions to specify a distribution key. We tested both of these distribution settings under load and saw that the best results were with distribution by key as the data distribution policy. This configuration resulted in optimum joins (colocated) as data for a particular key would be grouped on the same HAWQ segment node when running queries—random distribution does not provide this type of efficient data locality.

Data Format

HAWQ supports multiple formats for creating table data. We compared append only tables versus Parquet file tables and found that Parquet tables with snappy compression produced the best results when running HAWQ with EMC Isilon as its HDFS storage layer. Additionally we tested that the truncate functionality works as expected in HAWQ.

Table Partitions

Partitioning data in HAWQ has significant advantages. With it, you can look only at specific partitions of data for rows or columns, and you do not have to incur the expensive scan on the entire table when looking for specific values in a large table. Partitioning improves runtimes when EMC Isilon is the HDFS storage layer. We compared partitioning large tables annually, quarterly, monthly, weekly and daily in our test scenarios. We found that partitioning large tables by either quarterly partitions (partitioned every 90 days) or monthly partitions (partitioned every 30 days) would isolate table scans to specific partitions, reduce the amount of data scanned and speed up requested results.

Concurrency

We ran through the entire 99 query TPC-DS sequentially and compared it when 5,10,15,20 users ran it concurrently. We saw pretty linear growth in the runtimes, as we would expect. The big difference was that we were able to distribute the load evenly across the resources available and run the queries without running out of resources in the test environment. You can also leverage resource queues in HAWQ to fine tune which specific user or group gets more resources than others and provide efficient resource scheduling.

External Tables

HAWQ includes functionality called Pivotal Extension Framework (PXF). It allows HAWQ to very quickly access HAWQ table data that is in HDFS. This is the fastest way to load data or do some exploration on large datasets. We successfully tested PXF functionality in HAWQ as well.

Environment

In our lab, compute was provided by 16 commodity servers with 16 CPU cores each, 10×900 GB SAS drives for local scratch/spill space, 64GB of memory in each node, and HAWQ version 1.3.0.3 on Pivotal HD 3.0. HAWQ was configured to use spill space on each of its local drives as well as have 10 HAWQ segment instances per node.

Networking was configured to be a 10GB storage network where both the compute and Isilon cluster connected to the same switch. The compute rack has 2 x 10Gb internal switches, which were configured with a MLAG configuration using 4 ports (2 on each switch) and connected to the storage network. This configuration allows the Isilon cluster to be a shared resource on the 10GB network. This way, multiple compute clusters can leverage it for the same data and access it via HDFS or any other Isilon supported protocol.

Storage was provided by 8-X410 nodes in the Isilon cluster with 2 x 800GB SSDs and 34 x 1TB SATA drives. The software version on the Isilon cluster was OneFS v7.2.0.3. We configured the SSD strategy so that it would read/write metadata and data to the SSDs.

Test Overview

Our test used the industry standard TPC-DS benchmarking suite to run through 99 queries that represent real life decision support systems in retail organizations. These span real-world use cases of deep reporting, advanced data mining and interactive queries. We generated 3TB of sample data, loaded it into HAWQ via external tables and used the execution times of all 99 TPC-DS queries as our measure of performance.

Then, we successfully ran multiple iterations of all queries and measured runtimes for various configurations and scenarios.

Conclusion

In summary, these are the key parameters to look at when you want to get optimum performance from HAWQ with EMC Isilon as its storage layer. Of course, the work is not completely done yet, and we still have more testing to do.

Learn More:

About the Author

Biography

Previous
How We Found Out The #Patriots Beat The #Seahawks on Twitter Too
How We Found Out The #Patriots Beat The #Seahawks on Twitter Too

Imagine a stream of Twitter data coming into a dashboard showing the number of #Patriots versus #Seahawks h...

Next
Pivotal Philanthropy Helps Viral Video Game With A Strong Message On To The App Store
Pivotal Philanthropy Helps Viral Video Game With A Strong Message On To The App Store

Last year, two young women went to the Girls Who Code Summer Immersion Program and came out with a viral vi...