Today, every business must execute like a fast moving, innovative software company to survive. The development of innovative software begins with data, including advanced analytics and machine learning of the data. For the new generation killer applications of today, data is in Hadoop®. Apache™ Hadoop® has drastically changed the economics and capabilities for capturing and storing today’s unrelenting data growth. But, to go beyond just storing data to truly unlocking the power of Hadoop for business insights and predictive analytics, you need SQL. The advanced capabilities of Hadoop Native SQL provided by Pivotal HDB helps companies to unleash the power of Hadoop and drive significant business change.
Pivotal HDB – Hadoop Native SQL Analytics
Evolved from over a decade’s worth of R&D from Pivotal Greenplum™ (Pivotal’s flagship analytical data warehouse) and open source PostgreSQL, Pivotal HDB is an advanced SQL query engine that produces exceptional MPP-based analytics performance. Pivotal HDB also features robust ANSI SQL compliance and integrated MADlib machine learning.
Pivotal HDB operates natively in Hadoop, which simplifies overall system management of cluster resources. Pivotal HDB’s robust SQL compliance, powerful cost-based query optimization capabilities and support for advanced analytics enable companies to implement high-performance analytics solutions, on small to enormous data sets, much more rapidly than tradition Hadoop SQL tools. In addition, Pivotal is committed to delivering strong operationalization capabilities such as out-of-the-box support for Hortonworks Hadoop distribution, complete configuration and management using Apache Ambari and support for a multitude of deployment models. Companies can focus on solving business problems and shortening innovation cycles.
When Hadoop advanced data analytics is important and integral for business solutions, application development or corporate digital transformation, consider these Pivotal HDP advantages:
- Superior performance compared to current open source SQL on Hadoop analytic tools
- Massive MPP scalability to petabytes
- Near real-time latency
- Fast performance for complex and advanced data analytics
- ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP extensions
- Business-critical class analytics on Hadoop
- No compatibility risks to SQL developers or SQL BI tools and applications
- Support query roll-ups, dynamic partitions and joins
- Advanced machine learning for big data
- Local, direct in database operation
- Open source, Postgres-based
- Highly scalable MPP processing
- On premise or in clouds
- Open access to HAWQ analytics data via Apache Parquet
- Additional flexibility via HAWQ PxF services to HBase or Avro data files
Built On A Strong Foundation
Pivotal HDB is designed with a strong foundation of efficient data movement, concurrency control, advanced analytics and enterprise-grade robustness to deliver petabyte scale enterprise-grade Hadoop Native SQL operation. It executes directly inside the Apache Hadoop cluster and processes HDFS data in-place on the cluster.
Dynamic pipelining: Pivotal HDB employs dynamic pipelining to minimize the data transport overhead in performing SQL joins on Hadoop, making HDFS-based data suitable for interactive queries. This enables fast off-loading of Enterprise Data Warehouse (EDW) workloads at a significantly lower cost to the enterprise.
Concurrent queries: Pivotal HDB uses prioritized resource queues to deliver significantly higher queries per second consisting of mixed workloads compared to other Hadoop SQL engines. This enables Pivotal HDB to be used as general purpose analytical engine across large teams of analysts and developers.
Analytics and machine learning: Pivotal HDB has the most advanced machine learning support among Hadoop SQL engines in the industry. Pivotal HDB provides these capabilities through MADlib, an open source library for scalable in-database analytics extending the SQL capabilities on Hadoop through user-defined functions. This enables normal analytics workloads to embed advanced machine learning to implement powerful, large-scale analytical use cases.
Enterprise-grade robustness: Pivotal HDB is the first Hadoop Native SQL engine to support transactions. Transactions allow users to isolate concurrent activity on Hadoop and to rollback modifications when a fault occurs. Pivotal HDB’s fault tolerance and high- availability features tolerate disk-level and node-level failures, ensuring business continuity and enabling business-critical analytics to be offloaded to Pivotal HDB.
Robust and Reliable SQL Coverage
Tap into a large ecosystem of data analysis and data visualization tools such as SAS, Tableau and more. Analytic applications written over Pivotal HDB are easily portable to other SQL-compliant data engines and vice-versa. Leverage existing SQL skills to ramp up and execute quickly in Hadoop. Pivotal HDB quickly completes all 99 queries of the Transaction Processing Council TPC-DS benchmark suite.
Powerful Cost-based Query Optimizer
Pivotal HDB Query Optimizer has been proven to execute the most demanding of queries involving more than fifty joins, making it the industry’s best data discovery and query engine for native Hadoop analytics.
Cost-based query optimization: Pivotal HDB calculates and produces execution plans that optimally use the Hadoop cluster’s resources, irrespective of the complexity of the query or the size of the data. This enables enterprises to use Pivotal HDB for offloading traditional EDW workloads at more economical Hadoop costs.
Robust query plan optimization: The query optimizer applies all possible optimizations at the same time and considers a large set of plan alternatives in parallel. This enables robust optimization across a wide range of queries and mixed workloads and delivers ad-hoc query optimization without relying on pre-computed projections.
Complex Data Management: Pivotal HDB implements partitioned tables and partition pruning to provide rapid query execution. This enables Pivotal HDB to deliver truly interactive queries on big data workloads.
Unparalleled Accessibility to External Data
Pivotal HDB provides multiple accessibility features such as data federation, massive parallel loading and unloading, support for standard northbound ODBC and JDBC interfaces for analytics, and more.
Pivotal eXtension Framework (PXF): Pivotal HDB provides data federation using Pivotal eXtension Framework (PXF). PXF can federate data across other ADWs, EDWs, HDFS, Hbase and Hive instances while leveraging the parallelism inherent in native Hadoop SQL. PXF can extend the data federation framework to new types of data sources. Enterprises can transition to big data technologies without the burden of large data migration projects or the inefficiencies of large data movement during query execution.
Parallel loading and unloading: Pivotal HDB supports fast parallel load/unload of mas- sive data volumes at scale (without the bottleneck of a master node) and makes the data immediately available for analysis. A diverse set of data sources and sinks are supported including file systems, ETL products and Hadoop. This minimizes the time and costs of data wrangling and blending and shortens the time to insights.
Multitude of Big Data Compliant Storage Options
The storage framework in Pivotal HDB is designed to minimize data movement and ETL data processing, provide multiple storage options to handle diverse analytical workloads and support standard file formats that make Pivotal HDB native to the Hadoop ecosystem.
Polymorphic storage: Pivotal HDB supports a native row based format as well as the columnar storage format, Parquet. Columnar stores are good for analyzing specific data attributes in large volumes of historical data, whereas row stores are good for exploring all the data attributes from recent past. Choice of storage format, combined with partitioning, enable granular control and efficient information lifecycle management.
Diverse file formats: Pivotal HDB supports Avro, Parquet and native HDFS file formats in Hadoop. This minimizes the need for ETL during data ingest and enables schema-on-read type processing. Reduced need for ETL and data movement simplifies the overall effort to analyze data and contributes to lower cost of ownership of the analytics solution.
In extracting the maximum business value from Pivotal HDB, the operationalization capabilities are just as important as the core feature set.
Expanded Hadoop Distribution Support
Along with Pivotal’s Hadoop distribution, Pivotal HD, Pivotal HDB integrates out-of-the-box with Hortonworks HDP. With future releases, Pivotal HDB will integrate with the ODPi core.
Native Hadoop Management Using Apache Ambari
Pivotal HDB plugs in with the Apache Ambari installation, management and configuration framework. This provides a Hadoop-native mechanism for installation and deployment of Pivotal HDB and for monitoring cluster resources across Pivotal HDB and the rest of the Hadoop ecosystem.
Packaging and Deployment
Pivotal HDB is included in Pivotal Big Data Suite as part of the subscription-based flex-license. Pivotal HDB supports a multitude of deployment models including commodity hardware, virtualized IaaS, EMC Data Computing Appliance (DCA) and EMC Elastic Cloud Storage (ECS) with an option of using EMC Isilon as the HDFS filesystem.
Pivotal HDB is the world’s most advanced SQL on Hadoop engine and addresses all the business requirements for enterprises to rapidly build and deploy enterprise-grade SQL- based analytic applications and off-load EDW workloads into Hadoop. Pivotal HDB is also an integral part of the native Hadoop ecosystem with out-of-the-box support for Apache Ambari for management and native Hadoop file formats. In summary, Pivotal HDB is a key enabler in transforming companies into data-driven enterprises.