Pivotal HDB + Hortonworks HDP

December 2, 2016

Overview

Increasingly, data for a new generation of applications is in Apache™ Hadoop® because it has drastically changed the economics and capabilities for capturing and storing today’s unrelenting data growth. Pivotal HDB combines the familiarity of a full ANSI SQL interface, the performance of a massively parallel processing (MPP) engine, and the power of in-database analytics for advanced analytics, data science, and machine learning at scale. Running on Hortonworks® HDP®, the leading open source data platform, HDB helps enterprises harness the strength of Hadoop for analytics and business transformation.

Hadoop-Native SQL on HDP for Speed and Simplicity

Certified on the Hortonworks Data Platform (HDP), Pivotal HDB is the leading analytic database for data science and machine learning workloads in Hadoop. With its blazingly fast MPP querying architecture, Pivotal HDB brings Hadoop-native SQL to data scientists and analysts who need to develop near real-time insights. HDB executes standard SQL queries directly on all the data in the HDP cluster, removing the need to sample data or move it to another platform for advanced analytics. This helps analysts and data scientists ask more questions of their data more often for better modeling and deeper, actionable insights.

Advanced Analytics with Full SQL

Great companies are turning to their data to fundamentally transform their business through operational improvements, customer experiences, and new business models. To do this, they need to efficiently distill huge volumes of data down to the critical insights needed for operational effectiveness and customer delight.

To deliver faster time to insight, Pivotal HDB executes SQL inside the HDP cluster. HDB can query all data in HDP, regardless of format. Via the Pivotal eXtension Framework, HDB can quickly and efficiently federate queries across Hive, HBase, and HDFS. It also works with file formats like Parquet, ORC, and Avro.

To increase productivity, analysts and data scientists can access Apache MADlib (incubating), a library of highly parallel machine learning algorithms, from within HDB via familiar SQL syntax. By running analytics within the database, analysts can leverage the entire data set where it resides in the HDP cluster, rather than relying on sampling. This helps analysts create better models more quickly.

For ease of management, HDB integrates with HDP ecosystem components such as Ambari system management, to provide a completely open, integrated, enterprise-class data platform.

Benefits of HDB

Higher Productivity from Existing SQL Skills and Tools

Pivotal HDB provides a familiar, completely ANSI SQL-compliant environment. Analysts can continue to work with their favorite tools that automatically generate SQL code and be assured that their queries will work unmodified in Hadoop. Pivotal HDB supports a broad range of scalar and aggregate functions (window functions, rollups and cubes, correlated sub-queries, and more) that makes analysts more productive. Via ODBC and JDBC drivers, analysts can also leverage a large ecosystem of their favorite data analysis and visualization tools. Pivotal HDB also supports extensions for popular languages, including PL/Python, PL/R, PL/Java, PL/pgSQL, and PL/Perl.

High Performance for Faster Querying

For interactive queries on large datasets, Pivotal HDB provides strong performance for a broad range of query types, validated by results from all 99 types of queries included in the TPC-DS benchmark. Features including dynamic pipelining; a cost-based, query optimizer; and high concurrency deliver high-performance at linear scale.

Standards-Based for Lower Risk

Enterprises adopting Hadoop confront an ecosystem of 30-40 projects with different levels of maturity and support. Pivotal HDB is interoperable with ODPi, which provides an open source framework and a common deployment model that brings consistency to Hadoop implementations. By using ODPi-compliant solutions, organizations can invest in Hadoop with confidence that their applications will behave consistently with less risk.

Hadoop-Native Tools for Ease of Management

Pivotal HDB plugs in with the Apache Ambari installation, management and configuration framework. This provides a Hadoop-native mechanism for installation, deployment, and monitoring of cluster resources. YARN integration allows HDB to share resources with other modules in the cluster.

Flexible Deployment

Pivotal HDB supports several deployment models, both on premise and in the cloud, including commodity hardware, virtualized IaaS, and EMC Elastic Cloud Storage (ECS), with an option of using EMC Isilon as the HDFS filesystem.

Previous
Advances in Big Data Research
Advances in Big Data Research

Next
Pivotal HDB: World’s Most Advanced Hadoop Native SQL Enterprise Analytic Engine
Pivotal HDB: World’s Most Advanced Hadoop Native SQL Enterprise Analytic Engine