Querying External Data Sources with Hadoop

July 17, 2014

In large enterprises, it is not uncommon to have big data in different formats, sizes and stored across different systems. Moreover, enterprises typically have a multitude of systems with gold mines of information that can be put to use for strategic insights. Linking these existing storage systems with HDFS can be very challenging. Pivotal helps leverage your existing data infrastructure investments with HDFS and begins to shift your legacy enterprise data warehouse, analytical data marts and data silos into a centrally modern, governed business data lake, where all data types are stored and accessible for on-demand analytics. Pivotal HD is able to connect all data across multiple systems without having to move or copy the data to and from HDFS for analysis. This is possible through Pivotal Xtension Frameworks (PXF). PXF is an external table interface in HAWQ (fast, scalable, production grade, 100% SQL compliant query engine on HDFS) that allows you to read and query data directly stored in and outside of the Hadoop ecosystem -- HDFS, Hive, HBase, etc. while supporting a wide range of data format such as Text, AVRO, RCFile, and many more. PXF also delivers a fast extensible framework by exposing parallel APIs to connect HAWQ with additional data sources namely GemFire XD, JSON, Accumulo and Cassandra. Watch this technical preview to learn how to add extensibility into Hadoop: - Eliminate the need to copy data from the underlying storage system to HDFS - Leverage rich, deep, and fast analytics on Hadoop data files of various kinds - Conduct statistical and analytical functions from HAWQ on HBase or Hive Data - Run complex analytical queries that join in-database dimensional data with fact data stored in HBase - Easily write your own custom PXF connectors to external data sources - Cut down time and operational costs

Previous
Pivotal - OSv: Probably the Best OS for Cloud Workloads You've Never Heard Of (CF Summit 2014)
Pivotal - OSv: Probably the Best OS for Cloud Workloads You've Never Heard Of (CF Summit 2014)

Technical track breakout session presented by Roman Shaposhnik, Sr. Manager of Hadoop at Pivotal. OSv is ...

Next Video
Matt Stine - Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship (CF Summit 2014)
Matt Stine - Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship (CF Summit 2014)

Technical track breakout session presented by Matt Stine, Platform Engineer, Cloud Foundry, Pivotal. With ...