GoGaRuCo '09 -Hypertable and Rails: DB Scaling Solutions with HyperRecord – Josh Tyler & Rusty Burchfield

April 19, 2009 Pivotal Labs

Intro

Hypertable and Rails: DB Scaling Solutions with HyperRecord

Links:
Hypertable
HyperRecord

Rusty is from Zvents, a local search engine

Presentation

Showing example of hourly data for the last month for a single event

Old benchmark was over 1M rows inserted per second sustained

Hypertable is an open-source implementation of Google’s BigTable.

Hypertable is a Column-Oriented DBMS

Data Model
5-part key:
Row Key
Column Family
Column Qualifier
Timestamp
Revision

One index per table (on the row key)
Only stores strings

Architecture
Master server – tracks range servers and where data is stored (spare master is also usually run, as it’s a single point of failure)
Range servers – data is broken up into individual range servers
Hyperspace – Handles locking and master recovery
HDFS – Stores redundant copies of data

ThriftBroker – An RPC wrapper for Hypertable for many languages using the Thrift Wrapper

HyperRecord

HyperRecord is a subclass of ActiveRecord for Hypertable
Supported by the Hypertable

Example
Loading data into simple pages app
Loading first 10,000 articles of wikipedia
150MB of data infiled in 14 seconds
Loads all the data into a rails scaffold and browses it

Design considerations
Denormalization – can’t do joins so you have to put your data in an appropriate format for querying. Can use MapReduce to interact with data.
Column families/qualifiers – You can store data in the key part of the key value pair
Revisions – deletes are represented as inserted delete cells

Questions

Q: How do you break down data by hours in example
A: Broken down by Ruby and aggregated

Q: It looks like the keys in that list were strings, not timestamps, did you have to take the timestamp and convert it to a string yourself?
A: Pretty much

Q: Did the wikipedia articles contain any of the sub-data like images, links, etc?
A: No, just a sql dump as a demo of querying the database through a rails scaffold

Q: Does hypertable select support SQL limits, order, etc?
A: HQL supports a lot of things you’d expect from SQL, but it’s still somewhat limited.

Q: What do you do with it?
A: We store all of our log data and process it using Cascading to gather hourly data for all our pages. We then put it in Hypertable so we can query it quickly to generate reports.

Rusty:
Cascading is Java code
You can easily construct complicated MapReduce jobs using it

Josh:
Some other uses of Hypertable at Zvents
Changelog
We deal with a lot of user created content, and things change often and we don’t always know what
We log everything that ever happens to our data so that we can track everything that happens to our data. From uploaded images to deleted links to edited descriptions, we can see what changed, when and how.

Zvents and Baidu are the primary sponsors of the Hypertable project. Hypertable and HyperRecord are both on Github.

Hypertable development started 2 years ago as a forward looking solution to analytics problems.

The search problem for Zvents is many dimensional: Time, Location, Description, User Data and User Behavior and Hypertable is a way to inform a lot of that data.

Q: What kind of problems are well suited to HyperTable
A: We’re trying to move our entire site over. A canonical example for this kind of database is a crawl database.
A2: Anything where you have mountains and mountains of data and want to query over it.

Example of Crawl Database stored in Hypertable.

About the Author

Biography

GoGaRuCo '09 – Meta Meta – LiveBlogging the LiveBlogging – Coda/SubEtha

For the second day of GoGaRuCo, my fellow Pivots David Stevenson, Zach Brock, and Ryan Dy are helping out w...

GoGaRuCo '09 – Josh Susser and Leah Silber

Conference Organizers Extraordinaire!

GoGaRuCo '09 -Hypertable and Rails: DB Scaling Solutions with HyperRecord – Josh Tyler & Rusty Burchfield

Intro

Presentation

HyperRecord

Questions

About the Author

Previous

Next

GoGaRuCo '09 -Hypertable and Rails: DB Scaling Solutions with HyperRecord – Josh Tyler & Rusty Burchfield

Intro

Presentation

HyperRecord

Questions

About the Author

Previous

Next

Related content in this Stream

Bitnami-packaged open source software is loved by developers for its ease of use, which enables developers to directly pull a Bitnami package and seamlessly start using it with little effort.

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.