The History of Hadoop: From Small Starts to Big Data

March 14, 2013 Paul M. Davis

gigaom-hadoop-icon-final

Named after a toy elephant belonging to developer Doug Cutting’s son, over the past decade Hadoop has proven to be the little platform that could. From its humble beginnings as an open source search engine project created by Cutting and Mike Cafarella, Hadoop has evolved into a robust platform for Big Data storage and analysis. It manages the deluge of user data for giant social services like Facebook and Twitter, supports sophisticated medical and scientific research, and increasingly addresses the storage and predictive analytics demands of the Enterprise. How did an open source project started by a moonlighting developer and a University of Washington grad student become ubiquitous in so many data-driven settings? In its new four-part series, GigaOm documents Hadoop’s history, its growth, and the promising future of the platform.

Though Hadoop’s roots wind back to Nutch, the 2002 project started by Cutting and Cafarella, three key factors kickstarted the Hadoop we know today. Heavily influenced by Google’s foundational Google File System and MapReduce papers, Cutting joined Yahoo! in 2006, which isolated Nutch’s storage and data processing capabilities within a discrete package named Hadoop. The platform became crucial to the operations of the company’s data science team, if not its search engine. Meanwhile, Hadoop was embraced within the open source community and by developers at companies such as Google and Facebook, accelerating its update cycle and lending the platform additional credibility and battle-tested stability.

The platform has flourished since then, igniting a slew of startups and enjoying considerable investment and development resources. Many from the Yahoo! data science team that developed Hadoop in its early days have ended up at Greenplum. As companies offer turnkey solutions such as Pivotal HD, the enterprise is increasingly adopting the platform for its affordability, stability, and extensibility, with IDC predicting the Hadoop software market will be worth $813 million in 2016. In its series, GigaOm paints a picture of the robust Hadoop ecosystem, looks towards its future, and reflects on the critical moments in its evolution.

About the Author

Biography

Previous
Surprisingly Simple Epic Wins
Surprisingly Simple Epic Wins

A surprising amount of simple can get an application over a number of speed bumps. We’re going to look up a...

Next
Cloud Foundry Integration for Eclipse Can Now Launch External Command-line Applications Using Service Tunnels
Cloud Foundry Integration for Eclipse Can Now Launch External Command-line Applications Using Service Tunnels

In the course of developing, testing and deploying cloud applications, developers sometimes need to directl...