The History of Hadoop: From Small Starts to Big Data

March 14, 2013 Paul M. Davis

Named after a toy elephant belonging to developer Doug Cutting’s son, over the past decade Hadoop has proven to be the little platform that could. From its humble beginnings as an open source search engine project created by Cutting and Mike Cafarella, Hadoop has evolved into a robust platform for Big Data storage and analysis. It manages the deluge of user data for giant social services like Facebook and Twitter, supports sophisticated medical and scientific research, and increasingly addresses the storage and predictive analytics demands of the Enterprise. How did an open source project started by a moonlighting developer and a University of Washington grad student become ubiquitous in so many data-driven settings? In its new four-part series, GigaOm documents Hadoop’s history, its growth, and the promising future of the platform.

Though Hadoop’s roots wind back to Nutch, the 2002 project started by Cutting and Cafarella, three key factors kickstarted the Hadoop we know today. Heavily influenced by Google’s foundational Google File System and MapReduce papers, Cutting joined Yahoo! in 2006, which isolated Nutch’s storage and data processing capabilities within a discrete package named Hadoop. The platform became crucial to the operations of the company’s data science team, if not its search engine. Meanwhile, Hadoop was embraced within the open source community and by developers at companies such as Google and Facebook, accelerating its update cycle and lending the platform additional credibility and battle-tested stability.

The platform has flourished since then, igniting a slew of startups and enjoying considerable investment and development resources. Many from the Yahoo! data science team that developed Hadoop in its early days have ended up at Greenplum. As companies offer turnkey solutions such as Pivotal HD, the enterprise is increasingly adopting the platform for its affordability, stability, and extensibility, with IDC predicting the Hadoop software market will be worth $813 million in 2016. In its series, GigaOm paints a picture of the robust Hadoop ecosystem, looks towards its future, and reflects on the critical moments in its evolution.

About the Author

Biography

Surprisingly Simple Epic Wins

A surprising amount of simple can get an application over a number of speed bumps. We’re going to look up a...

Cloud Foundry Integration for Eclipse Can Now Launch External Command-line Applications Using Service Tunnels

In the course of developing, testing and deploying cloud applications, developers sometimes need to directl...

The History of Hadoop: From Small Starts to Big Data

About the Author

Previous

Next

The History of Hadoop: From Small Starts to Big Data

About the Author

Previous

Next

Related content in this Stream

How VMware Tanzu CloudHealth helps customers uncover spiraling AWS Extended Support charges.

VMware Tanzu enhances Spring development with simplified operations, accelerated innovation, seamless microservices transition, increased security, and effortless scaling.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

Bitnami-packaged open source software is loved by developers for its ease of use, which enables developers to directly pull a Bitnami package and seamlessly start using it with little effort.

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.