O’Reilly Media spreads the knowledge of innovators through its books, online services, magazine and conferences. An active participant in the technology community, O’Reilly is a chronicler and catalyst of leading-edge development – homing in on the technology trends that really matter and spurring their adoption. Publisher of the iconic “animal books” for software developers, organizer of the summit meeting that gave the open source software movement its name and prime instigator of the DIY revolution through Make magazine, O’Reilly continues to concoct new ways to connect people with the information they need.
Analyzing Millions of Job Postings
Targeting the latest hot topic is critical to the success of any publishing company, but staying on top of trends is difficult to achieve, especially when focused on the rapidly-evolving technology industry. When O’Reilly decided to refine its decision making regarding which topics to cover in future books and conferences, the company turned to Pivotal.
O’Reilly needed a powerful, scalable data warehouse to collect and analyze millions of IT job posts to glean insights into key trends in the technology industry. The company uses this data to come up with topics for its books and conferences, to offer consulting and custom research and for its own internal business development. O’Reilly’s goal was to tap into one of the largest sources of technology trend data available: Simply Hired. The site serves as an aggregator of millions of available jobs in the technology industry, painting a picture of the technologies garnering the most attention, investment and focus at any given moment.
O’Reilly had been collecting IT job data from Simply Hired for several years, to see which categories of jobs were increasing in number, which were diminishing and which were emerging for the first time. O’Reilly had data on about 30 million jobs in an open source database and manually analyzed query results. But the database continued to grow and the analysis process was slow and cumbersome – taking an average of 10 hours to run a single query. O’Reilly needed a faster, more robust data warehouse solution to capture and analyze the growing job data.
Pivotal Greenplum Delivers High Performance and Scalability
O’Reilly consolidated hundreds of millions of job postings from Simply Hired into Pivotal Greenplum for trend analysis. Pivotal Greenplum is designed to manage very large amounts of data very quickly and its massively-parallel architecture is perfectly suited to load and query petabytes of data – making it an ideal solution for O’Reilly’s needs.
After implementing Pivotal Greenplum, O’Reilly immediately began to see an increase in the speed of queries – from an average of one every 10 hours to one every six minutes, a 100x gain in performance.
Improved Data Analysis
Pivotal Greenplum’s massively-parallel structure allowed O’Reilly to effortlessly run more complex queries – for example, searching for all jobs that contain the world “Java”, but filtering out those related to coffee shops or travel agencies. With this expansion in data processing power, O’Reilly has exponentially increased the depth of its knowledge into technology trends.
Scalability for Business Growth
O’Reilly originally implemented Pivotal Greenplum to quickly and efficiently query data from 30 million job posts. Today, the system handles 170 million job posts and continues to grow. With Pivotal Greenplum, O’Reilly is able to analyze what has become a massive data store with alacrity, successfully performing large-scale analysis against huge data sets.
Now, O’Reilly believes the sky is the limit. Their Pivotal Greenplum architecture is on track to analyze 500 million job postings or more, allowing O’Reilly to draw richer conclusions about the technology industry to determine up-to-the-minute topics for books and conferences. Pivotal has helped O’Reilly retain its reputation as a trendspotter – and to better serve its role of educating the IT community on the latest developments in technology.