Greenplum and Kaggle Partner to Connect OpenChorus Users with the Data Science Elite

October 23, 2012 Paul M. Davis

The high demand for data scientists is an significant challenge for increasingly predictive enterprises. It’s is a sophisticated vocation, requiring a range of specialized skills. Kaggle, the online platform for data science competitions, is unique in that it has attracted a community of thousands of skilled practitioners. Moreover, its competitive structure encourages quick iteration, learning, and allows leading data scientists and emerging talents to rise to the top of the rankings. With the announcement of the availability of Greenplum’s OpenChorus, businesses can soon tap into the knowledge, talent, and cognitive surplus of Kaggle’s community of skilled practitioners. Datastream spoke with Anthony Goldbloom, founder and CEO at Kaggle, about what users of OpenChorus can expect from the integration, and how the competition platform attracts the best and the brightest in this burgeoning field.

Datastream: What can Kaggle and Greenplum users expect from the partnership between the two companies?

Anthony Goldbloom:We have a huge community of really talented data scientists (over 55,000). Partnering with Greenplum allows us to open up a huge new market to them. At the moment they earn some money through Kaggle relationships and winning competitions. OpenChorus allows them to leverage Greenplum’s customers, who in turn can leverage the market of data scientists.

Can you explain what the user experience will be like for companies looking for data scientists?

Anthony Goldbloom:As an OpenChorus user, as well as being able to collaborate with other people in your company, there will be a want help button that brings up a list of Kaggle data scientists that you can reach out to. Kaggle’s data scientists are high-end and able to tackle difficult problems, precisely the kind of problems that you expect Greenplum customers to be dealing with — unstructured text data, graph data, data sets missing values, for example. These are problems Kaggle’s data scientists are well-equipped to address.

What’s the advantage to OpenChorus users to tap into Kaggle’s community of practitioners?

Anthony Goldbloom:One problem with hiring data scientists is that it’s really difficult to get a sense of how good they are. Data science done well requires a range of skills, and it’s really hard to tell from a CV how good a data scientist is. Anyone who sees a Kaggle data scientist has risen to the top of the rankings knows that they’re getting someone really elite.

How has Kaggle attracted so many eager and talented data scientists?

Anthony Goldbloom:There’s so much demand because data scientists add a lot of value. Data scientists are doing high-leverage work — one data scientist building one algorithm can determine how an entire bank provides loans, for example. But because it’s such high-leverage work, the quality of a data scientist can mean hundreds of millions of dollars in ROI.

Data science is inherently creative work. Kaggle gives data scientists an opportunity to work on a range of different problems, and get better through competition, and learn from what the winners did. In a way, Kaggle filters the most curious data scientists, exactly the sort that are likely to be most high-performing.

I think OpenChorus is a great opportunity for our community, and hope it leads to more work and recognition for their skills.

