This week’s announcement that Greenplum is open-sourcing its collaborative data science platform Chorus and partnering with Kaggle to connect OpenChorus users with the data scientist elite has generated lots of press. Announced at this week’s O’Reilly Strata conference in New York City, OpenChorus and the Kaggle partnership will enable customers, partners, developers, and data scientists to collaboratively realize the predictive potential of Big Data. Here’s a roundup of some of the responses in the media:
Scott Yara, a co-founder of Greenplum and now its senior vice president of products, said his company already has a dedicated staff of 25 data scientists but has more work than it can handle. “We’ll never fill the gap,” he said. “Even the biggest companies in the world are just starting out on this.”
Greenplum’s and Kaggle’s effort may be one way that the market copes with a perceived shortage of data scientists. Last year, McKinsey Global Institute said that the United States needs perhaps 190,000 skilled data analysts and 1.5 million more data-literate managers to cope with all the information companies are collecting.
Other methods include offering marketplaces of algorithms and better user interfaces to automate some of the statistical process. Both of these efforts are under way, both at established companies and at start-ups.
“We’ve had good adoption of Chorus and companies’ internal data workers are using it to do data science so they now have the tools, but honestly they don’t have all the people they need,” said Josh Klahr, VP of product management for Greenplum. “Now you can search the Kaggle community based on rank, expertise, location and invite them to work on your challenge using Greenplum Chorus.”
Kaggle ranks its participants much the way the USTA ranks tennis players. And that community is growing fast – when Kaggle started fundraising in August, there were 11,000 members, now there are close to 60,000, said Anthony Goldbloom, CEO of Kaggle, who said this is the first such vendor partnership Kaggle has done.
Big data is one area where building a broad ecosystem of data providers is incredibly important. Putting good data scientists together with great data sets is incredibly important, said Ben Woo, managing director of research firm Neuralytics. “Big data is awfully short on the kinds of people who’ve done this work before and, frankly, people who give a damn. This sort of matchmaking is valuable.”
The Age of Aquarius has dawned on the World of Big Data, at least if you’re in EMC Greenplum’s orbit, that is.
The big data division of storage vendor EMC today announced that Greenplum Chorus, its platform for the widespread development of social-enabled, purpose-built Big Data applications and solutions is now Open Source (OpenChorus). It has also joined forces with Kaggle to help alleviate the shortage of data scientists. And it has partnered with social data provider Gnip, interactive data visualization products company Tableau and predictive analytics provider Alpine Data Labs, among others, to ease the use of these tools in conjunction with Chorus.
This may very well be the dawning of a new day in the Age of Big Data. It will begin with downloads of Chorus, with finding and visualizing your data, resourcing your project, analyzing and modeling, sharing insights and collaboration, and contributing back to the community.
OpenChorus allows customers, partners, developers, and data scientists to collaboratively realize the predictive potential of Big Data.
About the Author
BiographyMore Content by Paul M. Davis