Demonstrating the Future of Data Science at the Strata Conference

November 7, 2012 Mike Maxey

the_new_normal

A wise man once said only a fool would attempt a live demonstration (anyone remember Bill Gates and Windows 98?). Apparently I am that fool. Last month at the Strata Conference, my cohort Matt Neglay and I presented a talk, “Demonstrating the Future of Data Science.” As we demonstrated to the standing room-only crowd, data science is changing. The concept of a single all-powerful Data Scientist ruling the enterprise does not match reality. Instead, data science requires a team of people who work together and seek help where possible.

When you ask a data scientist what are the greatest challenges to doing a good job, you will likely get a variety of answers that include:

  • Getting access to the data
  • Understanding naming conventions around the data
  • Inheriting a project with little background or help
  • Sharing predictions with non-technical people
  • Lack of documented history for projects
  • People coming or going mid-project

The list goes on. These are daunting challenges , but if you look closely at them, most can be solved through the simple concept of collaboration. Does this scenario sound similar? One member of a team knows where the data is located, what it is called, and what has happened before with that data. Meanwhile, other team members have a grasp of the project history. Chorus is the platform that connects these team members and their respective skills and knowledge, and can transforming how an enterprise makes data-driven decisions.

This is precisely what we demonstrated in the talk by demoing Greenplum Chorus. It’s a platform that accelerates time to insight, not through a faster database, but instead by empowering the entire team with the ability to view, discuss, and collaborate upon the latest data and discoveries.

Still, it takes more than just a platform to be successful: you need data, visualizations and perhaps even some outside brainpower. These considerations were key to our demonstration. We demonstrated our GNIP connector that allows users to easily bring Twitter data into Hadoop, joined that social data with customer information from the database, visualized the tweets using Tableau v8, and even invited Kaggle’s community of data scientists to offer their skills and expertise to enterprises in need of top-tier practitioners.

If that seems like a lot for a 12-minute demo, it wasn’t. This is the power of Chorus: It brings data, tools, people, expertise together to simplify and accelerate the data science process, regardless of team members’ roles or physical locations. Best of all, Chorus is now open source. I’d like to encourage you to download the binaries or the source code and create your own Chorus demo, or even better, put it into production within your organization. I’d be willing to bet you will start doing things faster, and be on the road to building your own future of data science.

Watch the Slideshare from the “Demonstrating the Future of Data Science” presentation.

About the Author

Biography

More Content by Mike Maxey
Previous
Microsoft's Mobile Strategy
Microsoft's Mobile Strategy

"I have to admit, I didn’t think Microsoft really “got it” when it came to mobile."

Next
Using Open Directory Authentication in Splunk
Using Open Directory Authentication in Splunk

Splunk is capable of authenticating users against LDAP, including Apple's Open Directory. To configure Spl...