This Month In Data Science

August 29, 2014 Paul M. Davis

This Month in Data Science for August 2014This The month of August saw a plethora of data science-related news and announcements, including a Wired feature on Kaggle founder Jeremy Howard’s quest to improve health care, General Electric’s success in bringing analytics-derived insights to its products for the railroad, airline, hospital, and utility industries, a provocative interview with OKCupid founder Christian Rudder, and much more. Here’s our roundup of the major data science news of the month, both from Pivotal and the wider industry.

The Data Scientist on a Quest to Turn Computers Into Doctors

Wired profiles Kaggle founder Jeremy Howard and his efforts to improve health care through data science practices such as developing deep learning algorithms.

What Cars Did for Today’s World, Data May Do for Tomorrow’s

The New York Times profiles Pivotal investor General Electric’s major push towards Internet of Things-connected machines and devices to fuel its data lake. GE is bringing analytics-derived insights to its products for the railroad, airline, hospital, and utility industries, yielding successes such as the ability to detect “possible [airline] defects 2,000 times as fast as it could before.” In related news, the Washington Post reported on how GE is using sensor-enabled devices and data-driven insights to revolutionize its manufacturing processes.

For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights

For all of the promises of data science, the major hurdle for many practioners remains the messiness of available data, the New York Times reports. “Data wrangling” accounts for 50 to 80 percent of professionals’ time according to studies cited in the article, reducing the effectiveness of data scientists’ efforts.

SAS Visual Statistics for Modernized Analytics

Inside Big Data looks into SAS Visual Statistics, which is compatible with a number of platforms, including Pivotal’s database appliance. The post explains how the software can speed up model building and help provide more accurate insights by presenting analytical results visually.

Apache Hadoop® gets real

Computerworld details how a number of companies are benefiting from Apache Hadoop®‘s ever-increasing capability and functionality, and improved ease of use. The article focuses on an interview with Inovalon’s CTO, who discusses how the company is leveraging Apache Hadoop® in detail.

DataKind’s do-good data-science projects arrive in 5 more cities

VentureBeat reports on the increased scope of Pivotal for Good partner DataKind’s efforts, with the non-profit expanding its data scientist-and-nonprofit matchmaking service to five new cities around the world, including Bangalore, Dublin, Singapore, Washington DC, and San Francisco.

OkCupid’s Co-Founder on the Myth of the Data Scientist ‘Unicorn’

The Wall Street Journal interviews OkCupid founder Christian Rudder, who controversially defended Facebook’s user sentiment experiments on the company’s blog in July. Rudder aims to dispel a number of persistent myths and misconceptions about the discipline, explaining OkCupid’s rationale behind its own user experience experiments, and stating that while data science requires a number of specialized skills, practitioners need not be ‘special geniuses’ to be effective data scientists.

This Month in Pivotal Data Science

Data Science How To: Massively Parallel, In-Database Image Processing: Part 2

This post, part 2, is a continuation of part 1 where Image Processing Expert and Pivotal Senior Data Scientist, Ailey Crow, gives a short introduction on how data science is applied towards better, faster image processing. This approach can have a huge effect on a number of industries ranging from neurobiology and cancer detection to cognitive vision and control robotics.

Field Report: Hack Midwest Highlights How Developers Are Innovating on the Internet of Things and Big Data in Real-time

In a 24 hour period, over July 19th and 20th, Pivotal community engineer Scott Kahler participated in the Hack Midwest event in Kansas City as a judge. Kahler highlights the energy of the event, as well as shares some of the innovations produced, including a developer oriented dating site, collaborative wish lists for Amazon, and a drone app that is able to sweep crowds and report demographics. All developed in just one day, the event shows how agile developers have become in being able to develop valuable apps that fuse the Internet of Things and big data in real time.

Editor’s Note: Apache, Apache Hadoop, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

About the Author

Biography

Previous
Tech Talk: Flask
Tech Talk: Flask

Flask is a lightweight Python framework for building web applications. Advantages of Flask include an exten...

Next
Pair design Rule #2: "Yes, and…"
Pair design Rule #2: "Yes, and…"

We shared 40 hours a week, a screen, a product, client relations and a problem space. We raved about the j...