February was a big month for data science, demonstrating the breadth of problems the discipline addresses, as well as its increasingly important role within diverse settings including the enterprise, academic research, government, and more. Here’s our picks for the top data science news of the month, from both Pivotal and the entire field.
Top Data Science News in February 2014
For all the talk about building a data-driven enterprise, a majority of businesses still place a premium on the gut instincts of the people at the top. In this ZDnet article, Toby Wolpe examines the reason why some executives and businesses resist data science, and ways that long-held yet false beliefs can become institutionalized and fail a company over time.
For businesses that are more eager to embrace a data-driven approach, veteran data scientist Carla Gentry shares some key insights she learned from making mistakes in this Information Week article. She advises companies to differentiate data science from big data, bridge the communication gap between practitioners and decision makers, familiarize themselves with the available technologies, and enforce clean data standards.
The art of data visualization is growing more expansive in scope and expressive. Bloomberg offers a roundup of some of the most beautiful and impactful weather- and environment-related visualizations to emerge so far this year. They include a representation of wind and water flows in recent years, the relationship between U.S. fracking wells and water resources, and NASA’s comprehensive visualization of six decades of climate change.
Social welfare reform is a hotly-debated topic in countries across the globe, but New Zealand has taken an analytics-led approach to optimizing the effectiveness of its welfare programs since 2007. Paula Bennett, the country’s minister of social development, explained at a recent conference how the country has used analytics to quantify the lifetime cost of welfare, increase government transparency, and use predictive risk modeling to address potential “at risk” individuals and provide proactive and preventative support.
Data literacy is becoming increasingly important for not only experts, but professionals from a wide swath of disciplines—executives, marketers, journalists, non-profit workers, and many more. For the data-curious who never went further than Statistics 101, Google is offering a free MOOC, “Making Sense of Data,” which will introduce participants to the basics of data analytics, familiarity with tools such as Fusion Tables, and finding patterns and relationships.
More advanced students might take much from Jason Brownlee’s extensive library of machine learning tutorials and resources. In a recent post, Brownlee offers a primer on the various types of test options to consider when choosing machine learning algorithms, including training and testing on the same dataset, split tests, cross validation, and statistical significance.
This Month in Pivotal Data Science
Oil spills from mining accidents can cost tens of billions per incident. Such economic and environmental disasters could be avoided through smart systems that automate the detection of anomalies and trigger reactions that could avoid future spills. The Pivotal Data Science team revealed the progress they’ve made toward building a digital brain to control drilling rigs for the oil and gas industry, and shows how it could be applied to other industries as well.
The connected car is a poster child for the “Internet of Things” and ripe with innovations from applying data science to big data. Pivotal Data Labs has been doing extensive work in the sector and shares some of the recent learnings from auto manufacturers that can also be applied to other industries.
Strata is all about the future of big data and data science—exactly the same reasons we formed Pivotal to solve, which is why Pivotal was an Elite sponsor for Strata 2013. Since the last Strata conference, we have been very, very busy spinning out, innovating and solving big data and data science challenges around the globe. For a taste of what we’ve been up to, check out the full post.
Upcoming Data Science Events
March 4, 2014, San Francisco, CA
With the explosion of big data, the need for fast and inexpensive analytics solutions has become a key basis of competition in many industries. Extracting the value of big data with analytics can be complex, and requires advanced skills. During this talk at Pivotal Labs’ San Francisco office, Senior Developer Rahul Iyer will review numerous open source solutions, including MADlib, PivotalR, and PyMadlib.
March 8, 2014, Austin, TX
This SXSW Interactive panel will examine the political, social, and economic biases that can skew the collection and analysis of data, and the ethical use of data visualization as a communication tool. The panel features Jake Porway, founder of DataKind, which has partnered with Pivotal’s Data Science Labs team for the Pivotal for Good program.
March 18, 2014, San Francisco, CA
Pivotal’s Robert Geiger details the considerations and challenges around preserving security and privacy within the data lake during this talk at the Pivotal Labs San Francisco office. During his talk, Geiger will review the issues and some of the technologies being developed within the community and by vendors to secure and manage the data in the emerging data lake.
March 19–20, 2014, New York, NY
The world’s biggest and most innovative companies are using data to make better products, build bigger profits and even change the world. Join 900+ big data practitioners, technologists and executives as they examine how big data can drive business success. From grand new uses to the nuts and bolts of capturing, storing, analyzing and serving it, get the bottom line on big data now.
About the Author