Data science was a hot topic of discussion in July, with much debate over Facebook’s experiments with users’ emotions based on their behaviors and likes. Despite this, the role of data scientists enjoyed increasing prominence in the fields of sales, healthcare, and sports. Here’s our monthly roundup of the top data science news of the month, both from Pivotal and the entire industry.
Michael Howard, chief executive of analytics consultancy C9, explains the increasing importance of data science in sales for VentureBeat. He says that data science can improve clarity in sales communications, reduce the complexity of forecasting, and rein in inconsistency and subjective over-analysis within sales teams.
The healthcare industry’s extensive efforts to collect and store patients’ records have yet to yield significant improvements in care, argues Acupera’s Chief Technology Officer Imran Qureshi in Wired. Qureshi uses this as a case illustrating the limitations of data warehousing without yielding actionable insight from these records, explaining how integrating data analysis workflows into patient care could improve the entire cycle of treatment, integrating medical records, sensor data, and social and behavioral assessments to better inform health care practitioners’ care.
Data journalism has received plenty of attention this year with the launch of data-driven news sites such as FiveThirtyEight and Vox, with some resulting disappointment in the rigor and depth that these sites are delivering. Derrick Harris at GigaOm argues that much current data-driven journalism is merely charting recent report numbers, and that the breadth of sources and techniques that dedicated data scientists bring to the table could bring additional insight to data journalism efforts.
A recent study by Forbes and Glassdoor.com declared data scientist to be the best job for work-life balance, based on salary and employee feedback posted on the site. Reasons cited include the high demand for data scientists, which is leading companies to adopt more flexible schedules and workflows to attract top talent.
In light of his star performance during the World Cup, Benjamin Morris at FiveThirtyEight performs a deep dive on the performance of Argentinian soccer player Lionel Messi, looking at the scoring stats behind the vaunted “Messi magic.”
The revelation that Facebook had experimented with its news feed to affect users’ emotions and behavior ignited a firestorm of debate over how personal data is being used by technology companies, and whether the company and researchers stepped over the line. According to former data scientist Andrew Ledvina, Facebook’s data team has conducted similar experiments with little oversight. Some academic researchers pushed back against the public outrage, such as University of Michigan’s Clifford Lampe, who stated “Facebook deserves a lot of credit for pushing as much research into the public domain as they do.” Late in the month, OkCupid’s blog OkTrends supported Facebook by declaring that data scientists frequently experiment with user behavior, and that this is to be expected. Nevertheless, Facebook responded to the uproar with contrition and stated that it is overhauling its internal review process.
VentureBeat reports on Data for Good, a fork of data science news site DataTau focused on data-for-social good projects and initiatives. The site aims to document and model these projects so that others can replicate them in their own regions and communities.
This Month in Pivotal Data Science
Better, faster image processing can have a huge effect on a number of industries ranging from neurobiology and cancer detection to cognitive vision and control robotics. In part 1, Image Processing Expert and Pivotal Senior Data Scientist, Ailey Crow, gives a short introduction on how this science is applied and then demonstrates six steps of the process.
On July 18th, 28 high school-aged girls arrived at Pivotal’s San Francisco offices for an immersive introduction to Pivotal’s agile software development and data science practices. As part of the Girls Who Code initiative, participants were given an opportunity to try out Pivotal’s pair programming approach while receiving guidance and mentorship from a number of Pivotal’s expert women developers and data scientists.
The members of the Pivotal Data Labs team are often asked what tools and platforms they use to analyze large datasets and build cutting edge predictive models. In this post, Ian Huston considers the importance of choosing the right platform and focus on exploratory data science. The team always want to use the right tool for the right job, which means understanding what data processing is needed, performance requirements, and budgetary limitations.
In this article, Pivotal engineer and predictive analytics expert Hai Qian explains how someone new to R can get started performing statistical analysis on data stores in Greenplum Database, Pivotal HD and PostgreSQL in just 20 minutes using PivotalR. First, there is some background on R’s popularity, then the articles dives into important topics such as installation, data loading, and data manipulations for PivotalR.
About the AuthorMore Content by Paul M. Davis