This Month in Data Science: June 2015

July 1, 2015 Paul M. Davis

This Month in Data Science June 2015Data science’s critical role within a wide range of industries and areas of research was evident during the month of June, as Spotify announced a new data analytics team, free-to-play games received increased scrutiny, and much more. Here’s our roundup of the biggest data science news of the month, both from Pivotal and beyond.

Why Spotify Just Bought a Data Science Startup

With the announcement of Apple Music, the competition among streaming music services heated up this month. Spotify, the current champ in the streaming music space, is doubling down on data science to improve its recommendation engine by acquiring consulting firm Seed Scientific. The goal for the company’s new data analytics team is to analyze audio attributes, metadata, and user behavior to improve its recommendations and better target advertising.

Gaming Your Brain

Gamification’s time as the buzzword du jour may have passed, but its underlying principles are informing user interface design across many industries. Ironically it may have proven most disruptive to the video game industry, where console makers and developers have lost mindshare and gamers to the free-to-play mobile explosion. In this in-depth article for ESPN, Simon Parkin looks at how mobile gaming companies are utilizing psychology and data science to draw gamers into an ongoing feedback loop of monetization.

What’s Really Warming the World?

In an interactive infographic, Eric Roston and Blacki Migliozzi for Bloomberg Business visualize what factors have contributed to global warming, from 1880 to 2014. Using data from NASA’s Goddard Institute for Space Studies, the visualization takes into account the degree to which both natural and industrial factors contribute to global warming.

So You Want To Be a Data Scientist: A Guide for College Grads

The difficulty of finding skilled data scientists who hold the diverse skillsets required is well-established at this point. This unique mixture of mathematical, coding, and analytical prowess can be imposing to students looking to break into the field. In this article, Datanami speaks to a number of professional data scientists to determine what technical and professional skills would-be practitioners need to develop, as well as which educational paths will pay off for students.

EMC Uses Data Science To Unlock The Secrets Behind The Morecambe Missile

To understand what makes legendary British motorcycle racer John McGuinness so fast, EMC embedded sensors into his suit and bike in 2015 to collect data from a Spanish test circuit. The data was released to the data science community to determine what contributes to McGuinness’s unprecedented speed in two competitions, one focused on data analysis and the other on data visualization. Data analytics winner Stefan Jol found how performance in one area of the track impacted McGuinness’s entire race, while visualization winner Charlotte Wickham developed a live visualization of relative performance among racers.

Big Universe, Big Data, Astronomical Opportunity

Astronomical data may be the original source of big data, with research revealing a mind-boggling number of celestial bodies. It’s not surprising that the final frontier produced some of the earliest applications for big data technologies and data science. But as Maya Dillon at the Guardian details, even with new technologies and techniques, there remain many data challenges for astronomers as they contend with the vastness of space, including finding effective approaches to visualization, developing efficient algorithms, and introducing machine learning methodologies.

This Month in Pivotal Data Science

Making or Saving Money With Big Data

Based on a listener suggestion, podcast host Simon Elisha discusses examples of some of the data science Pivotal Labs performs for customers. Sticking to more common, and universally understandable examples, this podcast covers two use cases in retail and how they either make money or save money.

Pivotal Big Data Suite Sets Purdue University Students Up For Success

Purdue University has become a leader in using data and data science to help students increase student success rates, flag issues, and improve teacher effectiveness. With the help of Pivotal Big Data Suite, data mining techniques, and predictive analytics, the University can give students and teachers an early warning system in situations where students might have challenges.

Benchmarking Stream Performance With Spring XD 1.2 and Apache Kafka

One of the goals for the Spring XD 1.2 release was to obtain the baseline performance metrics on a typical cluster of machines and then optimize stream performance where necessary. Spring XD is a unified, distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export. Our testing drove several optimizations to increase streaming performance. The benchmarks found that a single threaded Spring XD stream can handle over 2 million (100 byte) events a second, using Apache Kafka as a transport.

New Spring XD Releases And Beta Release Of Flo For Spring XD

The Spring XD engineering team has some big announcements regarding Spring XD 1.2 and 1.1.3 along with Flo for Spring XD. Focusing on developer experience and productivity, the new features cover Flo, performance optimization, new sources/processors/sinks/batches, runtime refactoring to act as native apps in Pivotal Cloud Foundry, Apache Ambari installed clusters, resiliency improvements, registry HA support, improved integration with Pivotal HAWQ, Pivotal Gemfire, Pivotal Greenplum Database, Pivotal HD, and Sqoop.

Jignesh Patel on Pivotal’s Acquisition of Big Data Project Quickstep

Today, Pivotal announced an exciting acquisition of big data query technology from the University of Wisconsin-Madison. As part of the acquisition, Professor Jignesh Patel will be joining Pivotal and starts his tenure here sharing why this is such a great move for Pivotal customers, the Quickstep technologies and himself.

3 Biggest Questions Companies Have Before Starting To Tackle Apache Hadoop

After attending the Pivotal Big Data Roadshow in Atlanta, Pivotal’s Stacey Schneider validates that it is still early days for most companies on their journey to transform to a data-savvy technology company. Many attendees use the roadshow as a free orientation to starting this journey, and have many questions. Summarizing the 3 most popular questions from the event, she answers: How can I convince my org to start on big data now? Do I really have to run it? Is a Data Lake really all one big thing?

6 Free Technical Classes From Pivotal Education

Pivotal Education makes it easy to fully realize the capabilities of our technologies by offering a series of free training courses. Designed for developers, system architects, and data practitioners, these online courses engage students through a sandbox environment and interactive labs. The introductory courses enable technologists at any point of engagement with Pivotal technologies — whether during evaluation or after deployment — to become more well-versed, efficient, and effective in their efforts. The current classes provide hands-on experience with Pivotal technologies such as Pivotal HD, Pivotal Cloud Foundry, Pivotal Greenplum Database, HAWQ, Redis, and GemFire.

Pivotal Data Events in July

Pivotal Big Data Roadshow : Melbourne

Jul 7, 2015—Melbourne, Australia

Pivotal Big Data Roadshow : Sydney

Jul 8, 2015—Sydney, Australia

Join data technology experts from Pivotal to get the latest perspective on how big data analytics and applications are transforming organizations across industries.

Data Science Summit 2015

Jul 20, 2015—San Francisco, CA

Pivotal is a proud to be a Gold Sponsor of the Data Science Summit 2015. The Summit brings together researchers and data scientists from academia as well as industry to discuss state of the art data science, applied machine learning, and predictive applications.

About the Author

Biography

Previous
The Agility Frontier—Continuous Delivery and Pivotal Cloud Foundry
The Agility Frontier—Continuous Delivery and Pivotal Cloud Foundry

What’s Pivotal Cloud Foundry have to do with continuous delivery? Fresh off presenting at a recent Jenkins ...

Next
The Pivotal Glossary
The Pivotal Glossary

A guide to the language, idioms, and acronyms (oh, my!) that we use to develop modern software.