This Month In Data Science: November 2015

November 30, 2015 Paul M. Davis

This Month in Data ScienceWith the Presidential race heating up, the increasing importance of data science within the candidates’ campaigns received attention in November. In other news, Stanford’s School of Engineering held the first Women in Data Science conference, the National Science Foundation invested in cross-disciplinary data science partnerships, and the discipline’s capability to track and explain global consumer trends was focused on. Here’s our roundup of the biggest data science news of the month, both from Pivotal and beyond.

How Campaign Data Scientists Figure Out The Formula To Sway Your Vote For President

During a thread on Quora, Democratic Campaign Data Manager Luke Riley demystified how the 2016 Presidential hopefuls are utilizing data science. Though he sites micro-targeting as an important tool that data science adds to the process, but also emphasized the importance of his field experience gathering data door-to-door for four previous campaigns before joining the Obama 2012 team. During that time he learned “the depth of information available and the limitations behind data collection in that environment.”

Who’s Who In The Booming World Of Data Science

While “data scientist” is a blanket term used to refer to a variety of types of practitioners, a recent infographic from DataCamp breaks down who performs what key roles in the field, from data architects to business analysts. Moreover, the infographic lists the primary skills required for each role, major companies hiring those specific practitioners, and compares the national average salaries of the different roles.

Needed: More Women In Data Science

Stanford’s School of Engineering held the first Women in Data Science conference on November 2nd. During the all-female gathering, women practitioners shared their research, discussed the importance of diversity in understanding the questions and answered posed by data research, and discussed the challenges facing prospective women data scientists.

Data Science At Petabyte Scale Is Helping Explain Global Trends

Wired speaks to James Crawford, founder and CEO of Orbital Insight, which uses advanced image processing and data science techniques to track sales trends on a global scale. The company utilizes a plethora of data sources, including satellite and drone imagery, to gain insight on consumer activity as well as to track mining, manufacturing, and shipping activities.

Nate Silver Predicts 2016 Presidential Race At Salesforce World Tour

Speaking at the Salesforce World Tour on November 18th, star statistician Nate Silver offered some preliminary predictions for who will be the front-runners in the 2016 Presidential Election. Despite the insurgence of Bernie Sanders, Silver stated that Hillary Clinton remains the firm front runner in the Democratic race. On the Republican side, Silver hedged his bets, noting that there are few firm endorsements of candidates at this point, and that some factors are unpredictable: “You’ve never had a Trump or a Carson be a major candidate before,” he stated, referring to the current front runners in polls.

Establishing A Brain Trust For Data Science

The National Science Foundation announced the establishment of awards totaling $5 million and “Big Data Regional Innovation Hubs” which bring together academic researchers and leading corporations to drive innovation and share insights. The Hubs will prioritize a number of major topics researchers and scientists are focused on, including healthcare, management of natural resources, agriculture, smart cities, precision medicine, energy and manufacturing, and finance.

This Month In Pivotal Data Science

The New Flo for Spring XD

Flo for Spring XD is an incredibly powerful tool with a graphical canvas and DSL access. This first production-ready release adds batch workflows while addressing the most prominent challenges presented by Spring XD users since the beta process. Ultimately, Flo makes integration easier, improves the speed and quality of development, and addresses organizational needs. This post provides a background on Flo, explains the challenges it addresses, reviews the Flo solution and features, then talks about the journey ahead.

Data, Why Did It Have To Be Data?

In this episode of the Pivotal podcast, host Coté once again chats with Andrew Clay Shafer about the sundry challenges of transforming to a Cloud Native enterprise. They cover the changing focus we’re seeing among Pivotal customers: moving up the stack from infrastructure to the application layers. Then they discuss the difficulties of handling the data layer, and wrap-up with some change management tactics for getting the “rank and file” inspired and bought into the Cloud Native lifestyle.

Try Out Pivotal Greenplum With A Sandbox Virtual Machine

Pivotal Greenplum became the first open source massively parallel data warehouse in late October. Now known as Greenplum Database in its open source form, anyone can clone the github repo and build the product, but there is another segment of the community that just wants to try out the functionality of the product without going through that process. For that group, we now have the Pivotal Greenplum Sandbox Virtual Machine which combines the open source Greenplum Database, the commercially available Pivotal Greenplum Command Center management tool, Apache MADlib (incubating), PostGIS, PL/R, PL/Perl, and PL/Java into an easy-to-use virtual machine that runs in either VirtualBox or VMware Fusion.

How WellCare Accelerated Big Data Delivery To Improve Analytics

In the healthcare industry, big data management is becoming more and more of a high priority. This webinar is presented by executives from Pivotal, Attunity, and WellCare, a joint customer. In it, there are a number of industry data points shared—covering reporting at scale, using real-time data, where Apache Hadoop™ and massively parallel SQL on Hadoop systems fit, and more. WellCare also shares the story of their journey to improve mission-critical query times from 30 to seven days.

Now Open: Pivotal Big Data Center Of Excellence In Denver

Pivotal is expanding our partner support. With data and analytics becoming a key differentiator for successful businesses, enterprises and start-ups alike are increasingly building scale-out big data platforms. Pivotal is expanding our presence in Denver to increase the amount of hardware platforms we are certified with, helping to reduce risk and increase the time to value for our customers. Read more about this new capability and how to participate.

About the Author

Biography

More Content by Paul M. Davis
Previous
An Open Source Reference Architecture For Real-Time Stock Prediction
An Open Source Reference Architecture For Real-Time Stock Prediction

Since inception, stock traders have used information to make decisions. Over time, this led to algorithmic ...

Next
Cloud-Native On The Meetup Circuit
Cloud-Native On The Meetup Circuit

Starting Monday, November 30, the Cloud Foundry and Java Spring teams will be visiting Minneapolis, Columbu...

×

Subscribe to our Newsletter

Thank you!
Error - something went wrong!