With the Presidential race heating up, the increasing importance of data science within the candidates’ campaigns received attention in November. In other news, Stanford’s School of Engineering held the first Women in Data Science conference, the National Science Foundation invested in cross-disciplinary data science partnerships, and the discipline’s capability to track and explain global consumer trends was focused on. Here’s our roundup of the biggest data science news of the month, both from Pivotal and beyond.
During a thread on Quora, Democratic Campaign Data Manager Luke Riley demystified how the 2016 Presidential hopefuls are utilizing data science. Though he sites micro-targeting as an important tool that data science adds to the process, but also emphasized the importance of his field experience gathering data door-to-door for four previous campaigns before joining the Obama 2012 team. During that time he learned “the depth of information available and the limitations behind data collection in that environment.”
While “data scientist” is a blanket term used to refer to a variety of types of practitioners, a recent infographic from DataCamp breaks down who performs what key roles in the field, from data architects to business analysts. Moreover, the infographic lists the primary skills required for each role, major companies hiring those specific practitioners, and compares the national average salaries of the different roles.
Stanford’s School of Engineering held the first Women in Data Science conference on November 2nd. During the all-female gathering, women practitioners shared their research, discussed the importance of diversity in understanding the questions and answered posed by data research, and discussed the challenges facing prospective women data scientists.
Wired speaks to James Crawford, founder and CEO of Orbital Insight, which uses advanced image processing and data science techniques to track sales trends on a global scale. The company utilizes a plethora of data sources, including satellite and drone imagery, to gain insight on consumer activity as well as to track mining, manufacturing, and shipping activities.
Speaking at the Salesforce World Tour on November 18th, star statistician Nate Silver offered some preliminary predictions for who will be the front-runners in the 2016 Presidential Election. Despite the insurgence of Bernie Sanders, Silver stated that Hillary Clinton remains the firm front runner in the Democratic race. On the Republican side, Silver hedged his bets, noting that there are few firm endorsements of candidates at this point, and that some factors are unpredictable: “You’ve never had a Trump or a Carson be a major candidate before,” he stated, referring to the current front runners in polls.
The National Science Foundation announced the establishment of awards totaling $5 million and “Big Data Regional Innovation Hubs” which bring together academic researchers and leading corporations to drive innovation and share insights. The Hubs will prioritize a number of major topics researchers and scientists are focused on, including healthcare, management of natural resources, agriculture, smart cities, precision medicine, energy and manufacturing, and finance.
This Month In Pivotal Data Science
Flo for Spring XD is an incredibly powerful tool with a graphical canvas and DSL access. This first production-ready release adds batch workflows while addressing the most prominent challenges presented by Spring XD users since the beta process. Ultimately, Flo makes integration easier, improves the speed and quality of development, and addresses organizational needs. This post provides a background on Flo, explains the challenges it addresses, reviews the Flo solution and features, then talks about the journey ahead.
In this episode of the Pivotal podcast, host Coté once again chats with Andrew Clay Shafer about the sundry challenges of transforming to a Cloud Native enterprise. They cover the changing focus we’re seeing among Pivotal customers: moving up the stack from infrastructure to the application layers. Then they discuss the difficulties of handling the data layer, and wrap-up with some change management tactics for getting the “rank and file” inspired and bought into the Cloud Native lifestyle.
Pivotal Greenplum became the first open source massively parallel data warehouse in late October. Now known as Greenplum Database in its open source form, anyone can clone the github repo and build the product, but there is another segment of the community that just wants to try out the functionality of the product without going through that process. For that group, we now have the Pivotal Greenplum Sandbox Virtual Machine which combines the open source Greenplum Database, the commercially available Pivotal Greenplum Command Center management tool, Apache MADlib (incubating), PostGIS, PL/R, PL/Perl, and PL/Java into an easy-to-use virtual machine that runs in either VirtualBox or VMware Fusion.
In the healthcare industry, big data management is becoming more and more of a high priority. This webinar is presented by executives from Pivotal, Attunity, and WellCare, a joint customer. In it, there are a number of industry data points shared—covering reporting at scale, using real-time data, where Apache Hadoop™ and massively parallel SQL on Hadoop systems fit, and more. WellCare also shares the story of their journey to improve mission-critical query times from 30 to seven days.
Pivotal is expanding our partner support. With data and analytics becoming a key differentiator for successful businesses, enterprises and start-ups alike are increasingly building scale-out big data platforms. Pivotal is expanding our presence in Denver to increase the amount of hardware platforms we are certified with, helping to reduce risk and increase the time to value for our customers. Read more about this new capability and how to participate.
About the AuthorMore Content by Paul M. Davis