This Month In Data Science

March 31, 2014 Paul M. Davis

With GigaOm’s Structure Data Conference, the Pivotal HD 2.0 announcement, and big cloud platform announcements from Google and Amazon (below), March was an eventful month for data science and the platforms on which it is exercised. Here’s our top picks for the data science news items of the month, both from Pivotal and the entire field.

Top Data Science News in March 2014

Gearing Up for Cloudapalooza: Google and Microsoft Face-off Against Amazon

The competition among cloud platforms intensified this month, with Google going head-to-head with Amazon Web Services (AWS) by announcing significant price cuts to Google Cloud Platform services. Amazon responded with price cuts and announcements of its own, while Microsoft waited in the wings with updates to its Windows Azure cloud services.

Better NCAA Brackets Through Data Science

While there are many approaches to choosing your NCAA basketball brackets, ranging from the sentimental to the superstitious, Kaggle’s “March Machine Learning Mania” competition applies some scientific rigor to the process. Over 250 teams have applied so far, aiming to develop an algorithmic model that predicts the results of the past five tournaments, and then testing that model in real time to predict the results of the 2014 tournament.

Open Data Could Add $3 Trillion A Year In Total Value Worldwide

A recent McKinsey Global Institute report estimates that open data could add over $3 trillion in total value annually to the education, transportation, consumer products, electricity, oil and gas, health care and consumer finance sectors worldwide.

White House Launches Website to Visualize Climate Change

The White House rolled out an ambitious new web app that aims to communicate climate change through visualizations of the available data, and projections of how climate change will affect users’ own lives. The project is part of the White House’s Climate Data Initiative, which brings together open government data, and private and philanthropic organizations, to analyze and communicate the latest climate change research.

How Statisticians Could Help Find That Missing Plane

The mystery of what happened to Malaysia Airlines Flight 370 continues to confound. In a post at Nate Silver’s recently relaunched FiveThirtyEight, Carl Bialik explains how statisticians can add insight to the search for answers by utilizing Bayesian techniques to calculate the probability of causes for the missing flight.

UC Berkeley Dean: Data Science Classes Aren’t Just for Engineers

While data science is a highly specialized field, requiring knowledge in statistical analysis, engineering, and machine learning techniques, data literacy is becoming increasingly important for everyone. During a talk at Gigaom Structure Data, UC Berkeley Dean AnnaLee Saxenian emphasized the importance of increased data literacy among professionals in a wide range of disciplines and fields, and declared that as a result, data science classes are increasingly important to a well-rounded curriculum.

Google Flu Trends: The Limits of Big Data

Google Flu Trends, one of the most visible and popular applications of data science in recent years, became a case study this month in the intensifying debate over the potential limitations of Big Data. An article published in Science magazine detailed some of Google Flu Trends’ most notorious failures—such as an overestimation of flu cases in the United States in 2012-13—and extrapolated that Google is guilty of “big data hubris.” In response, a number of data scientists have responded that Google Flu Trends is far from a representative case, including its co-inventor Matt Mohebbi, who explained to the New York Times that the tool was designed and intended “as a ‘complimentary signal’ rather than a stand-alone forecasting tool.”

This Month in Pivotal Data Science

Pivotal HD 2.0 to Help Enterprises To Get More Out of Hadoop With a Business Data Lake

Pivotal HD 2.0 will help companies to get more than ever out of their Hadoop investments by building in complimentary in-memory data processing with GemFire XD, and providing additional analytical fire-power with the improvement of tools and added libraries of pre-populated analytics. This is a distribution of Hadoop that really accelerates the time-t0-insight for enterprises of all sizes.

Paul Maritz at Structure: Hadoop is Just One Ingredient of a ‘Profound Shift’ in Software

Pivotal’s CEO Paul Maritz sat down with GigaOM’s Om Malik at the Structure Data Conference in New York, starting off the session picking up on a new trend that he is seeing in the market today. In the interview, he calls Big Data technology Hadoop out as a catalyst to the market, citing that the bigger trend is in software is building on some of the tenets of Hadoop to take lots of cheap machines and cheap storage, and reinvent how businesses are building applications.

My First Three Months at Pivotal, and the Road Ahead to ApacheCON 2014

Pivotal’s Apache Hadoop leader, Roman Shaposhnik, shares what he has been up to for the first three months at Pivotal. In this post, he writes about how he is aligning Pivotal’s distribution, Pivotal HD, with the Apache Hadoop ecosystem projects and Apache Bigtop.

Time Series Analysis #2: Recognizing Patterns within a Time Series

The SQL Window Function construct can be used as a basis for many sorts of ordered calculations within SQL. This post elaborates on how this query capability can be used for a specific type of problem that frequently shows up in time series analysis, which is the recognition of simple patterns of movement within a series.

Upcoming Data Science Events

Data Science for the 99%

Tue, April 1; Webinar

In this webinar, Pivotal Data Labs members Woo J. Jung, Sarah Aerni, and Srivatsan Ramanujam will discuss some of the open source tools in their arsenal. They will introduce and provide details on the variety of tools – such as MADlib, PL/R, PL/Python, PivotalR, PyMADlib and a host of others – they have utilized and extended for customer engagements.

ApacheCon North America

April 7–9; Westin Denver Downtown, Denver, CO

ApacheCon brings together the open source community to learn about and collaborate on the technologies and projects driving the future of open source, big data and cloud computing.

SF: Accessing External Hadoop Data Sources Using Pivotal Xtension Framework (PXF)

Tuesday, April 8, 5:30 to 8:30 pm; Pivotal Labs, San Francisco, CA

Pivotal’s Sameer Tiwari provides insight into Pivotal Xtension Framework, an external table interface that gives SQL access on top of data stored within the Hadoop ecosystem. It enables loading and querying of data stored in HDFS, HBase and Hive. It supports a wide range of data formats such as Text, AVRO, Hive, Sequence, RCFile formats and HBase.

Cloud Foundry Summit

June 9–11; Hilton Union Square, San Francisco, CA

The premier event for developers and cloud operators using the industry’s leading Open Source Platform-as-a-Service: Cloud Foundry. Join core contributors to the project and real world users for three days to discuss deep technical topics, engineering roadmap, community ecosystem and operational best practices.

GigaOm Structure 2014

June 18–19; Mission Bay Conference Center, San Francisco, CA

Meet the innovators and thinkers who are building infrastructure to run the applications of the next decade.

About the Author

Biography

Testing JavaScript Promises

tldr: Testing promises is surprisingly hard. I wrote a mock-promises to address it. A recent project of mi...

Wear that Android

A few weeks ago, Google announced Android Wear – a development platform for extending Android to wearable t...

This Month In Data Science

Top Data Science News in March 2014

Gearing Up for Cloudapalooza: Google and Microsoft Face-off Against Amazon

Better NCAA Brackets Through Data Science

Open Data Could Add $3 Trillion A Year In Total Value Worldwide

White House Launches Website to Visualize Climate Change

How Statisticians Could Help Find That Missing Plane

UC Berkeley Dean: Data Science Classes Aren’t Just for Engineers

Google Flu Trends: The Limits of Big Data

This Month in Pivotal Data Science

Pivotal HD 2.0 to Help Enterprises To Get More Out of Hadoop With a Business Data Lake

Paul Maritz at Structure: Hadoop is Just One Ingredient of a ‘Profound Shift’ in Software

My First Three Months at Pivotal, and the Road Ahead to ApacheCON 2014

Time Series Analysis #2: Recognizing Patterns within a Time Series

Upcoming Data Science Events

Data Science for the 99%

ApacheCon North America

SF: Accessing External Hadoop Data Sources Using Pivotal Xtension Framework (PXF)

Cloud Foundry Summit

GigaOm Structure 2014

About the Author

Previous

Next

This Month In Data Science

Top Data Science News in March 2014

This Month in Pivotal Data Science

Upcoming Data Science Events

About the Author

Previous

Next

Related content in this Stream

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.

Experience enhanced security with Tanzu Application Platform. Elevate your organization's defenses from code to build with SLSA Level 3, image scanning scheduling & automatic upgrades for new patches.