How Can Data Science Serve the Public Good?

July 9, 2012 Paul M. Davis

Photo by Paul M. Davis from the fourth Big Data for the Public Good seminar at Code for America.

Over the course of four seminars, the Big Data for the Public Good series presented a rare opportunity for leading data science thinkers, innovators, and practitioners to explore how the field can serve the public interest. Presented by Code for America and sponsored by Greenplum, the series hosted Michal Migurski and Eric Rodenbeck of Stamen Design, Jake Porway of DataKind and formerly The New York Times, wiki inventor and Nike Code for a Better World Fellow Ward Cunningham, and Jeremy Howard, President and Chief Scientist at Kaggle. Though the diverse selection of speakers explored the topic from a variety of perspectives, a set of recurring themes arose during the talks.

Data Science is Storytelling

As big data proliferates, new approaches to communicating the insights revealed are required. Interactive maps, data visualization, and infographics are tools to clarify complexity, placing data scientists into the role of the storyteller. Referencing Hans Rosling, Stamen’s Rodenbeck emphasized that “narrative is critical,” in order to provide context and effectively communicate what a particular dataset demonstrates to governments, social organizations, and citizens. Stamen learned this lesson when a city growth visualization tool the studio built for Trulia was the subject of backlash by residents, who believed the application visualized their community as if it was a missile target in a video game.

Cunningham emphasized the “storytelling aspect” of data science while discussing the Smallest Federated Wiki platform he developed at Nike, which allows companies and the public to share and collaborate on the analysis of data sets. The platform’s federated approach, Cunningham explained, allows other users to assess the quality and accuracy of analyses. “There are a lot of different stories to tell about any particular piece of data,” he said, boasting of the advantages of a federated approach. Through analysis, narratives emerge. “As we find our way through the data,” he explained, “we can say, ‘here’s a visualization that can tell this story and there’s a visualization that can tell that story.’”

Data Empowers Citizens

Stamen’s Migurski emphasized the value of establishing a dialogue between citizens and government institutions based upon data. Data empowers citizens to advocate for the needs of their communities, and reveals what needs are not being addressed. Sharing what he learned while building the crime-mapping application Oakland Crimespotting, Migurski identified four best practices for working with government data. He stated that tools “must demonstrate the impact by linking to truths shared within the communities served,” be stable and reliable, refer to an official version that can be verified and supported, and remain contextually relevant.

Porway spoke of the wealth of public data that goes untapped, lamenting undirected government or organization data dumps that are “like giving crude oil to people.” “Open data is not useable data,” he warned, advocating for an ongoing dialogue between government agencies, social organizations, and data scientists. “By bridging these communities, you’re starting to make that data useable,” he said, increasing the likelihood it can serve citizens.

Bridging the Data Science Gap

There is an abundance of public data, but a lack of skilled practitioners to make sense of it. This presents an opportunity for data scientists to use their skills to serve the public interest. Porway noted that in many social organizations, “data and skills are often silo’d from one another.” This creates a risk that the wealth of information these organizations produce will become irretrievable data exhaust.

“On the one hand, we have a group of people who are really good at looking at data, really good at analyzing things, but don’t have a lot of social outputs for it,” Porway said. “On the other hand, we have social organizations that are surrounded by data and are trying to do really good things for the world but don’t have anybody to look at it.”

Porway sees a network of “transformative communities” emerging to address this issue, within which government officials, representatives of social organizations, data scientists, researchers, and journalists “are coming together for a common goal and sharing across those boundaries to do more.”

One way to connect data scientists with institutions and organizations that lack skilled practitioners is the competition model established by Kaggle. Howard explained that Kaggle harnesses practitioners’ competitive impulse and their desire “to hack at interesting problems and interesting code.”

He noted that in “cause organizations where they don’t have people working on this stuff, they often don’t see the forest for the trees,” unaware of the value of the data available. Howard cited the EMC Data Science Global Hackathon for Air Quality Prediction, a weekend-long competition that offered participants access to EPA Air Quality Index data for Chicago, as an example of how the Kaggle model can serve the public interest.

Revealing the transformative potential of data science in service of the social good, Howard noted that the competitive hackathon worked with “a data set which is local in scope,” which “you can use at a local level, yet you can also take the results and apply them really powerfully throughout the world.”

As grand as that may sound, we are only on the cusp of what can be done with big data. As demonstrated by the thinkers and practitioners who spoke at the Big Data for the Public Good seminar series, there exists a community of data scientists who are as passionate about serving the public interest as they are datasets. The potential of such collaborations is nothing less than transformative.

About the Author

Biography

iOS Acceptance and Ruby Keywords

Helps acceptance testing for iOS: Frank? We'd like to do happy path end-end testing for an iOS project. W...

Rewinding git pull

If you're using a rebase strategy for the first time you may run git pull in a situation where Git practica...

How Can Data Science Serve the Public Good?

Data Science is Storytelling

Data Empowers Citizens

Bridging the Data Science Gap

About the Author

Previous

Next

How Can Data Science Serve the Public Good?

Data Science is Storytelling

Data Empowers Citizens

Bridging the Data Science Gap

About the Author

Previous

Next

Related content in this Stream

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.

Experience enhanced security with Tanzu Application Platform. Elevate your organization's defenses from code to build with SLSA Level 3, image scanning scheduling & automatic upgrades for new patches.

Explore Spring's exceptional NPS score of 75, surpassing industry benchmarks by 18%. Discover why it matters.

From single apps to portfolios of apps in large enterprises and our experience has led us to identify four of the most common anti-patterns impacting organizations.