From Sea to Trees, Pivotal Data Science Looks at Climate Change in Acadia National Park: Day 2 Field Report

November 19, 2014 Srivatsan Ramanujam

Yesterday was our orientation day of the expedition to Acadia National Park. Today was all about data collection and citizen science. In the field report for today, I’ll describe our data collection activities, the challenges involved in the same, and the ways in which we see a climate data lake addressing some of those challenges. Additionally, I got to interact more closely with one of our scientists who is working with Acadia National Park on his research. There is significant potential for a data lake and big data tools to help them focus more on research and less on data wrangling. In one of our orientation sessions yesterday, our principal investigator for Acadia National Park, Dr. Abe Miller-Rushing, pointed out that 50% of their time is spent cleaning up the data—a lot of valuable time not doing core science such as studying phenology. While data wrangling is part-and-parcel of our jobs as data scientists, phenologists and ecologists should certainly not waste time wrangling data. Phenologists like Dr. Abe obtain data from multiple sources, a lot of which is manually recorded observations. Today, as citizen scientists, we participated in two such manual data collection exercises.

Collecting Data on Bird Migration: Manual vs. Machine-based Image Processing

The first of our exercises was counting bird migration. Right after breakfast at 8 am, we hiked a mile along the Acadia coast to Schoodic Point from the Schoodic Research Institute. Schoodic Point is on the southern tip of Winter Harbor. This geographic location is an ideal spot for observing bird migrations along the Atlantic. Our task was to work in teams of 2—one observer and one note taker—to count the number of sightings of three different migratory bird species: the Common Loon, the Northern Gannet, and the Common Eider. We were to only count birds migrating from North to South and were instructed to disregard birds which were simply milling around looking for food. Each one of us had a pair of binoculars and a telescope to spot birds up to a couple of miles away from the coast. We soon realized the challenges in this approach. My ability to differentiate one bird from the other, especially between species which looked very similar, was random at best. Furthermore, the freezing temperatures with 20 mph winds made it quite uncomfortable to stay focussed with our eyes on the horizon looking out for birds. When we compared initial results as recorded by our five different teams, we found considerable variability in the number of bird sightings. We seemed to do better when we worked as one single unit. Certainly, more pairs of eyes helped. Acadia and Schoodic Institute scientists and ecologists like our field team leader this morning, Seth Benz, routinely record such observations and make them available for research by uploading the data to websites such as ebird.org. While the observations from experts like Seth are likely to be a lot more accurate compared to observations by citizen scientists, we see a lot of ways in which technology, particularly a climate data lake, could greatly help scientists and ecologists like Seth and his team to better record these bird migration sightings. Data-Science-Acadia-National-Park-Day-2-observing-birds-LG-P

Using Technology to Automate Data Capture and Improve Quality

For example, if we had a network of stationary cameras taking high resolution images every few seconds along the horizon each day, then these images could be ingested regularly into a data lake for analysis. Through image processing and object recognition techniques, we could separate out blocks representing birds from these images and run a content based information retrieval engine (CBIR) to match the detected objects against a database of images of migratory birds observed in the region. The images could then be presented in a smartphone app to researchers like Seth, with timestamped images of detected birds, allowing them to override any misclassifications by the CBIR engine. Such a system, powered by a data lake infrastructure, could greatly reduce observer error and reduce the time spent by scientists on problems which can be solved by machines through automation. Furthermore, if the data lake is the central repository of such bird sightings, it will immediately be available to other researchers using this data. Note: At Pivotal Data Labs, we’ve built prototypes of this kind of system as a proof-of-concept of our big data technology and data science applications. You can find more information on this in earlier blog posts by Pivotal data scientists Gautam and Ailey such as Content Based Information Retrieval on Apache Hadoop® and Massively Parallel In-Database Image Processing.

Collecting Data on Tide Pools

Our second exercise on data collection was on measuring the effect of climate change on intertidal ecology. This was led by Hannah Webber, field team leader and education projects manager for Schoodic Institute. Right after lunch we hiked about a mile to Diagon Alley, a tide pool ecosystem of barnacles, blue mussels, and a variety of seaweeds. Again, our task was to work in pairs and count the presence or absence of several species. We used the Point-Intercept method with a quadrat to record our observations at four different, marked spots on three different tide pool regions. As with the previous data collection exercise, I could see how technology could assist and simplify this data collection task. A smartphone app could be developed to take a picture of the quadrat. Then, an image processing program could automatically fill out a matrix of hits and misses for the different species of organisms. This time-stamped data, along with the GPS coordinates of the tide pool, could be uploaded to the data lake and made available for researchers worldwide.

Data-Science-Acadia-National-Park-Day-2-observing-tide-pools-LG-P

Wrapping Up Day 2

As our second day of the expedition comes to an end, I can see more possibilities of how big data and a climate-focused data lake could have an impact on climate change research. Research can be greatly aided and automated by technology, allowing phenologists and ecologists to spend more time doing what they are best at—finding scientific meaning from data instead of tedious collection tasks. Instead, the power of data science on a data lake could lend a helping hand to eliminate the heavy lifting and allow researchers to get results and insights more quickly. Tomorrow, we’ll be heading out to Mount Desert Island for our field trip and will return to the Schoodic Research Institute in the afternoon to continue our brainstorming on the climate data lake. I will also blog my thoughts on how data lakes and big data tools, such as those used by Pivotal Data Labs, could assist Earthwatch scientist Dr. Richard Feldman in analyzing spatio temporal changes in duck abundances.

Learn More:

Check out the video
Read more about the program

Editor’s Note: Apache, Apache Hadoop, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

About the Author

Biography

All Things Pivotal Episode#6 – A Look At Pivotal CF Mobile Services

As businesses work to compete in new domains and on new fronts—the mobile world has proven to be a key batt...

All Things Pivotal Episode #7 – A Look at 12-Factor Apps

As we move into an increasingly cloud-based application landscape, key design patterns need to be applied t...

From Sea to Trees, Pivotal Data Science Looks at Climate Change in Acadia National Park: Day 2 Field Report

Collecting Data on Bird Migration: Manual vs. Machine-based Image Processing

Using Technology to Automate Data Capture and Improve Quality

Collecting Data on Tide Pools

Wrapping Up Day 2

About the Author

Previous

Next

From Sea to Trees, Pivotal Data Science Looks at Climate Change in Acadia National Park: Day 2 Field Report

Collecting Data on Bird Migration: Manual vs. Machine-based Image Processing

Using Technology to Automate Data Capture and Improve Quality

Collecting Data on Tide Pools

Wrapping Up Day 2

About the Author

Previous

Next

Related content in this Stream

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.

Experience enhanced security with Tanzu Application Platform. Elevate your organization's defenses from code to build with SLSA Level 3, image scanning scheduling & automatic upgrades for new patches.

Explore Spring's exceptional NPS score of 75, surpassing industry benchmarks by 18%. Discover why it matters.