Data Science Labs: Predictive Models to Improve Vaccine Quality and Production

June 20, 2013 Sarah Aerni

Photo by Horia Varian via Wikimedia Commons.

Written by Sarah Aerni, Hulya Farinas, and Noah Zimmerman of Pivotal’s Data Science Labs.

The age of “blockbuster drugs” is coming to an end, as personalized medicine becomes a reality. There is an industry-wide need to keep the cost of manufacturing down in order to remain profitable, while reengineering processes to enable the delivery of drugs to patients on different continents. Data science will be a major driver of innovation in these and other areas of the pharmaceutical industry. This was demonstrated during a project the Data Science Labs team executed with a major pharmaceuticals company. In this engagement, we worked with the customer to learn how to predict the potency of vaccines and gain insights into the manufacturing process in order to fine-tune vaccine production.

The Data Science Labs team often engages with companies that have skilled in-house practitioners leveraging business value from vast amounts of data. In these situations, our role is to help these businesses go beyond reacting to this information to anticipate new value opportunities.

Our team worked with roughly 13 million rows of data from many of the company’s source systems that collect data during the manufacturing pipeline. These data are obtained from both manual and automatic data collection processes. Our goal was to leverage the full dataset to create a predictive model that could help the company reduce hours and resources wasted on manufacturing products that did not meet their stringent standards for FDA-approved vaccines. In addition, the model helped the company better understand how various steps of the manufacturing process impacted vaccine quality, with the potential to further optimize their pipeline and reduce engineer workload.

As is the case with many Data Science Labs engagements, the data required significant staging and cleansing before model development began. In this case, the manually collected data suffered from various data entry errors and the fields were frequently incomplete. We demonstrated to the customer that statistical methods could be used to deal with these challenges. We developed automated approaches to identify data entry errors using methods adapted from the field of image processing. Using an automated iterative method, the data was cleansed to be subsequently used in modeling.

While the customer had attempted such analyses in the past, it had taken over six months and significant engineering resources to complete a similar task. The company’s engineers were already quite advanced at developing models in R, they worked primarily outside of the database, making it difficult to leverage all the available data sources. As a result, they used only a subset of the available sources at a highly aggregated level.

Our team used methodologies including sparse partial least squares, random forests, and principal component regression to build a predictive model including over 100 features engineered from the source data. We used cross-validation to evaluate the model fit, and analyzed the features in the models, to interpret which steps in the process may be most predictive of product quality.

We often work closely with the data owners and domain experts in order to produce meaningful and actionable results and models. In this lab, we focused on the level of interpretability and chose the final model that would enable identification of the set of tunable steps in the manufacturing process. The predictive models we developed will enable the company to perform experiments on its manufacturing pipeline to improve vaccine quality and consistency. It also identified potential efficiency improvements in the manufacturing engineer’s workload by reducing the number of uninformative measurements collected during the pipeline.

As a result, we were able to help the company do things they were unable to do before. By using a data-driven approach, we determined a number of key factors necessary to manufacture better products. In addition, the company was already undergoing some processing reengineering to generate new products. Our models helped them identify which key decisions during the manufacturing process played the strongest role in creating truly different products. Finally, we provided them with opportunities to create predictive models to avoid loss of products that did not meet their quality standards, reduce workload on their engineers, and demonstrated how statistical tools could identify data entry errors and allow the company to take corrective steps early.

We see data science reaching deeply into many sectors, and its impact on pharmaceuticals will play a role in shaping the future of the healthcare industry. Drugs are already being produced that target specific sub-populations of patients. Pharmaceutical companies have access to immense amounts of data that can be leveraged through analysis for repurposing of old drugs, identifying potential companion diagnostics, targeted population treatment, and remote patient monitoring and disease management. We are at the cusp of having truly personalized medicine, and data science will play a key role in this shift in healthcare.

About the Author

Biography

Unwriting your skeumorphic language

Less than 10-years-ago I was actually publishing to print, creating carbon copies in my checkbooks and putt...

VMware Tanzu Contributes Open Source Plugins for New Relic’s Pluggable Monitoring and Management Platform: RabbitMQ and Web Server

Smart development teams at SoundCloud, Zendesk, and Atlassian use New Relic for application performance mon...

Data Science Labs: Predictive Models to Improve Vaccine Quality and Production

About the Author

Previous

Next

Data Science Labs: Predictive Models to Improve Vaccine Quality and Production

About the Author

Previous

Next

Related content in this Stream

Bitnami-packaged open source software is loved by developers for its ease of use, which enables developers to directly pull a Bitnami package and seamlessly start using it with little effort.

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.