Churn Prediction in Retail Finance and Asset Management (Part 1)

September 22, 2014 Mariann Micsinai

featured-money-churn Joint work performed by Niels Kasch and Mariann Micsinai of Pivotal’s Data Science Labs.

Financial firms collect large volumes of data from all realms of our daily lives. These data assets are used to build predictive models for many purposes, such as understanding and predicting customer behavior. Insights from these models can be applied to areas such as customer acquisition and retention.

In this blog series, we will explain the important factors that enable banks in the retail finance and asset management industries to build, operationalize, and derive actionable insight from such models. First we need to consider how the data scientist navigates a financial institution’s multiple definitions of “customer” and “churn” to construct the correct population for analysis. In the next blog post, we will continue examining churn prediction by looking at predictive and explanatory modeling tools and resulting customer applications.

What constitutes churn? Defining the dependent variable

When building a predictive model, data scientists require a precise definition of the dependent variable–i.e., what should be predicted or explained. Tight collaboration with business experts—portfolio managers, IT warehouse data owners, and subject matter experts—is required to derive this definition. For retail finance customer retention models, business clarifications are often needed to determine whether the following cases constitute churn or not. Here are some potential scenarios:

A customer closes her account, then opens another account with better conditions at the same institution. The net outward flow of assets is zero.
A customer transfers 90% of her assets to another institution, but does not close her account. The net outward flow of assets is 90%.
A company decides to change its 401(k) plan administrator. In this case, all 401(k) employee accounts are transferred to the competitor. The net outflow of assets is 100%.

It is important that when drawing conclusions from the model, business stakeholders are aware of the assumptions that affect the dependent variable.

Which customers are part of the analysis? Defining the population

To arrive at the population of interest for a model, we employ the Data Waterfall approach. This approach narrows down the population from all conceptual customers to those customers that are of interest to the business, and for which it is feasible to develop a model. Screen Shot 2014-09-22 at 10.57.48 AM Our approach asks questions such as:

What is the total universe of customers?
Who is in the population of interest?
How are they defined in the data?
Who will the model be applied to?

The purpose of these questions is to get an understanding if the available data assets represent the entire customer base or a subset thereof. In practice, it often occurs that financial regulatory requirements prevent two internal business units from sharing customer data. This issue can lead to an incomplete picture of the customer base: understanding who is missing from the population and why will inform the applicability of any developed model.

Another factor relating to time series data is the temporal overlap of different data assets. Transactional data for all customers may be available for the past 10 years, but web browsing behavior only for the last two years. In this case, it may be more effective to narrow down the population to all active customers within the past two years. Under such a justification, web-browsing data can reliably be incorporated in a model without having to consider two pseudo models, one which includes web data and one which does not.

The definition of what constitutes a customer–i.e. defining the level on the analysis–is far from trivial. In the case of churn modeling for bank accounts, the most obvious level of analysis is a bank account. Complicating the definition is that an individual, multiple individuals, corporations, as well as other entities can own a bank account. Defining the population means making a decision on the granularity level of the analysis. Should the population be defined on an account level or an individual level? This decision often has an impact on defining the dependent variable. Is removing one of many users from a bank account considered churn?

These types of questions are often only answered through close cooperation with subject matter experts and business users. For this very reason, our data science engagements involve discovery meetings, followed by multiple feedback sessions, with all stakeholders. The feedback from these sessions helps to define business rules, adjust or correct definitions, and often leads to the discovery of additional data sources that can augment the model. With the population properly defined, one can bring to bear multiple data assets and analytic tools to construct models for application and operationalization.

Check out the next blog post in the finance series, where we will look at the approaches and algorithms Pivotal data scientists have used to equip our customers with operational models and actionable insights.

About the Author

Mariann Micsinai is a member of the Data Science team at Pivotal’s New York City location. She holds a Ph.D. in Computational Biology from NYU/Yale and pursued Master’s degrees in Computational Biology, Mathematics, Economics, International Studies and Linguistics. In the bioinformatics field, Mariann focused on developing novel computational methods in human cancer genetics and on analyzing and integrating next-generation sequencing experimental data (ChIP-Seq, RNA-Seq, Exome-Seq, 4C-Seq etc.). Prior to her experience in computational biology, she worked for Lehman Brothers’ Emerging Market Trading desk in a market risk management role. In parallel, she taught Econometrics and Mathematics for Economists at Barnard College, Columbia University. At Pivotal, Mariann is involved in solving big data problems in finance and health care analytics.

How to Find Pivotal at JavaOne!

This post summarizes where to find Pivotal at JavaOne. Come see us at booth number 5201 for demos on the in...

Using Maven to push to Cloud Foundry with Java 8

In this post, Pivotal Developer Advocate, Johannes Tuchscherer, explains how to use Maven with a Cloud Foun...

Churn Prediction in Retail Finance and Asset Management (Part 1)

What constitutes churn? Defining the dependent variable

Which customers are part of the analysis? Defining the population

About the Author

Previous

Next

Churn Prediction in Retail Finance and Asset Management (Part 1)

What constitutes churn? Defining the dependent variable

Which customers are part of the analysis? Defining the population

About the Author

Previous

Next

Related content in this Stream

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.

Experience enhanced security with Tanzu Application Platform. Elevate your organization's defenses from code to build with SLSA Level 3, image scanning scheduling & automatic upgrades for new patches.

Explore Spring's exceptional NPS score of 75, surpassing industry benchmarks by 18%. Discover why it matters.