Introduction
In this technical guide, our goal is to identify and estimate the causal effect of specific treatments (e.g. customer engagement levels or credit limit adjustments) on the likelihood of churn by leveraging a dataset that includes various features of bank clients.
Libraries
EconML: For estimating heterogeneous treatment effects.
DoWhy: For causal inference and estimating treatment effects.
Matplotlib and Seaborn: For data visualization.
Workflow
Dataset Overview & Preprocessing
Identify Causal Variables
Causal Discovery
Estimate Average Treatment Effects with DoWhy
Model Evaluation
Policy Choice & Conclusion
Dataset Overview
The dataset contains various features of bank clients, including demographic information, account characteristics, and activity metrics. Here's a brief overview of some key columns:
CLIENTNUM: Unique identifier for the client.
Attrition_Flag: Indicates whether the customer is an existing customer or has churned.
Customer_Age, Gender, Dependent_count: Demographic information.
Education_Level, Marital_Status, Income_Category: More demographics.
Card_Category: Type of card (e.g., Blue, Gold, Platinum).
Months_on_book: Duration of the account relationship with the bank.
Total_Relationship_Count, Months_Inactive_12_mon, Contacts_Count_12_mon: Account activity metrics.
Credit_Limit, Total_Revolving_Bal, Avg_Open_To_Buy: Credit account details.
Total_Amt_Chng_Q4_Q1, Total_Trans_Amt, Total_Trans_Ct, Total_Ct_Chng_Q4_Q1, Avg_Utilization_Ratio: Financial behaviors and utilization.
Naive_Bayes_Classifier_
...
: Columns related to a Naive Bayes classifier output.
Data preprocessing Steps
Encoding Categorical Variables: Columns like (Attrition_Flag, Gender, Education_Level, Marital_Status, Income_Category, and Card_Category) are categorical and may need to be encoded into numerical values for some types of analysis, especially if they're going to be used as confounders or treatment variables in causal analysis.
Normalizing/Standardizing Numerical Variables: In some instances, numerical variables might need scaling, particularly if they're used in machine learning models that are sensitive to the scale of input features.
Feature Selection: Identifying and selecting relevant features for the analysis, focusing on those that are likely to have a causal relationship with the outcome (bank churn). This includes deciding which variables to use as treatment, outcome, confounders, and effect modifiers.
Removing Unnecessary Columns: The last two columns appear to be related to a Naive Bayes Classifier and might not be necessary for our causal analysis. Unless they have a specific use case, they could be removed to simplify the dataset.
Converting Attrition_Flag to a Binary Variable: For causal analysis, it might be useful to have the outcome variable (Attrition_Flag) as a binary indicator (1 for churned customers, 0 for existing customers)
Identify Causal Variables
To predict bank churn using CausalML, we need to identify the causal variables:
Outcome (Y): Attrition_Flag (indicating churn or retention). This is our primary outcome variable, indicating whether a customer has churned (1) or is still an existing customer (0).
Treatment (T): A variable or intervention we hypothesize might influence churn, such as Total_Relationship_Count (as a proxy for engagement level), Contacts_Count_12_mon (frequency of contact with the bank), or even Credit_Limit adjustments.
Confounders (W): Demographic and account characteristics like Customer_Age, Education_Level, Income_Category, Months_on_book, and Avg_Utilization_Ratio that might affect both the treatment and the outcome.
Effect Modifier (X): Effect modifiers are variables that might change the effect of the treatment on the outcome.
Causal Discovery
To further validate and refine our selection of causal variables, we could employ causal discovery tools or techniques like;
Formulating a specific causal question or hypothesis based on the treatment and outcome.
Constructing a causal model, potentially using DoWhy to specify and visualize the causal graph.
Estimating the causal effect of the treatment on the outcome, controlling for confounders
Formulating a causal question or hypothesis
We start by formulating our causal question and representing it in a causal graph. For simplicity, let's focus on the hypothesis that "Total Relationship Count" (as a measure of customer engagement) has a causal impact on customer churn ("Attrition_Flag").
Construct a causal model and visualize the causal graph with DoWhy Package
This causal graph shows the assumed relationships between various customer attributes and the outcome of interest, which in this case is customer churn (denoted by Attrition_Flag). Let’s interprete the graph:
Nodes: Each circle represents a variable in the analysis. The nodes are Customer_Age, Education_Level, Marital_Status, Income_Category, Months_on_book, Total_Relationship_Count, and Attrition_Flag.
Edges: The arrows (or edges) indicate the assumed direction of causality between variables. For example, an arrow from Total_Relationship_Count to Attrition_Flag suggests that the number of products or services a customer has with the bank (Total Relationship Count) is presumed to have a causal influence on whether they churn or not (Attrition_Flag).
Central Nodes: The Total_Relationship_Count and Attrition_Flag are more centrally located and connected, indicating they are the primary focus of the causal analysis. Total_Relationship_Count is the treatment or intervention variable, and Attrition_Flag is the outcome variable.
Confounders:
The variables Customer_Age, Education_Level, Marital_Status, Income_Category, and Months_on_book have arrows pointing both to Total_Relationship_Count and Attrition_Flag. This positioning indicates that these are confounders. They are presumed to influence both the treatment (how many bank products a customer uses) and the outcome (whether the customer churns).
The graph assumes that there are no unobserved confounders; that is, all relevant factors that might influence both the treatment and the outcome are captured in the model. This assumption is crucial for making valid causal inferences.
Thinking about it in business strategy terms, the graph suggests that to understand and potentially reduce customer churn, one should consider not just the direct effect of product engagement but also how customer demographics and relationship tenure with the bank might influence churn. These insights could lead to more targeted interventions that account for the broader context of each customer's relationship with the bank.
Estimating Average Treatment Effect
The Average Treatment Effect (ATE) in this scenario is a mathematical way to measure how effective the bank's customer retention program is. It calculates the average impact of the program by comparing the retention rates of customers who participated in the program against those who didn't. This gives banks a precise number that shows the program's effectiveness in keeping customers from leaving.
ATE=E[Y(1)−Y(0)]
Where:
E - denotes the expected value across the population.
Y(1) - represents the outcome (e.g., staying with the bank) for a customer if they receive the treatment (participation in the retention program).
(Y(0) - represents the outcome for a customer if they do not receive the treatment.
Lets break this result down
1. Estimand Type: EstimandType.NONPARAMETRIC_ATE
What It Means: The type of estimand here is a nonparametric Average Treatment Effect (ATE), which indicates that the causal effect being estimated is the average effect of the treatment (in this case, Total Relationship Count with the bank) on the outcome (Attrition_Flag, i.e., customer churn) across the entire population, without assuming a specific parametric form of the relationship.
2. Estimand Expression
Expression: The mathematical expression represents the derivative of the expected value of the outcome (Attrition_Flag) with respect to the treatment (Total_Relationship_Count), conditioned on various confounders (Customer_Age, Marital_Status, Education_Level, Income_Category, Months_on_book).
What It Means: This expression seeks to isolate the effect of changing the total relationship count on the likelihood of a customer churning, while holding constant other factors that could influence both the treatment and the outcome. It's a formal way of asking, "If we increase a customer's product holdings, how does their risk of leaving the bank change, assuming everything else about them remains the same?"
3. Estimand Assumption: Unconfoundedness
Assumption Detail: This assumption is crucial for causal inference, stating that if there are any unobserved factors (U) that could influence both the treatment and the outcome, the probability of the outcome given the treatment and observed confounders (and these unobserved factors) is the same as the probability of the outcome given just the treatment and observed confounders.
What It Means: Essentially, this means that the model assumes all relevant factors that could affect both a customer's number of products and their decision to churn are accounted for in the analysis. This is a strong assumption, but it's necessary for attributing any observed effect to the treatment itself rather than to external confounding factors.
4. Realized Estimand
Model: The realized estimand describes the practical approach taken to estimate the causal effect, using a linear model that includes the treatment and confounders.
5. Estimate: Mean Value and P-Value
Mean Value: The mean value of -0.0353 indicates the estimated average change in the probability of churn for each additional product a customer holds with the bank. A negative value here means that more products are associated with lower churn risk.
P-Value: The extremely small p-value ([8.35 e-52]) provides the statistical significance of the estimate. In simpler terms, it means there is a negligible chance that the observed relationship between the number of products and churn could be due to random variation alone. This strongly suggests that the relationship is real and statistically significant.
Business Interpretation
ATE Interpretation: The estimated ATE of -0.035 means that, on average, adding one more product to a customer's portfolio is associated with a 3.46% decrease in the probability of churn. This effect is significant and suggests that, overall, increasing product engagement has a protective effect against churn.
Policy Strategies
Based on the negative ATE, a policy of cross-selling or bundling products tailored to customer needs and preferences could be effective. Specific recommendations might include:
Segment-Specific Engagement: Use Conditional Average Treatment Effect (CATE) insights to identify which customer segments benefit most from additional products and target these groups with personalized offers.
Customer Education: For segments where product engagement does not naturally reduce churn, focus on education and support to help customers derive more value from their products.
Feedback Loops: Implement mechanisms to gather feedback on customer satisfaction with their product portfolio, allowing for continuous refinement of cross-selling strategies.
In the next part, we would explore
Understanding doubly robust learning, fit a linear double robust learning using EconML and interpreting the results
How to use Conditional Average Treatement (CATE) insights to develop personalized offers to increase segment specfic engagment