Customer retention analysis through predictive modeling
IDENTIFYING THE CHARACTERISTICS WHICH ARE MOST RESPONSIBLE FOR THE RETENTION OF CUSTOMERS
A comprehensive study was performed on the data provided by a company which organizes school trips. The primary objective of the study was to identify the characteristics which are most responsible for the retention of customers. The secondary objective was to identify the best predictive model to predict the customers’ response.
Different modeling techniques such as linear regression, logistic regression, ridge regression, LASSO regression, CART, random forest, neural network and boosting were applied on the school trip data. The relationship between the response variable and different parameters was established based on the findings of aforementioned models.
The findings of the study suggest that satisfaction and loyalty plays an important role in extending a relationship with a customer. It was found that logistic regression model performed the best in predicting the response of the customers. In addition, linear models outperformed nonlinear models for the data. Our study showed that complex models do not always provide the best predictions.
The study examines the impact of different parameters on retention using various statistical modeling and machine learning methodologies. Comparison of the performance of different statistical modeling methods such as linear regression, logistic, ridge and LASSO regression yields that logistic regression provides the best performance. Analyses performed using different machine learning methods such as CART, random forest, boosting and neural network highlights that response predicted by Boosting is the best among all the machine learning methods.
In addition, the comparison of all methods simultaneously unravelled that logistic regression, which is one of the simplest and oldest classification techniques provides the maximum predictive power. It also highlights that fancy and advanced algorithms do not always work better. Different predictive modelling techniques work better on various kinds of data. Without scrutinizing the data carefully, logical conclusions cannot be made. Logistic, ridge and LASSO regression techniques are linear models, whereas Random Forest, CART, Boosting and Neural Network are nonlinear models and the overall results show that linear models performance is almost equal or better than nonlinear models. However, different model performances will vary depending on the objective of the company i.e. whether they want to focus on top 20% or top 50% of the customers.