Big Data in Automotive Industry


After Sales Market Trends


Repair works and maintenance contributes to an automotive company the major part of its annual profits with a still enormous potential as there is over $55 billion per annum of unperformed and underperformed maintenance by vehicle owners. 

In US as in EU economic crisis had a negative impact in cars after sales market, a market that was already in a transition point between a promising potential of further development and a difficult reality due to client’s difficulties to perceive after sales service as product but also because new cars are built to last longer and require maintenance less frequently.

As such, the average age of vehicles on the road has grown 14% since 2008 (IHS Automotive research, “Cars On American Roads Is Older Than Ever,” December 2013), with 86.4% of vehicles being out-of-warranty as of Q2 2011(Experian Automotive’s Vehicles In Operation (VIO) database, 2011).

Research shows, these out-of-warranty owners are more likely to explore repair shop options rather than return directly to the dealership. And in general, those still visiting dealerships are there less frequently for maintenance needs.

The average vehicle lifespan continues to increase. Currently at a record 11.3 years, vehicle age is expected to grow to 11.5 years by 2018 ((IHS). A very large percent of these older vehicles will be out-of warranty, challenging dealerships to compete to retain their business. 

A 2008 study by the Car Care Council estimates that 80% of vehicles need service and parts.

It is estimated that one in ten drivers continue to drive, ignoring the dashboard “check engine” light which is a major focal point of auto repair shops.

Unperformed maintenance represents a significant untapped market for auto franchises.


Authorized Repairers vs Independents

Auto owners tend to leave dealerships for service as cars age and go off-warranty. According to a
study by DME automotive, customers seeking basic services (rather than major repairs), tend to defect over time.

These lost customers are estimated to cost dealers a large percent of their revenue on older cars. In fact, dealers lose an average of 60-78% of revenue on three- to six-year-old cars and 82-92% of revenue for cars more than seven years old.

Specially for Europe we have a domination of authorized repairers in new cars and independents in the old ones (BCG 2012)

While in the current economic climate customers are more likely to defer expensive repairs on their cars, they are more likely to repair them eventually than to replace them. In this context Life Time Cycle of each client is critical and specially Churn as it affects the length of the service period and, hence, future profit generation.



The Challenge




A worldwide leader automotive company, faced a crucial challenge for its After Sales Service business and more especially for its Authorized repairers : How to increase loyalty, reduce churn, better serve and market to its customers.

Objective review through data analysis. Before the economic crisis, we used to define as churn the time period when a customer ceased visiting authorized repair shops for the annual service. For the majority of the customers churn was related with the warranty (6 years in our case).

It was more or less correct but now due to the economic crisis, churn definition must be reconsidered, as a significant number of customers stop maintenance service well before the sixth year, a lot of them alternate service in authorized and independent repairers and the majority of them, prolongs the period between two maintenance services (Figure 1)



The Solution





Business Intelligence is a vital component in strategic planning for companies that are aware of worldwide competition, ever-shorter production cycles and increasing customer requirements.

DIRECTING's mission is the design of knowledge architectural plan as part of business engineering and the creation of Business Intelligence applications in order to provide decision makers with knowledge, in real time, a knowledge diffused to all management levels, increasing this way team work, efficiency and profitability.


This is been accomplished by the initial concept of DATACTIF®, a Business Intelligence Platform able to generate concept-applications tailor made for each enterprise, adding in same time to each case, a 20 year overall experience of learning processes, accumulating knowledge and finding solutions to problems in industrial, financial and retail sectors.

DATACTIF® uses machine learning methodology and algorithms such as neural network, Kohonen SOM, fuzzy systems, genetic algorithms, Support Vector Machines, etc… and contains visualization methods that allows both a global view and an analytical view to information.

Contrary on the high level of used methodology and  algorithms complexity, DATACTIF® user interface does not requires knowledge, neither in statistics nor in computer science.





Align Knowledge Strategy to Business Objectives




Knowledge strategic objective is : Increase visits for maintenance to Authorized dealers. Why only maintenance. Because for serious repairs or malfunctions (accident, mechanical problem, electrical, etc...) customers prefers authorized repairers in a 55-60% and on  the other hand, such causes for service are unpredictable.

Target audience : Predict till the end of current year, those who will not come for maintenance next  year. Why we consider the "year" as the most appropriate time period; Because a year is the necessary period for decision makers to create, implement, adjust efficient business and marketing strategies.

Knowledge Strategy and Target audience :

1. Annual churn prediction for in-warranty customers (car age <4 years)

2. Annual churn prediction for customers near to the end of the warranty (car age >4 and <7)

3. Evolution analysis of Customer Life Time Cycle associated with churn.

Data Used : Historical data from 5 years visits for service maintenance (2009 – 2013). The input variables are: frequency of visits, recency of last visit, total visits, distance between last visit and previous one, total kilometers and age of the car.




These variables, together with the status (churn or not-churn) of customers, form the attributes of an example in the training data set. Churn-not Churn was the value to predict for customers data base.

We used DATACTIF's supervised learning (SVM) for churn prediction and unsupervised learning (neural networks and SOM) for clustering and rules extractions, using the same data set.




Churn Prediction Model



DATACTIF® uses both supervised and un-supervised learning methods in order to solve prediction problems.

Superivesed Learning

For supervised learning methods, there are 2 modules, a fuzzy system application and a SVM one. In our case, we used the polynomial kernel with the support vector machines (SVMs) that represent  the similarity of vectors (training samples) in a feature space over polynomials of
the original variables, allowing learning of non-linear model

We verified our prediction model with real data provided by maintenance visits occurred between 1/1/2014 and 31/7/2014.

Annual churn prediction for :

cars <4 years : Prediction Accuracy= 67.0%  

cars >4 and <7 : Average Accuracy= 76.8%

And more specifically for :

cars  = 5 year : Prediction Accuracy=79.5%

cars = 6 year : Prediction Accuracy=74.3%





Unsuperivesed Learning

In order to find hidden information into data and in same time to have a macroscopic point of view on the relationship between customers and maintenance, we decided to use neural networks and self organizing map. As data and input variables, we used the same as in the SVM prediction model.

A self-organizing map (SOM) is a type of artificial neural network that is trained to use unsupervised learning to produce a two-dimensional, discretized representation of the input space of the training samples, called a map.

For the visualization of the result we used the technique called U-matrix or unified distance matrix that visualizes the distance between adjacent units in the SOM. It represents the map as a regular grid of neurons as illustrated in (Figure 3).


In order to interpret the map, and in particular, the characteristics of each cluster, we used the component levels that show the distribution of values across the map, according to one variable at a time (Figure 3).



Hyper Clusters, LTC and Churn



Based on extracted values of features for each cluster and on clusters similitude’s analysis, we could define 4 Hyper Clusters (Figure 4).

Why Hyper Clusters : Apart from the fact that Hyper Clusters were defined by themselves in a clear way (Figure 4), we opted for Hyper Clusters because an enterprise needs groups of customers as big as possible in order to design cost efficient strategies (stock logistics, price discount, promotional campaigns, etc...).


Hyper Cluster A : clusters 4, 5, 10, 15,

Hyper Cluster B : clusters 1, 2, 6,

Hyper Cluster C : clusters 16, 17, 21, 22,

Hyper Cluster D : clusters 19, 20, 23, 24, 25





Regarding relation between Churn predicted and real Churn during 2014 we observe a total correlation  (Figure 5), that bring us to examine in detail Hyper Clusters and real visits for maintenance.

We will examine in detail not churn relation with Hyper Clusters because this way we will discover tendencies for future behavior




Based on real visits of cars that did maintenance service (not churn customers) between 1/2/2014 and 31/7/2014 we observe that Hyper Cluster D is more important for car ages under 6 years, Hyper Cluster C for car ages between 6 and 10 and Hyper Cluster B for car ages over 10 years (Figure 6)




Considering past years' history, we observe that from 2009 to 2014, there is a gradual movement from Hyper Cluster A to Hyper Clusters C and D, this movement allows us to predict that in 2015 we will have Hyper Cluster D as the only dominant.

So Hyper Clusters allow us a macroscopic point of view on LTC evolution also related with Churn (Figure 7)





Hyper Clusters and Warranty

Now concerning cars in warranty, there is a dominance of Hyper Cluster D independently of models (Figure 8) with just one exception for Model_8

For cars out of warranty, we have a dominance of Hyper Cluster D for the Model_6 as it is the newest of all and for the rest we have : Models 8, 1, 5, 4, 10 dominant Hyper Clusters are B and C, Models 2, 7 and 9, Hyper Clusters C and Model 3, Hyper Clusters B and D (Figure 9)

Created at: 06/12/2007 - 10:06