The Kolmogorov–Smirnov Statistic, Explained: Measuring Model Power in Credit Risk Modeling


days, people are taking more loans than ever. For anyone who wants to build their own house, home loans are available and if you own a property, you can get a property loan. There are also agriculture loans, education loans, business loans, gold loans, and many more.

In addition to these, for buying items like televisions, refrigerators, furniture and mobile phones, we also have EMI options.

But does everyone get their loan application approved?

Banks don’t give loans to every person who applies; there is a process they follow to approve loans.

We know that machine learning and data science are now applied across industries, and banks also make use of them.

When a customer applies for a loan, banks need to know the likelihood that the customer will repay on time.

For this, banks use predictive models, mainly based on logistic regression or other machine learning methods,

We already know that by applying these methods, each applicant is assigned a probability.

This is a classification model, and we need to classify defaulters and non-defaulters.

Defaulters: Customers who fail to repay their loan (miss payments or stop paying altogether).

Non-defaulters: Customers who repay their loans on time.

We already discussed accuracy and ROC-AUC to evaluate the classification models.

In this article, we are going to discuss the Kolmogorov-Smirnov Statistic (KS Statistic) which is used to evaluate classification models especially in the banking sector.

To understand the KS Statistic, we will use the German Credit Dataset.

This dataset contains information about 1000 loan applicants, describe by 20 features such as such as account status, loan duration, credit amount, employment, housing, and personal status etc.

The target variable indicates whether the applicant is non-defaulter (represented by 1) or defaulter (represented by 2).

You can find the information about dataset here.

Now we need to build a classification model to classify the applicants. Since it is a binary classification problem, we will apply logistic regression on this dataset.

Code:

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Load dataset
file_path = "C:/german.data"
data = pd.read_csv(file_path, sep=" ", header=None)

# Rename columns
columns = [f"col_{i}" for i in range(1, 21)] + ["target"]
data.columns = columns

# Features and target
X = pd.get_dummies(data.drop(columns=["target"]), drop_first=True)
y = data["target"]   # keep as 1 and 2

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Train logistic regression
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

# Predicted probabilities
y_pred_proba = model.predict_proba(X_test)

# Results DataFrame
results = pd.DataFrame({
    "Actual": y_test.values,
    "Pred_Prob_Class2": y_pred_proba[:, 1]
})

print(results.head())

We already know that when we apply logistic regression, we get predicted probabilities.

Read Also:  Google’s AlphaEvolve Is Evolving New Algorithms — And It Could Be a Game Changer
Image by Author

Now to understand how KS Statistic is calculated, let’s consider a sample of 10 points from this output.

ks stat blog 2
Image by Author

Here the highest predicted probability is 0.92, which means there is 92% chance that this applicant will default.

Now let’s proceed with KS Statistic calculation.

First, we will sort the applicants by their predicted probabilities in descending order, so that higher risk applicants will be at the top.

ks stat blog 3
Image by Author

We already know that ‘1’ represents non-defaulters and ‘2’ represents defaulters.

In next step, we calculate the cumulative count of non-defaulters and defaulters at each step.

ks stat blog 4
Image by Author

In next step, we convert cumulative counts of defaulters and non-defaulters into cumulative rates.

We divide the cumulative defaulters by the total number of defaulters, and the cumulative non-defaulters by the total number of non-defaulters.

ks stat blog 5
Image by Author

Next, we calculate the absolute difference between the cumulative defaulter rate and cumulative non-defaulter rate.

ks stat blog 6
Image by Author

The maximum difference between cumulative defaulter rate and cumulative non-defaulter rate is 0.83, which is the KS Statistic for this sample.

Here the KS Statistic is 0.83, occurred at a probability of 0.29.

This means the model captures defaulters 83% more effectively than non-defaulters at this threshold.


Here, we can observe that:

Cumulative Defaulter Rate = True Positive Rate (how many actual defaulters we have captured so far).

Cumulative Non-Defaulter Rate = False Positive Rate (how many non-defaulters are incorrectly captured as defaulters).

But as we haven’t fixed any threshold here, how can we get True Positive and False Positive rates?

Let’s see how cumulative rates are equal to TPR and FPR.

First, we consider every probability as a threshold and calculate TPR and FPR.

[
begin{aligned}
mathbf{At threshold 0.92:} & \[4pt]
TP &= 1,quad FN = 3,quad FP = 0,quad TN = 6\[6pt]
TPR &= tfrac{1}{4} = 0.25\[6pt]
FPR &= tfrac{0}{6} = 0\[6pt]
Rightarrow (mathrm{FPR},,mathrm{TPR}) &= (0,,0.25)
end{aligned}
]

[
begin{aligned}
mathbf{At threshold 0.63:} & \[4pt]
TP &= 2,quad FN = 2,quad FP = 0,quad TN = 6\[6pt]
TPR &= tfrac{2}{4} = 0.50\[6pt]
FPR &= tfrac{0}{6} = 0\[6pt]
Rightarrow (mathrm{FPR},,mathrm{TPR}) &= (0,,0.50)
end{aligned}
]
[
begin{aligned}
mathbf{At threshold 0.51:} & \[4pt]
TP &= 3,quad FN = 1,quad FP = 0,quad TN = 6\[6pt]
TPR &= tfrac{3}{4} = 0.75\[6pt]
FPR &= tfrac{0}{6} = 0\[6pt]
Rightarrow (mathrm{FPR},,mathrm{TPR}) &= (0,,0.75)
end{aligned}
]
[
begin{aligned}
mathbf{At threshold 0.39:} & \[4pt]
TP &= 3,quad FN = 1,quad FP = 1,quad TN = 5\[6pt]
TPR &= tfrac{3}{4} = 0.75\[6pt]
FPR &= tfrac{1}{6} approx 0.17\[6pt]
Rightarrow (mathrm{FPR},,mathrm{TPR}) &= (0.17,,0.75)
end{aligned}
]
[
begin{aligned}
mathbf{At threshold 0.29:} & \[4pt]
TP &= 4,quad FN = 0,quad FP = 1,quad TN = 5\[6pt]
TPR &= tfrac{4}{4} = 1.00\[6pt]
FPR &= tfrac{1}{6} approx 0.17\[6pt]
Rightarrow (mathrm{FPR},,mathrm{TPR}) &= (0.17,,1.00)
end{aligned}
]
[
begin{aligned}
mathbf{At threshold 0.20:} & \[4pt]
TP &= 4,quad FN = 0,quad FP = 2,quad TN = 4\[6pt]
TPR &= tfrac{4}{4} = 1.00\[6pt]
FPR &= tfrac{2}{6} approx 0.33\[6pt]
Rightarrow (mathrm{FPR},,mathrm{TPR}) &= (0.33,,1.00)
end{aligned}
]
[
begin{aligned}
mathbf{At threshold 0.13:} & \[4pt]
TP &= 4,quad FN = 0,quad FP = 3,quad TN = 3\[6pt]
TPR &= tfrac{4}{4} = 1.00\[6pt]
FPR &= tfrac{3}{6} = 0.50\[6pt]
Rightarrow (mathrm{FPR},,mathrm{TPR}) &= (0.50,,1.00)
end{aligned}
]
[
begin{aligned}
mathbf{At threshold 0.10:} & \[4pt]
TP &= 4,quad FN = 0,quad FP = 4,quad TN = 2\[6pt]
TPR &= tfrac{4}{4} = 1.00\[6pt]
FPR &= tfrac{4}{6} approx 0.67\[6pt]
Rightarrow (mathrm{FPR},,mathrm{TPR}) &= (0.67,,1.00)
end{aligned}
]
[
begin{aligned}
mathbf{At threshold 0.05:} & \[4pt]
TP &= 4,quad FN = 0,quad FP = 5,quad TN = 1\[6pt]
TPR &= tfrac{4}{4} = 1.00\[6pt]
FPR &= tfrac{5}{6} approx 0.83\[6pt]
Rightarrow (mathrm{FPR},,mathrm{TPR}) &= (0.83,,1.00)
end{aligned}
]
[
begin{aligned}
mathbf{At threshold 0.01:} & \[4pt]
TP &= 4,quad FN = 0,quad FP = 6,quad TN = 0\[6pt]
TPR &= tfrac{4}{4} = 1.00\[6pt]
FPR &= tfrac{6}{6} = 1.00\[6pt]
Rightarrow (mathrm{FPR},,mathrm{TPR}) &= (1.00,,1.00)
end{aligned}
]

Read Also:  The AI Productivity Paradox: Why Aren’t More Workers Using ChatGPT? | by Julia Winn | Oct, 2024

From the above calculations, we can see that the cumulative defaulter rate corresponds to the True Positive Rate (TPR), and the cumulative non-defaulter rate corresponds to the False Positive Rate (FPR).

When calculating the cumulative default rate and cumulative non-default rate, each row represents a threshold, and the rate is calculated up to that row.

Here we can observe that KS Statistic = Max (|TPR – FPR|)


Now let’s calculate the KS Statistic for full dataset.

Code:

# Create DataFrame with actual and predicted probs
results = pd.DataFrame({
    "Actual": y.values,
    "Pred_Prob_Class2": y_pred_proba
})

# Mark defaulters (2) and non-defaulters (1)
results["is_defaulter"] = (results["Actual"] == 2).astype(int)
results["is_nondefaulter"] = 1 - results["is_defaulter"]

# Sort by predicted probability
results = results.sort_values("Pred_Prob_Class2", ascending=False).reset_index(drop=True)

# Totals
total_defaulters = results["is_defaulter"].sum()
total_nondefaulters = results["is_nondefaulter"].sum()

# Cumulative counts and rates
results["cum_defaulters"] = results["is_defaulter"].cumsum()
results["cum_nondefaulters"] = results["is_nondefaulter"].cumsum()
results["cum_def_rate"] = results["cum_defaulters"] / total_defaulters
results["cum_nondef_rate"] = results["cum_nondefaulters"] / total_nondefaulters

# KS statistic
results["KS"] = (results["cum_def_rate"] - results["cum_nondef_rate"]).abs()
ks_value = results["KS"].max()
ks_index = results["KS"].idxmax()

print(f"KS Statistic = {ks_value:.3f} at probability {results.loc[ks_index, 'Pred_Prob_Class2']:.4f}")

# Plot KS curve
plt.figure(figsize=(8,6))
plt.plot(results.index, results["cum_def_rate"], label="Cumulative Defaulter Rate (TPR)", color="red")
plt.plot(results.index, results["cum_nondef_rate"], label="Cumulative Non-Defaulter Rate (FPR)", color="blue")

# Highlight KS point
plt.vlines(x=ks_index,
           ymin=results.loc[ks_index, "cum_nondef_rate"],
           ymax=results.loc[ks_index, "cum_def_rate"],
           colors="green", linestyles="--", label=f"KS = {ks_value:.3f}")

plt.xlabel("Applicants (sorted by predicted probability)")
plt.ylabel("Cumulative Rate")
plt.title("Kolmogorov–Smirnov (KS) Curve")
plt.legend(loc="lower right")
plt.grid(True)
plt.show()

Plot:

ks stat blog 7
Image by Author

The maximum gap is 0.530 at probability of 0.2928.


As we understood how to calculate the KS Statistic, let’s discuss the significance of this statistic.

Here we built a classification model and evaluated it using the KS Statistic, but we also have other classification metrics like accuracy, ROC-AUC, etc.

We already know that accuracy is specific to one threshold, and it changes according to the threshold.

ROC-AUC gives us a number which shows the overall ranking ability of the model.

But why is the KS Statistic used in Banks?

Read Also:  Google Cloud: Driving digital transformation

The KS statistic gives a single number, which represents the maximum gap between the cumulative distributions of defaulters and non-defaulters.

Let’s go back to our sample data.

We got KS Statistic 0.83 at probability of 0.29.

We already discussed that each row acts as a threshold.

So, what happened at 0.29?

Threshold = 0.29 means the probabilities are greater than or equal to 0.29 are flagged as defaulters.

At 0.29, the top 5 rows flagged as defaulters. Among those five, four are actual defaulters and one is non-defaulter incorrectly predicted as defaulter.

Here True Positives = 4 and False Positive = 1.

The remaining 5 rows will be predicted as non-defaulters.

At this point, the model has captured all the four defaulters and one non-defaulter incorrectly flagged as defaulter.

Here TPR is maxed out at 1 and FPR is 0.17.

So, KS Statistic = 1-0.17 = 0.83.

If we go further and calculate for other probabilities as we done earlier, we can observe that there will be no change in TPR but there will be increase in FPR, which results in flagging more non-defaulters as defaulters.

This reduces the gap between two groups.

Here we can say that at 0.29, model denied all defaulters and 17% of non-defaulters (according to sample data) and approved 83% of defaulters.


Do banks decide the threshold based on the KS Statistic?

While the KS Statistic shows the maximum gap between two groups, banks do not decide threshold based on this statistic.

The KS Statistic is used to validate the model strength, while the actual threshold is decided by considering risk, profitability and regulatory guidelines.

If KS is below 20, it is considered as a weak model.
If it is between 20-40, it is considered acceptable.
If KS is in the range of 50-70, it is considered as a good model.


Dataset

The dataset used in this blog is the German Credit dataset, which is publicly available on the UCI Machine Learning Repository. It is provided under the Creative Commons Attribution 4.0 International (CC BY 4.0) License. This means it can be freely used and shared with proper attribution.


I hope this blog post has given you a basic understanding of the Kolmogorov–Smirnov statistic. If you enjoyed reading, consider sharing it with your network, and feel free to share your thoughts.

If you haven’t read my blog on ROC–AUC yet, you can check it out here.

Thanks for reading!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top