To use MRMU Validation Suite in a project. Import functions from:
from validation_suite import (
compare_dataframes,
vif_check,
psi_check,
csi_check,
backtesting_report,
ValidationResult,
)
Compares two DataFrames (e.g. model owner output vs validator dry run).
Returns a ValidationResult with:
from validation_suite import compare_dataframes
result_strict = compare_dataframes(
df_reference=df_model_owner,
df_challenger=df_dry_run,
key_cols=["obligor_id"],
numeric_tol=1e-6,
label_reference="Model Owner v2.3",
label_challenger="Validator Dry Run",
)
print(f"Status : {result_strict.status}")
print(f"Warnings : {result_strict.warnings if result_strict.warnings else 'None'}")
print()
result_strict.summary_df.sort_values("max_abs_diff", ascending=False)
results in:
| column | max_abs_diff | mean_abs_diff | rows_exceeding_tol |
|---|---|---|---|
| pd_score | 0.007647 | 0.000059 | 8.000000 |
| index | 0.000000 | 0.000000 | 0.000000 |
| lgd | 0.000000 | 0.000000 | 0.000000 |
| ead | 0.000000 | 0.000000 | 0.000000 |
Columns with rows_exceeding_tol > 0 require additional analysis. A small numerical difference (e.g., max_abs_diff < 0.005) may be acceptable if it can be explained by rounding differences between platforms
Computes VIF for each feature. SR11-7 standard threshold = 10. Values > 10 indicate problematic multicollinearity.
from validation_suite import vif_check
rng2 = np.random.default_rng(7)
n_obs = 1000
df_features_clean = pd.DataFrame(
{
"ltv": rng2.uniform(0.30, 0.95, n_obs),
"dti": rng2.uniform(0.10, 0.60, n_obs),
"credit_age_yrs": rng2.uniform(1, 30, n_obs),
"utilization_rate": rng2.uniform(0, 1, n_obs),
"num_delinquencies": rng2.poisson(lam=0.4, size=n_obs).astype(float),
}
)
result_vif_clean = vif_check(
df_features_clean, feature_cols=FEATURE_COLS, threshold=10.0
)
Messing with features and correlating it will produce as output something like:
| # | Feature | VIF | Flag |
|---|---|---|---|
| 1 | dti | 20543658.327543 | HIGH |
| 5 | dti_annualized | 20543522.726991 | HIGH |
| 0 | ltv | 6.611564 | MODERATE |
| 2 | credit_age_yrs | 3.856893 | OK |
| 3 | utilization_rate | 3.500687 | OK |
| 4 | num_delinquencies | 1.398969 | OK |
Population Stability Index (PSI) — detects distributional shift between a reference (development/prior) sample and a monitoring (current) sample.
SR 11-7 context: Monitors whether the input population the model is applied to has drifted from the population it was developed on. Elevated PSI (> 0.25) is a trigger for model recalibration review.
ValidationResult
from validation_suite import psi_check
PSI_COLUMNS = ["pd_score", "ltv", "dti"]
result_psi = psi_check(
expected=df_dev,
actual=df_monitor,
columns=PSI_COLUMNS,
bins=10, # deciles
)
print(f"Status general: {result_psi.status}")
print()
if result_psi.warnings:
print("Warnings:")
for w in result_psi.warnings:
print(f" ⚠ {w}")
print()
result_psi.summary_df.sort_values("psi_value", ascending=False)
results in:
Status general: FAIL
Warnings
pd_score: 7.3992PSI Summary Table
| # | Variable | PSI Value | Flag | # Bins Shifted | N Expected | N Actual |
|---|---|---|---|---|---|---|
| 0 | pd_score | 7.399166 | UNSTABLE | 10 | 5000 | 5000 |
| 2 | dti | 0.032043 | STABLE | 1 | 5000 | 5000 |
| 1 | ltv | 0.002046 | STABLE | 0 | 5000 | 5000 |
Characteristic Stability Index (CSI) — measures distributional shift of the model score within each segment of a categorical variable.
SR 11-7 context: Where PSI measures overall population shift, CSI pinpoints which segment is driving instability. Commonly applied to risk grades, product lines, or geographic segments in PD/LGD/EAD models.
score_col and segment_col.ValidationResult
from validation_suite import csi_check
result_csi = csi_check(
expected=df_dev_seg,
actual=df_monitor_seg,
score_col=SCORE_COL,
segment_col=SEGMENT_COL,
bins=10,
)
print(f"Status general: {result_csi.status}")
print()
if result_csi.warnings:
print("Warnings:")
for w in result_csi.warnings:
print(f" ⚠ {w}")
print()
result_csi.summary_df.sort_values("csi_value", ascending=False)
results in:
Status general: FAIL
Warnings
BBB (pd_score): 0.8413CCC (pd_score): 5.1397CSI Summary Table
| # | Segment | CSI Value | Flag | N Expected | N Actual | Segment % Expected | Segment % Actual |
|---|---|---|---|---|---|---|---|
| 2 | CCC | 5.139687 | UNSTABLE | 800 | 800 | 0.333333 | 0.333333 |
| 1 | BBB | 0.841326 | UNSTABLE | 800 | 800 | 0.333333 | 0.333333 |
| 0 | AAA | 0.015184 | STABLE | 800 | 800 | 0.333333 | 0.333333 |
Comprehensive backtesting report for binary classification models (PD, fraud, prepayment) aligned with SR 11-7 performance requirements.
Discrimination:
Calibration:
ValidationResult
from validation_suite import backtesting_report()
result_bt_good = backtesting_report(
y_true=y_true,
y_score=y_score_good,
model_name="PD-RETAIL-V2.3",
n_hl_groups=10,
n_cal_buckets=10,
alpha=0.05,
)
print(f"Status : {result_bt_good.status}")
print(f"Traffic light : {result_bt_good.details['traffic_light']}")
print()
if result_bt_good.warnings:
for w in result_bt_good.warnings:
print(f" ⚠ {w}")
else:
print(" Sin warnings.")
print()
cols = [
"model_name",
"n_observations",
"n_defaults",
"auc_roc",
"gini",
"ks_statistic",
"brier_score",
"hl_p_value",
"binomial_p_value",
"traffic_light",
"status",
]
result_bt_good.summary_df[cols]
results in:
Status : PASS
Traffic light: GREEN
| model_name | n_observations | n_defaults | auc_roc | gini | ks_statistic | brier_score | hl_p_value | binomial_p_value | traffic_light | status |
|---|---|---|---|---|---|---|---|---|---|---|
| PD-RETAIL-V2.3 | 3000 | 280 | 0.693480 | 0.386959 | 0.306828 | 0.081629 | 0.587496 | 0.317816 | GREEN | PASS |