SteppeDNA — Multi-Gene HR Variant Classifier

Per-Gene ROC-AUC Performance

BRCA2

0.994

RAD51D

0.824

RAD51C

0.785

BRCA1

0.747

PALB2

0.605

Overall AUC (0.985) is weighted by gene representation. BRCA2 comprises 52% of the test set. Evaluated on a held-out 20% stratified test split.

Non-BRCA2 temporal AUCs (0.51–0.61) indicate limited generalization over time for data-scarce genes.

Manual Entry

cDNA Position (1 - 10257)

Reference AA

Alternate AA

Mutation (optional)

Upload VCF

Drop your VCF file here

or click to browse · .vcf format

Research Use Only — not for clinical diagnosis | Germline variants only

Pathogenic

Pathogenicity Probability 98%

Prediction based on EVE, MAVE, PhyloP, ESM-2, SpliceAI, AlphaFold 3D structure, gnomAD frequencies, and 120 engineered features.

Per-Gene SOTA Comparison

Gene	SteppeDNA	Best SOTA	Winner
BRCA2	0.994	0.949 (BayesDel)	SteppeDNA
RAD51D	0.824	0.461 (REVEL)	SteppeDNA
RAD51C	0.785	0.703 (CADD)	SteppeDNA
BRCA1	0.747	0.646 (BayesDel)	SteppeDNA
PALB2	0.605	0.732 (REVEL)	REVEL

* All tools evaluated on SteppeDNA’s held-out test set (methodological advantage). REVEL/BayesDel/CADD scored 72–73% of variants. Independent benchmark AUC: 0.750–0.801 (no training overlap).

Per-gene SOTA AUCs evaluated on SteppeDNA v5.4 test set. Scores sourced from dbNSFP; tools that could not score a variant were excluded from that gene's comparison.

AlphaMissense Label Leakage Discovery

AlphaMissense (Cheng et al. 2023) was partially trained on ClinVar labels, creating indirect label circularity. Ablation testing revealed that removing AlphaMissense features improved AUC for 3 of 5 genes. AlphaMissense was removed in v5.4.

With AlphaMissense Without (v5.4)

BRCA1

+0.020

BRCA2

−0.000

PALB2

+0.015

RAD51C

+0.018

RAD51D

−0.007

3 of 5 genes improved after removing AlphaMissense — confirming indirect ClinVar label leakage. Overall AUC: 0.9846 → 0.9853 (no loss).

Ablation study conducted with XGBoost-only on the pre-v5.4 model configuration. Per-gene AUC values differ from the final v5.4 XGBoost+MLP ensemble.

Population Equity Analysis

Model Performance by Ancestry

Proxy assignment by highest gnomAD sub-population AF — not self-reported ancestry. 84% of variants had no assignable population. Small sample sizes limit interpretation.

EAS (n=85)

0.995

AFR (n=144)

0.991

NFE (n=282)

0.989

AMR (n=100)

0.974

VUS Resolution Capacity by Gene

BRCA2

0.994

RAD51D

0.824

RAD51C

0.785

BRCA1

0.747

PALB2

0.605

High confidence Moderate Low — interpret with caution

How SteppeDNA Addresses Population Bias

90.8% frequency-independent architecture — 109 of 120 prediction features use universal biological signals (protein structure, evolutionary conservation, protein language models) that work equally for any population
Population-calibrated ACMG rules — PM2 evidence is automatically withheld for Kazakh/Central Asian patients (gnomAD has ~0% Central Asian representation; “absent from controls” means “never checked”). BA1/BS1 benign thresholds are relaxed 2× to prevent premature benign calls from sparse data
Founder mutation integration (PS4) — 7 known Kazakh founder mutations from published cancer genetics studies are recognized as formal ACMG strong pathogenic evidence, not just annotations
Honest uncertainty — confidence intervals are automatically widened (×1.5) for underrepresented populations, preventing overconfident misclassification

PM2 Evidence Disparity by Population

Population	Variants meeting PM2	Difference vs NFE
EAS	3,753	+245
AMR	3,713	+205
AFR	3,684	+176
NFE	3,508	—

PM2 (absent from population databases) is triggered more often for non-European populations due to lower representation in gnomAD. This can inflate pathogenicity evidence for underrepresented groups.

Kazakh Founder Mutations — Tool Comparison

Variant	Gene	SteppeDNA	REVEL	CADD	Freq (KZ)
p.Cys61Gly	BRCA1	0.991	0.948	24.5	0.004
p.Met1Thr	BRCA1	0.991	—	25.0	0.006
5382insC	BRCA1	Frameshift — N/A		—	0.035
c.5278-2del	BRCA1	Splice site — N/A		—	0.008
c.4035del	BRCA1	Frameshift — N/A		—	0.005
c.9409dup	BRCA2	Frameshift — N/A		—	0.012
c.9253del	BRCA2	Frameshift — N/A		—	0.008

CADD scores are on PHRED scale (not directly comparable to probability scores). REVEL and BayesDel evaluate missense variants only; SteppeDNA requires amino acid input and cannot score frameshifts or splice variants. CADD can score all variant types but scores for indel/splice founders were not retrieved.

VCF Batch Results

#	Type	HGVS	cDNA	AA Change	Mutation	Prediction	Probability

Universal Base Models

Multi-gene HR Deep Neural Network & XGBoost ensemble trained on 19,000+ ClinVar & gnomAD HR missense variants with isotonic probability calibration. Resolves 5 genes seamlessly.

120 Features

BLOSUM62 substitution scores, ESM-2 protein language model embeddings, PhyloP conservation, MAVE functional assays, EVE evolutionary scores, SpliceAI, AlphaFold 3D structures, and gnomAD population frequencies.

SHAP Explanations

Every prediction shows which features pushed it toward pathogenic or benign, using SHAP values extracted from the XGBoost model.

Model Card & Limitations

Intended Use

Research-grade variant classification for missense mutations in 5 Homologous Recombination DNA repair genes (BRCA1, BRCA2, PALB2, RAD51C, RAD51D). Intended as a decision-support tool, not a standalone diagnostic.

Training Data

19,223 variants: 18,738 from ClinVar + 485 gnomAD proxy-benign. Per gene: BRCA2 (10,085) | BRCA1 (5,432) | PALB2 (2,621) | RAD51C (675) | RAD51D (410). 60/20/20 split with gene × label stratification.

Architecture

XGBoost + Multi-Layer Perceptron blended ensemble with gene-adaptive weights and isotonic calibration trained on a held-out calibration set. 120 engineered features from 8 data sources.

Performance

ROC-AUC: 0.985 · MCC: 0.928 · Balanced Accuracy: 96.5%. 10-fold CV: 0.9858 ± 0.0021. Outperforms REVEL (0.725), BayesDel (0.721), CADD (0.539) on same test set.*

* Evaluated on SteppeDNA test set. General-purpose tools not trained on same distribution.

Known Limitations

Non-BRCA2 genes have smaller training sets and lower per-gene AUC (PALB2: 0.605, RAD51C: 0.785)
Copy number variants (CNVs) are not classified. Splice-site, in-frame indel, and synonymous variants use rule-based classification (not the ML model). Truncating variants (nonsense, frameshift) use Tier 1 protocol
Model may underperform on novel variants with no population frequency data
Probabilities are clipped to [1%, 99%] — the model never claims absolute certainty
Compound heterozygosity warning is provided in VCF batch analysis, but does not perform phasing; variants require segregation study to determine cis/trans configuration
Training data predominantly from populations of European descent; however, 90.8% of features are population-independent. Population-calibrated ACMG rules and widened CIs mitigate frequency bias for Central Asian variants
ClinVar classifications evolve over time; training labels may not reflect the latest expert consensus

Ethical Considerations

Not validated for clinical diagnostic use — all predictions require expert review
Training data is primarily from populations of European descent; population-calibrated ACMG rules and widened confidence intervals are applied for underrepresented ancestries
ACMG evidence codes are computational approximations, not expert classifications
Should not be used as the sole basis for clinical decisions about patient care

Per-Gene ROC-AUC Performance

Drop your VCF file here

Recent Analyses

Per-Gene SOTA Comparison

AlphaMissense Label Leakage Discovery

Population Equity Analysis

Model Performance by Ancestry

VUS Resolution Capacity by Gene

How SteppeDNA Addresses Population Bias

PM2 Evidence Disparity by Population

Kazakh Founder Mutations — Tool Comparison

Universal Base Models

120 Features

SHAP Explanations

Intended Use

Training Data

Architecture

Performance

Known Limitations

Ethical Considerations