Drop your VCF file here
or click to browse · .vcf format
Prediction based on AlphaMissense, MAVE, PhyloP, ESM-2, SpliceAI, AlphaFold 3D structure, and 103 engineered features.
SteppeDNA
Multi-Gene HR Variant Classifier • 103 Features from 5 Databases • ESM-2 LLM Embeddings • SHAP Explained
Overall AUC (0.978) is weighted by gene representation. BRCA2 comprises 52% of the test set. Evaluated on a held-out 20% stratified test split.
Non-BRCA2 temporal AUCs (0.51–0.61) indicate limited generalization over time for data-scarce genes.
| Gene | SteppeDNA | Best SOTA | Winner |
|---|---|---|---|
| BRCA2 | 0.983 | 0.949 (BayesDel) | SteppeDNA |
| RAD51D | 0.804 | 0.461 (REVEL) | SteppeDNA |
| RAD51C | 0.743 | 0.703 (CADD) | SteppeDNA |
| BRCA1 | 0.706 | 0.646 (BayesDel) | SteppeDNA |
| PALB2 | 0.641 | 0.732 (REVEL) | REVEL |
| All scores evaluated on SteppeDNA's own test set, giving SteppeDNA a methodological advantage. Independent benchmark AUC: 0.719–0.793. REVEL, BayesDel, and CADD are general-purpose tools not trained on this distribution. | |||
Evaluated on SteppeDNA’s held-out test set. Independent benchmark AUC: 0.719–0.793.
or click to browse · .vcf format
Prediction based on AlphaMissense, MAVE, PhyloP, ESM-2, SpliceAI, AlphaFold 3D structure, and 103 engineered features.
| # | Type | HGVS | cDNA | AA Change | Mutation | Prediction | Probability |
|---|
Multi-gene HR Deep Neural Network & XGBoost ensemble trained on 19,000+ ClinVar & gnomAD HR missense variants with isotonic probability calibration. Resolves 5 genes seamlessly.
BLOSUM62 substitution scores, ESM-2 protein language model embeddings, PhyloP conservation, MAVE functional assays, AlphaMissense, SpliceAI, AlphaFold 3D structures, and gnomAD population frequencies.
Every prediction shows which features pushed it toward pathogenic or benign, using SHAP values extracted from the XGBoost model.
Research-grade variant classification for missense mutations in 5 Homologous Recombination DNA repair genes (BRCA1, BRCA2, PALB2, RAD51C, RAD51D). Intended as a decision-support tool, not a standalone diagnostic.
19,223 variants: 18,738 from ClinVar (reviewed/expert-panel classifications) + 485 gnomAD proxy-benign (AC ≥ 2 in ~1.6M alleles). 60/20/20 split with gene × label stratification. RANDOM_STATE=42.
XGBoost (60%) + Multi-Layer Perceptron (40%) blended ensemble with isotonic calibration trained on a held-out calibration set. 103 engineered features from 5 databases.
ROC-AUC: 0.978 · MCC: 0.881 · Balanced Accuracy: 94.1%. 10-fold CV: 0.9797 ± 0.0031. Outperforms REVEL (0.725), BayesDel (0.721), CADD (0.539) on same test set.*
* Evaluated on SteppeDNA test set. General-purpose tools not trained on same distribution.
No regulatory approval has been obtained or is currently in progress. The roadmap below reflects aspirational milestones only.
This tool is currently for research use only and has not been submitted for regulatory approval. Clinical deployment will require certification from the relevant national health authority.