Eight evaluated settings across seven biomedical modalities and one fusion run.
Reported accuracies range from 0.610 (clinical text, 5-class classification with a 0.20 random baseline) to 0.993 (PathMNIST, 9-class). The validation column on each row records the protocol that produced the number; it is not omitted from any row.
| Modality | Dataset | Model | Accuracy | Validation |
|---|---|---|---|---|
| Clinical tabular | Synthea readmission (n=6,625) | XGBoost + Optuna | 0.777 | 5-fold CV, 0.774 ± 0.008 |
| Omics | Visium breast cancer (n=3,798 spots) | VAE | 0.928 | Validation split, 14-class Leiden clustering |
| Knowledge graphs | Hetionet ego-graphs (n=1,913) | GAT | 0.815 | Validation split, center-node readout, 8-class |
| Histopathology | PathMNIST (n=107,180) | DenseNet-121 | 0.993 | Validation split, 9-class |
| Clinical text | MTSamples (n≈3,500) | ClinicalBERT | 0.610 | 5-class specialty, random baseline 0.20 |
| Single-cell imaging | BloodMNIST (n=17,092) | DenseNet-121 | 0.969 | Test split, 8-class |
| Drug discovery | ChEMBL bioactivity (n=4,685) | GCN | 0.975 | Validation split, molecular graph classification |
| Multi-modal fusion | Visium (omics + vision + spatial) | Attention fusion | 0.930 | Validation split; vs. 0.928 omics-only on same dataset |
Per-modality accuracy, descending
The same eight numbers as the table, ordered by accuracy. The bar widths are linear over [0, 1].
Benchmark accuracy, not clinical performance
Every row reports accuracy on a held-out split or cross-validation protocol of a public or synthetic dataset. These are infrastructure and methodology measurements that show the framework can train, evaluate, and report across heterogeneous biomedical modalities under one execution surface. They are not estimates of clinical utility. Clinical evaluation, where applicable, lives on the Validation page.
Pipeline that produced these numbers
Every row in the table above is the output of the same staged DAG. Stage transitions persist artifacts; a feedback edge from reporting drives next-run selection.
Each stage persists state
Ingestion writes a dataset state, training writes a model, evaluation writes metrics, reporting writes the bundle that produced the rows above. Stage transitions cross artifact boundaries rather than in-process state.
Reporting drives next-run selection
The feedback path links the reporting bundle back to next-run selection through declared interfaces. It influences which experiments run next; it does not bypass the staged sequence or the artifact record.
Delivery state of the program
Phase markers for the build that produced the table above. Status is reported as completed or partial; nothing here implies clinical or production readiness.
Complete means the phase has shipped artifacts that the table above depends on. Partial means in-progress work toward declared deliverables; deployment, compliance posture, and platform scaling are not finished and are not claimed as production-ready.