Controls and configuration that bound the reported benchmark surface.
Experiments run on a fixed compute environment with versioned configuration, fixed seeds, deterministic CUDA where supported, DAG-enforced execution order, and tracked artifacts. These controls improve reproducibility within the documented scope; they are not presented as universal guarantees.
Compute and software environment
Single-node configuration used for the reported experiments. Static-analysis posture is reported separately under reproducibility controls.
| Resource | Specification |
|---|---|
| Instance | AWS EC2 g5.xlarge |
| GPU | NVIDIA A10G, 23 GB VRAM |
| CUDA toolkit | 12.1 (PyTorch build) |
| Python | 3.11.14 |
| PyTorch | 2.5.1+cu121 |
| Orchestration | Snakemake 9.16.3 (34 rules) |
| Tracking | MLflow 3.10.0 |
| Versioning | DVC (configured; partial integration with the primary pipeline) |
Dataset coverage
Eight modality families. The table summarizes the integrated adapters by family; see the note below for the full adapter accounting.
| Modality | Dataset | Source | Samples | Task |
|---|---|---|---|---|
| Clinical tabular | Synthea readmission | Synthea (synthetic EHR) | 6,625 | Binary classification |
| Omics | Visium breast cancer | 10x Genomics | 3,798 | 14-class tissue region |
| Knowledge graphs | Hetionet | Himmelstein et al. | 1,913 ego-graphs | 8-class node classification |
| Histopathology | PathMNIST | MedMNIST | 107,180 | 9-class classification |
| Clinical text | MTSamples | MTSamples.com | ~3,500 | 5-class specialty |
| Single-cell | BloodMNIST | MedMNIST | 17,092 | 8-class classification |
| Drug discovery | ChEMBL bioactivity | ChEMBL database | 4,685 | Binary bioactivity |
| Classical ML | Iris, Wine, Breast Cancer, Digits | UCI / scikit-learn | 150–1,797 | Multi-class classification |
Adapter accounting: 46 adapters are declared in the data registry across these 8 families. 37 are integrated (one or more adapters per row above); 9 are specified but not yet integrated and are excluded from the reported runs.
Training configuration
Per-modality model choice and key hyperparameters. Full configurations live in versioned YAML referenced by each run.
| Modality | Model | Key hyperparameters | Training details |
|---|---|---|---|
| Clinical tabular | XGBoost + Optuna | n_estimators=100–300, max_depth=6, lr=0.1 | Optuna HPO (30 trials), 5-fold CV |
| Omics | VAE | latent_dim=128, hidden=[512,256], epochs=80 | Visium HVG, Leiden labels, 80/20 val |
| Knowledge graphs | GAT | layers=2, hidden=64, heads=4, epochs=200 | Center-node readout, cosine LR |
| Histopathology | DenseNet-121 | ImageNet pretrained, epochs=20, lr=1e-4 | 28×28, PathMNIST augmentation |
| Clinical text | ClinicalBERT | epochs=15, lr=2e-5, batch=16 | Mean-pooled embeddings, 5-class |
| Single-cell | DenseNet-121 | ImageNet pretrained, epochs=20, lr=1e-4 | 28×28, BloodMNIST |
| Drug discovery | GCN | layers=3, hidden=128, epochs=100 | Class-weighted loss, molecular featurization |
| Multi-modal fusion | Attention fusion | heads=4, hidden=256 | PCA embeddings from training set only |
Each run is defined by a YAML committed to version control; seed=1337 applies across numpy, torch, and sklearn unless a modality-specific seed is documented.
Reproducibility controls and boundaries
Each control is paired with its current limit. Determinism is treated as a bounded engineering property within this scope.
Fixed global seed
seed=1337 for numpy, torch, and sklearn random operations.
Limit: stochastic library calls outside these three sources are not centrally seeded.
Deterministic algorithms
torch.use_deterministic_algorithms(True) is enabled where supported by the operator set in use.
Limit: a small number of GPU kernels remain non-deterministic and are documented per modality where they apply.
YAML-defined runs
Every run is defined by a YAML committed to version control; the run record references the exact configuration hash.
Limit: external dataset hosts are pinned by URL/version, not by content hash for every adapter.
Snakemake DAG
Stage dependencies are declared in the Snakefile. The DAG enforces ordering with no implicit stage coupling.
Limit: parallel rule execution can interleave logs; per-rule artifacts remain ordered.
Hashed bundles
DVC checksum workflows are configured for the staged artifact bundles produced by each run.
Limit: full end-to-end DVC enforcement across the primary pipeline is in progress.
Static analysis
Pyright reports 0 errors and Ruff 0.15.4 reports 0 violations on the pipeline source at the time of the reported runs.
Limit: static checks bound code quality, not numerical correctness; correctness is bounded by the validation surface.
From configuration to evidence bundle
A run is fully described by its configuration hash and the artifact hashes it produces. Re-execution from the same configuration on the same environment is expected to reproduce the bundle within the determinism limits noted above.