Experimental methodology

Compute environment, dataset coverage, per-modality training configuration, and the reproducibility controls that bound the reported results.

Methods
Environment, data, training, and reproducibility controls

Controls and configuration that bound the reported benchmark surface.

Experiments run on a fixed compute environment with versioned configuration, fixed seeds, deterministic CUDA where supported, DAG-enforced execution order, and tracked artifacts. These controls improve reproducibility within the documented scope; they are not presented as universal guarantees.

Compute and software environment

Single-node configuration used for the reported experiments. Static-analysis posture is reported separately under reproducibility controls.

ResourceSpecification
InstanceAWS EC2 g5.xlarge
GPUNVIDIA A10G, 23 GB VRAM
CUDA toolkit12.1 (PyTorch build)
Python3.11.14
PyTorch2.5.1+cu121
OrchestrationSnakemake 9.16.3 (34 rules)
TrackingMLflow 3.10.0
VersioningDVC (configured; partial integration with the primary pipeline)

Dataset coverage

Eight modality families. The table summarizes the integrated adapters by family; see the note below for the full adapter accounting.

ModalityDatasetSourceSamplesTask
Clinical tabularSynthea readmissionSynthea (synthetic EHR)6,625Binary classification
OmicsVisium breast cancer10x Genomics3,79814-class tissue region
Knowledge graphsHetionetHimmelstein et al.1,913 ego-graphs8-class node classification
HistopathologyPathMNISTMedMNIST107,1809-class classification
Clinical textMTSamplesMTSamples.com~3,5005-class specialty
Single-cellBloodMNISTMedMNIST17,0928-class classification
Drug discoveryChEMBL bioactivityChEMBL database4,685Binary bioactivity
Classical MLIris, Wine, Breast Cancer, DigitsUCI / scikit-learn150–1,797Multi-class classification

Adapter accounting: 46 adapters are declared in the data registry across these 8 families. 37 are integrated (one or more adapters per row above); 9 are specified but not yet integrated and are excluded from the reported runs.

Training configuration

Per-modality model choice and key hyperparameters. Full configurations live in versioned YAML referenced by each run.

ModalityModelKey hyperparametersTraining details
Clinical tabularXGBoost + Optunan_estimators=100–300, max_depth=6, lr=0.1Optuna HPO (30 trials), 5-fold CV
OmicsVAElatent_dim=128, hidden=[512,256], epochs=80Visium HVG, Leiden labels, 80/20 val
Knowledge graphsGATlayers=2, hidden=64, heads=4, epochs=200Center-node readout, cosine LR
HistopathologyDenseNet-121ImageNet pretrained, epochs=20, lr=1e-428×28, PathMNIST augmentation
Clinical textClinicalBERTepochs=15, lr=2e-5, batch=16Mean-pooled embeddings, 5-class
Single-cellDenseNet-121ImageNet pretrained, epochs=20, lr=1e-428×28, BloodMNIST
Drug discoveryGCNlayers=3, hidden=128, epochs=100Class-weighted loss, molecular featurization
Multi-modal fusionAttention fusionheads=4, hidden=256PCA embeddings from training set only

Each run is defined by a YAML committed to version control; seed=1337 applies across numpy, torch, and sklearn unless a modality-specific seed is documented.

Reproducibility controls and boundaries

Each control is paired with its current limit. Determinism is treated as a bounded engineering property within this scope.

Seeds

Fixed global seed

seed=1337 for numpy, torch, and sklearn random operations.

Limit: stochastic library calls outside these three sources are not centrally seeded.

CUDA determinism

Deterministic algorithms

torch.use_deterministic_algorithms(True) is enabled where supported by the operator set in use.

Limit: a small number of GPU kernels remain non-deterministic and are documented per modality where they apply.

Versioned configuration

YAML-defined runs

Every run is defined by a YAML committed to version control; the run record references the exact configuration hash.

Limit: external dataset hosts are pinned by URL/version, not by content hash for every adapter.

Execution order

Snakemake DAG

Stage dependencies are declared in the Snakefile. The DAG enforces ordering with no implicit stage coupling.

Limit: parallel rule execution can interleave logs; per-rule artifacts remain ordered.

Artifact integrity

Hashed bundles

DVC checksum workflows are configured for the staged artifact bundles produced by each run.

Limit: full end-to-end DVC enforcement across the primary pipeline is in progress.

Code-quality posture

Static analysis

Pyright reports 0 errors and Ruff 0.15.4 reports 0 violations on the pipeline source at the time of the reported runs.

Limit: static checks bound code quality, not numerical correctness; correctness is bounded by the validation surface.

Per-run lifecycle

From configuration to evidence bundle

SNPTX per-run reproducibility lifecycle A run begins from a YAML configuration with a fixed seed, executes through the Snakemake DAG with deterministic CUDA where supported, is tracked by MLflow, produces DVC-hashed artifacts, and emits a referenced evidence bundle. PER-RUN LIFECYCLE — SEED · DAG · TRACKING · HASHING config.yaml seed=1337 · pinned versions snakemake run DAG order · deterministic CUDA mlflow + dvc tracked params · hashed artifacts evidence bundle metrics + manifest + config hash SEED RUN HASH

A run is fully described by its configuration hash and the artifact hashes it produces. Re-execution from the same configuration on the same environment is expected to reproduce the bundle within the determinism limits noted above.