Attention-based fusion shows a small measured gain over the strongest unimodal baseline.
This page documents the narrow fusion result currently supported by the SNPTX academic record. The validated configuration combines an omics representation, vision PCA features, and spatial features through a shared projection and a four-head attention block before classification on a 14-class tissue-region task. The observed lift is positive but small, and the claim remains bounded to that reported configuration.
Three inputs, shared projection, cross-modal attention, classifier
The mechanism shown here is narrower than the full SNPTX framework surface. It explains only the validated fusion path used for the reported result, with train-split PCA handling for the image-derived features and a shared projection before attention-based fusion.
It identifies the exact path behind the validated result instead of collapsing the wider framework into a single generic multimodal claim.
Vision features enter as PCA-derived inputs fitted on the training split. That keeps the feature transformation inside the reported evaluation boundary.
SNPTX supports broader modality families at the framework level, but those broader surfaces are documented on the architecture page rather than counted as validated fusion evidence here.
Performance against unimodal baselines
The fusion result is only informative if the strongest unimodal comparator stays visible. The table therefore reports the fused model alongside the single-input alternatives it is meant to improve upon.
| Configuration | Accuracy | Interpretation |
|---|---|---|
| Omics only (VAE) | 92.8% | Strongest unimodal baseline in the reported comparison. |
| Vision PCA only | 87.4% | Morphological signal contributes useful information, but not enough to match omics alone. |
| Spatial features only | 72.1% | Weakest single-input result in the validated set. |
| Fusion (omics + vision + spatial) | 93.0% | Small positive lift of 0.2 percentage points over the best unimodal model. |
What the numbers justify
On the reported tissue-region task, fusion outperforms the best unimodal baseline by a narrow margin. That supports a complementary-signal claim, but not a broad statement that fusion is categorically superior across tasks or cohorts.
Holdout accuracy of 93.0% and 5-fold CV of 92.8% ± 0.7 are directionally consistent with the leakage-controlled reading presented elsewhere on the site. They support stability within the reported experiment, but they do not independently establish robustness under new cohort, site, or modality shifts.
The defensible inference is limited: image-derived and spatial covariates add some discriminative value beyond omics alone in this reported configuration. The magnitude of that gain remains modest.
Broader multi-modal work remains prospective
The roadmap is retained only to mark the boundary between the validated fusion result and the wider research agenda. These items indicate where the framework may expand, not what the current page treats as demonstrated.
| Workstream | Examples | Status |
|---|---|---|
| Cross-modal alignment | InfoNCE, CLIP-style contrastive alignment, Deep CCA | Planned research work |
| Uncertainty and calibration | Evidential models, conformal prediction, richer confidence reporting | Planned hardening |
| Richer fusion operators | Tensor fusion, mixture-of-experts, equivariant graph integration | Not part of the validated result |
| Domain-adaptive text and reconstruction objectives | Biomedical NLP augmentation, masked autoencoder-style objectives | Prospective extension surface |
What is claimed here
One attention-based fusion configuration using three inputs shows a small measured improvement over the strongest unimodal baseline on the reported tissue-region task.
What this page does not claim
This page does not claim general multimodal superiority, clinical utility, production readiness, or transfer robustness across new cohorts, sites, or modality mixes.
How this fits the current build
The wider SNPTX framework still includes broader modality and extension surfaces. This page stays consistent with that build by treating only the narrower validated fusion path as evidence, while leaving the larger framework story to architecture and methodology.
For the broader execution structure see Architecture. For evaluation controls and boundary-setting around reported metrics see Validation. For run semantics, reproducibility posture, and leakage discipline see Methodology.