Fusion methodology

Evidence for one validated fusion configuration in the current academic build: omics, vision PCA, and spatial inputs combined through attention with a small measured lift over the strongest unimodal baseline.

Validated fusion

Validated three-input fusion for tissue-region classification

Attention-based fusion shows a small measured gain over the strongest unimodal baseline.

This page documents the narrow fusion result currently supported by the SNPTX academic record. The validated configuration combines an omics representation, vision PCA features, and spatial features through a shared projection and a four-head attention block before classification on a 14-class tissue-region task. The observed lift is positive but small, and the claim remains bounded to that reported configuration.

Task

14-class

Tissue-region classification in the reported validation setting.

Validated inputs

Omics, vision PCA, and spatial features enter the fused model.

Best unimodal

92.8%

Omics VAE is the strongest single-modality comparator.

Fusion

93.0%

A modest lift of 0.2 percentage points over the best unimodal result.

Validated pipeline

Three inputs, shared projection, cross-modal attention, classifier

The mechanism shown here is narrower than the full SNPTX framework surface. It explains only the validated fusion path used for the reported result, with train-split PCA handling for the image-derived features and a shared projection before attention-based fusion.

Why this diagram matters

It identifies the exact path behind the validated result instead of collapsing the wider framework into a single generic multimodal claim.

Key technical boundary

Vision features enter as PCA-derived inputs fitted on the training split. That keeps the feature transformation inside the reported evaluation boundary.

Relation to the wider build

SNPTX supports broader modality families at the framework level, but those broader surfaces are documented on the architecture page rather than counted as validated fusion evidence here.

Comparative evidence

Performance against unimodal baselines

The fusion result is only informative if the strongest unimodal comparator stays visible. The table therefore reports the fused model alongside the single-input alternatives it is meant to improve upon.

Configuration	Accuracy	Interpretation
Omics only (VAE)	92.8%	Strongest unimodal baseline in the reported comparison.
Vision PCA only	87.4%	Morphological signal contributes useful information, but not enough to match omics alone.
Spatial features only	72.1%	Weakest single-input result in the validated set.
Fusion (omics + vision + spatial)	93.0%	Small positive lift of 0.2 percentage points over the best unimodal model.

Interpretation

What the numbers justify

On the reported tissue-region task, fusion outperforms the best unimodal baseline by a narrow margin. That supports a complementary-signal claim, but not a broad statement that fusion is categorically superior across tasks or cohorts.

Evaluation posture

Holdout accuracy of 93.0% and 5-fold CV of 92.8% ± 0.7 are directionally consistent with the leakage-controlled reading presented elsewhere on the site. They support stability within the reported experiment, but they do not independently establish robustness under new cohort, site, or modality shifts.

What is inferred

The defensible inference is limited: image-derived and spatial covariates add some discriminative value beyond omics alone in this reported configuration. The magnitude of that gain remains modest.

Future hardening

Broader multi-modal work remains prospective

The roadmap is retained only to mark the boundary between the validated fusion result and the wider research agenda. These items indicate where the framework may expand, not what the current page treats as demonstrated.

Workstream	Examples	Status
Cross-modal alignment	InfoNCE, CLIP-style contrastive alignment, Deep CCA	Planned research work
Uncertainty and calibration	Evidential models, conformal prediction, richer confidence reporting	Planned hardening
Richer fusion operators	Tensor fusion, mixture-of-experts, equivariant graph integration	Not part of the validated result
Domain-adaptive text and reconstruction objectives	Biomedical NLP augmentation, masked autoencoder-style objectives	Prospective extension surface

Validated scope

What is claimed here

One attention-based fusion configuration using three inputs shows a small measured improvement over the strongest unimodal baseline on the reported tissue-region task.

Not claimed

What this page does not claim

This page does not claim general multimodal superiority, clinical utility, production readiness, or transfer robustness across new cohorts, sites, or modality mixes.

Framework consistency

How this fits the current build

The wider SNPTX framework still includes broader modality and extension surfaces. This page stays consistent with that build by treating only the narrower validated fusion path as evidence, while leaving the larger framework story to architecture and methodology.

Evidence path

For the broader execution structure see Architecture. For evaluation controls and boundary-setting around reported metrics see Validation. For run semantics, reproducibility posture, and leakage discipline see Methodology.