Fusion methodology

Evidence for one validated fusion configuration in the current academic build: omics, vision PCA, and spatial inputs combined through attention with a small measured lift over the strongest unimodal baseline.

Validated fusion
Validated three-input fusion for tissue-region classification

Attention-based fusion shows a small measured gain over the strongest unimodal baseline.

This page documents the narrow fusion result currently supported by the SNPTX academic record. The validated configuration combines an omics representation, vision PCA features, and spatial features through a shared projection and a four-head attention block before classification on a 14-class tissue-region task. The observed lift is positive but small, and the claim remains bounded to that reported configuration.

Task
14-class
Tissue-region classification in the reported validation setting.
Validated inputs
3
Omics, vision PCA, and spatial features enter the fused model.
Best unimodal
92.8%
Omics VAE is the strongest single-modality comparator.
Fusion
93.0%
A modest lift of 0.2 percentage points over the best unimodal result.
Validated pipeline

Three inputs, shared projection, cross-modal attention, classifier

The mechanism shown here is narrower than the full SNPTX framework surface. It explains only the validated fusion path used for the reported result, with train-split PCA handling for the image-derived features and a shared projection before attention-based fusion.

Validated SNPTX fusion architecture A three-input fusion pipeline in which omics embedding, vision PCA features, and spatial features are projected into a shared 256-dimensional representation, fused by four-head attention, and passed to a classifier. INPUTS (FROM VALIDATED CONFIG) ALIGNMENT FUSION OUTPUT Omics embedding VAE-derived representation Vision PCA features PCA fitted on the training split Spatial features Region-level covariates Shared projection 256-dim aligned space Attention fusion 4 heads, hidden 256 cross-modal weighting Classifier 93.0% validation per-input signals shared dimension cross-modal weights predicted class
Why this diagram matters

It identifies the exact path behind the validated result instead of collapsing the wider framework into a single generic multimodal claim.

Key technical boundary

Vision features enter as PCA-derived inputs fitted on the training split. That keeps the feature transformation inside the reported evaluation boundary.

Relation to the wider build

SNPTX supports broader modality families at the framework level, but those broader surfaces are documented on the architecture page rather than counted as validated fusion evidence here.

Comparative evidence

Performance against unimodal baselines

The fusion result is only informative if the strongest unimodal comparator stays visible. The table therefore reports the fused model alongside the single-input alternatives it is meant to improve upon.

Configuration Accuracy Interpretation
Omics only (VAE) 92.8% Strongest unimodal baseline in the reported comparison.
Vision PCA only 87.4% Morphological signal contributes useful information, but not enough to match omics alone.
Spatial features only 72.1% Weakest single-input result in the validated set.
Fusion (omics + vision + spatial) 93.0% Small positive lift of 0.2 percentage points over the best unimodal model.
Interpretation

What the numbers justify

On the reported tissue-region task, fusion outperforms the best unimodal baseline by a narrow margin. That supports a complementary-signal claim, but not a broad statement that fusion is categorically superior across tasks or cohorts.

Evaluation posture

Holdout accuracy of 93.0% and 5-fold CV of 92.8% ± 0.7 are directionally consistent with the leakage-controlled reading presented elsewhere on the site. They support stability within the reported experiment, but they do not independently establish robustness under new cohort, site, or modality shifts.

What is inferred

The defensible inference is limited: image-derived and spatial covariates add some discriminative value beyond omics alone in this reported configuration. The magnitude of that gain remains modest.

Future hardening

Broader multi-modal work remains prospective

The roadmap is retained only to mark the boundary between the validated fusion result and the wider research agenda. These items indicate where the framework may expand, not what the current page treats as demonstrated.

Workstream Examples Status
Cross-modal alignment InfoNCE, CLIP-style contrastive alignment, Deep CCA Planned research work
Uncertainty and calibration Evidential models, conformal prediction, richer confidence reporting Planned hardening
Richer fusion operators Tensor fusion, mixture-of-experts, equivariant graph integration Not part of the validated result
Domain-adaptive text and reconstruction objectives Biomedical NLP augmentation, masked autoencoder-style objectives Prospective extension surface
Validated scope

What is claimed here

One attention-based fusion configuration using three inputs shows a small measured improvement over the strongest unimodal baseline on the reported tissue-region task.

Not claimed

What this page does not claim

This page does not claim general multimodal superiority, clinical utility, production readiness, or transfer robustness across new cohorts, sites, or modality mixes.

Framework consistency

How this fits the current build

The wider SNPTX framework still includes broader modality and extension surfaces. This page stays consistent with that build by treating only the narrower validated fusion path as evidence, while leaving the larger framework story to architecture and methodology.

Evidence path

For the broader execution structure see Architecture. For evaluation controls and boundary-setting around reported metrics see Validation. For run semantics, reproducibility posture, and leakage discipline see Methodology.