Evidence boundaries and near-term priorities

This page bounds the rest of the document. Demonstrated results live on Results and Evidence; this page records where those results stop.

Bounded scope
Evidence posture

What is implemented, what is partial.

The first table is the page in compressed form: each dimension states what the current system actually does and where the platform claim stops. Subsequent sections expand the partial column into specific limitations, declared out-of-scope items, and the technical work that addresses them.

Dimension Implemented Partial / roadmap boundary
Determinism Seed controls and deterministic algorithm flags configured across pipelines. No universal cross-hardware byte-identity claim; determinism is asserted within a fixed environment.
Versioning MLflow tracking and report-level provenance available for every recorded run. DVC integration with the primary pipeline is partial; not all artifacts are content-addressed end-to-end.
Feedback Autonomous experimentation loop operational with surrogate, acquisition, and stopping rules. Framework-wide feedback artifact contract is pending; today the loop is wired per-campaign rather than as a shared interface.
Deployment Pilot-ready research workflow with evaluation package, audit trail, and customer-hosted serving path. No claim of clinical or commercial production deployment; no IRB-governed prospective use to date.
Interpretation

The implemented column reflects behavior present in the current pipelines; the partial column states where that behavior stops. Items in the partial column are not failures — they are declared boundaries that prevent over-reading the platform claim.

Known limitations

Tagged by the constraint that produces the limit: data availability, scope of validation, or unvalidated scale.

Data-bound training-data availability Scope-bound validation scope Unvalidated behavior at larger scale
Area Limitation Implication
Data-bound
Clinical text
61.0% accuracy
ClinicalBERT fine-tuned for 5-class specialty classification on ~3,500 training samples. Accuracy plateaus at 61% as a function of dataset size; further gains require larger labeled clinical corpora rather than architectural change.
Scope-bound
Fusion lift
+0.2 pp over best unimodal
Three-modality fusion reaches 93.0% versus 92.8% for the best unimodal model (omics). Cross-modal benefit is modest in this regime because omics alone captures most of the available signal; higher-dimensional morphological features are the most likely route to a larger lift.
Scope-bound
Fusion arity
Three-modality fusion (omics + imaging + text) is empirically validated. Four-or-more modality combinations are architecturally supported but untested; attention-based fusion scales theoretically, empirical validation is pending.
Unvalidated
Dataset scale
Largest evaluated dataset is PathMNIST (107K samples). Scaling behavior at biobank scale (UK Biobank, All of Us) is not validated; throughput and statistical behavior at >106 samples are inferred, not measured.
Scope-bound
Single institution
All experiments executed on one EC2 environment. Reproducibility is asserted within-environment; cross-institution and cross-hardware reproducibility require an external pilot to verify.
Unvalidated
Intelligence layer
5/15 modules trained
Five intelligence-layer modules are trained and reported; ten are specified but not trained. The unspecified ten are grounded in literature interfaces but lack empirical validation in this codebase. Treat them as design surface, not as evidence.

Explicitly out of scope

Claims this document does not make. They are listed so the rest of the site can be read without ambiguity about what SNPTX is for today.

Clinical use

No diagnostic or treatment claim

No model on this site is presented as diagnostic, prognostic, or treatment-guiding. Outputs are research artifacts; clinical use would require regulatory pathway, prospective evidence, and clinical workflow integration that are not in scope here.

Production deployment

No commercial production claim

The deployment surface is pilot-scoped: customer-hosted, audit-trailed, and bounded to a defined evaluation package. SNPTX does not claim continuous production operation, on-call SLAs, or revenue-bearing service.

Cross-site reproducibility

No multi-institution guarantee

Reproducibility guarantees are validated within a single EC2 environment. Cross-institution byte-identity is not asserted; cross-institution behavioral reproducibility requires a pilot run to verify.

Scale

No biobank-scale benchmark

No experiment in this document is run at >106 samples. Statements about behavior at biobank scale are extrapolations from the validated regime, not measurements.

Autonomy

No unsupervised decisioning

The autonomous experimentation loop selects the next experiment under declared stopping rules and operator review. It does not autonomously act on clinical, financial, or external decisions.

Causal inference

No causal claim from current modules

Reported metrics are predictive. Treatment-effect estimation and counterfactual reasoning are listed as a near-term priority and are not present in the current evidence.

Near-term technical priorities

Each priority extends from a documented gap above. Listed as work to be done, not as work that has been done.

Imaging

Whole-slide imaging

Multiple-instance learning with attention pooling over tile embeddings; integration with domain foundation models (UNI, CONCH) through the extension boundary.

Foundation models

Domain-pretrained adapter integration

Adapter-based fine-tuning of GatorTron, BioGPT, and ESM-2 attached through the mediated extension runner rather than embedded in the core DAG.

Graph scale

Large-scale graph reasoning

Scaling the GCN pipeline to knowledge graphs with >106 nodes; characterizing message-passing throughput and memory at biobank scale before claiming behavior there.

Transfer

Multi-task and continual learning

Shared representations across related tasks via the embedding registry; continual updates across campaigns without retraining from scratch.

Prospective

Cross-institutional pilot

External deployment with artifact-hash comparison across sites and an IRB-governed cohort pilot to convert reproducibility from within-environment to cross-institution.

Causal

Treatment-effect estimation

Extending the intelligence layer with treatment-effect estimators and counterfactual reasoning so causal questions sit on the same execution spine as predictive ones.

Continuation logic

The priorities above are extensions of present gaps rather than aspirational research themes. Each one names the gap it addresses; progress against any item should be reflected back into the boundary table on this page rather than added quietly elsewhere.