Add per-label integration for sub mode#294
Conversation
Reuse the existing integration workflow after splitting base AnnData labels so sub-mode runs can create per-label embeddings in the combined output.
|
Update INTEGRATE subworkflow snapshots for the integration meta field and add pipeline test snapshots including versions.yml for nf-core lint.
Keep meta.id as the sample subset while meta.integration carries the method, so publish prefixes and scib filtering stay correct for per-label runs.
Avoid coupling published filenames to meta.id when only the subset label should distinguish per-label runs.
…elpers. Extract subset expansion into a dedicated subworkflow so CLUSTER can run graph, UMAP, Leiden, and entropy as a linear pipeline without UMAP id workarounds or duplicated plan matching in the parent workflow.
Fall back to meta.id in publish prefixes so isolated module nf-tests keep stable output names after the per-label integration meta changes.
Use meta.id as fallback in ADATA_MERGEEMBEDDINGS publishDir so module nf-tests do not write under a null integration key.
ADATA_MERGEEMBEDDINGS looked up base obsm keys from meta.id, which is still "merged" in extension mode after the integration meta refactor, causing KeyError X_merged.
Version capture:
|
|
Per-label integration sets ext.prefix to integration-subset, so the
embedding pickle must match X_${prefix}.pkl rather than meta.id alone.
Version capture now uses topic: versions at pipeline level; remove the legacy ch_versions channel and drop versions from subworkflow outputs.
Drop the forwarded versions emit and add a per-label scimilarity stub regression test for integrate_per_label runs.
Summary
--integrate_per_labelfor base-adata-only sub mode to split bybase_label_coland reuse the existingINTEGRATEsubworkflow per label.nft-anndata.Test plan
nftu subworkflows/local/sub_integrate/tests/main.nf.testnft subworkflows/local/sub_integrate/tests/main.nf.testnft tests/main_pipeline_sub_integrate_per_label.nf.test