Add per-label integration for sub mode by nictru · Pull Request #294 · nf-core/scdownstream

nictru · 2026-06-07T10:37:15Z

Summary

Add --integrate_per_label for base-adata-only sub mode to split by base_label_col and reuse the existing INTEGRATE subworkflow per label.
Preserve per-label metadata through integration and clustering so outputs can be merged back into a single finalized AnnData.
Add focused nf-test coverage for the new subworkflow and an end-to-end pipeline case using nft-anndata.

Test plan

nftu subworkflows/local/sub_integrate/tests/main.nf.test
nft subworkflows/local/sub_integrate/tests/main.nf.test
nft tests/main_pipeline_sub_integrate_per_label.nf.test

Reuse the existing integration workflow after splitting base AnnData labels so sub-mode runs can create per-label embeddings in the combined output.

github-actions · 2026-06-07T10:38:47Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit f073f21

+| ✅ 294 tests passed       |+
#| ❔   1 tests had warnings |#
!| ❗  15 tests had warnings |!

Details

❗ Test warnings:

files_exist - File not found: conf/igenomes.config
files_exist - File not found: conf/igenomes_ignored.config
readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
pipeline_todos - TODO string in nextflow.config: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
pipeline_todos - TODO string in CONTRIBUTING.md: Add any pipeline specific contribution guidelines here, such as coding styles, procedures, checklists etc.
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
pipeline_todos - TODO string in base.config: Check the defaults for all processes
pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required

❔ Tests fixed:

rocrate_readme_sync - Mismatch fixed: RO-Crate description updated from README.md.

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/nf-test.yml
files_exist - File found: .github/actions/get-shards/action.yml
files_exist - File found: .github/actions/nf-test/action.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-scdownstream_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/CONTRIBUTING.md
files_exist - File found: docs/images/nf-core-scdownstream_logo_light.png
files_exist - File found: docs/images/nf-core-scdownstream_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: nf-test.config
files_exist - File found: tests/default.nf.test
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File found: ro-crate-metadata.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-scdownstream_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowScdownstream.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-schema plugin
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config variable (correctly) not found: params.max_cpus
nextflow_config - Config variable (correctly) not found: params.max_memory
nextflow_config - Config variable (correctly) not found: params.max_time
nextflow_config - Config variable (correctly) not found: params.validationFailUnrecognisedParams
nextflow_config - Config variable (correctly) not found: params.validationLenientMode
nextflow_config - Config variable (correctly) not found: params.validationSchemaIgnoreParams
nextflow_config - Config variable (correctly) not found: params.validationShowHiddenParams
nextflow_config - Config variable (correctly) not found: validation.failUnrecognisedParams
nextflow_config - Config variable (correctly) not found: validation.failUnrecognisedHeaders
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 0.0.1dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.duplicate_var_resolution= sum
nextflow_config - Config default value correct: params.species= human
nextflow_config - Config default value correct: params.cell_cycle_scoring= true
nextflow_config - Config default value correct: params.ambient_correction= decontx
nextflow_config - Config default value correct: params.ambient_corrected_integration= false
nextflow_config - Config default value correct: params.doublet_detection= scrublet
nextflow_config - Config default value correct: params.doublet_detection_threshold= 1
nextflow_config - Config default value correct: params.cellbender_epochs= 150
nextflow_config - Config default value correct: params.integration_methods= scvi
nextflow_config - Config default value correct: params.integration_hvgs= 0
nextflow_config - Config default value correct: params.scimilarity_model= https://zenodo.org/records/10685499/files/model_v1.1.tar.gz
nextflow_config - Config default value correct: params.base_label_col= label
nextflow_config - Config default value correct: params.base_condition_col= condition
nextflow_config - Config default value correct: params.integrate_per_label= false
nextflow_config - Config default value correct: params.clustering_resolutions= 0.5,1.0
nextflow_config - Config default value correct: params.cluster_global= true
nextflow_config - Config default value correct: params.memory_scale= 1
nextflow_config - Config default value correct: params.scib= false
nextflow_config - Config default value correct: params.rankgenesgroups_method= wilcoxon
nextflow_config - Config default value correct: params.scvi_n_latent= 30
nextflow_config - Config default value correct: params.scvi_n_hidden= 128
nextflow_config - Config default value correct: params.scvi_n_layers= 2
nextflow_config - Config default value correct: params.scvi_dispersion= gene
nextflow_config - Config default value correct: params.scvi_gene_likelihood= zinb
nextflow_config - Config default value correct: params.pseudobulk_groupby_labels= batch
nextflow_config - Config default value correct: params.pseudobulk_min_num_cells= 5
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nictru/test-datasets/97addfb0946c0e51dbb70ee1391142d12e70f085/
nf_test_content - 'tests/main_pipeline_build.nf.test' contains outdir parameter
nf_test_content - 'tests/main_pipeline_build.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/main_pipeline_build.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/main_pipeline_qc.nf.test' contains outdir parameter
nf_test_content - 'tests/main_pipeline_qc.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/main_pipeline_qc.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/main_pipeline_build_minimal.nf.test' contains outdir parameter
nf_test_content - 'tests/main_pipeline_build_minimal.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/main_pipeline_sub.nf.test' contains outdir parameter
nf_test_content - 'tests/main_pipeline_sub.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/main_pipeline_sub.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/default.nf.test' contains outdir parameter
nf_test_content - 'tests/default.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/default.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/main_pipeline_extend.nf.test' contains outdir parameter
nf_test_content - 'tests/main_pipeline_extend.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/main_pipeline_extend.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/main_pipeline_reference_mapping.nf.test' contains outdir parameter
nf_test_content - 'tests/main_pipeline_reference_mapping.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/main_pipeline_reference_mapping.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/main_pipeline_sub_integrate_per_label.nf.test' contains outdir parameter
nf_test_content - 'tests/main_pipeline_sub_integrate_per_label.nf.test' snapshots a 'versions.yml' file
nf_test_content - 'tests/nextflow.config' contains modules_testdata_base_path
nf_test_content - 'tests/nextflow.config' contains pipelines_testdata_base_path
nf_test_content - 'nf-test.config' sets a testsDir
nf_test_content - 'nf-test.config' sets a workDir
nf_test_content - 'nf-test.config' sets a configFile
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-scdownstream_logo_light.png matches the template
files_unchanged - docs/images/nf-core-scdownstream_logo_light.png matches the template
files_unchanged - docs/images/nf-core-scdownstream_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_nf_test - '.github/workflows/nf-test.yml' is triggered on expected events
actions_nf_test - '.github/workflows/nf-test.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 25.10.4, Config: 25.10.4
readme - README nf-core template version badge found.
pipeline_if_empty_null - No ifEmpty(null) strings found
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: nf-test.yml
actions_schema_validation - Workflow validation passed: template-version-comment.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: fix_linting.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains a matching 'report_comment'.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
local_component_structure - local subworkflows directory structure is correct 'subworkflows/local/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - GET_ found in conf/modules.config and Nextflow scripts.
modules_config - COLLECT_SIZES found in conf/modules.config and Nextflow scripts.
modules_config - QC_RAW found in conf/modules.config and Nextflow scripts.
modules_config - QC_FILTERED found in conf/modules.config and Nextflow scripts.
modules_config - CELDA_DECONTX found in conf/modules.config and Nextflow scripts.
modules_config - SOUPX found in conf/modules.config and Nextflow scripts.
modules_config - SCANPY_FILTER found in conf/modules.config and Nextflow scripts.
modules_config - SCVITOOLS_SOLO found in conf/modules.config and Nextflow scripts.
modules_config - SCANPY_SCRUBLET found in conf/modules.config and Nextflow scripts.
modules_config - DOUBLETDETECTION found in conf/modules.config and Nextflow scripts.
modules_config - SCDBLFINDER found in conf/modules.config and Nextflow scripts.
modules_config - DOUBLETREMOVAL found in conf/modules.config and Nextflow scripts.
modules_config - SCANPY_CELLCYCLE found in conf/modules.config and Nextflow scripts.
modules_config - FINALIZE_QC_ANNDATAS found in conf/modules.config and Nextflow scripts.
modules_config - QC_REPORT found in conf/modules.config and Nextflow scripts.
modules_config - MYGENE found in conf/modules.config and Nextflow scripts.
modules_config - SET_INDEX found in conf/modules.config and Nextflow scripts.
modules_config - HUGOUNIFIER_GET found in conf/modules.config and Nextflow scripts.
modules_config - HUGOUNIFIER_APPLY found in conf/modules.config and Nextflow scripts.
modules_config - ADATA_UNIFY found in conf/modules.config and Nextflow scripts.
modules_config - ADATA_MERGE found in conf/modules.config and Nextflow scripts.
modules_config - ADATA_UPSETGENES found in conf/modules.config and Nextflow scripts.
modules_config - SCANPY_HVGS found in conf/modules.config and Nextflow scripts.
modules_config - SCVITOOLS_SCVI found in conf/modules.config and Nextflow scripts.
modules_config - SCVITOOLS_SCANVI found in conf/modules.config and Nextflow scripts.
modules_config - SYMPHONY_HARMONYINTEGRATE found in conf/modules.config and Nextflow scripts.
modules_config - SYMPHONY_MAPEMBEDDING found in conf/modules.config and Nextflow scripts.
modules_config - SCANPY_BBKNN found in conf/modules.config and Nextflow scripts.
modules_config - SEURAT_INTEGRATION found in conf/modules.config and Nextflow scripts.
modules_config - SCANPY_COMBAT found in conf/modules.config and Nextflow scripts.
modules_config - SCANPY_PCA found in conf/modules.config and Nextflow scripts.
modules_config - SCARCHES_EXPIMAP found in conf/modules.config and Nextflow scripts.
modules_config - SCIMILARITY_EMBED found in conf/modules.config and Nextflow scripts.
modules_config - SCIMILARITY_ANNOTATE found in conf/modules.config and Nextflow scripts.
modules_config - SCIBMETRICS_BENCHMARK found in conf/modules.config and Nextflow scripts.
modules_config - ADATA_MERGEEMBEDDINGS found in conf/modules.config and Nextflow scripts.
modules_config - CELLDEX_FETCHREFERENCE found in conf/modules.config and Nextflow scripts.
modules_config - CELLTYPES_SINGLER found in conf/modules.config and Nextflow scripts.
modules_config - CELLTYPES_CELLTYPIST found in conf/modules.config and Nextflow scripts.
modules_config - CYTETYPE found in conf/modules.config and Nextflow scripts.
modules_config - NEIGHBORS found in conf/modules.config and Nextflow scripts.
modules_config - LEIDEN found in conf/modules.config and Nextflow scripts.
modules_config - ENTROPY found in conf/modules.config and Nextflow scripts.
modules_config - UMAP found in conf/modules.config and Nextflow scripts.
modules_config - SCANPY_PAGA found in conf/modules.config and Nextflow scripts.
modules_config - LIANA_RANKAGGREGATE found in conf/modules.config and Nextflow scripts.
modules_config - SCANPY_RANKGENESGROUPS found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 4.0.2
rocrate_readme_sync - RO-Crate description matches the README.md.
container_configs - conf/containers_conda_lock_files_amd64.config is up to date
container_configs - conf/containers_conda_lock_files_arm64.config is up to date
container_configs - conf/containers_docker_amd64.config is up to date
container_configs - conf/containers_docker_arm64.config is up to date
container_configs - conf/containers_singularity_https_amd64.config is up to date
container_configs - conf/containers_singularity_https_arm64.config is up to date
container_configs - conf/containers_singularity_oras_amd64.config is up to date
container_configs - conf/containers_singularity_oras_arm64.config is up to date

Run details

nf-core/tools version 4.0.2
Run at 2026-06-22 07:27:18

Update INTEGRATE subworkflow snapshots for the integration meta field and add pipeline test snapshots including versions.yml for nf-core lint.

Keep meta.id as the sample subset while meta.integration carries the method, so publish prefixes and scib filtering stay correct for per-label runs.

Avoid coupling published filenames to meta.id when only the subset label should distinguish per-label runs.

…elpers. Extract subset expansion into a dedicated subworkflow so CLUSTER can run graph, UMAP, Leiden, and entropy as a linear pipeline without UMAP id workarounds or duplicated plan matching in the parent workflow.

Fall back to meta.id in publish prefixes so isolated module nf-tests keep stable output names after the per-label integration meta changes.

Use meta.id as fallback in ADATA_MERGEEMBEDDINGS publishDir so module nf-tests do not write under a null integration key.

ADATA_MERGEEMBEDDINGS looked up base obsm keys from meta.id, which is still "merged" in extension mode after the integration meta refactor, causing KeyError X_merged.

nictru · 2026-06-21T19:37:33Z

Version capture: `INTEGRATE` / `SUB_INTEGRATE` still mix `.out.versions`

While testing per-label sub integration (integrate_per_label=true, e.g. splitting on coarse_annotation), the pipeline fails before SUB_INTEGRATE completes:

ERROR ~ No such variable: Exception evaluating property 'versions' for nextflow.script.ChannelOut,
Reason: groovy.lang.MissingPropertyException: No such property: versions for class: groovyx.gpars.dataflow.DataflowBroadcast

 -- Check script 'subworkflows/local/integrate/main.nf' at line: 238

Root cause

Module outputs now publish versions via topic: versions (collected in workflows/scdownstream.nf with channel.topic("versions")), but INTEGRATE still uses the old pattern:

ch_versions = channel.empty() + ch_versions.mix(<module>.out.versions) for each integration method
ch_versions.mix(SCIMILARITY.out.versions) at line 238 — SCIMILARITY subworkflow does not emit versions, so this blows up when scimilarity is in integration_methods

Same pattern likely applies anywhere subworkflows mix .out.versions from child workflows that no longer expose that emit.

Suggested fix

Align INTEGRATE (and SUB_INTEGRATE) with other subworkflows (CLUSTER, COMBINE, PER_GROUP, etc.):

Remove all ch_versions mixing in subworkflows/local/integrate/main.nf
Drop the versions emit from INTEGRATE
Drop versions = INTEGRATE.out.versions from subworkflows/local/sub_integrate/main.nf
Update subworkflows/local/integrate/tests/main.nf.test snapshots — remove workflow.out.versions from assertions (versions are covered at pipeline level via topic)

Repro (publication repo)

04_scdownstream/04_sub/nextflow.config
  integrate_per_label = true
  integration_methods = 'scvi,scimilarity'
  base_label_col = 'coarse_annotation'

Run sub mode with a base merged.h5ad that has the label column in obs.

TODO: address on this branch; integrate nf-test snapshots will need nftu after the fix.

nictru · 2026-06-22T06:18:10Z

`SCIMILARITY_EMBED`: obsm pickle filename mismatch in per-label integration

When running integrate_per_label=true with scimilarity in integration_methods, SCIMILARITY_EMBED can fail with:

Missing output file(s) `X_scimilarity-Erythrocyte.pkl` expected by process
`NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:SUB_INTEGRATE:INTEGRATE:SCIMILARITY:SCIMILARITY_EMBED (Erythrocyte)`

Root cause

conf/modules.config sets a subset-aware prefix for scimilarity modules:

ext.prefix = { (meta.integration ?: meta.id) + (meta.subset ? "-${meta.subset}" : '') }

So for the Erythrocyte subset, prefix is scimilarity-Erythrocyte.

modules/local/scimilarity/embed/main.nf declares output: X_${prefix}.pkl → X_scimilarity-Erythrocyte.pkl
modules/local/scimilarity/embed/templates/embed.py was writing: X_${meta.id}.pkl → X_Erythrocyte.pkl

meta.id is only the subset name from SUB_INTEGRATE; it does not include the integration method. This bug is latent on global runs (where prefix often equals meta.id) but breaks per-label sub-integration.

Other integration modules (scVI, PCA, symphony, etc.) already use ${prefix} for obsm pickles.

Fix

In modules/local/scimilarity/embed/templates/embed.py:

-df.to_pickle("X_${meta.id}.pkl")
+df.to_pickle("X_${prefix}.pkl")

Repro

04_scdownstream/04_sub run config: integrate_per_label=true, integration_methods=scvi,scimilarity, split on coarse_annotation.

Per-label integration sets ext.prefix to integration-subset, so the embedding pickle must match X_${prefix}.pkl rather than meta.id alone.

Version capture now uses topic: versions at pipeline level; remove the legacy ch_versions channel and drop versions from subworkflow outputs.

Drop the forwarded versions emit and add a per-label scimilarity stub regression test for integrate_per_label runs.

Add per-label integration for sub mode

ba38533

Reuse the existing integration workflow after splitting base AnnData labels so sub-mode runs can create per-label embeddings in the combined output.

Fix nf-test snapshots for per-label integration

a726e27

Update INTEGRATE subworkflow snapshots for the integration meta field and add pipeline test snapshots including versions.yml for nf-core lint.

nictru marked this pull request as ready for review June 16, 2026 13:42

nictru added 11 commits June 17, 2026 20:09

Fix per-label integration wiring

988aabd

Fix per-label integration follow-ups

051d33d

Simplify base integration parameters

0d10247

Use integration meta for per-label output naming and fix CI snapshots.

03010bb

Keep meta.id as the sample subset while meta.integration carries the method, so publish prefixes and scib filtering stay correct for per-label runs.

Use meta.subset for per-label integration output prefixes.

1127299

Avoid coupling published filenames to meta.id when only the subset label should distinguish per-label runs.

Fix integration module prefixes when meta.integration is unset.

d0d9344

Fall back to meta.id in publish prefixes so isolated module nf-tests keep stable output names after the per-label integration meta changes.

Fix merge-embeddings publish path when meta.integration is unset.

7390b34

Use meta.id as fallback in ADATA_MERGEEMBEDDINGS publishDir so module nf-tests do not write under a null integration key.

Fix extension embedding merge to use meta.integration.

f7e1e24

ADATA_MERGEEMBEDDINGS looked up base obsm keys from meta.id, which is still "merged" in extension mode after the integration meta refactor, causing KeyError X_merged.

Avoid duplicate analysis-plan matching in CLUSTER Leiden prep.

d2b7380

Pass integration metadata via module inputs instead of meta maps.

828668e

nictru added 3 commits June 22, 2026 09:23

Fix SCIMILARITY_EMBED obsm output to use task prefix.

d71a4c8

Per-label integration sets ext.prefix to integration-subset, so the embedding pickle must match X_${prefix}.pkl rather than meta.id alone.

Stop mixing module versions in INTEGRATE subworkflow.

17b5200

Version capture now uses topic: versions at pipeline level; remove the legacy ch_versions channel and drop versions from subworkflow outputs.

Align SUB_INTEGRATE with topic-based version capture.

f073f21

Drop the forwarded versions emit and add a per-label scimilarity stub regression test for integrate_per_label runs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-label integration for sub mode#294

Add per-label integration for sub mode#294
nictru wants to merge 16 commits into
devfrom
per-group-integration

nictru commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026 •

edited

Loading

❗ Test warnings:

❔ Tests fixed:

✅ Tests passed:

Run details

Uh oh!

nictru commented Jun 21, 2026

Uh oh!

nictru commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nictru commented Jun 7, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests fixed:

✅ Tests passed:

Run details

Uh oh!

nictru commented Jun 21, 2026

Version capture: INTEGRATE / SUB_INTEGRATE still mix .out.versions

Root cause

Suggested fix

Repro (publication repo)

Uh oh!

nictru commented Jun 22, 2026

SCIMILARITY_EMBED: obsm pickle filename mismatch in per-label integration

Root cause

Fix

Repro

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 7, 2026 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Version capture: `INTEGRATE` / `SUB_INTEGRATE` still mix `.out.versions`

`SCIMILARITY_EMBED`: obsm pickle filename mismatch in per-label integration