Spatial Decomposition

Estimation of cell type proportions per spot in 2D space from spatial transcriptomic data coupled with corresponding single-cell data

Repository: openproblems-bio/task_spatial_decomposition

Description

Spatial decomposition (also often referred to as Spatial deconvolution) is applicable to spatial transcriptomics data where the transcription profile of each capture location (spot, voxel, bead, etc.) do not share a bijective relationship with the cells in the tissue, i.e., multiple cells may contribute to the same capture location. The task of spatial decomposition then refers to estimating the composition of cell types/states that are present at each capture location. The cell type/states estimates are presented as proportion values, representing the proportion of the cells at each capture location that belong to a given cell type.

We distinguish between reference-based decomposition and de novo decomposition, where the former leverage external data (e.g., scRNA-seq or scNuc-seq) to guide the inference process, while the latter only work with the spatial data. We require that all datasets have an associated reference single cell data set, but methods are free to ignore this information.

Due to the lack of real datasets with the necessary ground-truth, this task makes use of a simulated dataset generated by creating cell-aggregates by sampling from a Dirichlet distribution. The ground-truth dataset consists of the spatial expression matrix, XY coordinates of the spots, true cell-type proportions for each spot, and the reference single-cell data (from which cell aggregated were simulated).

Authors & contributors

name	roles
Giovanni Palla	author, maintainer
Scott Gigante	author
Sai Nirmayi Yasa	contributor

API

flowchart TB
  file_common_dataset("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-common-dataset'>Common Dataset</a>")
  comp_process_dataset[/"<a href='https://github.com/openproblems-bio/task_spatial_decomposition#component-type-data-processor'>Data processor</a>"/]
  file_single_cell("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-single-cell-data'>Single cell data</a>")
  file_solution("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-solution'>Solution</a>")
  file_spatial_masked("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-spatial-masked'>Spatial masked</a>")
  comp_control_method[/"<a href='https://github.com/openproblems-bio/task_spatial_decomposition#component-type-control-method'>Control method</a>"/]
  comp_method[/"<a href='https://github.com/openproblems-bio/task_spatial_decomposition#component-type-method'>Method</a>"/]
  comp_metric[/"<a href='https://github.com/openproblems-bio/task_spatial_decomposition#component-type-metric'>Metric</a>"/]
  file_output("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-output'>Output</a>")
  file_score("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-score'>Score</a>")
  file_simulated_dataset("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-common-dataset'>Common Dataset</a>")
  file_common_dataset---comp_process_dataset
  comp_process_dataset-->file_single_cell
  comp_process_dataset-->file_solution
  comp_process_dataset-->file_spatial_masked
  file_single_cell---comp_control_method
  file_single_cell---comp_method
  file_solution---comp_control_method
  file_solution---comp_metric
  file_spatial_masked---comp_control_method
  file_spatial_masked---comp_method
  comp_control_method-->file_output
  comp_method-->file_output
  comp_metric-->file_score
  file_output---comp_metric

File format: Common Dataset

A subset of the common dataset.

Example file: resources_test/common/cxg_mouse_pancreas_atlas/dataset.h5ad

Format:

AnnData object
 obs: 'cell_type', 'batch'
 var: 'hvg', 'hvg_score'
 obsm: 'X_pca'
 layers: 'counts'
 uns: 'cell_type_names', 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'

Data structure:

Slot	Type	Description
`obs["cell_type"]`	`string`	Cell type label IDs.
`obs["batch"]`	`string`	A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
`var["hvg"]`	`boolean`	Whether or not the feature is considered to be a ‘highly variable gene’.
`var["hvg_score"]`	`double`	A ranking of the features by hvg.
`obsm["X_pca"]`	`double`	(Optional) The resulting PCA embedding.
`layers["counts"]`	`integer`	Raw counts.
`uns["cell_type_names"]`	`string`	(Optional) Cell type names corresponding to values in `cell_type`.
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["dataset_name"]`	`string`	Nicely formatted name.
`uns["dataset_url"]`	`string`	(Optional) Link to the original source of the dataset.
`uns["dataset_reference"]`	`string`	(Optional) Bibtex reference of the paper in which the dataset was published.
`uns["dataset_summary"]`	`string`	Short description of the dataset.
`uns["dataset_description"]`	`string`	Long description of the dataset.
`uns["dataset_organism"]`	`string`	(Optional) The organism of the sample in the dataset.

Component type: Data processor

A spatial decomposition dataset processor.

Arguments:

Name	Type	Description
`--input`	`file`	A subset of the common dataset.
`--output_single_cell`	`file`	(Output) The single-cell data file used as reference for the spatial data.
`--output_spatial_masked`	`file`	(Output) The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.
`--output_solution`	`file`	(Output) The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.

File format: Single cell data

The single-cell data file used as reference for the spatial data

Example file: resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad

Format:

AnnData object
 obs: 'cell_type', 'batch'
 layers: 'counts'
 uns: 'cell_type_names', 'dataset_id'

Data structure:

Slot	Type	Description
`obs["cell_type"]`	`string`	Cell type label IDs.
`obs["batch"]`	`string`	(Optional) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
`layers["counts"]`	`integer`	Raw counts.
`uns["cell_type_names"]`	`string`	Cell type names corresponding to values in `cell_type`.
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.

File format: Solution

The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.

Example file: resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad

Format:

AnnData object
 obsm: 'spatial', 'proportions_true'
 layers: 'counts'
 uns: 'cell_type_names', 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'

Data structure:

Slot	Type	Description
`obsm["spatial"]`	`double`	XY coordinates for each spot.
`obsm["proportions_true"]`	`double`	True cell type proportions for each spot.
`layers["counts"]`	`integer`	Raw counts.
`uns["cell_type_names"]`	`string`	Cell type names corresponding to columns of `proportions`.
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["dataset_name"]`	`string`	Nicely formatted name.
`uns["dataset_url"]`	`string`	(Optional) Link to the original source of the dataset.
`uns["dataset_reference"]`	`string`	(Optional) Bibtex reference of the paper in which the dataset was published.
`uns["dataset_summary"]`	`string`	Short description of the dataset.
`uns["dataset_description"]`	`string`	Long description of the dataset.
`uns["dataset_organism"]`	`string`	(Optional) The organism of the sample in the dataset.
`uns["normalization_id"]`	`string`	Which normalization was used.

File format: Spatial masked

The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.

Example file: resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad

Format:

AnnData object
 obsm: 'spatial'
 layers: 'counts'
 uns: 'cell_type_names', 'dataset_id'

Data structure:

Slot	Type	Description
`obsm["spatial"]`	`double`	XY coordinates for each spot.
`layers["counts"]`	`integer`	Raw counts.
`uns["cell_type_names"]`	`string`	Cell type names corresponding to columns of `proportions_pred` in output.
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.

Component type: Control method

Quality control methods for verifying the pipeline.

Arguments:

Name	Type	Description
`--input_single_cell`	`file`	The single-cell data file used as reference for the spatial data.
`--input_spatial_masked`	`file`	The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.
`--input_solution`	`file`	The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.
`--output`	`file`	(Output) Spatial data with estimated proportions.

Component type: Method

A spatial composition method.

Arguments:

Name	Type	Description
`--input_single_cell`	`file`	The single-cell data file used as reference for the spatial data.
`--input_spatial_masked`	`file`	The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.
`--output`	`file`	(Output) Spatial data with estimated proportions.

Component type: Metric

A spatial decomposition metric.

Arguments:

Name	Type	Description
`--input_method`	`file`	Spatial data with estimated proportions.
`--input_solution`	`file`	The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.
`--output`	`file`	(Output) Metric score file.

File format: Output

Spatial data with estimated proportions.

Example file: resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad

Format:

AnnData object
 obsm: 'spatial', 'proportions_pred'
 layers: 'counts'
 uns: 'cell_type_names', 'dataset_id', 'method_id'

Data structure:

Slot	Type	Description
`obsm["spatial"]`	`double`	XY coordinates for each spot.
`obsm["proportions_pred"]`	`double`	Estimated cell type proportions for each spot.
`layers["counts"]`	`integer`	Raw counts.
`uns["cell_type_names"]`	`string`	Cell type names corresponding to columns of `proportions`.
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["method_id"]`	`string`	A unique identifier for the method.

File format: Score

Metric score file.

Example file: resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad

Format:

AnnData object
 uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'

Data structure:

Slot	Type	Description
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["method_id"]`	`string`	A unique identifier for the method.
`uns["metric_ids"]`	`string`	One or more unique metric identifiers.
`uns["metric_values"]`	`double`	The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’.

File format: Common Dataset

A subset of the common dataset.

Example file: resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/simulated_dataset.h5ad

Format:

AnnData object
 obs: 'cell_type', 'batch'
 var: 'hvg', 'hvg_score'
 obsm: 'X_pca', 'spatial', 'proportions_true'
 layers: 'counts'
 uns: 'cell_type_names', 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'

Data structure:

Slot	Type	Description
`obs["cell_type"]`	`string`	Cell type label IDs.
`obs["batch"]`	`string`	A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
`var["hvg"]`	`boolean`	Whether or not the feature is considered to be a ‘highly variable gene’.
`var["hvg_score"]`	`double`	A ranking of the features by hvg.
`obsm["X_pca"]`	`double`	The resulting PCA embedding.
`obsm["spatial"]`	`double`	(Optional) XY coordinates for each spot.
`obsm["proportions_true"]`	`double`	(Optional) True cell type proportions for each spot.
`layers["counts"]`	`integer`	Raw counts.
`uns["cell_type_names"]`	`string`	(Optional) Cell type names corresponding to values in `cell_type`.
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["dataset_name"]`	`string`	Nicely formatted name.
`uns["dataset_url"]`	`string`	(Optional) Link to the original source of the dataset.
`uns["dataset_reference"]`	`string`	(Optional) Bibtex reference of the paper in which the dataset was published.
`uns["dataset_summary"]`	`string`	Short description of the dataset.
`uns["dataset_description"]`	`string`	Long description of the dataset.
`uns["dataset_organism"]`	`string`	(Optional) The organism of the sample in the dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github		.github
common @ b529519		common @ b529519
scripts		scripts
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
README.md		README.md
_viash.yaml		_viash.yaml
main.nf		main.nf
nextflow.config		nextflow.config
thumbnail.svg		thumbnail.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spatial Decomposition

Description

Authors & contributors

API

File format: Common Dataset

Component type: Data processor

File format: Single cell data

File format: Solution

File format: Spatial masked

Component type: Control method

Component type: Method

Component type: Metric

File format: Output

File format: Score

File format: Common Dataset

About

Releases

Packages

Contributors 3

Languages

openproblems-bio/task_spatial_decomposition

Folders and files

Latest commit

History

Repository files navigation

Spatial Decomposition

Description

Authors & contributors

API

File format: Common Dataset

Component type: Data processor

File format: Single cell data

File format: Solution

File format: Spatial masked

Component type: Control method

Component type: Method

Component type: Metric

File format: Output

File format: Score

File format: Common Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages