Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update getting_started & sce_file_contents with celltyping instructions #193

Merged
merged 23 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
55f6b74
Fix typo in clustering section and initiate the celltype section in g…
sjspielman Nov 8, 2023
ed5bfff
rearrange so that information about accessing data is only provided i…
sjspielman Nov 8, 2023
0b5792d
Fix some typos and add rellink to metadata table
sjspielman Nov 8, 2023
32c68b5
add ontology column for singler and predictions matrix for cellassign
sjspielman Nov 8, 2023
ad234e9
Add more about reference information, but left previous sentence to d…
sjspielman Nov 9, 2023
4437490
fix spelling and some are -> is
sjspielman Nov 9, 2023
ec32dc7
remove some periods
sjspielman Nov 9, 2023
7c629de
Takes care of #192
sjspielman Nov 9, 2023
245cc5b
dont commit the spell check errors file...
sjspielman Nov 9, 2023
e84f4e3
Update docs/getting_started.md
sjspielman Nov 10, 2023
44f8284
Refer to table for metadata and remove those code examples. Some addi…
sjspielman Nov 10, 2023
7bfdfeb
indicate that submitter is less common and add sentence about QC report
sjspielman Nov 10, 2023
a746b2a
bulletify some of that text
sjspielman Nov 10, 2023
ef0f767
some cell type resources
sjspielman Nov 10, 2023
572f219
Submitter-excluded title case, and condense singler ontology labels w…
sjspielman Nov 10, 2023
23c8b40
also update bullet text per review
sjspielman Nov 10, 2023
b60f013
Apply suggestions from code review
sjspielman Nov 13, 2023
886523a
Merge branch 'development' into sjspielman/165-celltype-getting-started
sjspielman Nov 13, 2023
20349ed
condense sentences to 1 about referring to metadata for more info
sjspielman Nov 13, 2023
b652817
Update table with correct names, phrasing, hopefully...
sjspielman Nov 15, 2023
f10d7cb
Apply suggestions from code review
sjspielman Nov 15, 2023
8bcd24a
one more spot to indicate that submitter may not always be present
sjspielman Nov 15, 2023
b0a7e85
panglaodb link and wording
sjspielman Nov 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions components/dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ alevin
Alevin
APA
AnnData
Aran
barcode
barcodes
basepairs
Expand All @@ -31,6 +32,7 @@ Ensembl
et
fastp
FASTQ
Franzén
GC
Genomics
github
Expand Down Expand Up @@ -82,6 +84,7 @@ Stegle
Stoeckius
subdiagnosis
submitter
submitters
Tian
transcriptome
transcriptomics
Expand Down
125 changes: 124 additions & 1 deletion docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ processed_adata.uns["highly_variable_genes"]
### Clustering

Cluster assignments obtained from [Graph-based clustering](http://bioconductor.org/books/3.16/OSCA.basic/clustering.html#clustering-graph) is also available in the processed objects.
Here clustering was performed using the Louvain algorithm with 20 nearest neighbors and Jaccard weighting.
By default, clustering is performed using the Louvain algorithm with 20 nearest neighbors and Jaccard weighting.

To access the cluster assignments in the `SingleCellExperiment` object, use the following command:

Expand All @@ -203,6 +203,129 @@ See these resources for more information on clustering:
- [Quantifying clustering behavior in Orchestrating Single Cell Analysis](https://bioconductor.org/books/release/OSCA.advanced/clustering-redux.html#quantifying-clustering-behavior)


### Cell type annotation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I think we want to add a little bit more context to this section. I would describe the references used in this first section. I would also mention the supplemental cell type report and that it's a good resource for evaluating the provided cell type annotations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 2 links for cell annotation, but we could add more! Perhaps from this list we have going here? https://github.com/AlexsLemonade/training-specific-template/blob/main/additional-resources/single-cell-resources.md#cell-type-annotation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allyhawkins this comment may have fallen by the wayside, any other links here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you added is good!


Processed objects may contain cell type annotations and associated metadata from one or more of the following sources.

- Submitter-provided annotations (note that these are only present for a subset of libraries).
- Annotations determined by [`SingleR`](https://bioconductor.org/packages/release/bioc/html/SingleR.html), an automated reference-based method ([Looney _et al._ 2019](https://doi.org/10.1038/s41590-018-0276-y)).
- Annotations determined by [`CellAssign`](https://github.com/Irrationone/cellassign), an automated marker-gene based method ([Zhang _et al._ 2019](https://doi.org/10.1038/s41592-019-0529-1)).

If at least one method was used for cell type annotation, a supplemental cell type report will be provided with the download.
This report evaluates cell annotations results as follows:

- It provides diagnostic plots to assess the quality of cell type annotations.
- If multiple annotations are present, the report compares different annotations to one another.
Strong agreement between different annotation methods is a qualitative indicator of robustness.

To determine which methods were used for cell type annotations, use the following command on the processed `SingleCellExperiment` object:

```r
# show vector of available celltypes
# values will be one or more of: `submitter`, `singler`, `cellassign`
metadata(processed_sce)$celltype_methods
```

Or, on the processed `AnnData` object:

```python
# show list of available celltypes
# values will be one or more of: `submitter`, `singler`, `cellassign`
processed_adata.uns["celltype_methods"]
```

Below we provide instructions on how to access annotations from each cell type annotation method used, if present.

Note that additional information about `SingleR` and `CellAssign` annotation, including their respective reference source and versions, is also available from the processed `SingleCellExperiment` object's metadata and from the processed `AnnData` object's `uns` slot, as described in the {ref}`experiment metadata table<sce_file_contents:SingleCellExperiment experiment metadata>`.

#### Submitter-provided annotations

To access submitter-provided annotations, if available, in the `SingleCellExperiment`, use the following command:

```r
# submitter-provided annotations for each cell
processed_sce$submitter_celltype_annotation
```

To access submitter-provided annotations in the `AnnData` object, use the following command:

```python
# submitter-provided annotations for each cell
processed_adata.obs["submitter_celltype_annotation"]
```

Cells that submitters did not annotate are labeled with `Submitter-excluded`.
Note that submitter-provided annotations are also present in unfiltered and filtered objects and can be accessed using the same approach shown here for processed objects.


#### `SingleR` annotations

`SingleR` annotation uses a reference dataset from the [`celldex` package](https://bioconductor.org/packages/release/data/experiment/html/celldex.html) [[Aran _et al._ (2019)](https://doi.org/10.1038/s41590-018-0276-y)].


To access automated `SingleR` annotations as cell type names and/or ontology terms in the process `SingleCellExperiment` object, use the following command(s):

```r
# SingleR annotatins as cell type names
processed_sce$singler_celltype_annotation

# Or, SingleR annotatins as cell type ontology terms
processed_sce$singler_celltype_ontology
```

To access automated `SingleR` annotations as cell type names and/or ontology terms in the processed `AnnData` object, use the following command(s):

```python
# SingleR annotatins as cell type names
processed_adata.obs["singler_celltype_annotation"]

# Or, SingleR annotatins as cell type ontology terms
processed_adata.obs["singler_celltype_ontology"]
```

Cells that `SingleR` could not confidently annotate are labeled with `NA`.

You can also access the full object returned by `SingleR` from the `SingleCellExperiment`'s metadata with the following command (note that this information is not provided in the `AnnData` object):

```r
# SingleR full result
metadata(processed_sce)$singler_results
```

#### `CellAssign` annotations

`CellAssign` annotation uses a reference set of marker genes from the [`PanglaoDB` database](https://panglaodb.se/) [[Oscar Franzén _et al._ (2019)](https://doi.org/10.1093/database/baz046)], as compiled by the Data Lab for a given tissue group.

To access automated `CellAssign` annotations in the `SingleCellExperiment`, use the following command:

```r
# CellAssign annotations for each cell
processed_sce$cellassign_celltype_annotation
```

To access automated `CellAssign` annotations in the `AnnData` object, use the following command:

```python
# CellAssign annotations for each cell
processed_adata.obs["cellassign_celltype_annotation"]
```

Cells that `CellAssign` could not confidently annotate are labeled with `"other"`.

You can also access the full predictions matrix returned by `CellAssign` from the `SingleCellExperiment`'s metadata with the following command:

```r
sjspielman marked this conversation as resolved.
Show resolved Hide resolved
# CellAssign full predictions matrix full result
metadata(processed_sce)$cellassign_predictions
```

#### Additional cell type resources

See these resources for more information on automated cell type annotation:

- [Assigning cell types with `SingleR`](https://bioconductor.org/books/release/SingleRBook/)
- [Cell type assignment chapter in Orchestrating Single Cell Analysis](https://bioconductor.org/books/3.17/OSCA.advanced/cell-cycle-assignment.html)

## What if I want to use Seurat?

The files available for download that contain [`SingleCellExperiment` objects](http://bioconductor.org/books/3.13/OSCA.intro/the-singlecellexperiment-class.html) can also be converted into Seurat objects.
Expand Down
35 changes: 20 additions & 15 deletions docs/sce_file_contents.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,10 +137,15 @@ metadata(sce) # experiment metadata
| `cluster_nn` | The nearest neighbor parameter value used for the graph-based clustering. Only present for `processed` objects |
| `celltype_methods` | If cell type annotation was performed, a vector of the methods used for annotation. May include `"submitter"`, `"singler"` and/or `"cellassign"`. If submitter cell-type annotations are available, this metadata item will be present in all objects. Otherwise, this item will only be in `processed` objects |
| `singler_results` | If cell typing with `SingleR` was performed, the full result object returned by `SingleR` annotation. Only present for `processed` objects |
| `singler_reference` | If cell typing with `SingleR` was performed, the name of the [`celldex`](http://bioconductor.org/packages/release/data/experiment/html/celldex.html) reference dataset used for annotation. Only present for `processed` objects |
| `singler_reference` | If cell typing with `SingleR` was performed, the name of the reference dataset used for annotation. Only present for `processed` objects |
| `singler_reference_label` | If cell typing with `SingleR` was performed, the name of the label in the reference dataset used for annotation. Only present for `processed` objects |
| `singler_reference_source` | If cell typing with `SingleR` was performed, the source of the reference dataset (default is [`celldex`](http://bioconductor.org/packages/release/data/experiment/html/celldex.html)). Only present for `processed` objects |
| `singler_reference_version` | If cell typing with `SingleR` was performed, the version of `celldex` used to create the reference dataset source, with periods replaced as dashes (`-`). Only present for `processed` objects |
| `cellassign_predictions` | If cell typing with `CellAssign` was performed, the full matrix of predictions across cells and cell types. Only present for `processed` objects |
| `cellassign_reference` | If cell typing with `CellAssign` was performed, the name of the organ/tissue type for which marker genes were obtained from `PanglaoDB`. Only present for `processed` objects |
| `cellassign_reference` | If cell typing with `CellAssign` was performed, the name of the organ/tissue grouping for which marker genes were obtained. Only present for `processed` objects |
| `cellassign_reference_source` | If cell typing with `CellAssign` was performed, the source of the reference dataset (default is [`PanglaoDB`](https://panglaodb.se/)). Only present for `processed` objects |
| `cellassign_reference_version` | If cell typing with `CellAssign` was performed, the version of the reference dataset source. For references obtained from `PanglaoDB`, the version scheme is a date in ISO8601 format. Only present for `processed` objects |


### SingleCellExperiment sample metadata

Expand All @@ -162,12 +167,12 @@ The following columns are included in the sample metadata data frame for all lib
| `tissue_location` | Where in the body the tumor sample was located |
| `disease_timing` | At what stage of disease the sample was obtained, either diagnosis or recurrence |
| `organism` | The organism the sample was obtained from (e.g., `Homo_sapiens`) |
| `development_stage_ontology_term_id` | [`HsapDv` ontology](http://obofoundry.org/ontology/hsapdv.html) term indicating developmental stage. If unavailable, `unknown` is used. |
| `sex_ontology_term_id` | [`PATO`](http://obofoundry.org/ontology/pato.html) term referring to the sex of the sample. If unavailable, `unknown` is used. |
| `organism_ontology_id` | [NCBI taxonomy](https://www.ncbi.nlm.nih.gov/taxonomy) term for organism, e.g. [`NCBITaxon:9606`](http://purl.obolibrary.org/obo/NCBITaxon_9606). |
| `self_reported_ethnicity_ontology_term_id` | For _Homo sapiens_, a [`Hancestro` term](http://obofoundry.org/ontology/hancestro.html). `multiethnic` indicates more than one ethnicity is reported. `unknown` indicates unavailable ethnicity and `NA` is used for all other organisms. |
| `disease_ontology_term_id` | [`MONDO`](http://obofoundry.org/ontology/mondo.html) term indicating disease type. [`PATO:0000461`](http://purl.obolibrary.org/obo/PATO_0000461) indicates normal or healthy tissue. If unavailable, `NA` is used. |
| `tissue_ontology_term_id`| [`UBERON`](http://obofoundry.org/ontology/uberon.html) term indicating tissue of origin. If unavailable, `NA` is used. |
| `development_stage_ontology_term_id` | [`HsapDv` ontology](http://obofoundry.org/ontology/hsapdv.html) term indicating developmental stage. If unavailable, `unknown` is used |
| `sex_ontology_term_id` | [`PATO`](http://obofoundry.org/ontology/pato.html) term referring to the sex of the sample. If unavailable, `unknown` is used |
| `organism_ontology_id` | [NCBI taxonomy](https://www.ncbi.nlm.nih.gov/taxonomy) term for organism, e.g. [`NCBITaxon:9606`](http://purl.obolibrary.org/obo/NCBITaxon_9606) |
| `self_reported_ethnicity_ontology_term_id` | For _Homo sapiens_, a [`Hancestro` term](http://obofoundry.org/ontology/hancestro.html). `multiethnic` indicates more than one ethnicity is reported. `unknown` indicates unavailable ethnicity and `NA` is used for all other organisms |
| `disease_ontology_term_id` | [`MONDO`](http://obofoundry.org/ontology/mondo.html) term indicating disease type. [`PATO:0000461`](http://purl.obolibrary.org/obo/PATO_0000461) indicates normal or healthy tissue. If unavailable, `NA` is used |
| `tissue_ontology_term_id`| [`UBERON`](http://obofoundry.org/ontology/uberon.html) term indicating tissue of origin. If unavailable, `NA` is used |

For some libraries, the sample metadata may also include additional metadata specific to the disease type and experimental design of the project.
Examples of this include treatment or outcome.
Expand Down Expand Up @@ -368,13 +373,13 @@ The `AnnData` object also includes the following additional cell-level metadata
| `tissue_location` | Where in the body the tumor sample was located |
| `disease_timing` | At what stage of disease the sample was obtained, either diagnosis or recurrence |
| `organism` | The organism the sample was obtained from (e.g., `Homo_sapiens`) |
| `development_stage_ontology_term_id` | [`HsapDv` ontology](http://obofoundry.org/ontology/hsapdv.html) term indicating developmental stage. If unavailable, `unknown` is used. |
| `sex_ontology_term_id` | [`PATO`](http://obofoundry.org/ontology/pato.html) term referring to the sex of the sample. If unavailable, `unknown` is used. |
| `organism_ontology_id` | [NCBI taxonomy](https://www.ncbi.nlm.nih.gov/taxonomy) term for organism, e.g. [`NCBITaxon:9606`](http://purl.obolibrary.org/obo/NCBITaxon_9606). |
| `self_reported_ethnicity_ontology_term_id` | For _Homo sapiens_, a [`HANCESTRO` term](http://obofoundry.org/ontology/hancestro.html). `multiethnic` indicates more than one ethnicity is reported. `unknown` indicates unavailable ethnicity, and `NA` is used for all other organisms. |
| `disease_ontology_term_id` | [`Mondo`](http://obofoundry.org/ontology/mondo.html) term indicating disease type. [`PATO:0000461`](http://purl.obolibrary.org/obo/PATO_0000461) indicates normal or healthy tissue. If unavailable, `NA` is used. |
| `tissue_ontology_term_id`| [`Uberon`](http://obofoundry.org/ontology/uberon.html) term indicating tissue of origin. If unavailable, `NA` is used. |
| `is_primary_data` | Set to `FALSE` for all libraries to reflect that all libraries were obtained from external investigators. Required by `CELLxGENE`. |
| `development_stage_ontology_term_id` | [`HsapDv` ontology](http://obofoundry.org/ontology/hsapdv.html) term indicating developmental stage. If unavailable, `unknown` is used |
| `sex_ontology_term_id` | [`PATO`](http://obofoundry.org/ontology/pato.html) term referring to the sex of the sample. If unavailable, `unknown` is used |
| `organism_ontology_id` | [NCBI taxonomy](https://www.ncbi.nlm.nih.gov/taxonomy) term for organism, e.g. [`NCBITaxon:9606`](http://purl.obolibrary.org/obo/NCBITaxon_9606) |
| `self_reported_ethnicity_ontology_term_id` | For _Homo sapiens_, a [`HANCESTRO` term](http://obofoundry.org/ontology/hancestro.html). `multiethnic` indicates more than one ethnicity is reported. `unknown` indicates unavailable ethnicity, and `NA` is used for all other organisms |
| `disease_ontology_term_id` | [`Mondo`](http://obofoundry.org/ontology/mondo.html) term indicating disease type. [`PATO:0000461`](http://purl.obolibrary.org/obo/PATO_0000461) indicates normal or healthy tissue. If unavailable, `NA` is used |
| `tissue_ontology_term_id`| [`Uberon`](http://obofoundry.org/ontology/uberon.html) term indicating tissue of origin. If unavailable, `NA` is used |
| `is_primary_data` | Set to `FALSE` for all libraries to reflect that all libraries were obtained from external investigators. Required by `CELLxGENE` |


### AnnData gene information and metrics
Expand Down