Skip to content

Commit

Permalink
update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
allyhawkins committed Jan 3, 2025
1 parent c0f7dcc commit 272095c
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 3 deletions.
22 changes: 20 additions & 2 deletions analyses/cell-type-consensus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,29 @@ Specifically, the cell type annotations obtained from both `SingleR` and `CellAs

## Description

TBD
The goal of this module is to create a reference that can be used to define an ontology aware consensus cell type label for all cells across all ScPCA samples.
This module performs a series of steps to accomplish that goal:

1. The cell type annotations present in the `PanglaoDB` reference file were assigned to an ontology term identifier, when possible.
2. We looked at all possible combinations of cell type labels between the `PanglaoDB` reference (used with `CellAssign`) and the `BlueprintEncodeData` reference (used with `SingleR`).
We then explored using a set of rules used to define consensus cell types in [`exploratory-notebooks/01-reference-exploration.Rmd`](./exploratory-notebooks/01-reference-exploration.Rmd).
3. We created a [reference table](./references/consensus-cell-type-reference.tsv) containing all combinations for which we were able to identify a consensus cell type label.
The consensus cell type label corresponds to the [latest common ancestor (LCA)](https://rdrr.io/bioc/ontoProc/man/findCommonAncestors.html) between the `PanglaoDB` and `BlueprintEncodeData` terms.

When creating the consensus cell type labels we implemented the following rules:

- If the terms share more than 1 LCA, no consensus label is set.
The only exception is if one of the LCA terms corresponds to `hematopoietic precursor cells`.
If that is the case all other LCA terms are removed and `hematopoietic precursor cell` is used as the consensus label.
- If the LCA has greater than 170 descendants, no consensus label is set, with some exceptions:
- When the LCA is `neuron`, `neuron` is used as the consensus label.
- When the LCA is `epithelial cell` and the annotation from `BlueprintEncodeData` is `Epithelial cells`, then `epithelial cell` is used as the consensus label.
- If the LCA is `bone cell`, `lining cell`, `blood cell`, `progenitor cell`, or `supporting cell`, no consensus label is defined.


## Usage

TBD
See the [`scripts/README.md`](./scripts/README.md) for instructions on running the scripts in this module.

## Input files

Expand Down
16 changes: 16 additions & 0 deletions analyses/cell-type-consensus/references/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,19 @@ There were no terms that encompassed both other than `progenitor cell`.
Monocytes differentiate into mononuclear osteoclasts which are then activated and become multinucleated osteoclasts.
Because monocytes are the "precursor" to the differentiated osteoclast, we chose to use this term.
- `NA` was used for `Undefined placental cells` and `Transient cells` as no clear cell type from the cell ontology was identified.

2. `consensus-cell-type-reference.tsv`: This file contains a table with all cell type combinations between the `PanglaoDB` reference and `BlueprintEncodeData` reference for which a consensus cell type is identified.

The table includes the following columns:

| | |
| --- | --- |
| `panglao_ontology` | Cell type ontology term for `PanglaoDB` cell type |
| `panglao_annotation` | Original name for the cell type as set by `PanglaoDB` |
| `blueprint_ontology` | Cell type ontology term for `BlueprintEncodeData` cell type |
| `blueprint_annotation_main` | Original name for the cell type as set by `BlueprintEncodeData` (main term) |
| `blueprint_annotation_fine` | Original name for the cell type as set by `BlueprintEncodeData` (fine term) |
| `consensus_ontology` | Cell type ontology term for consensus cell type |
| `consensus_annotation` | Human readable name for the consensus cell type |

This file was generated by running [`scripts/02-prepare-consensus-reference.R`](../scripts/02-prepare-consensus-reference.R).
6 changes: 5 additions & 1 deletion analyses/cell-type-consensus/scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ This folder contains all scripts used for generating consensus cell types.
This reference file was originally obtained from `PanglaoDB` and contains a table with all marker genes for all cell types that were used to build the references used when running `CellAssign`.
The file will be stored in `references/PanglaoDB_markers_2020-03-27.tsv`.

2. `01-prepare-cell-type-ontologies.sh`: This script is used to assign [cell type ontologies](https://www.ebi.ac.uk/ols4/ontologies/cl) to cell types in the `PanglaoDB` reference file.
2. `01-prepare-cell-type-ontologies.R`: This script is used to assign [cell type ontologies](https://www.ebi.ac.uk/ols4/ontologies/cl) to cell types in the `PanglaoDB` reference file.
Any cell types whose human readable label matches the value in the `cell type` column of the reference file (downloaded using the `00-download-panglao-ref.sh` file) are programmatically assigned.
Ontology terms and labels along with the `cell type` label from the reference file are saved to a new file, `references/panglao-cell-type-ontologies.tsv`.

3. `02-prepare-consensus-reference.R`: This script is used to create a table with all consensus cell types.
The output table will contain one row for each combination of cell types in `PanglaoDB` and `BlueprintEncodeData` from `celldex` where a consensus cell type was identified.
If the combination is not included in the reference file, then no consensus cell type is assigned and can be set to "Unknown".

0 comments on commit 272095c

Please sign in to comment.