From 3676e8d73e89f0f63b18c3d42232cd780344d452 Mon Sep 17 00:00:00 2001 From: Ally Hawkins Date: Wed, 18 Dec 2024 09:08:50 -0600 Subject: [PATCH] add note about keratinocytes --- .../01-reference-exploration.Rmd | 5 ++-- .../01-reference-exploration.html | 27 +++++++++++-------- 2 files changed, 19 insertions(+), 13 deletions(-) diff --git a/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.Rmd b/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.Rmd index 6701c5077..9a06b85cb 100644 --- a/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.Rmd +++ b/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.Rmd @@ -451,7 +451,7 @@ print_df |> ``` The PanglaoDB cell types seem to be more specific than the ones present in Blueprint Encode, similar to the observation with neurons. -We should keep epithelial cell. +We should keep epithelial cell in the cases where the Blueprint Encode annotation is `Epithelial cells` but not when it is `Keratinocytes`. ### Removing anything with more than 1 LCA @@ -605,7 +605,8 @@ I would use the following criteria to come up with my whitelist: - Pairs should not have more than 1 LCA, with the exception of the matches that have the label hematopoietic precursor cell. - The LCA should have equal to or less than 170 total descendants. -- We whould include the term for `neuron` and `epithelial cell` even though they do not pass the threshold for number of descendants. +- We should include the term for `neuron` and `epithelial cell` even though they do not pass the threshold for number of descendants. +However, `epithelial cell` should only be included if the Blueprint Encode name is `Epithelial cells` and _not_ `Keratinocytes`. - Terms that are too broad should be removed. This includes: `lining cell`, `blood cell`, `progenitor cell`, `bone cell`, and `supporting cell` diff --git a/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.html b/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.html index cd0998b3e..e2cae57ad 100644 --- a/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.html +++ b/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.html @@ -11,7 +11,7 @@ - + Summary of cell type ontologies in reference files @@ -441,7 +441,7 @@

Summary of cell type ontologies in reference files

Ally Hawkins

-

2024-12-17

+

2024-12-18

@@ -512,7 +512,7 @@

Setup

panglao_annotation = "human_readable_value" )
## Rows: 178 Columns: 3
-## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────
 ## Delimiter: "\t"
 ## chr (3): ontology_id, human_readable_value, panglao_cell_type
 ## 
@@ -693,8 +693,8 @@ 

Latest common ancestor (LCA) between PanglaoDB and Blueprint dplyr::mutate(lca = dplyr::if_else(blueprint_ontology == panglao_ontology, blueprint_ontology, lca)) |> # join in information for each of the lca terms including name, number of ancestors and descendants dplyr::left_join(cl_df, by = c("lca" = "cl_ontology"))

-
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 7967 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
-## 20, ...].
+
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 7967 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
+## 18, 19, 20, ...].

Distribution of ancestors and descendants

ggplot(lca_df, aes(x = total_ancestors)) +
@@ -1733,7 +1733,7 @@ 

Myeloid leukocyte

then I would argue we shouldn’t keep myeloid leukocyte. Noting that after discussion we have decided to keep this one since T and B cells are much easier to differentiate based on gene expression alone than -cells that are party of the myeloid lineage.

+cells that are part of the myeloid lineage.

Progenitor cell

@@ -3347,7 +3347,9 @@

Epithelial cell

The PanglaoDB cell types seem to be more specific than the ones present in Blueprint Encode, similar to the observation with neurons. We -should keep epithelial cell.

+should keep epithelial cell in the cases where the Blueprint Encode +annotation is Epithelial cells but not when it is +Keratinocytes.

@@ -3492,8 +3494,8 @@

Removing anything with more than 1 LCA

# which cell types are now missing from the list to keep setdiff(celltypes_to_keep, updated_celltypes)
-
## [1] "blood cell"                   "hematopoietic precursor cell" "lining cell"                  "perivascular cell"           
-## [5] "supporting cell"
+
## [1] "blood cell"                   "hematopoietic precursor cell" "lining cell"                 
+## [4] "perivascular cell"            "supporting cell"

It looks like I am losing a few terms I already said were not specific and then a few other terms, like “hematopoietic precursor cell” and “perivascular cell”. I’ll look at both of those to confirm we would @@ -3909,9 +3911,12 @@

Conclusions

matches that have the label hematopoietic precursor cell.
  • The LCA should have equal to or less than 170 total descendants.
  • -
  • We whould include the term for neuron and +
  • We should include the term for neuron and epithelial cell even though they do not pass the threshold -for number of descendants.
  • +for number of descendants. However, epithelial cell should +only be included if the Blueprint Encode name is +Epithelial cells and not +Keratinocytes.
  • Terms that are too broad should be removed. This includes: lining cell, blood cell, progenitor cell, bone cell, and