From 0e89309664a8ff5da10f59ae2b2d61cd430cc7bb Mon Sep 17 00:00:00 2001 From: Ally Hawkins Date: Tue, 17 Dec 2024 13:22:53 -0600 Subject: [PATCH] keep myeloid and epithelial --- .../01-reference-exploration.Rmd | 26 +- .../01-reference-exploration.html | 1191 +++++++++++++++-- 2 files changed, 1125 insertions(+), 92 deletions(-) diff --git a/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.Rmd b/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.Rmd index 7c9eff4d2..472661dd1 100644 --- a/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.Rmd +++ b/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.Rmd @@ -84,7 +84,7 @@ cl_df <- data.frame( # list all ancestors and descendants calculate total ancestors = list(ontologyIndex::get_ancestors(cl_ont, cl_ontology)), total_ancestors = length(ancestors), - descendants = list(ontologyIndex::get_descendants(cl_ont, cl_ontology)), + descendants = list(ontologyIndex::get_descendants(cl_ont, cl_ontology, exclude_roots = TRUE)), total_descendants = length(descendants) ) ``` @@ -187,7 +187,7 @@ Ultimately, I would like to see if we can use that cutoff to decide if we should ```{r} # first set up the graph from cl ont -parent_terms <- cl$parents +parent_terms <- cl_ont$parents cl_graph <- igraph::make_graph(rbind(unlist(parent_terms), rep(names(parent_terms), lengths(parent_terms)))) ``` @@ -384,6 +384,7 @@ print_df |> ``` I'm torn on this one, because I do think it's helpful to know if something is of the myeloid lineage, but if we aren't keeping lymphocyte then I would argue we shouldn't keep myeloid leukocyte. +Noting that after discussion we have decided to keep this one since T and B cells are much easier to differentiate based on gene expression alone than cells that are party of the myeloid lineage. #### Progenitor cell @@ -395,6 +396,7 @@ print_df |> ``` Same with `progenitor cell`, I do think it could be helpful to know that something may be a progenitor cell, but when you have a cell with the label for HSC and the label for cells like monocytes or osteoblasts, then maybe we are talking about a tumor cell instead. +After discussion, we are going to remove progenitor cells. Along those same lines, I think the below terms, `lining cell` and `supporting cell`, are too broad even though they have few descendants. @@ -426,13 +428,13 @@ lca_df |> unique() ``` -The only term in this list that I would be concerned about losing is "neuron". +The only terms in this list that I would be concerned about losing are "neuron" and epithelial cells. Let's look at those combinations. #### Neuron ```{r} -# blood cell +# neuron print_df |> dplyr::filter(cl_annotation == "neuron") ``` @@ -440,6 +442,17 @@ print_df |> It looks like there are a lot of types of neurons in the PanglaoDB reference and only "neuron" as a term in Blueprint. Even though neuron has ~ 500 descendants, I think we should keep these labels. +#### Epithelial cell + +```{r} +# epithelial cell +print_df |> + dplyr::filter(cl_annotation == "epithelial cell") +``` + +The PanglaoDB cell types seem to be more specific than the ones present in Blueprint Encode, similar to the observation with neurons. +We should keep epithelial cell. + ### Removing anything with more than 1 LCA One thing I noticed when looking at the labels that have less than the cutoff is that most of them are from scenarios where we have multiple LCAs. @@ -592,8 +605,9 @@ I would use the following criteria to come up with my whitelist: - Pairs should not have more than 1 LCA, with the exception of the matches that have the label hematopoietic precursor cell. - The LCA should have equal to or less than 170 total descendants. -- We whould include the term for `neuron` even though it has 500 descendants. -- Terms that are too broad (like `supporting cell`, `blood cell`, `bone cell`, `lining cell`) should be removed. +- We whould include the term for `neuron` and `epithelial cell` even though they do not pass the threshold for number of descendants. +- Terms that are too broad should be removed. +This includes: `lining cell`, `blood cell`, `progenitor cell`, `bone cell`, and `supporting cell` Alternatively, rather than eliminate terms that are too broad we could look at the similarity index for individual matches and decide on a case by case basis if those should be allowed. Although I still think having a term that is too broad, even if it's a good match, is not super informative. diff --git a/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.html b/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.html index 474052d82..cd0998b3e 100644 --- a/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.html +++ b/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.html @@ -11,7 +11,7 @@ - + Summary of cell type ontologies in reference files @@ -441,7 +441,7 @@

Summary of cell type ontologies in reference files

Ally Hawkins

-

2024-12-12

+

2024-12-17

@@ -512,7 +512,7 @@

Setup

panglao_annotation = "human_readable_value" )
## Rows: 178 Columns: 3
-## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────
+## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 ## Delimiter: "\t"
 ## chr (3): ontology_id, human_readable_value, panglao_cell_type
 ## 
@@ -545,7 +545,7 @@ 

Full cell ontology

# list all ancestors and descendants calculate total ancestors = list(ontologyIndex::get_ancestors(cl_ont, cl_ontology)), total_ancestors = length(ancestors), - descendants = list(ontologyIndex::get_descendants(cl_ont, cl_ontology)), + descendants = list(ontologyIndex::get_descendants(cl_ont, cl_ontology, exclude_roots = TRUE)), total_descendants = length(descendants) )

The vertical lines in the below plot indicate the value for cell @@ -597,7 +597,7 @@

Full cell ontology

x = "Number of descendants", y = "Density" ) -

+

It looks like most cell types have very few descendants, so let’s zoom into the area below 500 to get a better look.

ggplot(cl_df, aes(x = total_descendants)) +
@@ -619,7 +619,7 @@ 

Full cell ontology

## Warning: Removed 14 rows containing non-finite outside the scale range (`stat_density()`).
## Warning: Removed 3 rows containing missing values or values outside the scale range (`geom_vline()`).
## Warning: Removed 3 rows containing missing values or values outside the scale range (`geom_text()`).
-

+

Here we see a much larger range of values and that cell types become more general as the number of descendants goes up. However, this distribution alone is probably not helpful in determining a cutoff. The @@ -634,7 +634,7 @@

Latest common ancestor (LCA) between PanglaoDB and Blueprint for assigning cell types with CellAssign) and the BlueprintEncodeData reference from celldex (used for assigning cell types with SingleR). The LCA -refers to the latest term in the cell ontology heirarchy that is common +refers to the latest term in the cell ontology hierarchy that is common between two terms. I will use the ontoProc::findCommonAncestors() function to get the LCA for each combination.

Note that it is possible to have more than one LCA for a set of @@ -644,7 +644,7 @@

Latest common ancestor (LCA) between PanglaoDB and Blueprint I would like to see if we can use that cutoff to decide if we should keep the LCA term as the consensus label or use “Unknown”.

# first set up the graph from cl ont
-parent_terms <- cl$parents
+parent_terms <- cl_ont$parents
 cl_graph <- igraph::make_graph(rbind(unlist(parent_terms), rep(names(parent_terms), lengths(parent_terms))))
# get a data frame with all combinations of panglao and blueprint terms
 # one row for each combination 
@@ -661,7 +661,7 @@ 

Latest common ancestor (LCA) between PanglaoDB and Blueprint dplyr::rowwise() |> dplyr::mutate( # least common shared ancestor - lca = list(rownames(ontoProc::findCommonAncestors(blueprint_ontology, panglao_ontology, g = g))) + lca = list(rownames(ontoProc::findCommonAncestors(blueprint_ontology, panglao_ontology, g = cl_graph))) )

## Warning in dplyr::left_join(dplyr::left_join(dplyr::rename(expand.grid(panglao_df$panglao_ontology, : Detected an unexpected many-to-many relationship between `x` and `y`.
 ## ℹ Row 49 of `x` matches multiple rows in `y`.
@@ -693,8 +693,8 @@ 

Latest common ancestor (LCA) between PanglaoDB and Blueprint dplyr::mutate(lca = dplyr::if_else(blueprint_ontology == panglao_ontology, blueprint_ontology, lca)) |> # join in information for each of the lca terms including name, number of ancestors and descendants dplyr::left_join(cl_df, by = c("lca" = "cl_ontology"))

-
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 7967 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
-## 18, 19, 20, ...].
+
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 7967 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
+## 20, ...].

Distribution of ancestors and descendants

ggplot(lca_df, aes(x = total_ancestors)) +
@@ -728,7 +728,7 @@ 

Distribution of ancestors and descendants

x = "Total number of descendants", y = "Density" )
-

+

Let’s zoom into the area below 1000, since we already know we would want to exlude anything above that based on this plot.

ggplot(lca_df, aes(x = total_descendants)) +
@@ -750,7 +750,7 @@ 

Distribution of ancestors and descendants

## Warning: Removed 6856 rows containing non-finite outside the scale range (`stat_density()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_vline()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_text()`).
-

+

We can use the vertical lines for cells of interest to help us define a potential cutoff based on the granularity we would like to see in our consensus label. We want to be able to label things like T cell, but we @@ -815,7 +815,7 @@

Defining a cutoff for number of descendants

x = "cell type", y = "Total descendants" )
-

+

There are a few terms that I think might be more broad than we want like blood cell, bone cell, supporting cell, and lining cell. I’m on the @@ -1730,7 +1730,10 @@

Myeloid leukocyte

I’m torn on this one, because I do think it’s helpful to know if something is of the myeloid lineage, but if we aren’t keeping lymphocyte -then I would argue we shouldn’t keep myeloid leukocyte.

+then I would argue we shouldn’t keep myeloid leukocyte. Noting that +after discussion we have decided to keep this one since T and B cells +are much easier to differentiate based on gene expression alone than +cells that are party of the myeloid lineage.

Progenitor cell

@@ -1920,7 +1923,7 @@

Progenitor cell

helpful to know that something may be a progenitor cell, but when you have a cell with the label for HSC and the label for cells like monocytes or osteoblasts, then maybe we are talking about a tumor cell -instead.

+instead. After discussion, we are going to remove progenitor cells.

Along those same lines, I think the below terms, lining cell and supporting cell, are too broad even though they have few descendants.

@@ -2090,11 +2093,11 @@

Discarded cell types

## [13] "secretory cell" "connective tissue cell" "electrically responsive cell" ## [16] "contractile cell" "epithelial cell" "neuron" ## [19] "neural cell"

-

The only term in this list that I would be concerned about losing is -“neuron”. Let’s look at those combinations.

+

The only terms in this list that I would be concerned about losing +are “neuron” and epithelial cells. Let’s look at those combinations.

Neuron

-
# blood cell
+
# neuron
 print_df |> 
   dplyr::filter(cl_annotation == "neuron")
@@ -2329,6 +2332,1023 @@

Neuron

reference and only “neuron” as a term in Blueprint. Even though neuron has ~ 500 descendants, I think we should keep these labels.

+
+

Epithelial cell

+
# epithelial cell
+print_df |> 
+  dplyr::filter(cl_annotation == "epithelial cell")
+
+ ++++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
blueprint_ontologyblueprint_annotation_mainblueprint_annotation_finepanglao_ontologypanglao_annotationtotal_lcalcacl_annotation
CL:0000066Epithelial cellsEpithelial cellsCL:0000622acinar cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:1000488cholangiocyte1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000166chromaffin cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000584enterocyte1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000164enteroendocrine cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000065ependymal cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000066epithelial cell0CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000160goblet cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000501granulosa cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000182hepatocyte1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0005006ionocyte1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000312keratinocyte1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000077mesothelial cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000185myoepithelial cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000165neuroendocrine cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002167olfactory epithelial cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000510paneth cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000162parietal cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002481peritubular myoid cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000652pinealocyte1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000653podocyte1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000209taste receptor cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000731urothelial cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002368respiratory epithelial cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002370respiratory goblet cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000171pancreatic A cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000169type B pancreatic cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000706choroid plexus epithelial cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000158club cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002250intestinal crypt stem cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000173pancreatic D cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002305epithelial cell of distal tubule1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002079pancreatic ductal cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000504enterochromaffin-like cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0005019pancreatic epsilon cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002258thyroid follicular cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002179foveolar cell of stomach1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000696PP cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000155peptic cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002292type I cell of carotid body1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0005010renal intercalated cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:1000909kidney loop of Henle epithelial cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002326luminal epithelial cell of mammary gland1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002327mammary gland epithelial cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000242Merkel cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000682M cell of gut1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002199oxyphil cell of parathyroid gland1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000446chief cell of parathyroid gland1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0005009renal principal cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002306epithelial cell of proximal tubule1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002062pulmonary alveolar type 1 cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002063pulmonary alveolar type 2 cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:1001596salivary gland glandular cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002140acinar cell of sebaceous gland1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0000216Sertoli cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002562hair germinal matrix cell1CL:0000066epithelial cell
CL:0000066Epithelial cellsEpithelial cellsCL:0002204brush cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000622acinar cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:1000488cholangiocyte1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000584enterocyte1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000164enteroendocrine cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000066epithelial cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000160goblet cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000501granulosa cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000182hepatocyte1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0005006ionocyte1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000185myoepithelial cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000510paneth cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000162parietal cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000653podocyte1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000209taste receptor cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000731urothelial cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0002368respiratory epithelial cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0002370respiratory goblet cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000171pancreatic A cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000169type B pancreatic cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000158club cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0002250intestinal crypt stem cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000173pancreatic D cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0002305epithelial cell of distal tubule1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0002079pancreatic ductal cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000504enterochromaffin-like cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0005019pancreatic epsilon cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0002258thyroid follicular cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0002179foveolar cell of stomach1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000696PP cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000155peptic cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0005010renal intercalated cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:1000909kidney loop of Henle epithelial cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0002326luminal epithelial cell of mammary gland1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0002327mammary gland epithelial cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000682M cell of gut1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0002199oxyphil cell of parathyroid gland1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0000446chief cell of parathyroid gland1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0005009renal principal cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0002306epithelial cell of proximal tubule1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:1001596salivary gland glandular cell1CL:0000066epithelial cell
CL:0000312KeratinocytesKeratinocytesCL:0002204brush cell1CL:0000066epithelial cell
+
+

The PanglaoDB cell types seem to be more specific than the ones +present in Blueprint Encode, similar to the observation with neurons. We +should keep epithelial cell.

+

Removing anything with more than 1 LCA

@@ -2354,99 +3374,99 @@

Removing anything with more than 1 LCA

bone cell -39 +38 blood cell -42 +41 perivascular cell -42 +41 stromal cell -54 +53 supporting cell -62 +61 hematopoietic precursor cell -106 +105 lining cell -121 +120 myeloid leukocyte -166 +165 progenitor cell -166 +165 mononuclear phagocyte -170 +169 phagocyte (sensu Vertebrata) -176 +175 contractile cell -178 +177 defensive cell -200 +199 professional antigen presenting cell -213 +212 connective tissue cell -224 +223 myeloid cell -248 +247 stuff accumulating cell -267 +266 precursor cell -272 +271 secretory cell -458 +457 mononuclear cell -504 +503 leukocyte -541 +540 electrically responsive cell -674 +673 hematopoietic cell -685 +684 eukaryotic cell -2646 +2645 @@ -2472,8 +3492,8 @@

Removing anything with more than 1 LCA

# which cell types are now missing from the list to keep setdiff(celltypes_to_keep, updated_celltypes)
-
## [1] "blood cell"                   "hematopoietic precursor cell" "lining cell"                 
-## [4] "perivascular cell"            "supporting cell"
+
## [1] "blood cell"                   "hematopoietic precursor cell" "lining cell"                  "perivascular cell"           
+## [5] "supporting cell"

It looks like I am losing a few terms I already said were not specific and then a few other terms, like “hematopoietic precursor cell” and “perivascular cell”. I’ll look at both of those to confirm we would @@ -2889,16 +3909,18 @@

Conclusions

matches that have the label hematopoietic precursor cell.
  • The LCA should have equal to or less than 170 total descendants.
  • -
  • We whould include the term for neuron even though it -has 500 descendants.
  • -
  • Terms that are too broad (like supporting cell, -blood cell, bone cell, -lining cell) should be removed.
  • +
  • We whould include the term for neuron and +epithelial cell even though they do not pass the threshold +for number of descendants.
  • +
  • Terms that are too broad should be removed. This includes: +lining cell, blood cell, +progenitor cell, bone cell, and +supporting cell
  • Alternatively, rather than eliminate terms that are too broad we could look at the similarity index for individual matches and decide on a case by case basis if those should be allowed. Although I still think -having a term that is too braod, even if it’s a good match, is not super +having a term that is too broad, even if it’s a good match, is not super informative.

    @@ -2906,7 +3928,7 @@

    Session info

    sessionInfo()
    ## R version 4.4.2 (2024-10-31)
     ## Platform: aarch64-apple-darwin20
    -## Running under: macOS Sonoma 14.4
    +## Running under: macOS Sequoia 15.2
     ## 
     ## Matrix products: default
     ## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
    @@ -2925,40 +3947,37 @@ 

    Session info

    ## [1] ggplot2_3.5.1 ## ## loaded via a namespace (and not attached): -## [1] RColorBrewer_1.1-3 jsonlite_1.8.9 magrittr_2.0.3 gypsum_1.2.0 -## [5] farver_2.1.2 rmarkdown_2.29 zlibbioc_1.52.0 vctrs_0.6.5 -## [9] memoise_2.0.1 DelayedMatrixStats_1.28.0 htmltools_0.5.8.1 S4Arrays_1.6.0 -## [13] polynom_1.4-1 AnnotationHub_3.14.0 curl_6.0.1 Rhdf5lib_1.28.0 -## [17] SparseArray_1.6.0 rhdf5_2.50.0 sass_0.4.9 alabaster.base_1.6.1 -## [21] bslib_0.8.0 htmlwidgets_1.6.4 httr2_1.0.7 cachem_1.1.0 -## [25] igraph_2.1.1 mime_0.12 lifecycle_1.0.4 pkgconfig_2.0.3 -## [29] Matrix_1.7-1 R6_2.5.1 fastmap_1.2.0 GenomeInfoDbData_1.2.13 -## [33] MatrixGenerics_1.18.0 shiny_1.9.1 digest_0.6.37 colorspace_2.1-1 -## [37] AnnotationDbi_1.68.0 S4Vectors_0.44.0 rprojroot_2.0.4 ExperimentHub_2.14.0 -## [41] GenomicRanges_1.58.0 RSQLite_2.3.9 filelock_1.0.3 labeling_0.4.3 -## [45] fansi_1.0.6 httr_1.4.7 polyclip_1.10-7 abind_1.4-8 -## [49] compiler_4.4.2 bit64_4.5.2 withr_3.0.2 DBI_1.2.3 -## [53] ontologySimilarity_2.7 HDF5Array_1.34.0 ggforce_0.4.2 alabaster.ranges_1.6.0 -## [57] alabaster.schemas_1.6.0 MASS_7.3-61 quantreg_5.99.1 rappdirs_0.3.3 -## [61] DelayedArray_0.32.0 ggpp_0.5.8-1 tools_4.4.2 httpuv_1.6.15 -## [65] glue_1.8.0 rhdf5filters_1.18.0 promises_1.3.2 grid_4.4.2 -## [69] generics_0.1.3 gtable_0.3.6 tzdb_0.4.0 tidyr_1.3.1 -## [73] hms_1.1.3 utf8_1.2.4 XVector_0.46.0 BiocGenerics_0.52.0 -## [77] BiocVersion_3.20.0 pillar_1.9.0 stringr_1.5.1 vroom_1.6.5 -## [81] later_1.4.1 splines_4.4.2 dplyr_1.1.4 tweenr_2.0.3 -## [85] BiocFileCache_2.14.0 lattice_0.22-6 survival_3.7-0 renv_1.0.11 -## [89] bit_4.5.0.1 SparseM_1.84-2 tidyselect_1.2.1 Biostrings_2.74.0 -## [93] knitr_1.49 ggpmisc_0.6.1 IRanges_2.40.0 ontologyPlot_1.7 -## [97] SummarizedExperiment_1.36.0 stats4_4.4.2 xfun_0.49 Biobase_2.66.0 -## [101] matrixStats_1.4.1 DT_0.33 stringi_1.8.4 UCSC.utils_1.2.0 -## [105] paintmap_1.0 yaml_2.3.10 evaluate_1.0.1 tibble_3.2.1 -## [109] Rgraphviz_2.50.0 alabaster.matrix_1.6.1 BiocManager_1.30.25 graph_1.84.0 -## [113] cli_3.6.3 ontologyIndex_2.12 xtable_1.8-4 reticulate_1.40.0 -## [117] jquerylib_0.1.4 munsell_0.5.1 Rcpp_1.0.13-1 GenomeInfoDb_1.42.1 -## [121] dbplyr_2.5.0 ontoProc_2.0.0 png_0.1-8 parallel_4.4.2 -## [125] MatrixModels_0.5-3 readr_2.1.5 blob_1.2.4 splus2R_1.3-5 -## [129] sparseMatrixStats_1.18.0 alabaster.se_1.6.0 scales_1.3.0 purrr_1.0.2 -## [133] crayon_1.5.3 rlang_1.1.4 KEGGREST_1.46.0 celldex_1.16.0
    +## [1] celldex_1.16.0 DBI_1.2.3 httr2_1.0.7 rlang_1.1.4 +## [5] magrittr_2.0.3 matrixStats_1.4.1 gypsum_1.2.0 compiler_4.4.2 +## [9] RSQLite_2.3.9 DelayedMatrixStats_1.28.0 png_0.1-8 vctrs_0.6.5 +## [13] pkgconfig_2.0.3 crayon_1.5.3 fastmap_1.2.0 dbplyr_2.5.0 +## [17] XVector_0.46.0 labeling_0.4.3 utf8_1.2.4 promises_1.3.2 +## [21] rmarkdown_2.29 tzdb_0.4.0 graph_1.84.0 UCSC.utils_1.2.0 +## [25] purrr_1.0.2 bit_4.5.0.1 xfun_0.49 zlibbioc_1.52.0 +## [29] cachem_1.1.0 splus2R_1.3-5 GenomeInfoDb_1.42.1 jsonlite_1.8.9 +## [33] blob_1.2.4 later_1.4.1 rhdf5filters_1.18.0 DelayedArray_0.32.0 +## [37] Rhdf5lib_1.28.0 parallel_4.4.2 R6_2.5.1 bslib_0.8.0 +## [41] reticulate_1.40.0 jquerylib_0.1.4 GenomicRanges_1.58.0 Rcpp_1.0.13-1 +## [45] SummarizedExperiment_1.36.0 knitr_1.49 readr_2.1.5 IRanges_2.40.0 +## [49] httpuv_1.6.15 Matrix_1.7-1 igraph_2.1.1 tidyselect_1.2.1 +## [53] abind_1.4-8 yaml_2.3.10 curl_6.0.1 ontologySimilarity_2.7 +## [57] lattice_0.22-6 tibble_3.2.1 shiny_1.9.1 Biobase_2.66.0 +## [61] withr_3.0.2 KEGGREST_1.46.0 evaluate_1.0.1 ontologyIndex_2.12 +## [65] BiocFileCache_2.14.0 alabaster.schemas_1.6.0 ExperimentHub_2.14.0 Biostrings_2.74.0 +## [69] pillar_1.9.0 BiocManager_1.30.25 filelock_1.0.3 MatrixGenerics_1.18.0 +## [73] DT_0.33 renv_1.0.11 stats4_4.4.2 generics_0.1.3 +## [77] vroom_1.6.5 rprojroot_2.0.4 BiocVersion_3.20.0 S4Vectors_0.44.0 +## [81] hms_1.1.3 sparseMatrixStats_1.18.0 munsell_0.5.1 scales_1.3.0 +## [85] alabaster.base_1.6.1 xtable_1.8-4 glue_1.8.0 alabaster.ranges_1.6.0 +## [89] alabaster.matrix_1.6.1 tools_4.4.2 ontologyPlot_1.7 AnnotationHub_3.14.0 +## [93] ontoProc_2.0.0 rhdf5_2.50.0 grid_4.4.2 tidyr_1.3.1 +## [97] AnnotationDbi_1.68.0 colorspace_2.1-1 GenomeInfoDbData_1.2.13 HDF5Array_1.34.0 +## [101] cli_3.6.3 rappdirs_0.3.3 fansi_1.0.6 S4Arrays_1.6.0 +## [105] dplyr_1.1.4 Rgraphviz_2.50.0 gtable_0.3.6 alabaster.se_1.6.0 +## [109] sass_0.4.9 digest_0.6.37 BiocGenerics_0.52.0 paintmap_1.0 +## [113] SparseArray_1.6.0 htmlwidgets_1.6.4 farver_2.1.2 memoise_2.0.1 +## [117] htmltools_0.5.8.1 lifecycle_1.0.4 httr_1.4.7 mime_0.12 +## [121] bit64_4.5.2