diff --git a/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.Rmd b/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.Rmd index 7c9eff4d2..472661dd1 100644 --- a/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.Rmd +++ b/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.Rmd @@ -84,7 +84,7 @@ cl_df <- data.frame( # list all ancestors and descendants calculate total ancestors = list(ontologyIndex::get_ancestors(cl_ont, cl_ontology)), total_ancestors = length(ancestors), - descendants = list(ontologyIndex::get_descendants(cl_ont, cl_ontology)), + descendants = list(ontologyIndex::get_descendants(cl_ont, cl_ontology, exclude_roots = TRUE)), total_descendants = length(descendants) ) ``` @@ -187,7 +187,7 @@ Ultimately, I would like to see if we can use that cutoff to decide if we should ```{r} # first set up the graph from cl ont -parent_terms <- cl$parents +parent_terms <- cl_ont$parents cl_graph <- igraph::make_graph(rbind(unlist(parent_terms), rep(names(parent_terms), lengths(parent_terms)))) ``` @@ -384,6 +384,7 @@ print_df |> ``` I'm torn on this one, because I do think it's helpful to know if something is of the myeloid lineage, but if we aren't keeping lymphocyte then I would argue we shouldn't keep myeloid leukocyte. +Noting that after discussion we have decided to keep this one since T and B cells are much easier to differentiate based on gene expression alone than cells that are party of the myeloid lineage. #### Progenitor cell @@ -395,6 +396,7 @@ print_df |> ``` Same with `progenitor cell`, I do think it could be helpful to know that something may be a progenitor cell, but when you have a cell with the label for HSC and the label for cells like monocytes or osteoblasts, then maybe we are talking about a tumor cell instead. +After discussion, we are going to remove progenitor cells. Along those same lines, I think the below terms, `lining cell` and `supporting cell`, are too broad even though they have few descendants. @@ -426,13 +428,13 @@ lca_df |> unique() ``` -The only term in this list that I would be concerned about losing is "neuron". +The only terms in this list that I would be concerned about losing are "neuron" and epithelial cells. Let's look at those combinations. #### Neuron ```{r} -# blood cell +# neuron print_df |> dplyr::filter(cl_annotation == "neuron") ``` @@ -440,6 +442,17 @@ print_df |> It looks like there are a lot of types of neurons in the PanglaoDB reference and only "neuron" as a term in Blueprint. Even though neuron has ~ 500 descendants, I think we should keep these labels. +#### Epithelial cell + +```{r} +# epithelial cell +print_df |> + dplyr::filter(cl_annotation == "epithelial cell") +``` + +The PanglaoDB cell types seem to be more specific than the ones present in Blueprint Encode, similar to the observation with neurons. +We should keep epithelial cell. + ### Removing anything with more than 1 LCA One thing I noticed when looking at the labels that have less than the cutoff is that most of them are from scenarios where we have multiple LCAs. @@ -592,8 +605,9 @@ I would use the following criteria to come up with my whitelist: - Pairs should not have more than 1 LCA, with the exception of the matches that have the label hematopoietic precursor cell. - The LCA should have equal to or less than 170 total descendants. -- We whould include the term for `neuron` even though it has 500 descendants. -- Terms that are too broad (like `supporting cell`, `blood cell`, `bone cell`, `lining cell`) should be removed. +- We whould include the term for `neuron` and `epithelial cell` even though they do not pass the threshold for number of descendants. +- Terms that are too broad should be removed. +This includes: `lining cell`, `blood cell`, `progenitor cell`, `bone cell`, and `supporting cell` Alternatively, rather than eliminate terms that are too broad we could look at the similarity index for individual matches and decide on a case by case basis if those should be allowed. Although I still think having a term that is too broad, even if it's a good match, is not super informative. diff --git a/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.html b/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.html index 474052d82..cd0998b3e 100644 --- a/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.html +++ b/analyses/cell-type-consensus/exploratory-notebooks/01-reference-exploration.html @@ -11,7 +11,7 @@ - +
## Rows: 178 Columns: 3
-## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────
+## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): ontology_id, human_readable_value, panglao_cell_type
##
@@ -545,7 +545,7 @@ Full cell ontology
# list all ancestors and descendants calculate total
ancestors = list(ontologyIndex::get_ancestors(cl_ont, cl_ontology)),
total_ancestors = length(ancestors),
- descendants = list(ontologyIndex::get_descendants(cl_ont, cl_ontology)),
+ descendants = list(ontologyIndex::get_descendants(cl_ont, cl_ontology, exclude_roots = TRUE)),
total_descendants = length(descendants)
)
The vertical lines in the below plot indicate the value for cell @@ -597,7 +597,7 @@
It looks like most cell types have very few descendants, so let’s zoom into the area below 500 to get a better look.
ggplot(cl_df, aes(x = total_descendants)) +
@@ -619,7 +619,7 @@ Full cell ontology
## Warning: Removed 14 rows containing non-finite outside the scale range (`stat_density()`).
## Warning: Removed 3 rows containing missing values or values outside the scale range (`geom_vline()`).
## Warning: Removed 3 rows containing missing values or values outside the scale range (`geom_text()`).
-
+
Here we see a much larger range of values and that cell types become
more general as the number of descendants goes up. However, this
distribution alone is probably not helpful in determining a cutoff. The
@@ -634,7 +634,7 @@
Latest common ancestor (LCA) between PanglaoDB and Blueprint
for assigning cell types with CellAssign
) and the
BlueprintEncodeData
reference from celldex
(used for assigning cell types with SingleR
). The LCA
-refers to the latest term in the cell ontology heirarchy that is common
+refers to the latest term in the cell ontology hierarchy that is common
between two terms. I will use the ontoProc::findCommonAncestors()
function to get the LCA for each combination.
Note that it is possible to have more than one LCA for a set of
@@ -644,7 +644,7 @@
Latest common ancestor (LCA) between PanglaoDB and Blueprint
I would like to see if we can use that cutoff to decide if we should
keep the LCA term as the consensus label or use “Unknown”.
# first set up the graph from cl ont
-parent_terms <- cl$parents
+parent_terms <- cl_ont$parents
cl_graph <- igraph::make_graph(rbind(unlist(parent_terms), rep(names(parent_terms), lengths(parent_terms))))
# get a data frame with all combinations of panglao and blueprint terms
# one row for each combination
@@ -661,7 +661,7 @@ Latest common ancestor (LCA) between PanglaoDB and Blueprint
dplyr::rowwise() |>
dplyr::mutate(
# least common shared ancestor
- lca = list(rownames(ontoProc::findCommonAncestors(blueprint_ontology, panglao_ontology, g = g)))
+ lca = list(rownames(ontoProc::findCommonAncestors(blueprint_ontology, panglao_ontology, g = cl_graph)))
)
## Warning in dplyr::left_join(dplyr::left_join(dplyr::rename(expand.grid(panglao_df$panglao_ontology, : Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 49 of `x` matches multiple rows in `y`.
@@ -693,8 +693,8 @@ Latest common ancestor (LCA) between PanglaoDB and Blueprint
dplyr::mutate(lca = dplyr::if_else(blueprint_ontology == panglao_ontology, blueprint_ontology, lca)) |>
# join in information for each of the lca terms including name, number of ancestors and descendants
dplyr::left_join(cl_df, by = c("lca" = "cl_ontology"))
-## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 7967 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
-## 18, 19, 20, ...].
+## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 7967 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
+## 20, ...].
Distribution of ancestors and descendants
ggplot(lca_df, aes(x = total_ancestors)) +
@@ -728,7 +728,7 @@ Distribution of ancestors and descendants
x = "Total number of descendants",
y = "Density"
)
-
+
Let’s zoom into the area below 1000, since we already know we would
want to exlude anything above that based on this plot.
ggplot(lca_df, aes(x = total_descendants)) +
@@ -750,7 +750,7 @@ Distribution of ancestors and descendants
## Warning: Removed 6856 rows containing non-finite outside the scale range (`stat_density()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_vline()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_text()`).
-
+
We can use the vertical lines for cells of interest to help us define
a potential cutoff based on the granularity we would like to see in our
consensus label. We want to be able to label things like T cell, but we
@@ -815,7 +815,7 @@
Defining a cutoff for number of descendants
x = "cell type",
y = "Total descendants"
)
-
+
There are a few terms that I think might be more broad than we want
like blood cell
, bone cell
,
supporting cell
, and lining cell
. I’m on the
@@ -1730,7 +1730,10 @@
Myeloid leukocyte
I’m torn on this one, because I do think it’s helpful to know if
something is of the myeloid lineage, but if we aren’t keeping lymphocyte
-then I would argue we shouldn’t keep myeloid leukocyte.
+then I would argue we shouldn’t keep myeloid leukocyte. Noting that
+after discussion we have decided to keep this one since T and B cells
+are much easier to differentiate based on gene expression alone than
+cells that are party of the myeloid lineage.
Progenitor cell
@@ -1920,7 +1923,7 @@ Progenitor cell
helpful to know that something may be a progenitor cell, but when you
have a cell with the label for HSC and the label for cells like
monocytes or osteoblasts, then maybe we are talking about a tumor cell
-instead.
+instead. After discussion, we are going to remove progenitor cells.
Along those same lines, I think the below terms,
lining cell
and supporting cell
, are too broad
even though they have few descendants.
@@ -2090,11 +2093,11 @@ Discarded cell types
## [13] "secretory cell" "connective tissue cell" "electrically responsive cell"
## [16] "contractile cell" "epithelial cell" "neuron"
## [19] "neural cell"
-The only term in this list that I would be concerned about losing is
-“neuron”. Let’s look at those combinations.
+The only terms in this list that I would be concerned about losing
+are “neuron” and epithelial cells. Let’s look at those combinations.
Neuron
-# blood cell
+# neuron
print_df |>
dplyr::filter(cl_annotation == "neuron")
@@ -2329,6 +2332,1023 @@ Neuron
reference and only “neuron” as a term in Blueprint. Even though neuron
has ~ 500 descendants, I think we should keep these labels.
+
+Epithelial cell
+# epithelial cell
+print_df |>
+ dplyr::filter(cl_annotation == "epithelial cell")
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+blueprint_ontology
+blueprint_annotation_main
+blueprint_annotation_fine
+panglao_ontology
+panglao_annotation
+total_lca
+lca
+cl_annotation
+
+
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000622
+acinar cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:1000488
+cholangiocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000166
+chromaffin cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000584
+enterocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000164
+enteroendocrine cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000065
+ependymal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000066
+epithelial cell
+0
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000160
+goblet cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000501
+granulosa cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000182
+hepatocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0005006
+ionocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000312
+keratinocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000077
+mesothelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000185
+myoepithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000165
+neuroendocrine cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002167
+olfactory epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000510
+paneth cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000162
+parietal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002481
+peritubular myoid cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000652
+pinealocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000653
+podocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000209
+taste receptor cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000731
+urothelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002368
+respiratory epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002370
+respiratory goblet cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000171
+pancreatic A cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000169
+type B pancreatic cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000706
+choroid plexus epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000158
+club cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002250
+intestinal crypt stem cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000173
+pancreatic D cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002305
+epithelial cell of distal tubule
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002079
+pancreatic ductal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000504
+enterochromaffin-like cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0005019
+pancreatic epsilon cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002258
+thyroid follicular cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002179
+foveolar cell of stomach
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000696
+PP cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000155
+peptic cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002292
+type I cell of carotid body
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0005010
+renal intercalated cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:1000909
+kidney loop of Henle epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002326
+luminal epithelial cell of mammary gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002327
+mammary gland epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000242
+Merkel cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000682
+M cell of gut
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002199
+oxyphil cell of parathyroid gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000446
+chief cell of parathyroid gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0005009
+renal principal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002306
+epithelial cell of proximal tubule
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002062
+pulmonary alveolar type 1 cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002063
+pulmonary alveolar type 2 cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:1001596
+salivary gland glandular cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002140
+acinar cell of sebaceous gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000216
+Sertoli cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002562
+hair germinal matrix cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002204
+brush cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000622
+acinar cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:1000488
+cholangiocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000584
+enterocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000164
+enteroendocrine cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000066
+epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000160
+goblet cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000501
+granulosa cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000182
+hepatocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0005006
+ionocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000185
+myoepithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000510
+paneth cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000162
+parietal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000653
+podocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000209
+taste receptor cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000731
+urothelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002368
+respiratory epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002370
+respiratory goblet cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000171
+pancreatic A cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000169
+type B pancreatic cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000158
+club cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002250
+intestinal crypt stem cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000173
+pancreatic D cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002305
+epithelial cell of distal tubule
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002079
+pancreatic ductal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000504
+enterochromaffin-like cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0005019
+pancreatic epsilon cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002258
+thyroid follicular cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002179
+foveolar cell of stomach
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000696
+PP cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000155
+peptic cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0005010
+renal intercalated cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:1000909
+kidney loop of Henle epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002326
+luminal epithelial cell of mammary gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002327
+mammary gland epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000682
+M cell of gut
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002199
+oxyphil cell of parathyroid gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000446
+chief cell of parathyroid gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0005009
+renal principal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002306
+epithelial cell of proximal tubule
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:1001596
+salivary gland glandular cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002204
+brush cell
+1
+CL:0000066
+epithelial cell
+
+
+
+
+The PanglaoDB cell types seem to be more specific than the ones
+present in Blueprint Encode, similar to the observation with neurons. We
+should keep epithelial cell.
+
Removing anything with more than 1 LCA
@@ -2354,99 +3374,99 @@ Removing anything with more than 1 LCA
bone cell
-39
+38
blood cell
-42
+41
perivascular cell
-42
+41
stromal cell
-54
+53
supporting cell
-62
+61
hematopoietic precursor cell
-106
+105
lining cell
-121
+120
myeloid leukocyte
-166
+165
progenitor cell
-166
+165
mononuclear phagocyte
-170
+169
phagocyte (sensu Vertebrata)
-176
+175
contractile cell
-178
+177
defensive cell
-200
+199
professional antigen presenting cell
-213
+212
connective tissue cell
-224
+223
myeloid cell
-248
+247
stuff accumulating cell
-267
+266
precursor cell
-272
+271
secretory cell
-458
+457
mononuclear cell
-504
+503
leukocyte
-541
+540
electrically responsive cell
-674
+673
hematopoietic cell
-685
+684
eukaryotic cell
-2646
+2645
@@ -2472,8 +3492,8 @@ Removing anything with more than 1 LCA
# which cell types are now missing from the list to keep
setdiff(celltypes_to_keep, updated_celltypes)
-## [1] "blood cell" "hematopoietic precursor cell" "lining cell"
-## [4] "perivascular cell" "supporting cell"
+## [1] "blood cell" "hematopoietic precursor cell" "lining cell" "perivascular cell"
+## [5] "supporting cell"
It looks like I am losing a few terms I already said were not
specific and then a few other terms, like “hematopoietic precursor cell”
and “perivascular cell”. I’ll look at both of those to confirm we would
@@ -2889,16 +3909,18 @@
Conclusions
matches that have the label hematopoietic precursor cell.
The LCA should have equal to or less than 170 total
descendants.
-We whould include the term for neuron
even though it
-has 500 descendants.
-Terms that are too broad (like supporting cell
,
-blood cell
, bone cell
,
-lining cell
) should be removed.
+We whould include the term for neuron
and
+epithelial cell
even though they do not pass the threshold
+for number of descendants.
+Terms that are too broad should be removed. This includes:
+lining cell
, blood cell
,
+progenitor cell
, bone cell
, and
+supporting cell
Alternatively, rather than eliminate terms that are too broad we
could look at the similarity index for individual matches and decide on
a case by case basis if those should be allowed. Although I still think
-having a term that is too braod, even if it’s a good match, is not super
+having a term that is too broad, even if it’s a good match, is not super
informative.
@@ -2906,7 +3928,7 @@ Session info
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
-## Running under: macOS Sonoma 14.4
+## Running under: macOS Sequoia 15.2
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
@@ -2925,40 +3947,37 @@ Session info
## [1] ggplot2_3.5.1
##
## loaded via a namespace (and not attached):
-## [1] RColorBrewer_1.1-3 jsonlite_1.8.9 magrittr_2.0.3 gypsum_1.2.0
-## [5] farver_2.1.2 rmarkdown_2.29 zlibbioc_1.52.0 vctrs_0.6.5
-## [9] memoise_2.0.1 DelayedMatrixStats_1.28.0 htmltools_0.5.8.1 S4Arrays_1.6.0
-## [13] polynom_1.4-1 AnnotationHub_3.14.0 curl_6.0.1 Rhdf5lib_1.28.0
-## [17] SparseArray_1.6.0 rhdf5_2.50.0 sass_0.4.9 alabaster.base_1.6.1
-## [21] bslib_0.8.0 htmlwidgets_1.6.4 httr2_1.0.7 cachem_1.1.0
-## [25] igraph_2.1.1 mime_0.12 lifecycle_1.0.4 pkgconfig_2.0.3
-## [29] Matrix_1.7-1 R6_2.5.1 fastmap_1.2.0 GenomeInfoDbData_1.2.13
-## [33] MatrixGenerics_1.18.0 shiny_1.9.1 digest_0.6.37 colorspace_2.1-1
-## [37] AnnotationDbi_1.68.0 S4Vectors_0.44.0 rprojroot_2.0.4 ExperimentHub_2.14.0
-## [41] GenomicRanges_1.58.0 RSQLite_2.3.9 filelock_1.0.3 labeling_0.4.3
-## [45] fansi_1.0.6 httr_1.4.7 polyclip_1.10-7 abind_1.4-8
-## [49] compiler_4.4.2 bit64_4.5.2 withr_3.0.2 DBI_1.2.3
-## [53] ontologySimilarity_2.7 HDF5Array_1.34.0 ggforce_0.4.2 alabaster.ranges_1.6.0
-## [57] alabaster.schemas_1.6.0 MASS_7.3-61 quantreg_5.99.1 rappdirs_0.3.3
-## [61] DelayedArray_0.32.0 ggpp_0.5.8-1 tools_4.4.2 httpuv_1.6.15
-## [65] glue_1.8.0 rhdf5filters_1.18.0 promises_1.3.2 grid_4.4.2
-## [69] generics_0.1.3 gtable_0.3.6 tzdb_0.4.0 tidyr_1.3.1
-## [73] hms_1.1.3 utf8_1.2.4 XVector_0.46.0 BiocGenerics_0.52.0
-## [77] BiocVersion_3.20.0 pillar_1.9.0 stringr_1.5.1 vroom_1.6.5
-## [81] later_1.4.1 splines_4.4.2 dplyr_1.1.4 tweenr_2.0.3
-## [85] BiocFileCache_2.14.0 lattice_0.22-6 survival_3.7-0 renv_1.0.11
-## [89] bit_4.5.0.1 SparseM_1.84-2 tidyselect_1.2.1 Biostrings_2.74.0
-## [93] knitr_1.49 ggpmisc_0.6.1 IRanges_2.40.0 ontologyPlot_1.7
-## [97] SummarizedExperiment_1.36.0 stats4_4.4.2 xfun_0.49 Biobase_2.66.0
-## [101] matrixStats_1.4.1 DT_0.33 stringi_1.8.4 UCSC.utils_1.2.0
-## [105] paintmap_1.0 yaml_2.3.10 evaluate_1.0.1 tibble_3.2.1
-## [109] Rgraphviz_2.50.0 alabaster.matrix_1.6.1 BiocManager_1.30.25 graph_1.84.0
-## [113] cli_3.6.3 ontologyIndex_2.12 xtable_1.8-4 reticulate_1.40.0
-## [117] jquerylib_0.1.4 munsell_0.5.1 Rcpp_1.0.13-1 GenomeInfoDb_1.42.1
-## [121] dbplyr_2.5.0 ontoProc_2.0.0 png_0.1-8 parallel_4.4.2
-## [125] MatrixModels_0.5-3 readr_2.1.5 blob_1.2.4 splus2R_1.3-5
-## [129] sparseMatrixStats_1.18.0 alabaster.se_1.6.0 scales_1.3.0 purrr_1.0.2
-## [133] crayon_1.5.3 rlang_1.1.4 KEGGREST_1.46.0 celldex_1.16.0
+## [1] celldex_1.16.0 DBI_1.2.3 httr2_1.0.7 rlang_1.1.4
+## [5] magrittr_2.0.3 matrixStats_1.4.1 gypsum_1.2.0 compiler_4.4.2
+## [9] RSQLite_2.3.9 DelayedMatrixStats_1.28.0 png_0.1-8 vctrs_0.6.5
+## [13] pkgconfig_2.0.3 crayon_1.5.3 fastmap_1.2.0 dbplyr_2.5.0
+## [17] XVector_0.46.0 labeling_0.4.3 utf8_1.2.4 promises_1.3.2
+## [21] rmarkdown_2.29 tzdb_0.4.0 graph_1.84.0 UCSC.utils_1.2.0
+## [25] purrr_1.0.2 bit_4.5.0.1 xfun_0.49 zlibbioc_1.52.0
+## [29] cachem_1.1.0 splus2R_1.3-5 GenomeInfoDb_1.42.1 jsonlite_1.8.9
+## [33] blob_1.2.4 later_1.4.1 rhdf5filters_1.18.0 DelayedArray_0.32.0
+## [37] Rhdf5lib_1.28.0 parallel_4.4.2 R6_2.5.1 bslib_0.8.0
+## [41] reticulate_1.40.0 jquerylib_0.1.4 GenomicRanges_1.58.0 Rcpp_1.0.13-1
+## [45] SummarizedExperiment_1.36.0 knitr_1.49 readr_2.1.5 IRanges_2.40.0
+## [49] httpuv_1.6.15 Matrix_1.7-1 igraph_2.1.1 tidyselect_1.2.1
+## [53] abind_1.4-8 yaml_2.3.10 curl_6.0.1 ontologySimilarity_2.7
+## [57] lattice_0.22-6 tibble_3.2.1 shiny_1.9.1 Biobase_2.66.0
+## [61] withr_3.0.2 KEGGREST_1.46.0 evaluate_1.0.1 ontologyIndex_2.12
+## [65] BiocFileCache_2.14.0 alabaster.schemas_1.6.0 ExperimentHub_2.14.0 Biostrings_2.74.0
+## [69] pillar_1.9.0 BiocManager_1.30.25 filelock_1.0.3 MatrixGenerics_1.18.0
+## [73] DT_0.33 renv_1.0.11 stats4_4.4.2 generics_0.1.3
+## [77] vroom_1.6.5 rprojroot_2.0.4 BiocVersion_3.20.0 S4Vectors_0.44.0
+## [81] hms_1.1.3 sparseMatrixStats_1.18.0 munsell_0.5.1 scales_1.3.0
+## [85] alabaster.base_1.6.1 xtable_1.8-4 glue_1.8.0 alabaster.ranges_1.6.0
+## [89] alabaster.matrix_1.6.1 tools_4.4.2 ontologyPlot_1.7 AnnotationHub_3.14.0
+## [93] ontoProc_2.0.0 rhdf5_2.50.0 grid_4.4.2 tidyr_1.3.1
+## [97] AnnotationDbi_1.68.0 colorspace_2.1-1 GenomeInfoDbData_1.2.13 HDF5Array_1.34.0
+## [101] cli_3.6.3 rappdirs_0.3.3 fansi_1.0.6 S4Arrays_1.6.0
+## [105] dplyr_1.1.4 Rgraphviz_2.50.0 gtable_0.3.6 alabaster.se_1.6.0
+## [109] sass_0.4.9 digest_0.6.37 BiocGenerics_0.52.0 paintmap_1.0
+## [113] SparseArray_1.6.0 htmlwidgets_1.6.4 farver_2.1.2 memoise_2.0.1
+## [117] htmltools_0.5.8.1 lifecycle_1.0.4 httr_1.4.7 mime_0.12
+## [121] bit64_4.5.2