From 9f10320acb6b39a32f351faa90cb4111418065ef Mon Sep 17 00:00:00 2001
From: Ally Hawkins This notebook aims to identify a set of consensus labels between cell
+types in the PanglaoDB and Blueprint Encode references. Below I will calculate the total number of ancestors and the total
+number of descendants for each term in the full cell type ontology and
+then show the distributions for those statistics. This will give us an
+idea of the range of values we expect to see when looking at the
+PanglaoDB and Blueprint Encode references. The vertical lines in the below plot indicate the value for cell
+types of varying granularity. Generally it looks like as the cell types get more specific we see a
+greater number of ancestors. However, the range of values is small and
+we see some cell types have the same value and probably not the same
+level of granularity. Below we will look at total number of descendants. It looks like most cell types have very few descendants, so let’s
+zoom into the area below 500 to get a better look. Here we see a much larger range of values and that cell types become
+more general as the number of descendants goes up. However, this
+distribution alone is probably not helpful in determining a cutoff. The
+next section we will look at this distribution specifically for cell
+types present in our references, PanglaoDB and Blueprint encode. This section will look at identifying the latest common ancestor
+(LCA) between all possible combinations of terms from PanglaoDB (used
+for assigning cell types with Note that it is possible to have more than one LCA for a set of
+terms. To start, I will keep all LCA terms found. For each LCA, I will again look at the total number of ancestors and
+descendants and see if I can identify an appropriate cutoff. Ultimately,
+I would like to see if we can use that cutoff to decide if we should
+keep the LCA term as the consensus label or use “Unknown”. Let’s zoom into the area below 1000, since we already know we would
+want to exlude anything above that based on this plot. We can use the vertical lines for cells of interest to help us define
+a potential cutoff based on the granularity we would like to see in our
+consensus label. We want to be able to label things like T cell, but we
+don’t want to label anything as lymphocyte as that’s probably not
+helpful. I don’t see any obvious cutoffs that may be present in the
+total number of ancestors, but the number of descendants is likely to be
+informative. I think it might be a good idea to start by drawing a line
+at the local maxima between the T cell and lymphocyte lines on the
+number of descendants graph. First we will find the value for the first peak shown in the
+distribution. This is likely to be a good cutoff for deciding which LCA
+labels to keep. Below is the list of all consensus cell type labels that we will be
+keeping if we were to just use this cutoff. We can also look at all the cell types we are keeping and the total
+number of descendants to see if there are any that may be we don’t want
+to include because the term is too braod. There are a few terms that I think might be more broad than we want
+like One could also argue to remove Below are tables that look specifically at the combinations of cell
+type annotations that resulted in some of the terms that I might
+consider removing. I think I’m in favor of not having a “blood cell” label, since I’m
+not sure that it’s helpful. Also, if two different methods label
+something a platelet and a neutrophil, then perhaps that label is
+inaccurate and it’s really a tumor cell. I think I would also remove bone cell, since hematopoietic stem cells
+and osteoclasts seem pretty different to me. I’m torn on this one, because I do think it’s helpful to know if
+something is of the myeloid lineage, but if we aren’t keeping lymphocyte
+then I would argue we shouldn’t keep myeloid leukocyte. Same with Along those same lines, I think the below terms,
+ We can also look at what cell type labels we are excluding when using
+this cut off to see if there are any terms we might actually want to
+keep instead. The only term in this list that I would be concerned about losing is
+“neuron”. Let’s look at those combinations. It looks like there are a lot of types of neurons in the PanglaoDB
+reference and only “neuron” as a term in Blueprint. Even though neuron
+has ~ 500 descendants, I think we should keep these labels. One thing I noticed when looking at the labels that have less than
+the cutoff is that most of them are from scenarios where we have
+multiple LCAs. Maybe in the case where we have multiple LCAs we are
+already too broad and we should just eliminate those matches from the
+beginning. Here I’m looking at the total number of descendants for all
+terms that show up because a term has multiple LCAs. It looks like most of these terms are pretty broad and are either
+much higher than the cutoff or right around the cutoff with a few
+exceptions. Things like “bone cell” and “supporting cell” have few
+descendants, but I would still argue these are very broad terms and not
+useful. I’m going to filter out any matches that show two LCA terms first and
+then use the cutoff to define labels we would keep. I’ll also look to
+see what cell types we lose when we add this extra filtering step to be
+sure they are ones that we want to lose. It looks like I am losing a few terms I already said were not
+specific and then a few other terms, like “hematopoietic precursor cell”
+and “perivascular cell”. I’ll look at both of those to confirm we would
+not want them. It looks like here we should be keeping these matches because both
+references have these labels as hematopoietic stem and progenitor cells.
+I think in the context of pediatric cancer having this label would be
+helpful, so maybe we shouldn’t remove all terms that have 2 LCAs. Let’s look at what the other LCA is for an example set. It looks like these terms have both
+ I would remove An alternative approach would be to calculate the similarity
+index between each set of terms and define a cutoff for which set of
+terms are similar. This is a value on a 0-1 scale where 0 indicates no
+similarity and 1 indicates the terms are equal. Although this could provide a metric that we could use to define
+similar cell types, we would still have to identify the label to use
+which would most likely be the LCA. Even if the similarity index is
+close to 1, if the LCA term is not informative then I don’t know that we
+would want to use that. However, we could use this to finalize the actual pairs of terms that
+we trust. For example, if the LCA for a pair is Below I’ll calculate the similarity index for each set of terms and
+plot the distribution. Then we will look at the values for pairs that
+have an LCA that pass the total descendants threshold we set to see if
+those pairs have a higher similarity index. This looks as I expected with most of the pairs that pass the total
+descendants cutoff having a higher similarity index than those that do
+not pass. There is still some overlap though so perhaps even if a set of
+terms shares an LCA that passes the threshold, the actual terms being
+compared may be further apart than we would like. Now let’s look at the similarity index for various LCA terms. Here
+each LCA term is its own plot and the vertical lines are the similarity
+index for each pair of terms that results in that LCA. It looks like terms that are more granular like T and B cell have
+higher similarity index values than terms that are less granular which
+is what we would expect. However, within terms like myeloid leukocyte
+and even T cell we do see a range of values. We could dig deeper into
+which pairs are resulting in which similarity index values if we wanted
+to, but I think that might be a future direction if we feel like the
+similarity index is something that could be useful. Based on these findings, I think it might be best to create a
+reference that has all possible pairs of labels between PanglaoDB and
+Blueprint Encode and the resulting consensus label for those pairs. To
+do this we could come up with a whitelist of LCA terms that we would be
+comfortable including and all other cell types would be unknowns. I
+would use the following criteria to come up with my whitelist: Alternatively, rather than eliminate terms that are too broad we
+could look at the similarity index for individual matches and decide on
+a case by case basis if those should be allowed. Although I still think
+having a term that is too braod, even if it’s a good match, is not super
+informative. The vertical lines in the below plot indicate the value for cell
@@ -597,7 +597,7 @@ It looks like most cell types have very few descendants, so let’s
zoom into the area below 500 to get a better look. Here we see a much larger range of values and that cell types become
more general as the number of descendants goes up. However, this
distribution alone is probably not helpful in determining a cutoff. The
@@ -634,7 +634,7 @@ Summary of cell type ontologies in
+reference files
+Ally Hawkins
+2024-12-12
+
+
+
+Setup
+
+suppressPackageStartupMessages({
+ # load required packages
+ library(ggplot2)
+})
+
+# Set default ggplot theme
+theme_set(
+ theme_bw()
+)
+# The base path for the OpenScPCA repository, found by its (hidden) .git directory
+repository_base <- rprojroot::find_root(rprojroot::is_git_root)
+
+# The path to this module
+ref_dir <- file.path(repository_base, "analyses", "cell-type-consensus", "references")
+
+# path to ref file for panglao
+panglao_file <- file.path(ref_dir, "panglao-cell-type-ontologies.tsv")
+# grab obo file
+cl_ont <- ontologyIndex::get_ontology("http://purl.obolibrary.org/obo/cl-basic.obo")
+
+# read in panglao file
+panglao_df <- readr::read_tsv(panglao_file) |>
+ # rename columns to have panglao in them for easy joining later
+ dplyr::select(
+ panglao_ontology = "ontology_id",
+ panglao_annotation = "human_readable_value"
+ )
+## Rows: 178 Columns: 3
+## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────
+## Delimiter: "\t"
+## chr (3): ontology_id, human_readable_value, panglao_cell_type
+##
+## ℹ Use `spec()` to retrieve the full column specification for this data.
+## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
+# grab singler ref from celldex
+blueprint_ref <- celldex::BlueprintEncodeData()
+# get ontologies and human readable name into data frame
+blueprint_df <- data.frame(
+ blueprint_ontology = blueprint_ref$label.ont,
+ blueprint_annotation_main = blueprint_ref$label.main,
+ blueprint_annotation_fine = blueprint_ref$label.fine
+) |>
+ unique()
Full cell ontology
+
+# turn cl_ont into data frame with one row per term
+cl_df <- data.frame(
+ cl_ontology = cl_ont$id,
+ cl_annotation = cl_ont$name
+) |>
+ dplyr::rowwise() |>
+ dplyr::mutate(
+ # list all ancestors and descendants calculate total
+ ancestors = list(ontologyIndex::get_ancestors(cl_ont, cl_ontology)),
+ total_ancestors = length(ancestors),
+ descendants = list(ontologyIndex::get_descendants(cl_ont, cl_ontology)),
+ total_descendants = length(descendants)
+ )
+celltypes_of_interest <- c("eukaryotic cell", "lymphocyte", "leukocyte", "hematopoietic cell", "T cell", "endothelial cell", "smooth muscle cell", "memory T cell")
+line_df <- cl_df |>
+ dplyr::filter(cl_annotation %in% celltypes_of_interest) |>
+ dplyr::select(cl_annotation, total_descendants, total_ancestors) |>
+ unique()
+
+# group any labels that have the same number of ancestors
+ancestor_labels_df <- line_df |>
+ dplyr::group_by(total_ancestors) |>
+ dplyr::summarise(cl_annotation = paste(cl_annotation, collapse = ","))
+
+# make density plots showing distribution of ancestors and descendants
+ggplot(cl_df, aes(x = total_ancestors)) +
+ geom_density(fill = "#00274C", alpha = 0.5) +
+ geom_vline(data = ancestor_labels_df,
+ mapping = aes(xintercept = total_ancestors),
+ lty = 2) +
+ geom_text(
+ data = ancestor_labels_df,
+ mapping = aes(x = total_ancestors, y = 0.04, label = cl_annotation),
+ angle = 90,
+ vjust = -0.5
+ ) +
+ labs(
+ x = "Number of ancestors",
+ y = "Density"
+ )
+
+ggplot(cl_df, aes(x = total_descendants)) +
+ geom_density(fill = "#FFCB05", alpha = 0.5) +
+ geom_vline(data = line_df,
+ mapping = aes(xintercept = total_descendants),
+ lty = 2) +
+ geom_text(
+ data = line_df,
+ mapping = aes(x = total_descendants, y = 0.6, label = cl_annotation),
+ angle = 90,
+ vjust = -0.5
+ ) +
+ labs(
+ x = "Number of descendants",
+ y = "Density"
+ )
+ggplot(cl_df, aes(x = total_descendants)) +
+ geom_density(fill = "#FFCB05", alpha = 0.5) +
+ geom_vline(data = line_df,
+ mapping = aes(xintercept = total_descendants),
+ lty = 2) +
+ geom_text(
+ data = line_df,
+ mapping = aes(x = total_descendants, y = 0.6, label = cl_annotation),
+ angle = 90,
+ vjust = -0.5
+ ) +
+ labs(
+ x = "Number of descendants",
+ y = "Density"
+ ) +
+ xlim(c(0,500))
+## Warning: Removed 14 rows containing non-finite outside the scale range (`stat_density()`).
+## Warning: Removed 3 rows containing missing values or values outside the scale range (`geom_vline()`).
+
+## Warning: Removed 3 rows containing missing values or values outside the scale range (`geom_text()`).
Latest common ancestor (LCA) between PanglaoDB and Blueprint
+encode
+CellAssign
) and the
+BlueprintEncodeData
reference from celldex
+(used for assigning cell types with SingleR
). The LCA
+refers to the latest term in the cell ontology heirarchy that is common
+between two terms. I will use the ontoProc::findCommonAncestors()
+function to get the LCA for each combination.
+# first set up the graph from cl ont
+parent_terms <- cl$parents
+cl_graph <- igraph::make_graph(rbind(unlist(parent_terms), rep(names(parent_terms), lengths(parent_terms))))
+# get a data frame with all combinations of panglao and blueprint terms
+# one row for each combination
+all_ref_df <- expand.grid(panglao_df$panglao_ontology,
+ blueprint_df$blueprint_ontology) |>
+ dplyr::rename(
+ panglao_ontology = "Var1",
+ blueprint_ontology = "Var2"
+ ) |>
+ # add in the human readable values for each ontology term
+ dplyr::left_join(blueprint_df, by = "blueprint_ontology") |>
+ dplyr::left_join(panglao_df, by = "panglao_ontology") |>
+ tidyr::drop_na() |>
+ dplyr::rowwise() |>
+ dplyr::mutate(
+ # least common shared ancestor
+ lca = list(rownames(ontoProc::findCommonAncestors(blueprint_ontology, panglao_ontology, g = g)))
+ )
+## Warning in dplyr::left_join(dplyr::left_join(dplyr::rename(expand.grid(panglao_df$panglao_ontology, : Detected an unexpected many-to-many relationship between `x` and `y`.
+## ℹ Row 49 of `x` matches multiple rows in `y`.
+## ℹ Row 99 of `y` matches multiple rows in `x`.
+## ℹ If a many-to-many relationship is expected, set `relationship = "many-to-many"` to silence this warning.
+## Warning: There were 23859 warnings in `dplyr::mutate()`.
+## The first warning was:
+## ℹ In argument: `lca = list(...)`.
+## ℹ In row 1.
+## Caused by warning in `dim()`:
+## ! The dim() method for DataFrameList objects is deprecated. Please use dims() on these objects instead.
+## ℹ Run `dplyr::last_dplyr_warnings()` to see the 23858 remaining warnings.
+lca_df <- all_ref_df |>
+ dplyr::mutate(
+ total_lca = length(lca), # max is three terms
+ lca = paste0(lca, collapse = ",") # make it easier to split the df
+ ) |>
+ # split each lca term into its own column
+ tidyr::separate(lca, into = c("lca_1", "lca_2", "lca_3"), sep = ",") |>
+ tidyr::pivot_longer(
+ cols = dplyr::starts_with("lca"),
+ names_to = "lca_number",
+ values_to = "lca"
+ ) |>
+ tidyr::drop_na() |>
+ dplyr::select(-lca_number) |>
+ # account for any cases where the ontology IDs are exact matches
+ # r complains about doing this earlier since the lca column holds lists until now
+ dplyr::mutate(lca = dplyr::if_else(blueprint_ontology == panglao_ontology, blueprint_ontology, lca)) |>
+ # join in information for each of the lca terms including name, number of ancestors and descendants
+ dplyr::left_join(cl_df, by = c("lca" = "cl_ontology"))
+## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 7967 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
+## 18, 19, 20, ...].
Distribution of ancestors and descendants
+
+
+ggplot(lca_df, aes(x = total_ancestors)) +
+ geom_density() +
+ geom_vline(data = ancestor_labels_df,
+ mapping = aes(xintercept = total_ancestors),
+ lty = 2) +
+ geom_text(
+ data = ancestor_labels_df,
+ mapping = aes(x = total_ancestors, y = 0.6, label = cl_annotation),
+ angle = 90,
+ vjust = -0.5
+ ) +
+ labs(
+ x = "Total number of ancestors",
+ y = "Density"
+ )
+
+ggplot(lca_df, aes(x = total_descendants)) +
+ geom_density() +
+ geom_vline(data = line_df,
+ mapping = aes(xintercept = total_descendants),
+ lty = 2) +
+ geom_text(
+ data = line_df,
+ mapping = aes(x = total_descendants, y = 0.002, label = cl_annotation),
+ angle = 90,
+ vjust = -0.5
+ ) +
+ labs(
+ x = "Total number of descendants",
+ y = "Density"
+ )
+ggplot(lca_df, aes(x = total_descendants)) +
+ geom_density() +
+ geom_vline(data = line_df,
+ mapping = aes(xintercept = total_descendants),
+ lty = 2) +
+ geom_text(
+ data = line_df,
+ mapping = aes(x = total_descendants, y = 0.002, label = cl_annotation),
+ angle = 90,
+ vjust = -0.5
+ ) +
+ xlim(c(0, 1000)) +
+ labs(
+ x = "Total number of descendants",
+ y = "Density"
+ )
+## Warning: Removed 6856 rows containing non-finite outside the scale range (`stat_density()`).
+## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_vline()`).
+
+## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_text()`).
Defining a cutoff for number of descendants
+
+peak_idx <- splus2R::peaks(lca_df$total_descendants)
+cutoff <- lca_df$total_descendants[peak_idx] |>
+ min() # find the smallest peak and use that as the cutoff for number of descendants
+celltypes_to_keep <- lca_df |>
+ dplyr::filter(total_descendants <= cutoff) |>
+ dplyr::pull(cl_annotation) |>
+ unique()
+
+celltypes_to_keep
+## [1] "myeloid leukocyte" "granulocyte" "neutrophil"
+## [4] "blood cell" "mononuclear phagocyte" "progenitor cell"
+## [7] "monocyte" "hematopoietic precursor cell" "T cell"
+## [10] "CD4-positive, alpha-beta T cell" "mature alpha-beta T cell" "mature T cell"
+## [13] "regulatory T cell" "memory T cell" "natural killer cell"
+## [16] "innate lymphoid cell" "B cell" "lymphocyte of B lineage"
+## [19] "mature B cell" "naive B cell" "memory B cell"
+## [22] "somatic stem cell" "stem cell" "hematopoietic stem cell"
+## [25] "bone cell" "macrophage" "erythroid lineage cell"
+## [28] "megakaryocyte" "endothelial cell" "lining cell"
+## [31] "dendritic cell" "eosinophil" "plasma cell"
+## [34] "chondrocyte" "stromal cell" "extracellular matrix secreting cell"
+## [37] "fibroblast" "smooth muscle cell" "muscle cell"
+## [40] "melanocyte" "cell of skeletal muscle" "ecto-epithelial cell"
+## [43] "keratinocyte" "squamous epithelial cell" "epidermal cell"
+## [46] "blood vessel endothelial cell" "microvascular endothelial cell" "adipocyte"
+## [49] "pericyte" "perivascular cell" "supporting cell"
+## [52] "astrocyte" "glial cell" "macroglial cell"
+## [55] "neuron associated cell" "mesangial cell"
+
+# pull out the cell types and total descendants for cell types to keep
+plot_celltype_df <- lca_df |>
+ dplyr::filter(cl_annotation %in% celltypes_to_keep) |>
+ dplyr::select(cl_annotation, total_descendants) |>
+ unique()
+
+# bar chart showing total number of descendants for each cell type
+ggplot(plot_celltype_df, aes(x = reorder(cl_annotation, total_descendants), y = total_descendants)) +
+ geom_bar(stat = "identity") +
+ theme(
+ axis.text.x = element_text(angle = 90)
+ ) +
+ labs(
+ x = "cell type",
+ y = "Total descendants"
+ )
blood cell
, bone cell
,
+supporting cell
, and lining cell
. I’m on the
+fence about keeping myeloid leukocyte
and
+progenitor cell
. I think if we wanted to remove those terms
+we could move our cutoff to be the same number of descendants as
+T cell
, since we do want to keep that.stromal cell
or
+extracellular matrix secreting cell
.Blood cell
+
+print_df <- lca_df |>
+ dplyr::select(blueprint_ontology, blueprint_annotation_main, blueprint_annotation_fine, panglao_ontology, panglao_annotation, total_lca, lca, cl_annotation)
+
+# blood cell
+print_df |>
+ dplyr::filter(cl_annotation == "blood cell")
+
+
+
+
+
+blueprint_ontology
+blueprint_annotation_main
+blueprint_annotation_fine
+panglao_ontology
+panglao_annotation
+total_lca
+lca
+cl_annotation
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000233
+platelet
+2
+CL:0000081
+blood cell
+
+
+CL:0000232
+Erythrocytes
+Erythrocytes
+CL:0000767
+basophil
+2
+CL:0000081
+blood cell
+
+
+CL:0000232
+Erythrocytes
+Erythrocytes
+CL:0000771
+eosinophil
+2
+CL:0000081
+blood cell
+
+
+CL:0000232
+Erythrocytes
+Erythrocytes
+CL:0000775
+neutrophil
+2
+CL:0000081
+blood cell
+
+
+CL:0000232
+Erythrocytes
+Erythrocytes
+CL:0000233
+platelet
+2
+CL:0000081
+blood cell
+
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000233
+platelet
+2
+CL:0000081
+blood cell
+Bone cell
+
+# bone cell
+print_df |>
+ dplyr::filter(cl_annotation == "bone cell")
+
+
+
+
+
+blueprint_ontology
+blueprint_annotation_main
+blueprint_annotation_fine
+panglao_ontology
+panglao_annotation
+total_lca
+lca
+cl_annotation
+
+
+CL:0000557
+HSC
+GMP
+CL:0000092
+osteoclast
+2
+CL:0001035
+bone cell
+
+
+
+CL:0000557
+HSC
+GMP
+CL:0000137
+osteocyte
+1
+CL:0001035
+bone cell
+Myeloid leukocyte
+
+# myeloid leukocyte cell
+print_df |>
+ dplyr::filter(cl_annotation == "myeloid leukocyte")
+
+
+
+
+
+blueprint_ontology
+blueprint_annotation_main
+blueprint_annotation_fine
+panglao_ontology
+panglao_annotation
+total_lca
+lca
+cl_annotation
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000583
+alveolar macrophage
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000235
+macrophage
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000097
+mast cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000576
+monocyte
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000576
+monocyte
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000092
+osteoclast
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000091
+Kupffer cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000453
+Langerhans cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000129
+microglial cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000889
+myeloid suppressor cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000576
+monocyte
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000576
+monocyte
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000775
+Neutrophils
+Neutrophils
+CL:0000874
+splenic red pulp macrophage
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000583
+alveolar macrophage
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000767
+basophil
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000771
+eosinophil
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000235
+macrophage
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000097
+mast cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000775
+neutrophil
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000092
+osteoclast
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000091
+Kupffer cell
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000453
+Langerhans cell
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000129
+microglial cell
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000889
+myeloid suppressor cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000874
+splenic red pulp macrophage
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000235
+Macrophages
+Macrophages
+CL:0000767
+basophil
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000235
+Macrophages
+Macrophages
+CL:0000771
+eosinophil
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000235
+Macrophages
+Macrophages
+CL:0000097
+mast cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000235
+Macrophages
+Macrophages
+CL:0000576
+monocyte
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000235
+Macrophages
+Macrophages
+CL:0000576
+monocyte
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000235
+Macrophages
+Macrophages
+CL:0000775
+neutrophil
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000235
+Macrophages
+Macrophages
+CL:0000092
+osteoclast
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000235
+Macrophages
+Macrophages
+CL:0000453
+Langerhans cell
+3
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000235
+Macrophages
+Macrophages
+CL:0000889
+myeloid suppressor cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000235
+Macrophages
+Macrophages
+CL:0000576
+monocyte
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000235
+Macrophages
+Macrophages
+CL:0000576
+monocyte
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000863
+Macrophages
+Macrophages M1
+CL:0000767
+basophil
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000863
+Macrophages
+Macrophages M1
+CL:0000771
+eosinophil
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000863
+Macrophages
+Macrophages M1
+CL:0000097
+mast cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000863
+Macrophages
+Macrophages M1
+CL:0000576
+monocyte
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000863
+Macrophages
+Macrophages M1
+CL:0000576
+monocyte
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000863
+Macrophages
+Macrophages M1
+CL:0000775
+neutrophil
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000863
+Macrophages
+Macrophages M1
+CL:0000092
+osteoclast
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000863
+Macrophages
+Macrophages M1
+CL:0000453
+Langerhans cell
+3
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000863
+Macrophages
+Macrophages M1
+CL:0000889
+myeloid suppressor cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000863
+Macrophages
+Macrophages M1
+CL:0000576
+monocyte
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000863
+Macrophages
+Macrophages M1
+CL:0000576
+monocyte
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000890
+Macrophages
+Macrophages M2
+CL:0000767
+basophil
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000890
+Macrophages
+Macrophages M2
+CL:0000771
+eosinophil
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000890
+Macrophages
+Macrophages M2
+CL:0000097
+mast cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000890
+Macrophages
+Macrophages M2
+CL:0000576
+monocyte
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000890
+Macrophages
+Macrophages M2
+CL:0000576
+monocyte
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000890
+Macrophages
+Macrophages M2
+CL:0000775
+neutrophil
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000890
+Macrophages
+Macrophages M2
+CL:0000092
+osteoclast
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000890
+Macrophages
+Macrophages M2
+CL:0000453
+Langerhans cell
+3
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000890
+Macrophages
+Macrophages M2
+CL:0000889
+myeloid suppressor cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000890
+Macrophages
+Macrophages M2
+CL:0000576
+monocyte
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000890
+Macrophages
+Macrophages M2
+CL:0000576
+monocyte
+2
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000583
+alveolar macrophage
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000235
+macrophage
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000097
+mast cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000576
+monocyte
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000576
+monocyte
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000092
+osteoclast
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000091
+Kupffer cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000453
+Langerhans cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000129
+microglial cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000889
+myeloid suppressor cell
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000576
+monocyte
+1
+CL:0000766
+myeloid leukocyte
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000576
+monocyte
+1
+CL:0000766
+myeloid leukocyte
+
+
+
+CL:0000771
+Eosinophils
+Eosinophils
+CL:0000874
+splenic red pulp macrophage
+1
+CL:0000766
+myeloid leukocyte
+Progenitor cell
+
+# progenitor cell
+print_df |>
+ dplyr::filter(cl_annotation == "progenitor cell") |>
+ head(n=15) # there's a lot of these so let's only print out some
+
+
+
+
+
+blueprint_ontology
+blueprint_annotation_main
+blueprint_annotation_fine
+panglao_ontology
+panglao_annotation
+total_lca
+lca
+cl_annotation
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000765
+erythroblast
+2
+CL:0011026
+progenitor cell
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000037
+hematopoietic stem cell
+2
+CL:0011026
+progenitor cell
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000062
+osteoblast
+1
+CL:0011026
+progenitor cell
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000158
+club cell
+1
+CL:0011026
+progenitor cell
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0000038
+erythroid progenitor cell
+2
+CL:0011026
+progenitor cell
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:4042021
+neuronal-restricted precursor
+1
+CL:0011026
+progenitor cell
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0002453
+oligodendrocyte precursor cell
+1
+CL:0011026
+progenitor cell
+
+
+CL:0000576
+Monocytes
+Monocytes
+CL:0002351
+progenitor cell of endocrine pancreas
+1
+CL:0011026
+progenitor cell
+
+
+CL:0000050
+HSC
+MEP
+CL:0000765
+erythroblast
+2
+CL:0011026
+progenitor cell
+
+
+CL:0000050
+HSC
+MEP
+CL:0000037
+hematopoietic stem cell
+2
+CL:0011026
+progenitor cell
+
+
+CL:0000050
+HSC
+MEP
+CL:0000576
+monocyte
+2
+CL:0011026
+progenitor cell
+
+
+CL:0000050
+HSC
+MEP
+CL:0000576
+monocyte
+2
+CL:0011026
+progenitor cell
+
+
+CL:0000050
+HSC
+MEP
+CL:0000062
+osteoblast
+1
+CL:0011026
+progenitor cell
+
+
+CL:0000050
+HSC
+MEP
+CL:0000158
+club cell
+1
+CL:0011026
+progenitor cell
+
+
+
+CL:0000050
+HSC
+MEP
+CL:0000038
+erythroid progenitor cell
+3
+CL:0011026
+progenitor cell
+progenitor cell
, I do think it could be
+helpful to know that something may be a progenitor cell, but when you
+have a cell with the label for HSC and the label for cells like
+monocytes or osteoblasts, then maybe we are talking about a tumor cell
+instead.lining cell
and supporting cell
, are too broad
+even though they have few descendants.Lining cell
+
+# lining cell
+print_df |>
+ dplyr::filter(cl_annotation == "lining cell")
+
+
+
+
+
+blueprint_ontology
+blueprint_annotation_main
+blueprint_annotation_fine
+panglao_ontology
+panglao_annotation
+total_lca
+lca
+cl_annotation
+
+
+CL:0000115
+Endothelial cells
+Endothelial cells
+CL:0000077
+mesothelial cell
+2
+CL:0000213
+lining cell
+
+
+CL:0000115
+Endothelial cells
+Endothelial cells
+CL:0002481
+peritubular myoid cell
+2
+CL:0000213
+lining cell
+
+
+CL:0000115
+Endothelial cells
+Endothelial cells
+CL:0000216
+Sertoli cell
+2
+CL:0000213
+lining cell
+
+
+CL:2000008
+Endothelial cells
+mv Endothelial cells
+CL:0000077
+mesothelial cell
+2
+CL:0000213
+lining cell
+
+
+CL:2000008
+Endothelial cells
+mv Endothelial cells
+CL:0002481
+peritubular myoid cell
+2
+CL:0000213
+lining cell
+
+
+
+CL:2000008
+Endothelial cells
+mv Endothelial cells
+CL:0000216
+Sertoli cell
+2
+CL:0000213
+lining cell
+Supporting cell
+
+# supporting cell
+print_df |>
+ dplyr::filter(cl_annotation == "supporting cell")
+
+
+
+
+
+blueprint_ontology
+blueprint_annotation_main
+blueprint_annotation_fine
+panglao_ontology
+panglao_annotation
+total_lca
+lca
+cl_annotation
+
+
+CL:0000669
+Pericytes
+Pericytes
+CL:0000216
+Sertoli cell
+2
+CL:0000630
+supporting cell
+
+
+
+CL:0000650
+Mesangial cells
+Mesangial cells
+CL:0000216
+Sertoli cell
+2
+CL:0000630
+supporting cell
+Discarded cell types
+
+lca_df |>
+ dplyr::filter(total_descendants > cutoff) |>
+ dplyr::pull(cl_annotation) |>
+ unique()
+## [1] "leukocyte" "eukaryotic cell" "myeloid cell"
+## [4] "cell" "hematopoietic cell" "mononuclear cell"
+## [7] "stuff accumulating cell" "precursor cell" "phagocyte (sensu Vertebrata)"
+## [10] "defensive cell" "lymphocyte" "professional antigen presenting cell"
+## [13] "secretory cell" "connective tissue cell" "electrically responsive cell"
+## [16] "contractile cell" "epithelial cell" "neuron"
+## [19] "neural cell"
Neuron
+
+# blood cell
+print_df |>
+ dplyr::filter(cl_annotation == "neuron")
+
+
+
+
+
+blueprint_ontology
+blueprint_annotation_main
+blueprint_annotation_fine
+panglao_ontology
+panglao_annotation
+total_lca
+lca
+cl_annotation
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000109
+adrenergic neuron
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000108
+cholinergic neuron
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000166
+chromaffin cell
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000700
+dopaminergic neuron
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0007011
+enteric neuron
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:1001509
+glycinergic neuron
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000099
+interneuron
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000100
+motor neuron
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000165
+neuroendocrine cell
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000540
+neuron
+0
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0008025
+noradrenergic neuron
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000210
+photoreceptor cell
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000740
+retinal ganglion cell
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000850
+serotonergic neuron
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:4023169
+trigeminal neuron
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000695
+Cajal-Retzius cell
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000617
+GABAergic neuron
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000679
+glutamatergic neuron
+1
+CL:0000540
+neuron
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000121
+Purkinje cell
+1
+CL:0000540
+neuron
+
+
+
+CL:0000540
+Neurons
+Neurons
+CL:0000598
+pyramidal neuron
+1
+CL:0000540
+neuron
+Removing anything with more than 1 LCA
+
+lca_df |>
+ dplyr::filter(total_lca > 1) |>
+ dplyr::select(cl_annotation, total_descendants) |>
+ unique() |>
+ dplyr::arrange(total_descendants)
+
+
+
+
+
+
+cl_annotation
+total_descendants
+
+
+bone cell
+39
+
+
+blood cell
+42
+
+
+perivascular cell
+42
+
+
+stromal cell
+54
+
+
+supporting cell
+62
+
+
+hematopoietic precursor cell
+106
+
+
+lining cell
+121
+
+
+myeloid leukocyte
+166
+
+
+progenitor cell
+166
+
+
+mononuclear phagocyte
+170
+
+
+phagocyte (sensu Vertebrata)
+176
+
+
+contractile cell
+178
+
+
+defensive cell
+200
+
+
+professional antigen presenting cell
+213
+
+
+connective tissue cell
+224
+
+
+myeloid cell
+248
+
+
+stuff accumulating cell
+267
+
+
+precursor cell
+272
+
+
+secretory cell
+458
+
+
+mononuclear cell
+504
+
+
+leukocyte
+541
+
+
+electrically responsive cell
+674
+
+
+hematopoietic cell
+685
+
+
+
+eukaryotic cell
+2646
+
+# remove any combinations with more than one lca
+filtered_lca_df <- lca_df |>
+ dplyr::filter(total_lca < 2)
+
+# get a list of cell types to keep based on cutoff
+updated_celltypes <- filtered_lca_df |>
+ dplyr::filter(total_descendants <= cutoff) |>
+ dplyr::pull(cl_annotation) |>
+ unique()
+
+# which cell types are now missing from the list to keep
+setdiff(celltypes_to_keep, updated_celltypes)
+## [1] "blood cell" "hematopoietic precursor cell" "lining cell"
+## [4] "perivascular cell" "supporting cell"
Hematopoietic precursor cell
+
+print_df |>
+ dplyr::filter(cl_annotation == "hematopoietic precursor cell")
+
+
+
+
+
+blueprint_ontology
+blueprint_annotation_main
+blueprint_annotation_fine
+panglao_ontology
+panglao_annotation
+total_lca
+lca
+cl_annotation
+
+
+CL:0000050
+HSC
+MEP
+CL:0000037
+hematopoietic stem cell
+2
+CL:0008001
+hematopoietic precursor cell
+
+
+CL:0000050
+HSC
+MEP
+CL:0000038
+erythroid progenitor cell
+3
+CL:0008001
+hematopoietic precursor cell
+
+
+CL:0000037
+HSC
+HSC
+CL:0000038
+erythroid progenitor cell
+2
+CL:0008001
+hematopoietic precursor cell
+
+
+CL:0000837
+HSC
+MPP
+CL:0000037
+hematopoietic stem cell
+2
+CL:0008001
+hematopoietic precursor cell
+
+
+CL:0000837
+HSC
+MPP
+CL:0000038
+erythroid progenitor cell
+2
+CL:0008001
+hematopoietic precursor cell
+
+
+CL:0000051
+HSC
+CLP
+CL:0000037
+hematopoietic stem cell
+2
+CL:0008001
+hematopoietic precursor cell
+
+
+CL:0000051
+HSC
+CLP
+CL:0000038
+erythroid progenitor cell
+2
+CL:0008001
+hematopoietic precursor cell
+
+
+CL:0000557
+HSC
+GMP
+CL:0000037
+hematopoietic stem cell
+2
+CL:0008001
+hematopoietic precursor cell
+
+
+CL:0000557
+HSC
+GMP
+CL:0000038
+erythroid progenitor cell
+3
+CL:0008001
+hematopoietic precursor cell
+
+
+CL:0000049
+HSC
+CMP
+CL:0000037
+hematopoietic stem cell
+2
+CL:0008001
+hematopoietic precursor cell
+
+
+
+CL:0000049
+HSC
+CMP
+CL:0000038
+erythroid progenitor cell
+2
+CL:0008001
+hematopoietic precursor cell
+
+lca_df |>
+ dplyr::filter(panglao_ontology == "CL:0000037" & blueprint_ontology == "CL:0000050") |>
+ dplyr::select(blueprint_annotation_main, blueprint_annotation_fine, panglao_annotation, cl_annotation)
+
+
+
+
+
+blueprint_annotation_main
+blueprint_annotation_fine
+panglao_annotation
+cl_annotation
+
+
+HSC
+MEP
+hematopoietic stem cell
+hematopoietic precursor cell
+
+
+
+HSC
+MEP
+hematopoietic stem cell
+progenitor cell
+hematopoietic precursor cell
and
+progenitor cell
as LCAs. Personally, I would keep the term
+for hematopoietic precursor cell
because I think it’s more
+informative and specific to the type of progenitor cell.Perivascular cell
+
+print_df |>
+ dplyr::filter(cl_annotation == "perivascular cell")
+
+
+
+
+
+blueprint_ontology
+blueprint_annotation_main
+blueprint_annotation_fine
+panglao_ontology
+panglao_annotation
+total_lca
+lca
+cl_annotation
+
+
+CL:0000669
+Pericytes
+Pericytes
+CL:0000359
+vascular associated smooth muscle cell
+3
+CL:4033054
+perivascular cell
+
+
+CL:0000669
+Pericytes
+Pericytes
+CL:0000359
+vascular associated smooth muscle cell
+3
+CL:4033054
+perivascular cell
+
+
+CL:0000669
+Pericytes
+Pericytes
+CL:0000359
+vascular associated smooth muscle cell
+3
+CL:4033054
+perivascular cell
+
+
+CL:0000669
+Pericytes
+Pericytes
+CL:0000359
+vascular associated smooth muscle cell
+3
+CL:4033054
+perivascular cell
+
+
+CL:0000650
+Mesangial cells
+Mesangial cells
+CL:0000359
+vascular associated smooth muscle cell
+3
+CL:4033054
+perivascular cell
+
+
+CL:0000650
+Mesangial cells
+Mesangial cells
+CL:0000359
+vascular associated smooth muscle cell
+3
+CL:4033054
+perivascular cell
+
+
+CL:0000650
+Mesangial cells
+Mesangial cells
+CL:0000359
+vascular associated smooth muscle cell
+3
+CL:4033054
+perivascular cell
+
+
+
+CL:0000650
+Mesangial cells
+Mesangial cells
+CL:0000359
+vascular associated smooth muscle cell
+3
+CL:4033054
+perivascular cell
+perivascular cell
, since the cell type
+labels from PanglaoDB and Blueprint are pretty different from each
+other.Similarity index
+T cell
we
+can look at the similarity index to confirm that specific pair of terms
+has high similarity.
+information_content <- ontologySimilarity::descendants_IC(cl_ont)
+
+# get similarity index for each set of terms
+si_df <- lca_df |>
+ dplyr::rowwise() |>
+ dplyr::mutate(
+ similarity_index = ontologySimilarity::get_sim_grid(ontology = cl_ont,
+ term_sets = list(panglao_ontology, blueprint_ontology)) |>
+ ontologySimilarity::get_sim()
+ )
+
+si_df <- si_df |>
+ dplyr::mutate(
+ lca_threshold = dplyr::if_else(total_descendants < cutoff, "PASS", "FAIL")
+ )
+
+ggplot(si_df, aes(x = similarity_index, fill = lca_threshold)) +
+ geom_density(bw = 0.05, alpha = 0.5) +
+ labs(
+ x = "Similarity index",
+ y = "Density"
+ )
+celltypes_to_plot <- c("myeloid leukocyte", "T cell", "cell", "supporting cell", "B cell")
+
+celltypes_to_plot |>
+ purrr::map(\(celltype){
+ line_df <- si_df |>
+ dplyr::filter(cl_annotation == celltype) |>
+ dplyr::select(cl_annotation, similarity_index) |>
+ unique()
+
+ ggplot(si_df, aes(x = similarity_index)) +
+ geom_density() +
+ geom_vline(data = line_df,
+ mapping = aes(xintercept = similarity_index),
+ lty = 2) +
+ labs(
+ x = "Similarity index",
+ y = "Density",
+ title = celltype
+ )
+
+ })
+
+## [[1]]
+
+##
+## [[2]]
+
+##
+## [[3]]
+
+##
+## [[4]]
+
+##
+## [[5]]
Conclusions
+
+
+neuron
even though it
+has 500 descendants.supporting cell
,
+blood cell
, bone cell
,
+lining cell
) should be removed.Session info
+
+sessionInfo()
+## R version 4.4.2 (2024-10-31)
+## Platform: aarch64-apple-darwin20
+## Running under: macOS Sonoma 14.4
+##
+## Matrix products: default
+## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
+## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
+##
+## locale:
+## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
+##
+## time zone: America/Chicago
+## tzcode source: internal
+##
+## attached base packages:
+## [1] stats graphics grDevices datasets utils methods base
+##
+## other attached packages:
+## [1] ggplot2_3.5.1
+##
+## loaded via a namespace (and not attached):
+## [1] RColorBrewer_1.1-3 jsonlite_1.8.9 magrittr_2.0.3 gypsum_1.2.0
+## [5] farver_2.1.2 rmarkdown_2.29 zlibbioc_1.52.0 vctrs_0.6.5
+## [9] memoise_2.0.1 DelayedMatrixStats_1.28.0 htmltools_0.5.8.1 S4Arrays_1.6.0
+## [13] polynom_1.4-1 AnnotationHub_3.14.0 curl_6.0.1 Rhdf5lib_1.28.0
+## [17] SparseArray_1.6.0 rhdf5_2.50.0 sass_0.4.9 alabaster.base_1.6.1
+## [21] bslib_0.8.0 htmlwidgets_1.6.4 httr2_1.0.7 cachem_1.1.0
+## [25] igraph_2.1.1 mime_0.12 lifecycle_1.0.4 pkgconfig_2.0.3
+## [29] Matrix_1.7-1 R6_2.5.1 fastmap_1.2.0 GenomeInfoDbData_1.2.13
+## [33] MatrixGenerics_1.18.0 shiny_1.9.1 digest_0.6.37 colorspace_2.1-1
+## [37] AnnotationDbi_1.68.0 S4Vectors_0.44.0 rprojroot_2.0.4 ExperimentHub_2.14.0
+## [41] GenomicRanges_1.58.0 RSQLite_2.3.9 filelock_1.0.3 labeling_0.4.3
+## [45] fansi_1.0.6 httr_1.4.7 polyclip_1.10-7 abind_1.4-8
+## [49] compiler_4.4.2 bit64_4.5.2 withr_3.0.2 DBI_1.2.3
+## [53] ontologySimilarity_2.7 HDF5Array_1.34.0 ggforce_0.4.2 alabaster.ranges_1.6.0
+## [57] alabaster.schemas_1.6.0 MASS_7.3-61 quantreg_5.99.1 rappdirs_0.3.3
+## [61] DelayedArray_0.32.0 ggpp_0.5.8-1 tools_4.4.2 httpuv_1.6.15
+## [65] glue_1.8.0 rhdf5filters_1.18.0 promises_1.3.2 grid_4.4.2
+## [69] generics_0.1.3 gtable_0.3.6 tzdb_0.4.0 tidyr_1.3.1
+## [73] hms_1.1.3 utf8_1.2.4 XVector_0.46.0 BiocGenerics_0.52.0
+## [77] BiocVersion_3.20.0 pillar_1.9.0 stringr_1.5.1 vroom_1.6.5
+## [81] later_1.4.1 splines_4.4.2 dplyr_1.1.4 tweenr_2.0.3
+## [85] BiocFileCache_2.14.0 lattice_0.22-6 survival_3.7-0 renv_1.0.11
+## [89] bit_4.5.0.1 SparseM_1.84-2 tidyselect_1.2.1 Biostrings_2.74.0
+## [93] knitr_1.49 ggpmisc_0.6.1 IRanges_2.40.0 ontologyPlot_1.7
+## [97] SummarizedExperiment_1.36.0 stats4_4.4.2 xfun_0.49 Biobase_2.66.0
+## [101] matrixStats_1.4.1 DT_0.33 stringi_1.8.4 UCSC.utils_1.2.0
+## [105] paintmap_1.0 yaml_2.3.10 evaluate_1.0.1 tibble_3.2.1
+## [109] Rgraphviz_2.50.0 alabaster.matrix_1.6.1 BiocManager_1.30.25 graph_1.84.0
+## [113] cli_3.6.3 ontologyIndex_2.12 xtable_1.8-4 reticulate_1.40.0
+## [117] jquerylib_0.1.4 munsell_0.5.1 Rcpp_1.0.13-1 GenomeInfoDb_1.42.1
+## [121] dbplyr_2.5.0 ontoProc_2.0.0 png_0.1-8 parallel_4.4.2
+## [125] MatrixModels_0.5-3 readr_2.1.5 blob_1.2.4 splus2R_1.3-5
+## [129] sparseMatrixStats_1.18.0 alabaster.se_1.6.0 scales_1.3.0 purrr_1.0.2
+## [133] crayon_1.5.3 rlang_1.1.4 KEGGREST_1.46.0 celldex_1.16.0
Summary of cell type ontologies in
reference files
Ally Hawkins
-2024-12-12
+2024-12-17
@@ -512,7 +512,7 @@ Setup
panglao_annotation = "human_readable_value"
)
## Rows: 178 Columns: 3
-## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────
+## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): ontology_id, human_readable_value, panglao_cell_type
##
@@ -545,7 +545,7 @@
Full cell ontology
# list all ancestors and descendants calculate total
ancestors = list(ontologyIndex::get_ancestors(cl_ont, cl_ontology)),
total_ancestors = length(ancestors),
- descendants = list(ontologyIndex::get_descendants(cl_ont, cl_ontology)),
+ descendants = list(ontologyIndex::get_descendants(cl_ont, cl_ontology, exclude_roots = TRUE)),
total_descendants = length(descendants)
)Full cell ontology
x = "Number of descendants",
y = "Density"
)
-
+
ggplot(cl_df, aes(x = total_descendants)) +
@@ -619,7 +619,7 @@
Full cell ontology
## Warning: Removed 14 rows containing non-finite outside the scale range (`stat_density()`).
## Warning: Removed 3 rows containing missing values or values outside the scale range (`geom_vline()`).
-
+
## Warning: Removed 3 rows containing missing values or values outside the scale range (`geom_text()`).
Latest common ancestor (LCA) between PanglaoDB and Blueprint
for assigning cell types with
CellAssign
) and the
BlueprintEncodeData
reference from celldex
(used for assigning cell types with SingleR
). The LCA
-refers to the latest term in the cell ontology heirarchy that is common
+refers to the latest term in the cell ontology hierarchy that is common
between two terms. I will use the ontoProc::findCommonAncestors()
function to get the LCA for each combination.
Note that it is possible to have more than one LCA for a set of @@ -644,7 +644,7 @@
# first set up the graph from cl ont
-parent_terms <- cl$parents
+parent_terms <- cl_ont$parents
cl_graph <- igraph::make_graph(rbind(unlist(parent_terms), rep(names(parent_terms), lengths(parent_terms))))
# get a data frame with all combinations of panglao and blueprint terms
# one row for each combination
@@ -661,7 +661,7 @@ Latest common ancestor (LCA) between PanglaoDB and Blueprint
dplyr::rowwise() |>
dplyr::mutate(
# least common shared ancestor
- lca = list(rownames(ontoProc::findCommonAncestors(blueprint_ontology, panglao_ontology, g = g)))
+ lca = list(rownames(ontoProc::findCommonAncestors(blueprint_ontology, panglao_ontology, g = cl_graph)))
)
## Warning in dplyr::left_join(dplyr::left_join(dplyr::rename(expand.grid(panglao_df$panglao_ontology, : Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 49 of `x` matches multiple rows in `y`.
@@ -693,8 +693,8 @@ Latest common ancestor (LCA) between PanglaoDB and Blueprint
dplyr::mutate(lca = dplyr::if_else(blueprint_ontology == panglao_ontology, blueprint_ontology, lca)) |>
# join in information for each of the lca terms including name, number of ancestors and descendants
dplyr::left_join(cl_df, by = c("lca" = "cl_ontology"))
-## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 7967 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
-## 18, 19, 20, ...].
+## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 7967 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
+## 20, ...].
ggplot(lca_df, aes(x = total_ancestors)) +
@@ -728,7 +728,7 @@ Distribution of ancestors and descendants
x = "Total number of descendants",
y = "Density"
)
-
+
Let’s zoom into the area below 1000, since we already know we would want to exlude anything above that based on this plot.
ggplot(lca_df, aes(x = total_descendants)) +
@@ -750,7 +750,7 @@ Distribution of ancestors and descendants
## Warning: Removed 6856 rows containing non-finite outside the scale range (`stat_density()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_vline()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_text()`).
-
+
We can use the vertical lines for cells of interest to help us define
a potential cutoff based on the granularity we would like to see in our
consensus label. We want to be able to label things like T cell, but we
@@ -815,7 +815,7 @@
Defining a cutoff for number of descendants
x = "cell type",
y = "Total descendants"
)
-
+
There are a few terms that I think might be more broad than we want
like blood cell
, bone cell
,
supporting cell
, and lining cell
. I’m on the
@@ -1730,7 +1730,10 @@
I’m torn on this one, because I do think it’s helpful to know if something is of the myeloid lineage, but if we aren’t keeping lymphocyte -then I would argue we shouldn’t keep myeloid leukocyte.
+then I would argue we shouldn’t keep myeloid leukocyte. Noting that +after discussion we have decided to keep this one since T and B cells +are much easier to differentiate based on gene expression alone than +cells that are party of the myeloid lineage.Along those same lines, I think the below terms,
lining cell
and supporting cell
, are too broad
even though they have few descendants.
The only term in this list that I would be concerned about losing is -“neuron”. Let’s look at those combinations.
+The only terms in this list that I would be concerned about losing +are “neuron” and epithelial cells. Let’s look at those combinations.
# blood cell
+# neuron
print_df |>
dplyr::filter(cl_annotation == "neuron")
@@ -2329,6 +2332,1023 @@ Neuron
reference and only “neuron” as a term in Blueprint. Even though neuron
has ~ 500 descendants, I think we should keep these labels.
+
+Epithelial cell
+# epithelial cell
+print_df |>
+ dplyr::filter(cl_annotation == "epithelial cell")
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+blueprint_ontology
+blueprint_annotation_main
+blueprint_annotation_fine
+panglao_ontology
+panglao_annotation
+total_lca
+lca
+cl_annotation
+
+
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000622
+acinar cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:1000488
+cholangiocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000166
+chromaffin cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000584
+enterocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000164
+enteroendocrine cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000065
+ependymal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000066
+epithelial cell
+0
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000160
+goblet cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000501
+granulosa cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000182
+hepatocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0005006
+ionocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000312
+keratinocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000077
+mesothelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000185
+myoepithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000165
+neuroendocrine cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002167
+olfactory epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000510
+paneth cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000162
+parietal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002481
+peritubular myoid cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000652
+pinealocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000653
+podocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000209
+taste receptor cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000731
+urothelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002368
+respiratory epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002370
+respiratory goblet cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000171
+pancreatic A cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000169
+type B pancreatic cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000706
+choroid plexus epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000158
+club cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002250
+intestinal crypt stem cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000173
+pancreatic D cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002305
+epithelial cell of distal tubule
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002079
+pancreatic ductal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000504
+enterochromaffin-like cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0005019
+pancreatic epsilon cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002258
+thyroid follicular cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002179
+foveolar cell of stomach
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000696
+PP cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000155
+peptic cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002292
+type I cell of carotid body
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0005010
+renal intercalated cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:1000909
+kidney loop of Henle epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002326
+luminal epithelial cell of mammary gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002327
+mammary gland epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000242
+Merkel cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000682
+M cell of gut
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002199
+oxyphil cell of parathyroid gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000446
+chief cell of parathyroid gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0005009
+renal principal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002306
+epithelial cell of proximal tubule
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002062
+pulmonary alveolar type 1 cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002063
+pulmonary alveolar type 2 cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:1001596
+salivary gland glandular cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002140
+acinar cell of sebaceous gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0000216
+Sertoli cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002562
+hair germinal matrix cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000066
+Epithelial cells
+Epithelial cells
+CL:0002204
+brush cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000622
+acinar cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:1000488
+cholangiocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000584
+enterocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000164
+enteroendocrine cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000066
+epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000160
+goblet cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000501
+granulosa cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000182
+hepatocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0005006
+ionocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000185
+myoepithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000510
+paneth cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000162
+parietal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000653
+podocyte
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000209
+taste receptor cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000731
+urothelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002368
+respiratory epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002370
+respiratory goblet cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000171
+pancreatic A cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000169
+type B pancreatic cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000158
+club cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002250
+intestinal crypt stem cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000173
+pancreatic D cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002305
+epithelial cell of distal tubule
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002079
+pancreatic ductal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000504
+enterochromaffin-like cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0005019
+pancreatic epsilon cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002258
+thyroid follicular cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002179
+foveolar cell of stomach
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000696
+PP cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000155
+peptic cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0005010
+renal intercalated cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:1000909
+kidney loop of Henle epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002326
+luminal epithelial cell of mammary gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002327
+mammary gland epithelial cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000682
+M cell of gut
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002199
+oxyphil cell of parathyroid gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0000446
+chief cell of parathyroid gland
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0005009
+renal principal cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002306
+epithelial cell of proximal tubule
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:1001596
+salivary gland glandular cell
+1
+CL:0000066
+epithelial cell
+
+
+CL:0000312
+Keratinocytes
+Keratinocytes
+CL:0002204
+brush cell
+1
+CL:0000066
+epithelial cell
+
+
+
+
+The PanglaoDB cell types seem to be more specific than the ones
+present in Blueprint Encode, similar to the observation with neurons. We
+should keep epithelial cell.
+
## [1] "blood cell" "hematopoietic precursor cell" "lining cell"
-## [4] "perivascular cell" "supporting cell"
+## [1] "blood cell" "hematopoietic precursor cell" "lining cell" "perivascular cell"
+## [5] "supporting cell"
It looks like I am losing a few terms I already said were not specific and then a few other terms, like “hematopoietic precursor cell” and “perivascular cell”. I’ll look at both of those to confirm we would @@ -2889,16 +3909,18 @@
neuron
even though it
-has 500 descendants.supporting cell
,
-blood cell
, bone cell
,
-lining cell
) should be removed.neuron
and
+epithelial cell
even though they do not pass the threshold
+for number of descendants.lining cell
, blood cell
,
+progenitor cell
, bone cell
, and
+supporting cell
Alternatively, rather than eliminate terms that are too broad we could look at the similarity index for individual matches and decide on a case by case basis if those should be allowed. Although I still think -having a term that is too braod, even if it’s a good match, is not super +having a term that is too broad, even if it’s a good match, is not super informative.
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
-## Running under: macOS Sonoma 14.4
+## Running under: macOS Sequoia 15.2
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
@@ -2925,40 +3947,37 @@ Session info
## [1] ggplot2_3.5.1
##
## loaded via a namespace (and not attached):
-## [1] RColorBrewer_1.1-3 jsonlite_1.8.9 magrittr_2.0.3 gypsum_1.2.0
-## [5] farver_2.1.2 rmarkdown_2.29 zlibbioc_1.52.0 vctrs_0.6.5
-## [9] memoise_2.0.1 DelayedMatrixStats_1.28.0 htmltools_0.5.8.1 S4Arrays_1.6.0
-## [13] polynom_1.4-1 AnnotationHub_3.14.0 curl_6.0.1 Rhdf5lib_1.28.0
-## [17] SparseArray_1.6.0 rhdf5_2.50.0 sass_0.4.9 alabaster.base_1.6.1
-## [21] bslib_0.8.0 htmlwidgets_1.6.4 httr2_1.0.7 cachem_1.1.0
-## [25] igraph_2.1.1 mime_0.12 lifecycle_1.0.4 pkgconfig_2.0.3
-## [29] Matrix_1.7-1 R6_2.5.1 fastmap_1.2.0 GenomeInfoDbData_1.2.13
-## [33] MatrixGenerics_1.18.0 shiny_1.9.1 digest_0.6.37 colorspace_2.1-1
-## [37] AnnotationDbi_1.68.0 S4Vectors_0.44.0 rprojroot_2.0.4 ExperimentHub_2.14.0
-## [41] GenomicRanges_1.58.0 RSQLite_2.3.9 filelock_1.0.3 labeling_0.4.3
-## [45] fansi_1.0.6 httr_1.4.7 polyclip_1.10-7 abind_1.4-8
-## [49] compiler_4.4.2 bit64_4.5.2 withr_3.0.2 DBI_1.2.3
-## [53] ontologySimilarity_2.7 HDF5Array_1.34.0 ggforce_0.4.2 alabaster.ranges_1.6.0
-## [57] alabaster.schemas_1.6.0 MASS_7.3-61 quantreg_5.99.1 rappdirs_0.3.3
-## [61] DelayedArray_0.32.0 ggpp_0.5.8-1 tools_4.4.2 httpuv_1.6.15
-## [65] glue_1.8.0 rhdf5filters_1.18.0 promises_1.3.2 grid_4.4.2
-## [69] generics_0.1.3 gtable_0.3.6 tzdb_0.4.0 tidyr_1.3.1
-## [73] hms_1.1.3 utf8_1.2.4 XVector_0.46.0 BiocGenerics_0.52.0
-## [77] BiocVersion_3.20.0 pillar_1.9.0 stringr_1.5.1 vroom_1.6.5
-## [81] later_1.4.1 splines_4.4.2 dplyr_1.1.4 tweenr_2.0.3
-## [85] BiocFileCache_2.14.0 lattice_0.22-6 survival_3.7-0 renv_1.0.11
-## [89] bit_4.5.0.1 SparseM_1.84-2 tidyselect_1.2.1 Biostrings_2.74.0
-## [93] knitr_1.49 ggpmisc_0.6.1 IRanges_2.40.0 ontologyPlot_1.7
-## [97] SummarizedExperiment_1.36.0 stats4_4.4.2 xfun_0.49 Biobase_2.66.0
-## [101] matrixStats_1.4.1 DT_0.33 stringi_1.8.4 UCSC.utils_1.2.0
-## [105] paintmap_1.0 yaml_2.3.10 evaluate_1.0.1 tibble_3.2.1
-## [109] Rgraphviz_2.50.0 alabaster.matrix_1.6.1 BiocManager_1.30.25 graph_1.84.0
-## [113] cli_3.6.3 ontologyIndex_2.12 xtable_1.8-4 reticulate_1.40.0
-## [117] jquerylib_0.1.4 munsell_0.5.1 Rcpp_1.0.13-1 GenomeInfoDb_1.42.1
-## [121] dbplyr_2.5.0 ontoProc_2.0.0 png_0.1-8 parallel_4.4.2
-## [125] MatrixModels_0.5-3 readr_2.1.5 blob_1.2.4 splus2R_1.3-5
-## [129] sparseMatrixStats_1.18.0 alabaster.se_1.6.0 scales_1.3.0 purrr_1.0.2
-## [133] crayon_1.5.3 rlang_1.1.4 KEGGREST_1.46.0 celldex_1.16.0
+## [1] celldex_1.16.0 DBI_1.2.3 httr2_1.0.7 rlang_1.1.4
+## [5] magrittr_2.0.3 matrixStats_1.4.1 gypsum_1.2.0 compiler_4.4.2
+## [9] RSQLite_2.3.9 DelayedMatrixStats_1.28.0 png_0.1-8 vctrs_0.6.5
+## [13] pkgconfig_2.0.3 crayon_1.5.3 fastmap_1.2.0 dbplyr_2.5.0
+## [17] XVector_0.46.0 labeling_0.4.3 utf8_1.2.4 promises_1.3.2
+## [21] rmarkdown_2.29 tzdb_0.4.0 graph_1.84.0 UCSC.utils_1.2.0
+## [25] purrr_1.0.2 bit_4.5.0.1 xfun_0.49 zlibbioc_1.52.0
+## [29] cachem_1.1.0 splus2R_1.3-5 GenomeInfoDb_1.42.1 jsonlite_1.8.9
+## [33] blob_1.2.4 later_1.4.1 rhdf5filters_1.18.0 DelayedArray_0.32.0
+## [37] Rhdf5lib_1.28.0 parallel_4.4.2 R6_2.5.1 bslib_0.8.0
+## [41] reticulate_1.40.0 jquerylib_0.1.4 GenomicRanges_1.58.0 Rcpp_1.0.13-1
+## [45] SummarizedExperiment_1.36.0 knitr_1.49 readr_2.1.5 IRanges_2.40.0
+## [49] httpuv_1.6.15 Matrix_1.7-1 igraph_2.1.1 tidyselect_1.2.1
+## [53] abind_1.4-8 yaml_2.3.10 curl_6.0.1 ontologySimilarity_2.7
+## [57] lattice_0.22-6 tibble_3.2.1 shiny_1.9.1 Biobase_2.66.0
+## [61] withr_3.0.2 KEGGREST_1.46.0 evaluate_1.0.1 ontologyIndex_2.12
+## [65] BiocFileCache_2.14.0 alabaster.schemas_1.6.0 ExperimentHub_2.14.0 Biostrings_2.74.0
+## [69] pillar_1.9.0 BiocManager_1.30.25 filelock_1.0.3 MatrixGenerics_1.18.0
+## [73] DT_0.33 renv_1.0.11 stats4_4.4.2 generics_0.1.3
+## [77] vroom_1.6.5 rprojroot_2.0.4 BiocVersion_3.20.0 S4Vectors_0.44.0
+## [81] hms_1.1.3 sparseMatrixStats_1.18.0 munsell_0.5.1 scales_1.3.0
+## [85] alabaster.base_1.6.1 xtable_1.8-4 glue_1.8.0 alabaster.ranges_1.6.0
+## [89] alabaster.matrix_1.6.1 tools_4.4.2 ontologyPlot_1.7 AnnotationHub_3.14.0
+## [93] ontoProc_2.0.0 rhdf5_2.50.0 grid_4.4.2 tidyr_1.3.1
+## [97] AnnotationDbi_1.68.0 colorspace_2.1-1 GenomeInfoDbData_1.2.13 HDF5Array_1.34.0
+## [101] cli_3.6.3 rappdirs_0.3.3 fansi_1.0.6 S4Arrays_1.6.0
+## [105] dplyr_1.1.4 Rgraphviz_2.50.0 gtable_0.3.6 alabaster.se_1.6.0
+## [109] sass_0.4.9 digest_0.6.37 BiocGenerics_0.52.0 paintmap_1.0
+## [113] SparseArray_1.6.0 htmlwidgets_1.6.4 farver_2.1.2 memoise_2.0.1
+## [117] htmltools_0.5.8.1 lifecycle_1.0.4 httr_1.4.7 mime_0.12
+## [121] bit64_4.5.2
## Rows: 178 Columns: 3
-## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): ontology_id, human_readable_value, panglao_cell_type
##
@@ -693,8 +693,8 @@ Latest common ancestor (LCA) between PanglaoDB and Blueprint
dplyr::mutate(lca = dplyr::if_else(blueprint_ontology == panglao_ontology, blueprint_ontology, lca)) |>
# join in information for each of the lca terms including name, number of ancestors and descendants
dplyr::left_join(cl_df, by = c("lca" = "cl_ontology"))
-## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 7967 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
-## 20, ...].
+## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 7967 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
+## 18, 19, 20, ...].
ggplot(lca_df, aes(x = total_ancestors)) +
@@ -1733,7 +1733,7 @@ Myeloid leukocyte
then I would argue we shouldn’t keep myeloid leukocyte. Noting that
after discussion we have decided to keep this one since T and B cells
are much easier to differentiate based on gene expression alone than
-cells that are party of the myeloid lineage.
+cells that are part of the myeloid lineage.
The PanglaoDB cell types seem to be more specific than the ones present in Blueprint Encode, similar to the observation with neurons. We -should keep epithelial cell.
+should keep epithelial cell in the cases where the Blueprint Encode +annotation isEpithelial cells
but not when it is
+Keratinocytes
.
## [1] "blood cell" "hematopoietic precursor cell" "lining cell" "perivascular cell"
-## [5] "supporting cell"
+## [1] "blood cell" "hematopoietic precursor cell" "lining cell"
+## [4] "perivascular cell" "supporting cell"
It looks like I am losing a few terms I already said were not specific and then a few other terms, like “hematopoietic precursor cell” and “perivascular cell”. I’ll look at both of those to confirm we would @@ -3909,9 +3911,12 @@
neuron
and
+neuron
and
epithelial cell
even though they do not pass the threshold
-for number of descendants.epithelial cell
should
+only be included if the Blueprint Encode name is
+Epithelial cells
and not
+Keratinocytes
.
lining cell
, blood cell
,
progenitor cell
, bone cell
, and