This repository includes the R package NEST-Score
Sample homogeneiTy-Score) as introduced in “Reconstitution of Human
Brain Cell Diversity in Organoids via Four Protocols” (Naas et al.
2024, bioRxiv).
It is compatible with a typical Seurat
(Hao et al.
2024) workflow to analyse single-cell
RNA-sequencing (scRNA-seq) data.
You can install the development version of NEST-Score
R Package with:
# install.packages("devtools")
This is an exemplary workflow using NEST-Score on the PBMC 3k dataset
provided by SeuratData
(see the respective GitHub
repository). For details on
the scRNA-seq object pre-processing see the corresponding Seurat
obj <- UpdateSeuratObject(pbmc3k)
# quality control
obj[[""]] <- PercentageFeatureSet(obj, pattern = "^MT-")
obj <- subset(obj, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & < 5)
# pre-processing
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj, nfeatures = 2000)
obj <- ScaleData(obj)
obj <- RunPCA(obj)
obj <- RunUMAP(obj, reduction = "pca", dims = 1:10)
DimPlot(obj, = "seurat_annotations")
Let’s create an artificial sample assignment sample_random
for every
cell in the dataset, which has non-uniform global frequencies and shows
a near-perfect mixedness of samples across the whole dataset:
obj$sample_random <- sample(c(rep(1,3),rep(2,3),3,4), dim(obj)[1], replace = T)
obj$sample_random <- paste0("sample ", obj$sample_random)
#> sample 1 sample 2 sample 3 sample 4
#> 1009 987 319 323
DimPlot(obj, = "sample_random")
To not only visually evaluate how well the samples mix, we can compute the cell-wise NEST-Score as follows:
NESTres <- NESTscore(obj, group_by = "sample_random", k_nn = 30, ndims = 50)
FeaturePlot(NESTres$seuratobj, feature = "NESTscore_sample_random", order = T) +
scale_color_viridis_c(limits = NESTres$NESTscore_limits)
We can observe uniformly high NEST-Scores since all samples are
represented in all cell neighborhoods (k_nn = 30
nearest neighbors in
the ndims = 50
dimensional Principal Component space).
For comparison, let’s create a second assignment, where two cell clusters are consisting of only one sample, respectively:
obj$sample_grouped <- sample(1:2, dim(obj)[1], replace = T)
obj$sample_grouped[obj$seurat_annotations %in% c("B")] <- 3
obj$sample_grouped[obj$seurat_annotations %in% c("CD14+ Mono",
"FCGR3A+ Mono",
"DC")] <- 4
obj$sample_grouped <- paste0("sample ", obj$sample_grouped)
DimPlot(obj, = "sample_grouped")
Again, we can compute the NEST-Score as follows:
NESTres <- NESTscore(obj, group_by = "sample_grouped", k_nn = 30, ndims = 50)
FeaturePlot(NESTres$seuratobj, feature = "NESTscore_sample_grouped", order = T) +
scale_color_viridis_c(limits = NESTres$NESTscore_limits)
For this assignment we can see, that cells of samples 3 and 4 have low NEST-Scores since they do not mix with any cells of samples 1 and 2. On the other hand, cells of sample 1 and 2 mix and hence have higher NEST-Scores.
The NEST-Score function also provides pairwise evaluations of how well two considered samples mix on average:
NESTres <- NESTscore(obj, group_by = "sample_random", k_nn = 30, ndims = 50,
return_pairwise_eval = T, show_heatmap = T)
NESTres <- NESTscore(obj, group_by = "sample_grouped", k_nn = 30, ndims = 50,
return_pairwise_eval = T, show_heatmap = T)
