7 Seurat
Seurat is another R package for single cell analysis, developed by the Satija Lab. In this module, we will repeat many of the same analyses we did with SingleCellExperiment
, while noting differences between them.
Important note: In this workshop, we use Seurat
v4 (4.4.0). The Seurat
package is currently transitioning to v5, and some of the code is not compatible between versions. If you end up working with Seurat
v5 in the near future, the following shows how to adapt the workshop code.
Seurat v3, v4 | Seurat v5 | |
---|---|---|
Accessing counts
|
seurat[["RNA"]]@counts
|
seurat[["RNA"]]$counts
|
Accessing data
|
seurat[["RNA"]]@data
|
seurat[["RNA"]]$data
|
Initial value of data
|
Identical to counts
|
Does not exist until you normalize counts
|
Creating another assay, like CPM
|
CreateAssayObject()
|
CreateAssay5Object()
|
Integration | As in the Integration chapter |
Requires JoinLayers() before looking at differential expression between integrated datasets
|
Seurat
documentation for more details.
First we load the packages we will need.
library(Seurat)
library(ggplot2)
library(gridExtra)
7.1 Build a fake Seurat object from scratch
First we generate the same small matrix of counts as we did for SingleCellExperiment
. Note that instead of creating row and column names as metadata, we should attach them to the counts matrix.
set.seed(42)
<- 12
num_genes <- 8
num_cells <- as.integer(rexp(num_genes*num_cells, rate = 0.5))
raw_counts <- matrix(raw_counts, nrow = num_genes, ncol = num_cells)
raw_counts
rownames(raw_counts) <- paste("Gene", 1:num_genes, sep = "-")
colnames(raw_counts) <- paste("Cell", 1:num_cells, sep = "-")
And finally we create and inspect a Seurat
object.
<- CreateSeuratObject(counts = raw_counts, project = "Fake Seurat")
fake_seurat fake_seurat
## An object of class Seurat
## 12 features across 8 samples within 1 assay
## Active assay: RNA (12 features, 0 variable features)
## 2 layers present: counts, data
The default assay is called “RNA” and is accessible from fake_seurat@assays$RNA
or just fake_seurat[["RNA"]]
. This stores the raw counts in its counts
slot, while its data
slot is initially equal to counts
but reserved for normalized data.
Assays(fake_seurat)
## [1] "RNA"
"RNA"]]@counts fake_seurat[[
## 12 x 8 sparse Matrix of class "dgCMatrix"
## Cell-1 Cell-2 Cell-3 Cell-4 Cell-5 Cell-6 Cell-7 Cell-8
## Gene-1 . . 2 . . . 1 2
## Gene-2 1 . 1 . 13 2 1 .
## Gene-3 . 2 2 . . . 2 .
## Gene-4 . . . 3 3 . 13 .
## Gene-5 . . . 1 2 . 2 .
## Gene-6 2 1 2 1 . . . 1
## Gene-7 . 2 1 . 1 3 1 5
## Gene-8 . . 6 2 . 1 2 3
## Gene-9 2 9 . 3 2 . 2 4
## Gene-10 1 1 . . 1 . . 2
## Gene-11 2 9 3 8 . 11 . 1
## Gene-12 4 . 1 2 16 . . 3
all(fake_seurat[["RNA"]]@data == fake_seurat[["RNA"]]@counts)
## [1] TRUE
Note that counts
is a sparse matrix by default.
7.1.1 Metadata
The meta.data
slot is created with three columns by default.
head(fake_seurat@meta.data)
## orig.ident nCount_RNA nFeature_RNA
## Cell-1 Fake Seurat 12 6
## Cell-2 Fake Seurat 24 6
## Cell-3 Fake Seurat 18 8
## Cell-4 Fake Seurat 20 7
## Cell-5 Fake Seurat 38 7
## Cell-6 Fake Seurat 17 4
orig.ident
is the project name we assigned this Seurat
object, nCount_RNA
is the total counts for each cell (like sum
in SingleCellExperiment
), and nFeature_RNA
is the number of genes with nonzero counts for each cell (like detected
).
7.1.2 Normalization and multiple assays
The Seurat
normalization functions work slightly differently than in SingleCellExperiment
, where multiple assays like logcounts
, normcounts
, and cpm
naturally coexist. Instead, Seurat
expects you to explicitly create a new assay for each (non-default) one, starting from the same counts
. The transformed data are assigned to the new assay’s data
slot, as in this example for counts per million. To compute CPM, use the relative counts (“RC”) method.
"CPM"]] <- CreateAssayObject(counts = fake_seurat[["RNA"]]@counts)
fake_seurat[[<- NormalizeData(fake_seurat, assay = "CPM", normalization.method = "RC", scale.factor = 1000000)
fake_seurat "CPM"]]@data fake_seurat[[
## 12 x 8 sparse Matrix of class "dgCMatrix"
## Cell-1 Cell-2 Cell-3 Cell-4 Cell-5 Cell-6 Cell-7
## Gene-1 . . 111111.11 . . . 41666.67
## Gene-2 83333.33 . 55555.56 . 342105.26 117647.06 41666.67
## Gene-3 . 83333.33 111111.11 . . . 83333.33
## Gene-4 . . . 150000 78947.37 . 541666.67
## Gene-5 . . . 50000 52631.58 . 83333.33
## Gene-6 166666.67 41666.67 111111.11 50000 . . .
## Gene-7 . 83333.33 55555.56 . 26315.79 176470.59 41666.67
## Gene-8 . . 333333.33 100000 . 58823.53 83333.33
## Gene-9 166666.67 375000.00 . 150000 52631.58 . 83333.33
## Gene-10 83333.33 41666.67 . . 26315.79 . .
## Gene-11 166666.67 375000.00 166666.67 400000 . 647058.82 .
## Gene-12 333333.33 . 55555.56 100000 421052.63 . .
## Cell-8
## Gene-1 95238.10
## Gene-2 .
## Gene-3 .
## Gene-4 .
## Gene-5 .
## Gene-6 47619.05
## Gene-7 238095.24
## Gene-8 142857.14
## Gene-9 190476.19
## Gene-10 95238.10
## Gene-11 47619.05
## Gene-12 142857.14
To compute the log-normalized counts, use the (default) “LogNormalize” method. Note that the log-normalized counts use a different scale factor than in SingleCellExperiment
, but this does not affect downstream computations.
<- NormalizeData(fake_seurat) # normalization.method = "LogNormalize"
fake_seurat "RNA"]]@data fake_seurat[[
## 12 x 8 sparse Matrix of class "dgCMatrix"
## Cell-1 Cell-2 Cell-3 Cell-4 Cell-5 Cell-6 Cell-7 Cell-8
## Gene-1 . . 7.014015 . . . 6.034684 6.860015
## Gene-2 6.726633 . 6.321767 . 8.137996 7.071124 6.034684 .
## Gene-3 . 6.726633 7.014015 . . . 6.726633 .
## Gene-4 . . . 7.313887 6.672632 . 8.597420 .
## Gene-5 . . . 6.216606 6.267800 . 6.726633 .
## Gene-6 7.419181 6.034684 7.014015 6.216606 . . . 6.167916
## Gene-7 . 6.726633 6.321767 . 5.576547 7.476306 6.034684 7.775676
## Gene-8 . . 8.112028 6.908755 . 6.378826 6.726633 7.265130
## Gene-9 7.419181 8.229778 . 7.313887 6.267800 . 6.726633 7.552637
## Gene-10 6.726633 6.034684 . . 5.576547 . . 6.860015
## Gene-11 7.419181 8.229778 7.419181 8.294300 . 8.775177 . 6.167916
## Gene-12 8.112028 . 6.321767 6.908755 8.345580 . . 7.265130
Assays(fake_seurat)
## [1] "RNA" "CPM"
@active.assay fake_seurat
## [1] "RNA"
Now we have two assays, “RNA” and “CPM”. Both have the same counts
but different data
, and “RNA” is the active (default) one which (in this case) contains the log-normalized data.
7.2 Import a real one from our dataset
First we import the same data as before, using the Read10X()
function.
<- Read10X("data/filtered_feature_bc_matrix/")
my_counts dim(my_counts)
## [1] 34218 5780
<- CreateSeuratObject(counts = my_counts, project = "Arabidopsis")
my_seurat my_seurat
## An object of class Seurat
## 34218 features across 5780 samples within 1 assay
## Active assay: RNA (34218 features, 0 variable features)
## 2 layers present: counts, data
# The following are equivalent, as you can see
# dim(my_seurat@assays$RNA@counts)
# dim(my_seurat[["RNA"]]@counts)
# dim(my_seurat[["RNA"]])
dim(my_seurat)
## [1] 34218 5780
dim(my_seurat@meta.data)
## [1] 5780 3
head(my_seurat@meta.data)
## orig.ident nCount_RNA nFeature_RNA
## AAACCCACACGCGCAT-1 Arabidopsis 2285 1382
## AAACCCACAGAAGTTA-1 Arabidopsis 2609 1525
## AAACCCACAGACAAAT-1 Arabidopsis 4712 2606
## AAACCCACATTGACAC-1 Arabidopsis 6133 3434
## AAACCCAGTAGCTTTG-1 Arabidopsis 1243 998
## AAACCCAGTCCAGTTA-1 Arabidopsis 1095 914
The metadata at this point are orig.ident
(the project name), nCount_RNA
(sum
from my_SCE), and nFeature_RNA
(detected
from my_SCE).
We can also add the predetermined clusters.
<- readRDS("data/clusters_5.rds")
df.clusters $cell_barcode <- substring(df.clusters$cell_barcode, 1, 18)
df.clusters@meta.data <- merge(my_seurat@meta.data, df.clusters, by.x = "row.names", by.y = "cell_barcode", all.x = TRUE)
my_seurat# It converted row.names(my_seurat@meta.data) to a spurious column called Row.names,
# so clean up
row.names(my_seurat@meta.data) <- my_seurat@meta.data$Row.names
@meta.data$Row.names <- NULL
my_seurathead(my_seurat@meta.data)
## orig.ident nCount_RNA nFeature_RNA cluster_id
## AAACCCACACGCGCAT-1 Arabidopsis 2285 1382 15
## AAACCCACAGAAGTTA-1 Arabidopsis 2609 1525 15
## AAACCCACAGACAAAT-1 Arabidopsis 4712 2606 4
## AAACCCACATTGACAC-1 Arabidopsis 6133 3434 14
## AAACCCAGTAGCTTTG-1 Arabidopsis 1243 998 17
## AAACCCAGTCCAGTTA-1 Arabidopsis 1095 914 9
## supercluster
## AAACCCACACGCGCAT-1 Endodermal cells
## AAACCCACAGAAGTTA-1 Endodermal cells
## AAACCCACAGACAAAT-1 Atrichoblasts
## AAACCCACATTGACAC-1 Endodermal cells
## AAACCCAGTAGCTTTG-1 Stele cells
## AAACCCAGTCCAGTTA-1 Meristematic cells
# Assign display colors for the 21 clusters and 6 superclusters
# (close to those used in Farmer et al.)
<- c(
cluster_colors "cornflowerblue", "darkblue", "cyan",
"springgreen", "olivedrab", "chartreuse", "darkgreen",
"gray", "black", "saddlebrown",
"slateblue", "purple", "pink", "lightpink3", "hotpink", "darkred",
"lightsalmon", "orange", "tan", "firebrick1", "indianred"
)<- c("blue", "green", "black", "magenta", "red", "orange") supercluster_colors
7.2.1 Creating other metadata columns
Here we will create artificial metadata columns corresponding to the Arabidopsis thaliana gene name prefixes (AT1-AT5, ATC, ATM, and ENS). We use Seurat
’s PercentageFeatureSet()
function to compute the percentage of cell counts belonging to each category.
<- substring(rownames(my_seurat), 1, 3)
gene_prefix <- sort(unique(gene_prefix))
feature_categories for (fc in feature_categories) {
<- paste0("pct_", fc)
fc_name <- PercentageFeatureSet(my_seurat, pattern = paste0("^", fc))
my_seurat[[fc_name]]
}head(my_seurat@meta.data)
## orig.ident nCount_RNA nFeature_RNA cluster_id
## AAACCCACACGCGCAT-1 Arabidopsis 2285 1382 15
## AAACCCACAGAAGTTA-1 Arabidopsis 2609 1525 15
## AAACCCACAGACAAAT-1 Arabidopsis 4712 2606 4
## AAACCCACATTGACAC-1 Arabidopsis 6133 3434 14
## AAACCCAGTAGCTTTG-1 Arabidopsis 1243 998 17
## AAACCCAGTCCAGTTA-1 Arabidopsis 1095 914 9
## supercluster pct_AT1 pct_AT2 pct_AT3 pct_AT4
## AAACCCACACGCGCAT-1 Endodermal cells 24.11379 16.10503 22.27571 17.41794
## AAACCCACAGAAGTTA-1 Endodermal cells 26.94519 15.82982 20.96589 14.25834
## AAACCCACAGACAAAT-1 Atrichoblasts 25.65789 16.02292 20.47963 16.08659
## AAACCCACATTGACAC-1 Endodermal cells 27.19713 15.32692 18.21295 16.24001
## AAACCCAGTAGCTTTG-1 Stele cells 24.29606 16.41191 21.31939 16.00965
## AAACCCAGTCCAGTTA-1 Meristematic cells 25.20548 13.78995 20.45662 15.89041
## pct_AT5 pct_ATC pct_ATM pct_ENS
## AAACCCACACGCGCAT-1 19.69365 0.04376368 0.35010941 0
## AAACCCACAGAAGTTA-1 21.08087 0.00000000 0.91989268 0
## AAACCCACAGACAAAT-1 21.73175 0.00000000 0.02122241 0
## AAACCCACATTGACAC-1 20.80548 0.35871515 1.85879667 0
## AAACCCAGTAGCTTTG-1 21.64119 0.08045052 0.24135157 0
## AAACCCAGTCCAGTTA-1 23.74429 0.18264840 0.73059361 0
pdf("pdf/at1-5.pdf")
VlnPlot(my_seurat, features = paste0("pct_", "AT", 1:5), pt.size = 0, ncol = 5, fill.by = "feature")
dev.off()
pdf("pdf/atc-atm-ens.pdf")
VlnPlot(my_seurat, features = paste0("pct_", c("ATC", "ATM", "ENS")), pt.size = 0, ncol = 3, fill.by = "feature")
dev.off()
The chloroplast, mitochondrial, and non-protein coding genes do not contribute much, so we are justified in eliminating them.
7.3 Quality control
We can decide which cells to discard using fixed thresholds. Before we filter out the discards, let’s illustrate FeatureScatter()
which creates scatter plots for visualizing feature-feature relationships. It can be used for anything calculated by the object, i.e. columns in object metadata, PC scores etc.
$discard <- (my_seurat$nCount_RNA < 1000 | my_seurat$nFeature_RNA < 750 | is.na(my_seurat$cluster_id)) my_seurat
pdf("pdf/seurat-fixed-threshold-qc.pdf")
FeatureScatter(my_seurat, feature1 = "nCount_RNA", feature2 = "nFeature_RNA", group.by = "discard", cols = c("blue", "tan")) +
labs(color = "discard")
dev.off()
Now we can discard those cells and the undesired genes (“ATC”, “ATM”, “ENS”, and those with no cell counts).
<- (rowSums(my_seurat[["RNA"]]@counts) > 0)
genes_to_keep <- genes_to_keep & gene_prefix %in% paste0("AT", 1:5)
genes_to_keep <- my_seurat[genes_to_keep, !my_seurat$discard]
my_seurat dim(my_seurat)
## [1] 23848 5151
7.4 Plots
VlnPlot()
shows the variation in individual features. By default, this overlays the violin plot with the jittered count values, so it is difficult to see. To remove them, we add the argument pt.size = 0
.
pdf("pdf/nCount-nFeature-by-cluster.pdf")
<- VlnPlot(my_seurat, features = "nCount_RNA", cols = cluster_colors, pt.size = 0, group.by = "cluster_id") +
plot1 xlab("cluster_id") +
NoLegend()
<- VlnPlot(my_seurat, features = "nFeature_RNA", cols = cluster_colors, pt.size = 0, group.by = "cluster_id") +
plot2 xlab("cluster_id") +
NoLegend()
grid.arrange(plot1, plot2, ncol = 1)
dev.off()
The group.by
argument determines which categories of cell to give their own violin plot. For example, to group by supercluster
instead of cluster_id
,
pdf("pdf/nCount-nFeature-by-supercluster.pdf")
<- VlnPlot(my_seurat, features = "nCount_RNA", cols = supercluster_colors, pt.size = 0, group.by = "supercluster") +
plot1 xlab("supercluster") +
NoLegend()
<- VlnPlot(my_seurat, features = "nFeature_RNA", cols = supercluster_colors, pt.size = 0, group.by = "supercluster") +
plot2 xlab("supercluster") +
NoLegend()
grid.arrange(plot1, plot2, ncol = 2)
dev.off()
We can also visualize the same data as a ridgeline plot.
pdf("pdf/nCount-nFeature-ridgeplots.pdf")
<- RidgePlot(my_seurat, features = "nCount_RNA", cols = supercluster_colors, group.by = "supercluster") +
plot1 ylab(NULL) +
NoLegend()
<- RidgePlot(my_seurat, features = "nFeature_RNA", cols = supercluster_colors, group.by = "supercluster") +
plot2 ylab(NULL) +
NoLegend()
grid.arrange(plot1, plot2, ncol = 1)
dev.off()
To combine all data into a single category, group by orig.ident
.
7.5 Normalization
Again, we can log-normalize our data just by running NormalizeData()
.
<- NormalizeData(my_seurat)
my_seurat "RNA"]]@counts[1:6, 1:3] my_seurat[[
## 6 x 3 sparse Matrix of class "dgCMatrix"
## AAACCCACACGCGCAT-1 AAACCCACAGAAGTTA-1 AAACCCACAGACAAAT-1
## AT1G01010 . 1 .
## AT1G01020 . . 1
## AT1G01030 . . .
## AT1G01040 . . 2
## AT1G03993 . . .
## AT1G01050 1 . .
"RNA"]]@data[1:6, 1:3] my_seurat[[
## 6 x 3 sparse Matrix of class "dgCMatrix"
## AAACCCACACGCGCAT-1 AAACCCACAGAAGTTA-1 AAACCCACAGACAAAT-1
## AT1G01010 . 1.58278 .
## AT1G01020 . . 1.138695
## AT1G01030 . . .
## AT1G01040 . . 1.657348
## AT1G03993 . . .
## AT1G01050 1.685227 . .
7.6 Identification of highly variable features
Another method of quality control is to determine the most variable genes and use these for dimensionality reduction. Here we use FindVariableFeatures()
to identify the 2000 most variable genes (with the top 10 labeled), and VariableFeaturePlot()
to visually compare them to the less variable ones.
<- FindVariableFeatures(my_seurat) # nfeatures = 2000 by default, unlike 500 as in SingleCellExperiment
my_seurat <- head(VariableFeatures(my_seurat), 10) top_10_genes
pdf("pdf/variable-features.pdf")
<- VariableFeaturePlot(my_seurat)
vf_plot LabelPoints(plot = vf_plot, points = top_10_genes, repel = TRUE, xnudge = 0, ynudge = 0)
dev.off()
7.7 Dimensionality reduction
7.7.1 PCA
In Seurat
, before computing principal components we must scale the (log-normalized) data by subtracting the mean and dividing by the standard deviation of each row. The features
argument allows your choice of genes to scale, but the default is to only scale the rows corresponding to the most variable genes.
<- ScaleData(my_seurat, verbose = FALSE)
my_seurat <- RunPCA(my_seurat, verbose = FALSE) my_seurat
We can view the individual PC values for each cell as follows.
# Just the first 6 rows and 5 columns
"pca"]]@cell.embeddings[1:6, 1:5] my_seurat[[
## PC_1 PC_2 PC_3 PC_4 PC_5
## AAACCCACACGCGCAT-1 -3.5730765 -0.720932 1.5214319 -3.1399856 4.6600356
## AAACCCACAGAAGTTA-1 -4.6125194 -21.638604 -5.1986060 -0.9866993 3.4633961
## AAACCCACAGACAAAT-1 1.8597980 2.741490 1.7616183 10.2241086 1.9462996
## AAACCCACATTGACAC-1 -0.4922986 -8.080446 -0.8845399 0.8801926 0.2192683
## AAACCCAGTAGCTTTG-1 -3.5552763 1.851916 4.3438592 -3.5047218 -2.2177276
## AAACCCAGTCCAGTTA-1 -4.0690844 3.278787 -0.1907693 -1.4210729 -3.8602152
The VizDimLoadings()
function shows which genes are most strongly associated with each principal component.
pdf("pdf/vizdimloadings.pdf")
VizDimLoadings(my_seurat, dims = 1:4, nfeatures = 10, reduction = "pca")
dev.off()
DimHeatmap()
sorts both genes and cells by principal component values. Genes with the most negative PC values are on top, and those with the most positive values are on the bottom, while cells are arranged from left to right. In this default color scheme, magenta represents positive correlations and yellow represents negative ones, with no correlation in black.
pdf("pdf/dimheatmaps.pdf")
DimHeatmap(my_seurat, dims = 1:4, nfeatures = 10, ncol = 2, cells = 500)
dev.off()
VizDimLoadings()
. Do they agree?
Click for answer
Yes, remember to count positive-valued genes from the bottom and negative-valued genes from the top of each heatmap.The ElbowPlot()
function shows the amount of variance in each principal component, giving us a sense of how many PCs we should retain in subsequent computations.
pdf("pdf/elbow-plot-seurat.pdf")
ElbowPlot(my_seurat, ndims = 50)
dev.off()
It looks like we would not gain much by retaining more than 20 PCs in our dimensionality reduction computations.
Finally, we can create our dimensionality reduction plots with DimPlot()
.
pdf("pdf/pca-by-cluster-seurat.pdf")
DimPlot(my_seurat, reduction = "pca", cols = cluster_colors, group.by = "cluster_id", label.size = 4, repel = TRUE, label = TRUE)
dev.off()
pdf("pdf/pca-by-supercluster-seurat.pdf")
DimPlot(my_seurat, reduction = "pca", cols = supercluster_colors, group.by = "supercluster", label.size = 4, repel = TRUE, label = TRUE) + NoLegend()
dev.off()
We do not really need the reduction = "pca"
argument in this initial example, as we have only run PCA so far. Seurat
looks for existing reduced dimensions in order: first UMAP, then t-SNE, then PCA. (Therefore, we would do it this way if we wanted to visualize PCA later after having computed UMAP, for example.)
7.7.2 t-SNE
We repeat the same analysis for t-SNE, using 20 principal components.
set.seed(42)
<- RunTSNE(my_seurat, dims = 1:20, verbose = FALSE)
my_seurat "tsne"]]@cell.embeddings[1:6, ] my_seurat[[
## tSNE_1 tSNE_2
## AAACCCACACGCGCAT-1 9.1858828 -22.510883
## AAACCCACAGAAGTTA-1 0.3305121 -43.819644
## AAACCCACAGACAAAT-1 28.1141137 1.241850
## AAACCCACATTGACAC-1 -21.5127426 -31.453517
## AAACCCAGTAGCTTTG-1 -12.3916993 -9.794599
## AAACCCAGTCCAGTTA-1 -13.2210028 15.448777
pdf("pdf/tsne-by-cluster-seurat.pdf")
DimPlot(my_seurat, cols = cluster_colors, group.by = "cluster_id", label.size = 4, repel = TRUE, label = TRUE)
dev.off()
pdf("pdf/tsne-by-supercluster-seurat.pdf")
DimPlot(my_seurat, cols = supercluster_colors, group.by = "supercluster", label.size = 4, repel = TRUE, label = TRUE) + NoLegend()
dev.off()
7.7.3 UMAP
And the same for UMAP,
set.seed(42)
<- RunUMAP(my_seurat, dims = 1:20, verbose = FALSE)
my_seurat "umap"]]@cell.embeddings[1:6, ] my_seurat[[
## UMAP_1 UMAP_2
## AAACCCACACGCGCAT-1 -5.46506841 9.20502553
## AAACCCACAGAAGTTA-1 -10.09535081 7.70838723
## AAACCCACAGACAAAT-1 8.11869615 0.06980795
## AAACCCACATTGACAC-1 -13.14318997 0.89037666
## AAACCCAGTAGCTTTG-1 -6.06361253 -5.82680860
## AAACCCAGTCCAGTTA-1 0.01529878 -4.98024049
pdf("pdf/umap-by-cluster-seurat.pdf")
DimPlot(my_seurat, cols = cluster_colors, group.by = "cluster_id", label.size = 4, repel = TRUE, label = TRUE)
dev.off()
pdf("pdf/umap-by-supercluster-seurat.pdf")
DimPlot(my_seurat, cols = supercluster_colors, group.by = "supercluster", label.size = 4, repel = TRUE, label = TRUE) + NoLegend()
dev.off()
7.8 Differential gene expression
Feature plots reveal differential expression of marker genes in different cell types. Here we create feature plots for the most strongly associated gene from the first 4 principal components (from the VizDimLoadings()
example above).
Idents(my_seurat) <- my_seurat$cluster_id
<- c("AT5G17390", "AT1G68850", "AT5G22310", "AT3G16340") my_genes
pdf("pdf/featureplots-seurat.pdf")
FeaturePlot(my_seurat, features = my_genes, label.size = 4, repel = TRUE, label = TRUE)
dev.off()
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## ggrepel: 1 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## ggrepel: 1 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## ggrepel: 1 unlabeled data points (too many overlaps). Consider increasing max.overlaps
An alternative is to create violin plots for genes of interest.
pdf("pdf/violin-featureplots-seurat.pdf")
VlnPlot(my_seurat, features = my_genes, cols = cluster_colors, pt.size = 0, group.by = "cluster_id", log = TRUE, ncol = 2) +
xlab("cluster_id") +
NoLegend()
dev.off()
Another is to make dot plots of gene expression.
pdf("pdf/dotplot-seurat.pdf")
DotPlot(my_seurat, features = top_10_genes, cols = c("darkblue", "red"), cluster.idents = TRUE, group.by = "supercluster") +
ylab("supercluster") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
dev.off()
Seurat
also has a more direct statistical way to automatically suggest which genes go with which clusters (or any category you specify using Idents()
), FindAllMarkers()
. The following code identifies the top 3 genes for each cluster. First we must load the dplyr
package for its data frame manipulation functions.
library(dplyr)
Idents(my_seurat) <- my_seurat$cluster_id
<- FindAllMarkers(my_seurat, only.pos = TRUE, min.pct = 0.5, logfc.threshold = 0.5, verbose = FALSE)
all_markers head(all_markers)
## p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
## AT2G20520 5.283152e-85 3.077935 0.750 0.037 1.259926e-80 1 AT2G20520
## AT4G08400 2.328373e-81 4.675131 0.821 0.048 5.552703e-77 1 AT4G08400
## AT3G49960 1.064398e-80 4.071788 0.929 0.063 2.538376e-76 1 AT3G49960
## AT5G04960 4.239402e-79 4.015624 0.964 0.071 1.011013e-74 1 AT5G04960
## AT5G06630 1.586581e-78 5.431388 0.929 0.067 3.783679e-74 1 AT5G06630
## AT1G12040 2.055730e-78 3.985362 0.893 0.060 4.902504e-74 1 AT1G12040
<- as.data.frame(top_n(group_by(all_markers, cluster), n = 3, wt = avg_log2FC))
top_3_markers top_3_markers
## p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
## 1 4.805192e-55 5.432073 1.000 0.122 1.145942e-50 1 AT3G54590
## 2 2.837409e-50 5.997661 0.929 0.114 6.766654e-46 1 AT2G24980
## 3 3.182976e-42 5.437437 0.964 0.156 7.590760e-38 1 AT3G54580
## 4 0.000000e+00 4.957604 0.975 0.054 0.000000e+00 2 AT1G12560
## 5 0.000000e+00 3.866195 0.944 0.037 0.000000e+00 2 AT1G50930
## 6 2.869877e-270 3.852518 0.975 0.104 6.844082e-266 2 AT4G25820
## 7 0.000000e+00 2.871506 0.655 0.009 0.000000e+00 3 AT1G27140
## 8 3.860169e-127 4.049835 0.905 0.151 9.205730e-123 3 AT1G66270
## 9 2.229071e-93 3.141852 0.672 0.099 5.315887e-89 3 AT1G04040
## 10 2.041199e-183 3.446692 0.605 0.103 4.867850e-179 4 AT4G18550
## 11 1.004522e-135 3.698562 0.562 0.118 2.395585e-131 4 AT3G48340
## 12 5.911867e-106 3.583753 0.568 0.154 1.409862e-101 4 AT3G45010
## 13 0.000000e+00 2.774994 0.926 0.290 0.000000e+00 5 AT1G66800
## 14 1.111775e-269 2.668145 0.842 0.264 2.651361e-265 5 AT2G37130
## 15 4.451567e-196 2.459206 0.827 0.328 1.061610e-191 5 AT5G42600
## 16 2.080539e-252 3.791566 0.733 0.038 4.961670e-248 6 AT1G53830
## 17 3.200221e-230 4.671823 0.600 0.026 7.631887e-226 6 AT4G33790
## 18 5.994578e-169 4.446089 0.658 0.051 1.429587e-164 6 AT4G15290
## 19 7.799618e-93 2.334195 0.620 0.121 1.860053e-88 7 AT1G28290
## 20 1.277339e-30 1.648956 0.647 0.327 3.046197e-26 7 AT1G29418
## 21 1.228976e-15 1.893333 0.599 0.408 2.930861e-11 7 AT1G66280
## 22 8.331900e-115 2.330073 0.541 0.072 1.986992e-110 8 AT5G44530
## 23 2.777944e-55 2.329416 0.519 0.132 6.624841e-51 8 AT1G62480
## 24 1.646121e-38 1.748039 0.616 0.250 3.925669e-34 8 AT4G17390
## 25 6.391708e-175 2.579441 0.547 0.058 1.524295e-170 9 AT1G09200
## 26 2.550174e-171 2.500638 0.773 0.143 6.081655e-167 9 AT1G56110
## 27 3.131177e-136 2.873593 0.883 0.311 7.467230e-132 9 AT1G29418
## 28 0.000000e+00 4.604532 0.941 0.027 0.000000e+00 10 AT4G23800
## 29 1.244835e-232 4.449378 0.542 0.020 2.968683e-228 10 AT4G33270
## 30 6.606529e-183 4.896985 0.551 0.029 1.575525e-178 10 AT4G33260
## 31 1.435710e-297 2.385858 0.561 0.057 3.423882e-293 11 AT4G15393
## 32 2.925117e-263 2.579904 0.727 0.150 6.975819e-259 11 AT4G02520
## 33 1.708945e-260 2.455158 0.816 0.216 4.075492e-256 11 AT4G30270
## 34 8.658822e-184 3.241874 0.588 0.061 2.064956e-179 12 AT5G49360
## 35 1.947868e-109 2.793009 0.570 0.104 4.645277e-105 12 AT3G23470
## 36 7.482203e-84 2.653856 0.842 0.364 1.784356e-79 12 AT4G30170
## 37 0.000000e+00 3.988574 0.658 0.022 0.000000e+00 13 AT4G11290
## 38 3.143627e-259 3.355122 0.663 0.043 7.496923e-255 13 AT5G24580
## 39 2.772143e-153 3.390540 0.842 0.163 6.611006e-149 13 AT1G05260
## 40 1.651738e-291 5.907585 0.694 0.056 3.939064e-287 14 AT1G45201
## 41 3.694241e-209 5.466461 0.532 0.045 8.810027e-205 14 AT5G37690
## 42 1.391073e-198 5.244063 0.560 0.056 3.317431e-194 14 AT2G23540
## 43 0.000000e+00 4.118525 0.992 0.287 0.000000e+00 15 AT3G32980
## 44 0.000000e+00 3.940808 0.859 0.056 0.000000e+00 15 AT2G28470
## 45 0.000000e+00 3.527007 0.912 0.112 0.000000e+00 15 AT3G22600
## 46 0.000000e+00 5.761647 0.982 0.042 0.000000e+00 16 AT2G36100
## 47 0.000000e+00 5.667331 0.901 0.028 0.000000e+00 16 AT5G65530
## 48 0.000000e+00 5.389908 0.973 0.029 0.000000e+00 16 AT5G42180
## 49 3.959335e-197 3.154383 0.751 0.102 9.442222e-193 17 AT1G75500
## 50 1.905619e-146 2.733056 0.631 0.096 4.544521e-142 17 AT2G02100
## 51 6.921911e-141 2.925161 0.524 0.065 1.650737e-136 17 AT3G53100
## 52 0.000000e+00 3.824625 0.779 0.038 0.000000e+00 18 AT3G62040
## 53 0.000000e+00 3.717356 0.831 0.044 0.000000e+00 18 AT1G31050
## 54 3.359451e-282 4.306949 0.563 0.063 8.011618e-278 18 AT1G32450
## 55 0.000000e+00 5.671221 0.837 0.017 0.000000e+00 19 AT4G00670
## 56 0.000000e+00 5.656709 0.977 0.023 0.000000e+00 19 AT2G16740
## 57 4.953494e-253 5.684909 0.942 0.054 1.181309e-248 19 AT5G09220
## 58 3.674197e-137 4.604391 0.929 0.052 8.762225e-133 20 AT2G31900
## 59 4.528058e-128 4.367504 0.738 0.033 1.079851e-123 20 AT1G12080
## 60 4.792214e-118 4.217155 0.952 0.067 1.142847e-113 20 AT1G77690
## 61 3.591989e-214 3.077428 0.610 0.021 8.566176e-210 21 AT1G47410
## 62 1.747501e-150 3.292408 0.732 0.051 4.167440e-146 21 AT4G32880
## 63 1.691360e-116 3.564900 0.512 0.031 4.033555e-112 21 AT5G19530
Click for answer
Idents(my_seurat) <- my_seurat$supercluster
<- FindAllMarkers(my_seurat, only.pos = TRUE, min.pct = 0.5, logfc.threshold = 0.5, verbose = FALSE)
all_markers <- as.data.frame(top_n(group_by(all_markers, cluster), n = 10, wt = avg_log2FC))
top_10_markers top_10_markers
## p_val avg_log2FC pct.1 pct.2 p_val_adj cluster
## 1 0.000000e+00 4.713490 0.683 0.045 0.000000e+00 Trichoblasts
## 2 0.000000e+00 3.518780 0.644 0.044 0.000000e+00 Trichoblasts
## 3 0.000000e+00 3.392073 0.706 0.044 0.000000e+00 Trichoblasts
## 4 4.696692e-252 3.378538 0.781 0.116 1.120067e-247 Trichoblasts
## 5 3.449318e-248 3.483590 0.712 0.090 8.225933e-244 Trichoblasts
## 6 4.581437e-243 3.890838 0.846 0.158 1.092581e-238 Trichoblasts
## 7 1.735074e-238 3.485318 0.703 0.096 4.137804e-234 Trichoblasts
## 8 5.904813e-227 3.463225 0.761 0.123 1.408180e-222 Trichoblasts
## 9 7.477153e-209 3.630900 0.699 0.112 1.783151e-204 Trichoblasts
## 10 3.987865e-164 3.499831 0.585 0.089 9.510261e-160 Trichoblasts
## 11 0.000000e+00 2.329371 0.673 0.124 0.000000e+00 Atrichoblasts
## 12 0.000000e+00 2.286634 0.774 0.251 0.000000e+00 Atrichoblasts
## 13 0.000000e+00 2.274846 0.704 0.180 0.000000e+00 Atrichoblasts
## 14 0.000000e+00 2.251108 0.814 0.290 0.000000e+00 Atrichoblasts
## 15 0.000000e+00 2.207429 0.614 0.124 0.000000e+00 Atrichoblasts
## 16 1.919420e-282 2.124907 0.750 0.273 4.577434e-278 Atrichoblasts
## 17 2.933857e-270 2.474596 0.704 0.255 6.996661e-266 Atrichoblasts
## 18 8.836606e-268 2.619090 0.676 0.219 2.107354e-263 Atrichoblasts
## 19 6.477013e-249 2.162646 0.632 0.190 1.544638e-244 Atrichoblasts
## 20 1.674176e-240 2.349853 0.680 0.229 3.992575e-236 Atrichoblasts
## 21 1.428019e-211 2.316819 0.609 0.121 3.405540e-207 Meristematic cells
## 22 9.043039e-209 2.077853 1.000 0.977 2.156584e-204 Meristematic cells
## 23 4.040242e-201 2.224239 0.725 0.208 9.635169e-197 Meristematic cells
## 24 4.390477e-178 2.114723 0.711 0.221 1.047041e-173 Meristematic cells
## 25 2.750088e-172 2.637504 0.745 0.290 6.558409e-168 Meristematic cells
## 26 9.064828e-166 1.918239 0.655 0.182 2.161780e-161 Meristematic cells
## 27 4.234139e-162 1.881389 0.616 0.161 1.009757e-157 Meristematic cells
## 28 1.610003e-157 1.953758 0.691 0.223 3.839536e-153 Meristematic cells
## 29 1.588954e-156 1.911681 0.711 0.245 3.789338e-152 Meristematic cells
## 30 1.278088e-131 1.966833 0.591 0.187 3.047985e-127 Meristematic cells
## 31 0.000000e+00 1.996938 0.803 0.200 0.000000e+00 Cortical cells
## 32 0.000000e+00 1.852046 0.592 0.036 0.000000e+00 Cortical cells
## 33 3.268771e-246 1.898365 0.727 0.190 7.795364e-242 Cortical cells
## 34 2.324000e-244 1.962013 0.639 0.138 5.542274e-240 Cortical cells
## 35 2.504481e-241 2.126654 0.973 0.692 5.972686e-237 Cortical cells
## 36 1.599894e-240 2.388098 0.817 0.301 3.815427e-236 Cortical cells
## 37 2.878450e-228 1.829322 0.830 0.304 6.864527e-224 Cortical cells
## 38 1.792085e-199 1.805577 0.749 0.275 4.273765e-195 Cortical cells
## 39 3.937027e-192 2.108964 0.668 0.213 9.389021e-188 Cortical cells
## 40 2.001277e-179 2.182676 0.569 0.150 4.772645e-175 Cortical cells
## 41 0.000000e+00 4.542391 0.911 0.213 0.000000e+00 Endodermal cells
## 42 0.000000e+00 3.846652 0.582 0.085 0.000000e+00 Endodermal cells
## 43 0.000000e+00 3.689689 0.633 0.058 0.000000e+00 Endodermal cells
## 44 0.000000e+00 3.567181 0.525 0.044 0.000000e+00 Endodermal cells
## 45 0.000000e+00 3.482247 0.645 0.080 0.000000e+00 Endodermal cells
## 46 0.000000e+00 3.272092 0.512 0.046 0.000000e+00 Endodermal cells
## 47 0.000000e+00 3.084530 0.512 0.032 0.000000e+00 Endodermal cells
## 48 0.000000e+00 3.083063 0.588 0.086 0.000000e+00 Endodermal cells
## 49 0.000000e+00 2.975978 0.554 0.041 0.000000e+00 Endodermal cells
## 50 1.270629e-280 2.864026 0.671 0.218 3.030195e-276 Endodermal cells
## 51 0.000000e+00 3.418204 0.804 0.082 0.000000e+00 Stele cells
## 52 0.000000e+00 3.254163 0.569 0.053 0.000000e+00 Stele cells
## 53 0.000000e+00 3.057052 0.507 0.037 0.000000e+00 Stele cells
## 54 5.860327e-291 2.675344 0.599 0.108 1.397571e-286 Stele cells
## 55 4.894555e-266 3.329565 0.501 0.075 1.167253e-261 Stele cells
## 56 3.483802e-212 2.189974 0.937 0.656 8.308171e-208 Stele cells
## 57 1.972447e-192 2.131032 0.598 0.183 4.703893e-188 Stele cells
## 58 5.662993e-179 2.879122 0.535 0.163 1.350511e-174 Stele cells
## 59 3.720749e-147 2.524200 0.645 0.308 8.873243e-143 Stele cells
## 60 1.243051e-104 1.938841 0.504 0.224 2.964428e-100 Stele cells
## gene
## 1 AT1G12560
## 2 AT5G22410
## 3 AT1G48930
## 4 AT5G44020
## 5 AT3G54590
## 6 AT3G28550
## 7 AT4G25820
## 8 AT3G54580
## 9 AT3G62680
## 10 AT2G24980
## 11 AT3G12977
## 12 AT1G51830
## 13 AT3G09220
## 14 AT4G37070
## 15 AT1G73300
## 16 AT3G29250
## 17 AT1G66800
## 18 AT2G37130
## 19 AT1G14960
## 20 AT4G15230
## 21 AT1G56110
## 22 AT3G06355
## 23 AT4G17390
## 24 AT3G60245
## 25 AT1G29418
## 26 AT2G18020
## 27 AT1G52300
## 28 AT2G34480
## 29 AT4G16720
## 30 AT1G48920
## 31 AT5G42250
## 32 AT3G46500
## 33 AT5G25460
## 34 AT3G51860
## 35 AT1G21310
## 36 AT4G30170
## 37 AT5G42590
## 38 AT3G26470
## 39 AT4G30270
## 40 AT4G02520
## 41 AT3G32980
## 42 AT1G04220
## 43 AT1G05260
## 44 AT2G48130
## 45 AT3G22600
## 46 AT2G28470
## 47 AT1G08670
## 48 AT1G61590
## 49 AT5G41040
## 50 AT2G16750
## 51 AT4G34600
## 52 AT2G40900
## 53 AT1G31050
## 54 AT2G43910
## 55 AT4G01450
## 56 AT2G21660
## 57 AT3G14990
## 58 AT5G23050
## 59 AT5G59780
## 60 AT1G69850
<- intersect(top_3_markers$gene, top_10_markers$gene)
matching_markers matching_markers
## [1] "AT3G54590" "AT2G24980" "AT3G54580" "AT1G12560" "AT4G25820" "AT1G66800"
## [7] "AT2G37130" "AT1G29418" "AT4G17390" "AT1G56110" "AT4G02520" "AT4G30270"
## [13] "AT4G30170" "AT1G05260" "AT3G32980" "AT2G28470" "AT3G22600" "AT1G31050"
18 marker genes appear in both sets, out of ~60 in each set.
7.9 Clustering
We can use SingleR
to predict cell types for a Seurat
object, as we did for a SingleCellExperiment
. However, instead of passing a Seurat
object as the test
or ref
argument, we must pass its normalized counts matrix (data
).
library(SingleR)
<- readRDS("data/another_seurat.rds")
another_seurat <- SingleR(test = another_seurat[["RNA"]]@data, ref = my_seurat[["RNA"]]@data, labels = my_seurat$cluster_id) predicted_clusters
Seurat
can also assign clusters for you. Its FindClusters()
function does not use k-means as in the previous chapter, but a shared nearest neighbor clustering algorithm. We need not specify a number of desired clusters.
<- FindNeighbors(my_seurat, dims = 1:10, verbose = FALSE)
my_seurat <- FindClusters(my_seurat, resolution = 0.5, verbose = FALSE)
my_seurat table(my_seurat$seurat_clusters)
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 650 540 521 511 483 467 418 347 248 215 192 134 132 116 91 86
pdf("pdf/umap-by-computed-cluster-seurat.pdf")
DimPlot(my_seurat, label.size = 4, repel = TRUE, label = TRUE)
dev.off()
This method groups our cells into 16 clusters instead of 21. Note that they start from 0 and go in the seurat_clusters
column of the metadata.
7.10 SCTransform normalization
The SCTransform()
function (in package Seurat
) provides an alternative to log-normalization, based on regularized negative binomial regression. It is designed to better capture the variation due to biology over spurious variation.
<- SCTransform(my_seurat, ncells = ncol(my_seurat), verbose = FALSE)
my_seurat my_seurat
## An object of class Seurat
## 45005 features across 5151 samples within 2 assays
## Active assay: SCT (21157 features, 3000 variable features)
## 3 layers present: counts, data, scale.data
## 1 other assay present: RNA
## 3 dimensional reductions calculated: pca, tsne, umap
We have created a new assay called “SCT” and made it the active (default) one. Now we can recompute the reduced dimensions based on SCTransform
.
<- RunPCA(my_seurat, verbose = FALSE)
my_seurat <- RunUMAP(my_seurat, dims = 1:20, verbose = FALSE) my_seurat
pdf("pdf/umap-by-cluster-sct-seurat.pdf")
DimPlot(my_seurat, cols = cluster_colors, group.by = "cluster_id", label.size = 4, repel = TRUE, label = TRUE)
dev.off()
pdf("pdf/umap-by-supercluster-sct-seurat.pdf")
DimPlot(my_seurat, cols = supercluster_colors, group.by = "supercluster", label.size = 4, repel = TRUE, label = TRUE) + NoLegend()
dev.off()
SCTransform
hold that after normalizing this way, we should use more principal components when running UMAP. Try it with 40 PCs (dimensions) instead of 20. What do you think?
Click for answer
Rerun UMAP with 40 PCs, then make the dimensionality reduction charts.
<- RunUMAP(my_seurat, dims = 1:40, verbose = FALSE) my_seurat
pdf("pdf/umap-by-cluster-sct-seurat-40-pcs.pdf")
DimPlot(my_seurat, cols = cluster_colors, group.by = "cluster_id", label.size = 4, repel = TRUE, label = TRUE)
dev.off()
pdf("pdf/umap-by-supercluster-sct-seurat-40-pcs.pdf")
DimPlot(my_seurat, cols = supercluster_colors, group.by = "supercluster", label.size = 4, repel = TRUE, label = TRUE) + NoLegend()
dev.off()
Note that the PCA and UMAP dimensionality reductions now live in the “SCT” assay instead of the original “RNA” assay, leaving only t-SNE in “RNA”. If you wanted to recreate our initial PCA or UMAP plots, you would have to change the active assay back to “RNA” and recompute them there (or reload my_seurat
from a previously saved state).
@reductions my_seurat
## $pca
## A dimensional reduction object with key PC_
## Number of dimensions: 50
## Number of cells: 5151
## Projected dimensional reduction calculated: FALSE
## Jackstraw run: FALSE
## Computed using assay: SCT
##
## $tsne
## A dimensional reduction object with key tSNE_
## Number of dimensions: 2
## Number of cells: 5151
## Projected dimensional reduction calculated: FALSE
## Jackstraw run: FALSE
## Computed using assay: RNA
##
## $umap
## A dimensional reduction object with key UMAP_
## Number of dimensions: 2
## Number of cells: 5151
## Projected dimensional reduction calculated: FALSE
## Jackstraw run: FALSE
## Computed using assay: SCT
7.11 Conversion between Seurat and SingleCellExperiment
After trying both SingleCellExperiment
and Seurat
, you may develop a preference for one of them and end up using it in your work. In case a collaborator shares a dataset in the other format, you need to be able to convert it to yours. You may also want to use functionality that exists in only one format.
The as.SingleCellExperiment()
function (from package Seurat
) provides a quick way to convert an existing Seurat
object to SingleCellExperiment
. (We will of course need to reload the SingleCellExperiment
package.)
library(SingleCellExperiment)
<- as.SingleCellExperiment(fake_seurat)
fake_SCE_2 counts(fake_SCE_2)
## 12 x 8 sparse Matrix of class "dgCMatrix"
## Cell-1 Cell-2 Cell-3 Cell-4 Cell-5 Cell-6 Cell-7 Cell-8
## Gene-1 . . 2 . . . 1 2
## Gene-2 1 . 1 . 13 2 1 .
## Gene-3 . 2 2 . . . 2 .
## Gene-4 . . . 3 3 . 13 .
## Gene-5 . . . 1 2 . 2 .
## Gene-6 2 1 2 1 . . . 1
## Gene-7 . 2 1 . 1 3 1 5
## Gene-8 . . 6 2 . 1 2 3
## Gene-9 2 9 . 3 2 . 2 4
## Gene-10 1 1 . . 1 . . 2
## Gene-11 2 9 3 8 . 11 . 1
## Gene-12 4 . 1 2 16 . . 3
logcounts(fake_SCE_2)
## 12 x 8 sparse Matrix of class "dgCMatrix"
## Cell-1 Cell-2 Cell-3 Cell-4 Cell-5 Cell-6 Cell-7 Cell-8
## Gene-1 . . 7.014015 . . . 6.034684 6.860015
## Gene-2 6.726633 . 6.321767 . 8.137996 7.071124 6.034684 .
## Gene-3 . 6.726633 7.014015 . . . 6.726633 .
## Gene-4 . . . 7.313887 6.672632 . 8.597420 .
## Gene-5 . . . 6.216606 6.267800 . 6.726633 .
## Gene-6 7.419181 6.034684 7.014015 6.216606 . . . 6.167916
## Gene-7 . 6.726633 6.321767 . 5.576547 7.476306 6.034684 7.775676
## Gene-8 . . 8.112028 6.908755 . 6.378826 6.726633 7.265130
## Gene-9 7.419181 8.229778 . 7.313887 6.267800 . 6.726633 7.552637
## Gene-10 6.726633 6.034684 . . 5.576547 . . 6.860015
## Gene-11 7.419181 8.229778 7.419181 8.294300 . 8.775177 . 6.167916
## Gene-12 8.112028 . 6.321767 6.908755 8.345580 . . 7.265130
colData(fake_SCE_2)
## DataFrame with 8 rows and 6 columns
## orig.ident nCount_RNA nFeature_RNA nCount_CPM nFeature_CPM ident
## <factor> <numeric> <integer> <numeric> <integer> <factor>
## Cell-1 Fake Seurat 12 6 12 6 Fake Seurat
## Cell-2 Fake Seurat 24 6 24 6 Fake Seurat
## Cell-3 Fake Seurat 18 8 18 8 Fake Seurat
## Cell-4 Fake Seurat 20 7 20 7 Fake Seurat
## Cell-5 Fake Seurat 38 7 38 7 Fake Seurat
## Cell-6 Fake Seurat 17 4 17 4 Fake Seurat
## Cell-7 Fake Seurat 24 8 24 8 Fake Seurat
## Cell-8 Fake Seurat 21 8 21 8 Fake Seurat
Similarly, the as.Seurat()
function (from package SeuratObject
) converts a SingleCellExperiment
object to Seurat
.
<- as.Seurat(fake_SCE_2)
fake_seurat_2 "RNA"]]@counts fake_seurat_2[[
## 12 x 8 sparse Matrix of class "dgCMatrix"
## Cell-1 Cell-2 Cell-3 Cell-4 Cell-5 Cell-6 Cell-7 Cell-8
## Gene-1 . . 2 . . . 1 2
## Gene-2 1 . 1 . 13 2 1 .
## Gene-3 . 2 2 . . . 2 .
## Gene-4 . . . 3 3 . 13 .
## Gene-5 . . . 1 2 . 2 .
## Gene-6 2 1 2 1 . . . 1
## Gene-7 . 2 1 . 1 3 1 5
## Gene-8 . . 6 2 . 1 2 3
## Gene-9 2 9 . 3 2 . 2 4
## Gene-10 1 1 . . 1 . . 2
## Gene-11 2 9 3 8 . 11 . 1
## Gene-12 4 . 1 2 16 . . 3
"RNA"]]@data fake_seurat_2[[
## 12 x 8 sparse Matrix of class "dgCMatrix"
## Cell-1 Cell-2 Cell-3 Cell-4 Cell-5 Cell-6 Cell-7 Cell-8
## Gene-1 . . 7.014015 . . . 6.034684 6.860015
## Gene-2 6.726633 . 6.321767 . 8.137996 7.071124 6.034684 .
## Gene-3 . 6.726633 7.014015 . . . 6.726633 .
## Gene-4 . . . 7.313887 6.672632 . 8.597420 .
## Gene-5 . . . 6.216606 6.267800 . 6.726633 .
## Gene-6 7.419181 6.034684 7.014015 6.216606 . . . 6.167916
## Gene-7 . 6.726633 6.321767 . 5.576547 7.476306 6.034684 7.775676
## Gene-8 . . 8.112028 6.908755 . 6.378826 6.726633 7.265130
## Gene-9 7.419181 8.229778 . 7.313887 6.267800 . 6.726633 7.552637
## Gene-10 6.726633 6.034684 . . 5.576547 . . 6.860015
## Gene-11 7.419181 8.229778 7.419181 8.294300 . 8.775177 . 6.167916
## Gene-12 8.112028 . 6.321767 6.908755 8.345580 . . 7.265130
@meta.data fake_seurat_2
## orig.ident nCount_RNA nFeature_RNA nCount_CPM nFeature_CPM ident
## Cell-1 Fake Seurat 12 6 12 6 Fake Seurat
## Cell-2 Fake Seurat 24 6 24 6 Fake Seurat
## Cell-3 Fake Seurat 18 8 18 8 Fake Seurat
## Cell-4 Fake Seurat 20 7 20 7 Fake Seurat
## Cell-5 Fake Seurat 38 7 38 7 Fake Seurat
## Cell-6 Fake Seurat 17 4 17 4 Fake Seurat
## Cell-7 Fake Seurat 24 8 24 8 Fake Seurat
## Cell-8 Fake Seurat 21 8 21 8 Fake Seurat
Be sure to examine the converted object to make sure that all of the data you expect actually exist.