CellDEEP_vignette.RmdCellDEEP reduces scRNA-seq sparsity by pooling cells into pseudocells before DE testing.
FindMarker.CellDEEP includes metadata preparation
internally. Key parameters to set: - group_id,
sample_id, cluster_id: metadata column names
in your Seurat object - ident.1, ident.2: two
groups to compare - cell_selection: how to select cells for
pooling ("kmean" or "random") -
readcounts: how to aggregate counts in pooled cells
("sum" or "mean") -
min_cells_per_subgroup: minimum cells required in each
sample-cluster subgroup for pooling
de.test <- FindMarker.CellDEEP(
sim,
group_id = "Status",
sample_id = "DonorID",
cluster_id = "cluster_id",
Pool = TRUE,
test.use = "wilcox",
n_cells = 3,
min_cells_per_subgroup = 1,
cell_selection = "random",
readcounts = "sum",
logfc.threshold = 0.25,
ident.1 = "Case",
ident.2 = "Control"
)
#> Start Pooling.....
#> Pooling...
#> Warning: Data is of class matrix. Coercing to dgCMatrix.
#> FindMarker running.....
#> 1st ident is:
#> Case
#> 2nd ident is:
#> Control
#> group by:
#> group_id
#> Normalizing layer: counts
#> Finding variable features for layer counts
#> Centering and scaling data matrix
#> For a (much!) faster implementation of the Wilcoxon Rank Sum Test,
#> (default method for FindMarkers) please install the presto package
#> --------------------------------------------
#> install.packages('devtools')
#> devtools::install_github('immunogenomics/presto')
#> --------------------------------------------
#> After installation of presto, Seurat will automatically use the more
#> efficient implementation (no further action necessary).
#> This message will be shown once per session
#> 20
#> Gene1728Gene1992Gene1626Gene1864Gene1715Gene1807Use these functions if you want pooled objects without running DE immediately.
min_cells_per_subgroup means the minimum number of cells
required in each sample_id x cluster_id subgroup before
pooling is performed.
Pooling functions use standardized metadata fields
(sample_id, group_id,
cluster_id), so prepare once before pooling:
pool_input <- prepare_data(
sim,
sample_id = "DonorID",
group_id = "Status",
cluster_id = "cluster_id"
)
pooled_kmean <- CellDEEP.Kmean(
pool_input,
readcounts = "sum",
n_cells = 3,
min_cells_per_subgroup = 1,
assay_name = "RNA"
)
#> Pooling...
#> Warning: Data is of class matrix. Coercing to dgCMatrix.
#> Drop out cell number during kmean pooling is:
#> 24
pooled_kmean
#> An object of class Seurat
#> 2000 features across 56 samples within 1 assay
#> Active assay: RNA (2000 features, 0 variable features)
#> 1 layer present: counts
pooled_random <- CellDEEP.Random(
pool_input,
readcounts = "sum",
n_cells = 5,
min_cells_per_subgroup = 1,
assay_name = "RNA"
)
#> Pooling...
#> Warning: Data is of class matrix. Coercing to dgCMatrix.
pooled_random
#> An object of class Seurat
#> 2000 features across 32 samples within 1 assay
#> Active assay: RNA (2000 features, 0 variable features)
#> 1 layer present: countsIf no genes pass the adjusted p-value filter in this small example
dataset, try a larger dataset or set full_list = TRUE.