You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm facing issues with the SketchData() function.
The pipeline I am following begins with a large dataset, using the integration approach described in the "Sketch integration" vignette (https://satijalab.org/seurat/articles/parsebio_sketch_integration). I successfully applied this pipeline to an object containing all cell types, and it worked perfectly with another variable ("study", nearby 30 datasets). However, issues arise when I attempt the same approach with a subset of specific cell types (reclustering), splitting by "orig.ident" (referring to each sample, that are almost 400 samples). After subsetting the initial object, I tried re-performing the integration steps by sample. I am primarily encountering two problems:
Running SketchData() results in an error stating that VariableFeatures() could not be found, even though I executed FindVariableFeatures(), and the information is stored in seurat_object@assays[["RNA"]]@meta.data. ¿Could this issue be related to object size (splitting across 400+ samples) or storage limitations, as a smaller subset of samples did not produce this error?
My modified approach involves splitting the object before performing integration steps, which differs from the order described in the vignette. This issue was also raised on GitHub (join layers or not ( contradicting vignettes) standard workflow and sketch #9013). In the standard integration vignette, the NormalizeData() function is applied after splitting the object, while in the sketch integration vignette, it is applied before. Could you clarify the correct order for data integration from different sources? My current understanding is that normalization should ideally be performed after splitting the object—is this correct? Could this change in the pipeline impact results when splitting by additional variables?
Furthermore, the user in the GitHub thread mentioned the use of JoinLayers() before performing FindNeighbors(), FindClusters(), and RunUMAP(), as described in the integration vignette. However, this step is not included in either the sketch data pipeline or the Seurat v5 integration vignette (https://satijalab.org/seurat/articles/seurat5_integration). Could you please clarify whether re-joining layers is necessary before these downstream analyses?
Thank you very much for your support and guidance on these questions. I truly appreciate it.
Hi, I'm facing issues with the SketchData() function.
The pipeline I am following begins with a large dataset, using the integration approach described in the "Sketch integration" vignette (https://satijalab.org/seurat/articles/parsebio_sketch_integration). I successfully applied this pipeline to an object containing all cell types, and it worked perfectly with another variable ("study", nearby 30 datasets). However, issues arise when I attempt the same approach with a subset of specific cell types (reclustering), splitting by "orig.ident" (referring to each sample, that are almost 400 samples). After subsetting the initial object, I tried re-performing the integration steps by sample. I am primarily encountering two problems:
Furthermore, the user in the GitHub thread mentioned the use of JoinLayers() before performing FindNeighbors(), FindClusters(), and RunUMAP(), as described in the integration vignette. However, this step is not included in either the sketch data pipeline or the Seurat v5 integration vignette (https://satijalab.org/seurat/articles/seurat5_integration). Could you please clarify whether re-joining layers is necessary before these downstream analyses?
Thank you very much for your support and guidance on these questions. I truly appreciate it.
The text was updated successfully, but these errors were encountered: