Error in SketchData() due to variable features #9487

calafell · 2024-11-15T14:56:14Z

Hi, I'm facing issues with the SketchData() function.
The pipeline I am following begins with a large dataset, using the integration approach described in the "Sketch integration" vignette (https://satijalab.org/seurat/articles/parsebio_sketch_integration). I successfully applied this pipeline to an object containing all cell types, and it worked perfectly with another variable ("study", nearby 30 datasets). However, issues arise when I attempt the same approach with a subset of specific cell types (reclustering), splitting by "orig.ident" (referring to each sample, that are almost 400 samples). After subsetting the initial object, I tried re-performing the integration steps by sample. I am primarily encountering two problems:

Running SketchData() results in an error stating that VariableFeatures() could not be found, even though I executed FindVariableFeatures(), and the information is stored in seurat_object@assays[["RNA"]]@meta.data. ¿Could this issue be related to object size (splitting across 400+ samples) or storage limitations, as a smaller subset of samples did not produce this error?
My modified approach involves splitting the object before performing integration steps, which differs from the order described in the vignette. This issue was also raised on GitHub (join layers or not ( contradicting vignettes) standard workflow and sketch #9013). In the standard integration vignette, the NormalizeData() function is applied after splitting the object, while in the sketch integration vignette, it is applied before. Could you clarify the correct order for data integration from different sources? My current understanding is that normalization should ideally be performed after splitting the object—is this correct? Could this change in the pipeline impact results when splitting by additional variables?

Furthermore, the user in the GitHub thread mentioned the use of JoinLayers() before performing FindNeighbors(), FindClusters(), and RunUMAP(), as described in the integration vignette. However, this step is not included in either the sketch data pipeline or the Seurat v5 integration vignette (https://satijalab.org/seurat/articles/seurat5_integration). Could you please clarify whether re-joining layers is necessary before these downstream analyses?

Thank you very much for your support and guidance on these questions. I truly appreciate it.

## First part that doesn't give me problems
seurat_object[["RNA"]] <- split(seurat_object[["RNA"]], f = seurat_object$orig.ident)

seurat_object <- NormalizeData(seurat_object)

seurat_object <- FindVariableFeatures(seurat_object, verbose = T, nfeatures = 5000);gc()

## Part in which the error apperars
seurat_object <- SketchData(object = seurat_object, ncells = 300,
                                method = "LeverageScore", sketched.assay = "sketch",
                                verbose = T)

[1] "Starting Sketch"
Calcuating Leverage Score
Error in `VariableFeatures()`:
! No variable features found
Backtrace:
     x
  1. \-Seurat::SketchData(...)
  3.   +-Seurat::LeverageScore(...)
  4.   \-Seurat:::LeverageScore.Seurat(...)
  5.     +-Seurat::LeverageScore(...)
  6.     \-Seurat:::LeverageScore.StdAssay(...)
  7.       +-Seurat::LeverageScore(...)
  8.       +-SeuratObject::LayerData(...)
  9.       +-SeuratObject:::LayerData.Assay5(...)
  10.       | \-features %||% dnames[[1L]]
 11.       +-SeuratObject::VariableFeatures(...)
 12.       \-SeuratObject:::VariableFeatures.Assay5(object = object, method = vf.method, layer = l)
 13.         \-rlang::abort(message = msg)

Warning message:
In CheckMetaVarName(object = object, var.name = var.name) :
  leverage.score is already existed in the meta.data. leverage.score.1 will store leverage score value

# R session info
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Madrid
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BPCells_0.2.0         Seurat.utils_2.8.0    magrittr_2.0.3       
 [4] ggExpress_0.9.0       MarkdownHelpers_1.0.7 ggpubr_0.6.0         
 [7] CodeAndRoll2_2.6.0    Stringendo_0.6.0      lubridate_1.9.3      
[10] forcats_1.0.0         stringr_1.5.1         dplyr_1.1.4          
[13] purrr_1.0.2           readr_2.1.5           tidyr_1.3.1          
[16] tibble_3.2.1          ggplot2_3.5.1         tidyverse_2.0.0      
[19] data.table_1.16.2     scCustomize_2.1.2     Seurat_5.0.0         
[22] SeuratObject_5.0.2    sp_2.1-4             

loaded via a namespace (and not attached):
  [1] IRanges_2.36.0           R.methodsS3_1.8.2        vroom_1.6.5             
  [4] goftest_1.2-3            Biostrings_2.68.1        vctrs_0.6.5             
  [7] spatstat.random_3.3-2    RApiSerialize_0.1.3      digest_0.6.37           
 [10] png_0.1-8                shape_1.4.6.1            ggrepel_0.9.6           
 [13] deldir_2.0-4             parallelly_1.39.0        MASS_7.3-60.0.1         
 [16] tictoc_1.2.1             reshape2_1.4.4           httpuv_1.6.15           
 [19] foreach_1.5.2            BiocGenerics_0.48.1      qvalue_2.34.0           
 [22] withr_3.0.2              ggrastr_1.0.2            ggfun_0.1.5             
 [25] survival_3.6-4           memoise_2.0.1            ggbeeswarm_0.7.2        
 [28] clusterProfiler_4.10.1   janitor_2.2.0            gson_0.1.0              
 [31] princurve_2.1.6          tidytree_0.4.6           zoo_1.8-12              
 [34] GlobalOptions_0.1.2      gtools_3.9.5             pbapply_1.7-2           
 [37] R.oo_1.26.0              rematch2_2.1.2           KEGGREST_1.40.1         
 [40] promises_1.3.0           httr_1.4.7               rstatix_0.7.2           
 [43] globals_0.16.3           fitdistrplus_1.2-1       stringfish_0.16.0       
 [46] rstudioapi_0.16.0        ggVennDiagram_1.5.2      miniUI_0.1.1.1          
 [49] generics_0.1.3           DOSE_3.28.2              S4Vectors_0.40.2        
 [52] zlibbioc_1.46.0          ggraph_2.2.1             polyclip_1.10-7         
 [55] GenomeInfoDbData_1.2.10  xtable_1.8-4             GenomicRanges_1.52.1    
 [58] hms_1.1.3                irlba_2.3.5.1            qs_0.26.3               
 [61] colorspace_2.1-1         ROCR_1.0-11              VennDiagram_1.7.3       
 [64] reticulate_1.39.0        spatstat.data_3.1-2      lmtest_0.9-40           
 [67] snakecase_0.11.1         later_1.3.2              viridis_0.6.5           
 [70] ggtree_3.10.1            lattice_0.22-6           spatstat.geom_3.3-3     
 [73] future.apply_1.11.3      scattermore_1.2          shadowtext_0.1.4        
 [76] cowplot_1.1.3            matrixStats_1.4.1        DatabaseLinke.R_1.7.0   
 [79] RcppAnnoy_0.0.22         pillar_1.9.0             nlme_3.1-164            
 [82] iterators_1.0.14         caTools_1.18.3           compiler_4.3.3          
 [85] RSpectra_0.16-2          stringi_1.8.4            tensor_1.5              
 [88] plyr_1.8.9               crayon_1.5.3             abind_1.4-8             
 [91] gridGraphics_0.5-1       sm_2.2-6.0               SoupX_1.6.2             
 [94] graphlayouts_1.1.1       bit_4.0.5                fastmatch_1.1-4         
 [97] codetools_0.2-20         paletteer_1.6.0          plotly_4.10.4           
[100] mime_0.12                splines_4.3.3            circlize_0.4.16         
[103] Rcpp_1.0.13-1            fastDummies_1.7.4        sparseMatrixStats_1.14.0
[106] HDO.db_0.99.1            blob_1.2.4               utf8_1.2.4              
[109] fs_1.6.5                 listenv_0.9.1            checkmate_2.3.1         
[112] job_0.3.1                HGNChelper_0.8.14        openxlsx_4.2.6.1        
[115] ggsignif_0.6.4           ggplotify_0.1.2          Matrix_1.6-5            
[118] tzdb_0.4.0               tweenr_2.0.3             pkgconfig_2.0.3         
[121] pheatmap_1.0.12          tools_4.3.3              cachem_1.1.0            
[124] RSQLite_2.3.7            viridisLite_0.4.2        DBI_1.2.2               
[127] splitstackshape_1.4.8    fastmap_1.2.0            scales_1.3.0            
[130] grid_4.3.3               ica_1.0-3                broom_1.0.5             
[133] patchwork_1.3.0          ggprism_1.0.5            dotCall64_1.2           
[136] carData_3.0-5            RANN_2.6.2               farver_2.1.2            
[139] tidygraph_1.3.1          scatterpie_0.2.3         MatrixGenerics_1.12.3   
[142] cli_3.6.3                stats4_4.3.3             leiden_0.4.3.1          
[145] lifecycle_1.0.4          uwot_0.2.2               Biobase_2.62.0          
[148] lambda.r_1.2.4           sessioninfo_1.2.2        backports_1.4.1         
[151] ggcorrplot_0.1.4.1       BiocParallel_1.36.0      timechange_0.3.0        
[154] gtable_0.3.6             ggridges_0.5.6           progressr_0.15.0        
[157] parallel_4.3.3           ape_5.8                  jsonlite_1.8.9          
[160] RcppHNSW_0.6.0           bitops_1.0-9             bit64_4.0.5             
[163] Rtsne_0.17               yulab.utils_0.1.6        spatstat.utils_3.1-1    
[166] zip_2.3.1                RcppParallel_5.1.8       MarkdownReports_4.7.1   
[169] futile.options_1.0.1     GOSemSim_2.28.1          spatstat.univar_3.1-1   
[172] R.utils_2.12.3           lazyeval_0.2.2           shiny_1.9.1             
[175] htmltools_0.5.8.1        enrichplot_1.22.0        GO.db_3.17.0            
[178] sctransform_0.4.1        rappdirs_0.3.3           formatR_1.14            
[181] glue_1.8.0               spam_2.11-0              ReadWriter_1.5.4        
[184] httr2_1.0.1              XVector_0.40.0           RCurl_1.98-1.14         
[187] treeio_1.26.0            futile.logger_1.4.3      gridExtra_2.3           
[190] igraph_2.1.1             R6_2.5.1                 gplots_3.2.0            
[193] cluster_2.1.6            clipr_0.8.0              aplot_0.2.3             
[196] GenomeInfoDb_1.36.4      vioplot_0.5.0            tidyselect_1.2.1        
[199] vipor_0.4.7              ggforce_0.4.2            car_3.1-2               
[202] AnnotationDbi_1.62.2     future_1.34.0            munsell_0.5.1           
[205] KernSmooth_2.23-22       EnhancedVolcano_1.20.0   htmlwidgets_1.6.4       
[208] fgsea_1.28.0             RColorBrewer_1.1-3       rlang_1.1.4             
[211] spatstat.sparse_3.1-0    spatstat.explore_3.3-3   colorRamps_2.3.4        
[214] fansi_1.0.6              beeswarm_0.4.0

adyzh · 2024-12-06T21:24:47Z

Hi @calafell ,
Thank you for reaching out! We’re always grateful when folks take the time to help make Seurat better 🙂.

Unfortunately, with the details provided, we cannot reproduce your issue. Please provide a minimal reproducible example that demonstrates the issue using one of the datasets available through SeuratData.
For NormalizeData(), it depends on what method of normalization you are applying. I believe CLR requires normalizing before splitting.

calafell added the bug Something isn't working label Nov 15, 2024

adyzh closed this as completed Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in SketchData() due to variable features #9487

Error in SketchData() due to variable features #9487

calafell commented Nov 15, 2024

adyzh commented Dec 6, 2024

Error in SketchData() due to variable features #9487

Error in SketchData() due to variable features #9487

Comments

calafell commented Nov 15, 2024

adyzh commented Dec 6, 2024