Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in SketchData() due to variable features #9487

Open
calafell opened this issue Nov 15, 2024 · 0 comments
Open

Error in SketchData() due to variable features #9487

calafell opened this issue Nov 15, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@calafell
Copy link

Hi, I'm facing issues with the SketchData() function.
The pipeline I am following begins with a large dataset, using the integration approach described in the "Sketch integration" vignette (https://satijalab.org/seurat/articles/parsebio_sketch_integration). I successfully applied this pipeline to an object containing all cell types, and it worked perfectly with another variable ("study", nearby 30 datasets). However, issues arise when I attempt the same approach with a subset of specific cell types (reclustering), splitting by "orig.ident" (referring to each sample, that are almost 400 samples). After subsetting the initial object, I tried re-performing the integration steps by sample. I am primarily encountering two problems:

  1. Running SketchData() results in an error stating that VariableFeatures() could not be found, even though I executed FindVariableFeatures(), and the information is stored in seurat_object@assays[["RNA"]]@meta.data. ¿Could this issue be related to object size (splitting across 400+ samples) or storage limitations, as a smaller subset of samples did not produce this error?
  2. My modified approach involves splitting the object before performing integration steps, which differs from the order described in the vignette. This issue was also raised on GitHub (join layers or not ( contradicting vignettes) standard workflow and sketch #9013). In the standard integration vignette, the NormalizeData() function is applied after splitting the object, while in the sketch integration vignette, it is applied before. Could you clarify the correct order for data integration from different sources? My current understanding is that normalization should ideally be performed after splitting the object—is this correct? Could this change in the pipeline impact results when splitting by additional variables?

Furthermore, the user in the GitHub thread mentioned the use of JoinLayers() before performing FindNeighbors(), FindClusters(), and RunUMAP(), as described in the integration vignette. However, this step is not included in either the sketch data pipeline or the Seurat v5 integration vignette (https://satijalab.org/seurat/articles/seurat5_integration). Could you please clarify whether re-joining layers is necessary before these downstream analyses?

Thank you very much for your support and guidance on these questions. I truly appreciate it.

## First part that doesn't give me problems
seurat_object[["RNA"]] <- split(seurat_object[["RNA"]], f = seurat_object$orig.ident)

seurat_object <- NormalizeData(seurat_object)

seurat_object <- FindVariableFeatures(seurat_object, verbose = T, nfeatures = 5000);gc()

## Part in which the error apperars
seurat_object <- SketchData(object = seurat_object, ncells = 300,
                                method = "LeverageScore", sketched.assay = "sketch",
                                verbose = T)

[1] "Starting Sketch"
Calcuating Leverage Score
Error in `VariableFeatures()`:
! No variable features found
Backtrace:
     x
  1. \-Seurat::SketchData(...)
  3.   +-Seurat::LeverageScore(...)
  4.   \-Seurat:::LeverageScore.Seurat(...)
  5.     +-Seurat::LeverageScore(...)
  6.     \-Seurat:::LeverageScore.StdAssay(...)
  7.       +-Seurat::LeverageScore(...)
  8.       +-SeuratObject::LayerData(...)
  9.       +-SeuratObject:::LayerData.Assay5(...)
  10.       | \-features %||% dnames[[1L]]
 11.       +-SeuratObject::VariableFeatures(...)
 12.       \-SeuratObject:::VariableFeatures.Assay5(object = object, method = vf.method, layer = l)
 13.         \-rlang::abort(message = msg)

Warning message:
In CheckMetaVarName(object = object, var.name = var.name) :
  leverage.score is already existed in the meta.data. leverage.score.1 will store leverage score value

# R session info
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Madrid
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BPCells_0.2.0         Seurat.utils_2.8.0    magrittr_2.0.3       
 [4] ggExpress_0.9.0       MarkdownHelpers_1.0.7 ggpubr_0.6.0         
 [7] CodeAndRoll2_2.6.0    Stringendo_0.6.0      lubridate_1.9.3      
[10] forcats_1.0.0         stringr_1.5.1         dplyr_1.1.4          
[13] purrr_1.0.2           readr_2.1.5           tidyr_1.3.1          
[16] tibble_3.2.1          ggplot2_3.5.1         tidyverse_2.0.0      
[19] data.table_1.16.2     scCustomize_2.1.2     Seurat_5.0.0         
[22] SeuratObject_5.0.2    sp_2.1-4             

loaded via a namespace (and not attached):
  [1] IRanges_2.36.0           R.methodsS3_1.8.2        vroom_1.6.5             
  [4] goftest_1.2-3            Biostrings_2.68.1        vctrs_0.6.5             
  [7] spatstat.random_3.3-2    RApiSerialize_0.1.3      digest_0.6.37           
 [10] png_0.1-8                shape_1.4.6.1            ggrepel_0.9.6           
 [13] deldir_2.0-4             parallelly_1.39.0        MASS_7.3-60.0.1         
 [16] tictoc_1.2.1             reshape2_1.4.4           httpuv_1.6.15           
 [19] foreach_1.5.2            BiocGenerics_0.48.1      qvalue_2.34.0           
 [22] withr_3.0.2              ggrastr_1.0.2            ggfun_0.1.5             
 [25] survival_3.6-4           memoise_2.0.1            ggbeeswarm_0.7.2        
 [28] clusterProfiler_4.10.1   janitor_2.2.0            gson_0.1.0              
 [31] princurve_2.1.6          tidytree_0.4.6           zoo_1.8-12              
 [34] GlobalOptions_0.1.2      gtools_3.9.5             pbapply_1.7-2           
 [37] R.oo_1.26.0              rematch2_2.1.2           KEGGREST_1.40.1         
 [40] promises_1.3.0           httr_1.4.7               rstatix_0.7.2           
 [43] globals_0.16.3           fitdistrplus_1.2-1       stringfish_0.16.0       
 [46] rstudioapi_0.16.0        ggVennDiagram_1.5.2      miniUI_0.1.1.1          
 [49] generics_0.1.3           DOSE_3.28.2              S4Vectors_0.40.2        
 [52] zlibbioc_1.46.0          ggraph_2.2.1             polyclip_1.10-7         
 [55] GenomeInfoDbData_1.2.10  xtable_1.8-4             GenomicRanges_1.52.1    
 [58] hms_1.1.3                irlba_2.3.5.1            qs_0.26.3               
 [61] colorspace_2.1-1         ROCR_1.0-11              VennDiagram_1.7.3       
 [64] reticulate_1.39.0        spatstat.data_3.1-2      lmtest_0.9-40           
 [67] snakecase_0.11.1         later_1.3.2              viridis_0.6.5           
 [70] ggtree_3.10.1            lattice_0.22-6           spatstat.geom_3.3-3     
 [73] future.apply_1.11.3      scattermore_1.2          shadowtext_0.1.4        
 [76] cowplot_1.1.3            matrixStats_1.4.1        DatabaseLinke.R_1.7.0   
 [79] RcppAnnoy_0.0.22         pillar_1.9.0             nlme_3.1-164            
 [82] iterators_1.0.14         caTools_1.18.3           compiler_4.3.3          
 [85] RSpectra_0.16-2          stringi_1.8.4            tensor_1.5              
 [88] plyr_1.8.9               crayon_1.5.3             abind_1.4-8             
 [91] gridGraphics_0.5-1       sm_2.2-6.0               SoupX_1.6.2             
 [94] graphlayouts_1.1.1       bit_4.0.5                fastmatch_1.1-4         
 [97] codetools_0.2-20         paletteer_1.6.0          plotly_4.10.4           
[100] mime_0.12                splines_4.3.3            circlize_0.4.16         
[103] Rcpp_1.0.13-1            fastDummies_1.7.4        sparseMatrixStats_1.14.0
[106] HDO.db_0.99.1            blob_1.2.4               utf8_1.2.4              
[109] fs_1.6.5                 listenv_0.9.1            checkmate_2.3.1         
[112] job_0.3.1                HGNChelper_0.8.14        openxlsx_4.2.6.1        
[115] ggsignif_0.6.4           ggplotify_0.1.2          Matrix_1.6-5            
[118] tzdb_0.4.0               tweenr_2.0.3             pkgconfig_2.0.3         
[121] pheatmap_1.0.12          tools_4.3.3              cachem_1.1.0            
[124] RSQLite_2.3.7            viridisLite_0.4.2        DBI_1.2.2               
[127] splitstackshape_1.4.8    fastmap_1.2.0            scales_1.3.0            
[130] grid_4.3.3               ica_1.0-3                broom_1.0.5             
[133] patchwork_1.3.0          ggprism_1.0.5            dotCall64_1.2           
[136] carData_3.0-5            RANN_2.6.2               farver_2.1.2            
[139] tidygraph_1.3.1          scatterpie_0.2.3         MatrixGenerics_1.12.3   
[142] cli_3.6.3                stats4_4.3.3             leiden_0.4.3.1          
[145] lifecycle_1.0.4          uwot_0.2.2               Biobase_2.62.0          
[148] lambda.r_1.2.4           sessioninfo_1.2.2        backports_1.4.1         
[151] ggcorrplot_0.1.4.1       BiocParallel_1.36.0      timechange_0.3.0        
[154] gtable_0.3.6             ggridges_0.5.6           progressr_0.15.0        
[157] parallel_4.3.3           ape_5.8                  jsonlite_1.8.9          
[160] RcppHNSW_0.6.0           bitops_1.0-9             bit64_4.0.5             
[163] Rtsne_0.17               yulab.utils_0.1.6        spatstat.utils_3.1-1    
[166] zip_2.3.1                RcppParallel_5.1.8       MarkdownReports_4.7.1   
[169] futile.options_1.0.1     GOSemSim_2.28.1          spatstat.univar_3.1-1   
[172] R.utils_2.12.3           lazyeval_0.2.2           shiny_1.9.1             
[175] htmltools_0.5.8.1        enrichplot_1.22.0        GO.db_3.17.0            
[178] sctransform_0.4.1        rappdirs_0.3.3           formatR_1.14            
[181] glue_1.8.0               spam_2.11-0              ReadWriter_1.5.4        
[184] httr2_1.0.1              XVector_0.40.0           RCurl_1.98-1.14         
[187] treeio_1.26.0            futile.logger_1.4.3      gridExtra_2.3           
[190] igraph_2.1.1             R6_2.5.1                 gplots_3.2.0            
[193] cluster_2.1.6            clipr_0.8.0              aplot_0.2.3             
[196] GenomeInfoDb_1.36.4      vioplot_0.5.0            tidyselect_1.2.1        
[199] vipor_0.4.7              ggforce_0.4.2            car_3.1-2               
[202] AnnotationDbi_1.62.2     future_1.34.0            munsell_0.5.1           
[205] KernSmooth_2.23-22       EnhancedVolcano_1.20.0   htmlwidgets_1.6.4       
[208] fgsea_1.28.0             RColorBrewer_1.1-3       rlang_1.1.4             
[211] spatstat.sparse_3.1-0    spatstat.explore_3.3-3   colorRamps_2.3.4        
[214] fansi_1.0.6              beeswarm_0.4.0      	
@calafell calafell added the bug Something isn't working label Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant