-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding 00-reference to build azimuth kidney reference #706
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @maud-p, thanks for filing this! It looks like there is some overlap with the files changed in #704, which makes sense – we'll just need to be careful about how we merge both pull requests in.
I am returning some initial feedback here that I'd put in two main categories:
- Where to use
params
. If we wanted a better understanding of how the parameters that are currently hardcoded in the notebook affected the results, it would be easier to test if we could pass new values (e.g., the seed being set) tormarkdown::render()
and compare the resulting notebooks. Using these also has the benefit of organizing values in one place, which can be very helpful!- This advice might be less impactful if we can't get the Azimuth issue sorted (I will need to take a closer look at that), but having things in one place is still helpful, in my opinion.
- What paths to use. I think
scratch
is what you will want to use for the transient file you download, and you'll want to useresults
instead ofmarker-sets
for the result files.
Let me know if you have any questions 😄
analyses/cell-type-wilms-tumor-06/00_fetal_reference_kidney.Rmd
Outdated
Show resolved
Hide resolved
rds data can be downloaded using the URLhttps://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds | ||
|
||
Please note that this download link permanently references the current version of the dataset (08/2024). | ||
If this dataset is updated, a new download link will be created that permanently references the next version of this dataset. | ||
|
||
We save it in the marker-sets folder of the module. | ||
Note to the DataLab: should we save it somewhere else? | ||
I suggest this rds file could be placed here transiently and removed once the reference is build? | ||
|
||
|
||
```{r path_to_data} | ||
path_to_data <- file.path(module_base, "marker-sets/fetal_full.rds") | ||
|
||
url = "https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds" | ||
|
||
download.file(url, path_to_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could move this URL to a parameter with this as the default. I'll comment above with that suggestion.
analyses/cell-type-wilms-tumor-06/00_fetal_reference_kidney.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/00_fetal_reference_kidney.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/00_fetal_reference_kidney.Rmd
Outdated
Show resolved
Hide resolved
rds data can be downloaded using the URLhttps://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds | ||
|
||
Please note that this download link permanently references the current version of the dataset (08/2024). | ||
If this dataset is updated, a new download link will be created that permanently references the next version of this dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would need to change this text if using params
.
analyses/cell-type-wilms-tumor-06/00_fetal_reference_kidney.Rmd
Outdated
Show resolved
Hide resolved
``` | ||
## Output file | ||
|
||
The azimuth compatible fetal kidney reference will be saved in the marker-sets folder from the module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this text to reflect whatever happens in the chunk below!
Briefly, this will download the reference data from the cellxgene platform: https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds | ||
and create an azimuth compatible Seurat object that will be saved in the marker-sets forlder as ref.Rds and idx.annoy files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to reflect the use of params
(to let folks know where to look to see what is being used!) and where it will be saved.
library(SCpubr) | ||
library(tidyverse) | ||
library(patchwork) | ||
library(SeuratWrappers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was poking around locally to see if I could better understand the Azimuth problem, and I think we need to get SeuratWrappers
into the renv.lock
file.
I was able to install from RStudio within the Docker container with:
remotes::install_github("satijalab/seurat-wrappers@8d46d6c47c089e193fe5c02a8c23970715918aa9")
(This is the most recent commit.)
If you run renv::snapshot()
, I expect some packages might get removed because the code in this branch doesn't account for what is getting added in #704. That would be okay, though; we'd need to resolve it in whichever branch gets merged second.
|
||
```{r create_ref, echo=TRUE, fig.height=7, fig.width=12, message=FALSE, warning=FALSE, out.width='100%'} | ||
options(future.globals.maxSize= 891289600000) | ||
Fetal_kidney <- AzimuthReference( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if you have had any luck running AzimuthReference
within a script? 🤔 Because if so, maybe we make this a script instead of a notebook.
Edited to add: If we did use a script, we'd probably want to use optparse
to specify the different parameters (i.e., it would replace our params
strategy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know very little about Azimuth, so forgive me if this is a naive question! How is the reference generated here different from what is available on Zenodo? https://zenodo.org/records/4738021#.YJIW4C2ZNQI
Edit: I assume the difference is in the input downloaded from CELLxGENE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to run the same in a script instead of a RMarkdown but same, I couldn't run it using one clic on "Source". 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am new to this, but I am now under the impression we can use Seurat to accomplish many of the same things as Azimuth: https://azimuth.hubmapconsortium.org/#General
Can I run the app myself?
The source code is available here. However, for users interested in performing these analyses outside the context of the Azimuth app, we suggest using Seurat v4 and using our vignette on Mapping and annotating query datasets as an example. You can also download a Seurat v4 R script from the app once your analysis is complete to reproduce the results locally.
(h/t @jashapiro)
Following those links, I assume we'd want to use this section as a reference: https://satijalab.org/seurat/articles/integration_mapping.html#cell-type-classification-using-an-integrated-reference
Being unable to run this successfully except for chunk by chunk gives me pause – it would be hard to test this notebook in GitHub Actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference you suggest can also be of interest but it contains 15 different organs fron a quick look. The one I wanted to use is only composed of cells from the kidney and from what I understood the annotation have been done by kidney experts.
But I could give a try with the one you suggest, might be more straightforward!
I'll compare the two on few samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I could give a try with the one you suggest, might be more straightforward! I'll compare the two on few samples.
I would be very curious about this result in general!
Another option would be to try to use Seurat #706 (comment) with the kidney dataset.
I am trying to avoid the AzimuthReference()
bug if at all possible 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for looking into Seurat/Azimuth!
The label transfer using FindTransferAnchors and TransferData (described in the link you suggested https://satijalab.org/seurat/articles/integration_mapping.html#cell-type-classification-using-an-integrated-reference) is what I used before Seurat and Azimuth v5! I'll go back to it!
Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>
Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>
Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>
Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>
Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>
Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>
Thank you very much @jaclyn-taroni for all your feedback and suggestions, all clear so far and make lot of sense! I'd like to try:
You are right, we might not need to build a reference in the end :) |
Hi @jaclyn-taroni ! My approachesI tested 3 different approaches:
You can find one RMarkdown template (in notebook_template) per approach. The idea was to then select 1 or 2 to be rendered for all samples. I knitr the 3 templates for the sample 176 and you can find the html reports in the folder notebook/00-reference. Of note: I started to write a R script Result 1 :
|
Thanks, @maud-p! It's very interesting that the "irrelevant" tissues are not a problem when using the entire fetal reference. I plan to take a closer look this afternoon (Eastern US). |
I just wanted to let you know that I solved the plot rendering problem by using the output:
html_document:
toc: yes
toc_float: yes
code_folding: hide
highlight: pygments
df_print: kable
self_contained: yes
mode: selfcontained |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @maud-p,
With #704 and these new additions, I’m getting a fuller picture of where the module will end up. As a result, I’m going to propose some changes to the structure to help us reduce runtimes and ensure consistency across notebooks.
I’ll summarize the high-level thoughts here:
- The download and preparation of the fetal kidney reference should go into its own script, so you don’t have to repeat that across notebooks or for individual samples. I’ve added a comment that contains what I think that script would look like and described how I would use it/how it fits together with the notebooks in other comments.
- This would be the first thing you would run in
00_reference.R
- The second thing you’d run in
00_reference.R
would be a version of00_fetal_reference_kidney.Rmd
that does the exploration and marker gene calculations.
- This would be the first thing you would run in
- Create functions for common Seurat steps, such as converting from
SingleCellExperiment
and normalization, dimension reduction, etc. — then you can source these in whichever notebook you need them and be consistent in their application without repeating code.- In the future, you might create a script just for converting to Seurat objects. I would make that a later pull request if you think it’s a good idea; this one is fairly large and complex without those changes.
- Make sure you save any outputs of notebooks you expect to use in later steps, like the Seurat objects and results of transfer, in
results
! - Remove data that you can get from the data release from the Git repository.
- I think there’s value in using
RunAzimuth()
consistently across the two atlases if you’re going to use both atlases. So, I would only run theSeuratv4
notebook on a handful of samples to keep a record of the consistency between Seurat and Azimuth results. - In terms of naming and organization:
- I would rename the Cao and Stewart templates to be prefixed with
00a
,00b
, etc., so it’s clear they are part of the reference stuff. I would also use the atlas name instead of the author’s name in the filenames. - I thought it might be helpful if each sample gets its own directory in
notebook/00-reference
.
- I would rename the Cao and Stewart templates to be prefixed with
General comment — if you’re going to be comparing the two atlases down the line, I think we should be considering how to do that quantitatively in addition to the qualitative plots you have here in future pull requests. All the more reason to be saving the outputs!
Once you make these changes, I will file a PR to this branch to set up running CI to test the module. I think that’s the best way to demonstrate how that should work, and then you could maintain that workflow in future PRs if you are interested in learning about it!
Thanks again for your contributions 😄 ! If you have any questions or run into issues, let us know!
I was glad to see that the plotting issue was resolved because it was on my list to figure out before returning this review, so nice work on that 👍 🚀 I had not ever seen that before!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this file. Generally, anything that can be downloaded via the data download script should not be tracked in the Git repository.
We want people to use the data release, not what is committed to the repository, because 1. they could easily get out of sync if the data is updated in a future release, 2. we want to make sure people agree to the Terms of Use before accessing the data, and 3. we want to be wary tracking files we don't need to track (i.e., because they are in the release) because it can degrade Git performance over time.
I will add a comment in 00_reference.R
about how to accomplish the same thing using the release.
Using this reference, we test two workflows for label transfer: | ||
|
||
- `Azimuth v5`, | ||
- `Seurat v4` as described here https://satijalab.org/seurat/articles/integration_mapping.html#cell-type-classification-using-an-integrated-reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admittedly, I am very confused about Seurat's versioning scheme.
If I look at the Seurat version in the output of sessionInfo()
, I see it is version 5.1.0
. So, what makes this Seurat v4 instead of v5? Is it some of the options that you use in lines 216-221?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm sorry for the confusion, you are right I used Seurat v5
, but the worflow was developed with lower version of Seurat
. I should remove the version! (Or even not, as it seems that we will only go with Azimuth
:) )
Sorry for the confusion!
download.file(params$ref_url, path_to_reference) | ||
seurat <- readRDS(path_to_reference) | ||
|
||
options(future.globals.maxSize= 891289600000) | ||
s <- SCTransform(seurat, verbose = FALSE, method = "glmGamPoi", conserve.memory = TRUE) | ||
s <- RunPCA(s, npcs = 50, verbose = FALSE) | ||
s <- RunUMAP(s, dims = 1:50, verbose = FALSE, return.model = TRUE) | ||
|
||
|
||
Fetal_kidney <- AzimuthReference( | ||
s, | ||
refUMAP = "umap", | ||
refDR = "pca", | ||
refAssay = "SCT", | ||
dims = 1:50, | ||
k.param = 31, | ||
plotref = "umap", | ||
plot.metadata = NULL, | ||
ori.index = NULL, | ||
colormap = NULL, | ||
assays = NULL, | ||
metadata = c("compartment", "cell_type"), | ||
reference.version = "0.0.0", | ||
verbose = FALSE | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect these steps take a long time, and I see you are doing it here and in the 00_fetal_reference_kidney.Rmd
notebook. As far as I can tell, the individual sample you'll use the reference for doesn't matter. So, I think you could put the download and AzimuthReference
step in a script so you don't have to repeat it between notebooks and for every sample:
#!/usr/bin/env Rscript
# Download the fetal kidney dataset and create a reference for use with
# Azimuth
#
# USAGE:
# Rscript download-build-kidney-reference.R \
# --url https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds \
# --output_dir ../results/references \
# --seed 2024
#
library(optparse)
library(Seurat)
library(Azimuth)
# Parse arguments --------------------------------------------------------------
# set up arguments
option_list <- list(
make_option(
opt_str = c("-u", "--url"),
type = "character",
default = "https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds",
help = "The URL of the fetal kidney atlas from CELLxGENE"
),
make_option(
opt_str = c("-d", "--output_dir"),
type = "character",
default = "results/references",
help = "Output directory for the Azimuth reference, relative to your current directory"
),
make_option(
opt_str = c("-s", "--seed"),
type = "integer",
default = 12345,
help = "Seed passed to set.seed()"
)
)
opts <- parse_args(OptionParser(option_list = option_list))
# Download data ----------------------------------------------------------------
project_root <- rprojroot::find_root(rprojroot::is_git_root)
path_to_data <- file.path(
project_root,
"analyses",
"cell-type-wilms-tumor-06",
"scratch",
"fetal_kidney.rds"
)
download.file(url = opts$url, destfile = path_to_data)
# Read in data -----------------------------------------------------------------
seurat <- readRDS(path_to_data)
# Transform and dimension reduction --------------------------------------------
set.seed(opts$seed)
options(future.globals.maxSize = 891289600000)
s <- SCTransform(
seurat,
verbose = FALSE,
method = "glmGamPoi",
conserve.memory = TRUE
)
s <- RunPCA(s, npcs = 50, verbose = FALSE)
s <- RunUMAP(s, dims = 1:50, verbose = FALSE, return.model = TRUE)
# Create reference -------------------------------------------------------------
options(future.globals.maxSize = 891289600000)
fetal_kidney <- AzimuthReference(
s,
refUMAP = "umap",
refDR = "pca",
refAssay = "SCT",
dims = 1:50,
k.param = 31,
plotref = "umap",
plot.metadata = NULL,
ori.index = NULL,
colormap = NULL,
assays = NULL,
metadata = c("compartment", "cell_type"),
reference.version = "0.0.0",
verbose = FALSE
)
# Save reference ---------------------------------------------------------------
# Create directory if it doesn't exist yet
dir.create(opts$output_dir, recursive = TRUE, showWarnings = FALSE)
# Save annoy index
SaveAnnoyIndex(
object = fetal_kidney[["refdr.annoy.neighbors"]],
file = file.path(opts$output_dir, "idx.annoy")
)
# Save reference
saveRDS(object = fetal_kidney, file = file.path(opts$output_dir, "ref.Rds"))
Bonus: I tested that AzimuthReference()
runs without error in a script! 🎉
That script ☝🏻 uses the optparse
package, which would need to be added to renv.lock
.
I will assume this is saved as scripts/download-and-create-fetal-kidney-ref.R
in other comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment on naming: I might number this with something like 00a
instead of 01
, and I would consider specifying the tissue of the atlas used (fetal all vs. fetal kidney) instead of the author so it would be easier to tell what atlas is being used at a glance if you're not familiar with the primary literature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While renaming I just though, would it be OK to go with
- "00a_fetal_all_reference_Cao.Rmd"
- "00b_fetal_kidney_reference_Stewart.Rmd"?
Just to make sure it is clear it is not from the same reference, i.e. fetal_kidney is not a subset of fetal_all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sounds good 👍🏻
### Find marker genes for each of the compartment | ||
|
||
|
||
```{r markers_compatment, fig.width=8, fig.height=7, out.width='100%'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be appropriate to save the output of this chunk in results
? Will the results be used later?
### Find marker genes for each of the cell types | ||
|
||
|
||
```{r markers_cell, fig.width=15, fig.height=17, out.width='100%'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be appropriate to save the output of this chunk in results
? Will the results be used later?
```{r fig.height=6, fig.width=6, message=FALSE, warnings=FALSE} | ||
|
||
anchors <- FindTransferAnchors(reference = s, query = srat, dims = 1:50, | ||
reference.reduction = "pca") | ||
predictions <- TransferData(anchorset = anchors, refdata = s$cell_type, dims = 1:50) | ||
srat <- AddMetaData(srat, metadata = predictions$predicted.id, col.name = "predicted.cell_type") | ||
|
||
predictions <- TransferData(anchorset = anchors, refdata = s$compartment, dims = 1:50) | ||
srat <- AddMetaData(srat, metadata = predictions$predicted.id, col.name = "predicted.compartment") | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving forward, if you will also use the all fetal tissue atlas with RunAzimuth()
, I do think there is some value in being consistent in how you apply the two references (even if the results of the Seurat
and RunAzimuth()
methods are close!)
# Label transfer from the Stewart reference using Seurat | ||
rmarkdown::render(input = "notebook_template/01_fetal_reference_Stewart_Seuratv4.Rmd", | ||
params = list(scpca_project_id = metadata$scpca_project_id[metadata$scpca_sample_id ==i],sample_id = i), | ||
output_format = "html_document", | ||
output_file = paste0("01_fetal_reference_Stewart_Seuratv4_",i, ".html"), | ||
output_dir = "notebook/00-reference") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe move this out to a different loop that just covers a handful of samples, and then the loop it is currently inside would cover all samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there's any value in each sample getting its own folder in notebook/00-reference
? If so, you the Seurat v4 notebook loop would come second, and then the first step in this loop through all the samples would be creating the sample-specific directory with dir.create()
or similar.
# Render the reports for (all) samples in the project | ||
for (i in metadata$scpca_sample_id[9:11]) { | ||
# Label transfer from the Cao reference using Azimuth | ||
rmarkdown::render(input = "notebook_template/01_fetal_reference_Cao_Azimuth.Rmd", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you create a path using file.path()
and module_dir
, you don't have to worry about where the script is being run from – it should always work as expected.
Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>
Thank you very much @jaclyn-taroni ! Thank you so much, your reviews and the ones from @sjspielman are super clear and useful! |
Hi @jaclyn-taroni , Thank you for your review, I tried to apply the changes :) The The the loop over the samples perform in three steps:
For the input and output, I went for the following strategy: At the end of the workflow, we have a
I tried to run it on 11 samples, seems to work :-) Results should be loaded on the S3 bucket Please let me know if something is unclear! Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, @maud-p! This is looking really good!
I haven't tested all of the suggestions I'm returning, so if you use any of them, please make sure they work as expected 😄
I filed maud-p#4 so that we can make sure all the code runs without error on the test data. I'd like for that to go into this branch before we merge to main
. That way, we might catch bugs or edge cases we missed from running it on the 11 samples + my reading the code and be able to fix them!
...ll-type-wilms-tumor-06/notebook_template/00b_characterize_fetal_kidney_reference_Stewart.Rmd
Outdated
Show resolved
Hide resolved
...s/cell-type-wilms-tumor-06/notebook_template/02a_label-transfer_fetal_full_reference_Cao.Rmd
Outdated
Show resolved
Hide resolved
...-type-wilms-tumor-06/notebook_template/02b_label-transfer_fetal_kidney_reference_Stewart.Rmd
Outdated
Show resolved
Hide resolved
...-type-wilms-tumor-06/notebook_template/02b_label-transfer_fetal_kidney_reference_Stewart.Rmd
Outdated
Show resolved
Hide resolved
Hi @jaclyn-taroni , Thank you so much! I will test this now! |
Uncomment workflow triggers, download relevant project, and run workflow
Hi @maud-p - I wanted to let you know I am looking into the latest failure. We have used a strategy where we use |
I understand now, thank you very much @jaclyn-taroni !! I didn't realized that in CI the samples were downsampled. But then I am a bit afraid, it seems that decreasing I'll also try to play with this and keep you updated. |
Oh no 😕 Well, if we can't figure it out shortly, we should try to get this in without CI and figure that part out in a later PR. |
@jaclyn-taroni I am on it, I might have found a solution modifiying a bit the Another parameter to reduce is But then comes another ERROR a bit later because of a bug in the As |
FYI @jaclyn-taroni , I tryied first to copy your strategy to use params to change behavior based on whether or not a notebook is running in CI previously, and I am trying that out (https://github.com/jaclyn-taroni/OpenScPCA-analysis/tree/jaclyn-taroni/install-rhtslib/analyses/cell-type-wilms-tumor-06) but it failed. I might have messed up something... I thus decided to adapt the behavior depending on the size of the sample. Not as elegant but temporary :) |
I will take a look! Here's the docs page about test data: https://openscpca.readthedocs.io/en/latest/getting-started/accessing-resources/getting-access-to-data/#accessing-test-data. But you are probably looking for the module instead: https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/main/analyses/simulate-sce We use the default 100 cells to simulate. The problem could be the nature of the simulated data itself, too. |
Thanks, I have the impression the workflow do not go to the |
Okay, @maud-p - let's do this:
Then I will give this a final review for correctness, etc. and merge it without CI. We'll figure out CI in a follow-up PR. |
@jaclyn-taroni I created a new PR #737 with a new branch at the stage of Sorry I wasn't sure how to rollback to this state. Also sorry that I couldn't find a way to use this Thank you very much for your help!! One question, after you merge it, should I rather try to solve this issue or continue the analysis? I guess these could be two independant PR? Thank you again!! |
I think you picked the easiest way to do it, so that totally works 👍
No worries – sorry it has been such a pain! 😅
I'd recommend moving on to clustering, and we will try to figure out the best thing to do internally at the Data Lab. If increasing the number of cells is what is necessary, we'd have to figure out the best approach. |
Closing in favor of #737 |
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
#703
What is the goal of this pull request?
I write a RMarkdown script to
This will be used in the module to perform label transfer from the human fetal kidney atlas to the Wilms tumor samples.
Briefly describe the general approach you took to achieve this goal.
I used a different approach than described in the issue #703 .
I figured out that I can download the human fetal kidney data from cellxgene as a rds object.
url = "https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds"
Like this, I didn't need to create a conda/renv enrironment.
The dockerfile from this PR is the same as the dockerfile from the PR #704 .
Of note however, using your documentation, I have been able to build a conda/renv container that actually allow to run the scripts I did so far, in case we need it in the future :)
If known, do you anticipate filing additional pull requests to complete this analysis module?
Yes! The next one will be to implement the label transfer in the sample report.
Results
What is the name of your results bucket on S3?
here is the command I used to upload the result to the bucket:
What types of results does your code produce (e.g., table, figure)?
The azimuth compatible reference in a format of 2 files:
When running the 00_fetal_reference_kidney.Rmd, these two files will be saved in the module folder/marker-genes.
What is your summary of the results?
We build a reference that is compatible with the azimuth label transfer.
Provide directions for reviewers
I have one issue, using the RMarkdown 00_fetal_reference_kidney.Rmd, we can only build the reference manually running each chunk, but it do not work when we want to knittr as html report.
I could isolate the problem to the AzimuthReference function, might be related to this issue: satijalab/azimuth#219
What are the software and computational requirements needed to be able to run the code in this PR?
Are there particularly areas you'd like reviewers to have a close look at?
Is there anything that you want to discuss further?
Author checklists
Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.