Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fragments_dir in ArchR2Signac #3

Closed
yutongo opened this issue Oct 21, 2022 · 21 comments
Closed

fragments_dir in ArchR2Signac #3

yutongo opened this issue Oct 21, 2022 · 21 comments
Assignees
Labels
bug Something isn't working

Comments

@yutongo
Copy link

yutongo commented Oct 21, 2022

Thank you for the package to convert an ArchRProject a SeuratObject!

I have a question about fragments_dir in function ArchR2Signac. Could you provide an example of fragments_dir for the ArchRProject? I tried the folder with "fragments.tsv.gz(tbi)" or fragments.arrow" but got the error:

Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments, :
Fragment file does not exist.

Thank you.

@rootze rootze self-assigned this Oct 24, 2022
@rootze
Copy link
Collaborator

rootze commented Oct 24, 2022

Hello @yutongo, Thanks for using ArchR2Signac.

For example, if you use 10X Genomics and cellranger-atac count for processing your fastq.gz file, you will likely have cellranger-atac count outputs (fragments.tsv.gz) in a directory under the PATH ./SampleID/out/.

fragments_dir <- "path_to_cellranger_atac_output" # the directory before "/outs/" for all samples
So fragments_dir is the directory to the fragments.tsv.gz files. Since I made /outs/ part default in the code for the PATH that you need to provide, you only need everything before the /outs/.

Example :
/home/PATH_to_ProjectFolder/cellranger_out/Sample10/

I hope this makes sense. Please let me know if this solves your problem. Thank you!

@yutongo
Copy link
Author

yutongo commented Oct 24, 2022

Hello

Hello @yutongo, Thanks for using ArchR2Signac.

For example, if you use 10X Genomics and cellranger-atac count for processing your fastq.gz file, you will likely have cellranger-atac count outputs (fragments.tsv.gz) in a directory under the PATH ./SampleID/out/.

fragments_dir <- "path_to_cellranger_atac_output" # the directory before "/outs/" for all samples So fragments_dir is the directory to the fragments.tsv.gz files. Since I made /outs/ part default in the code for the PATH that you need to provide, you only need everything before the /outs/.

Example : /home/PATH_to_ProjectFolder/cellranger_out/Sample10/

I hope this makes sense. Please let me know if this solves your problem. Thank you!

Thank you for the quick response.

However, I got the fragments files from SnapTools instead of cellranger-atac count and I only have fragments.tsv.gz(tbi) files which are enough inputs for ArchR. The path to fragments.tsv.gz(tbi) files generates the above errors. What else files do I need to use the function CreateFragmentObject?

Thank you.

@rootze
Copy link
Collaborator

rootze commented Oct 25, 2022

Hello @yutongo,

Thanks for letting me know. I never work with SnapTools before. But it should be a quick fix from my end to update the package or the function to fit your input.
If you could please share a full PATH to two of your fragments.tsv.gz(tbi) files as examples; as well as a tree list with the other files in the same folder, I would make quick changes accordingly in the source code.

Also, I will research the SnapTools output and file formats, and make changes as soon as I can.
Thank you for your understanding.

Regards,
Ze

@yutongo
Copy link
Author

yutongo commented Oct 25, 2022

Hello Ze,

Thank you!

Path:
/Users/yutongo/Documents/ATAC/ATAC_ALL_NEW

Tree list:
.
├── sample1.tsv.gz
├── sample1.tsv.gz.tbi
├── sample2.tsv.gz
├── sample2.tsv.gz.tbi
├── sample3.tsv.gz
├── sample3.tsv.gz.tbi
├── sample4.tsv.gz
├── sample4.tsv.gz.tbi
├── sample1.arrow
├── sample2.arrow
├── sample3.arrow
├── sample4.arrow
├── ArchRLogs
│   ├── ArchR-addClusters-c7937f9bf4f5-Date-2022-10-22_Time-22-54-51.log
│   ├── ArchR-addDoubletScores-c79341433395-Date-2022-10-21_Time-09-59-08.log
│   ├── ArchR-addGeneIntegrationMatrix-c79324480789-Date-2022-10-22_Time-23-15-40.log
│   ├── ArchR-addIterativeLSI-c793f9f96b4-Date-2022-10-23_Time-10-24-35.log
│   ├── ArchR-createArrows-c7932e5bf973-Date-2022-10-21_Time-07-53-55.log
│   ├── ArchR-getMarkerFeatures-c793476a91b8-Date-2022-10-21_Time-12-04-32.log
│   ├── ArchR-plotEmbedding-c793f2d90f1-Date-2022-10-23_Time-16-53-21.log
│   ├── ArchR-plotFragmentSizes-c79310cbe77a-Date-2022-10-21_Time-10-29-08.log
│   ├── ArchR-plotMarkerHeatmap-c793254baab0-Date-2022-10-21_Time-12-15-09.log
│   └── ArchR-plotTSSEnrichment-c7933aca0333-Date-2022-10-21_Time-11-07-06.log
├── HemeTutorial
│   ├── ArrowFiles
│   │   ├── sample1.arrow
│   │   ├── sample2.arrow
│   │   ├── sample3.arrow
│   │   └── sample4.arrow
│   ├── Embeddings
│   │   └── Save-Uwot-UMAP-Params-IterativeLSI2-c7935e9fc38f-Date-2022-10-21_Time-12-04-17.tar
│   ├── ImputeWeights
│   │   ├── Impute-Weights-Rep-1
│   │   └── Impute-Weights-Rep-2
│   ├── IterativeLSI2
│   │   ├── Save-LSI-Iteration-1.pdf
│   │   ├── Save-LSI-Iteration-1.rds
│   │   ├── Save-LSI-Iteration-2.pdf
│   │   ├── Save-LSI-Iteration-2.rds
│   │   ├── Save-LSI-Iteration-3.pdf
│   │   └── Save-LSI-Iteration-3.rds
│   ├── Plots
│   │   ├── GeneScores-Marker-Heatmap.pdf
│   │   ├── GeneScores-Markerall-Heatmap.pdf
│   │   ├── Plot-UMAP-Marker-Genes-RNA-W-Imputation.pdf
│   │   ├── Plot-UMAP-Marker-Genes-WO-Imputation.pdf
│   │   ├── Plot-UMAP-RNA-Integration.pdf
│   │   ├── Plot-UMAP-Remap-Clusters.pdf
│   │   ├── Plot-UMAP-Sample-Clusters.pdf
│   │   ├── QC-Sample-FragSizes-TSSProfile.pdf
│   │   ├── QC-Sample-Statistics.pdf
│   │   ├── percentage_cluster.pdf
│   │   └── percentage_cluster_col1.pdf
│   └── RNAIntegration
│   └── GeneIntegrationMatrix
│   ├── Save-Block1-JointCCA-UMAP.pdf
│   ├── Save-Block1-JointCCA.rds
│   ├── Save-Block2-JointCCA-UMAP.pdf
│   ├── Save-Block2-JointCCA.rds
│   ├── Save-Block3-JointCCA-UMAP.pdf
│   ├── Save-Block3-JointCCA.rds
│   ├── Save-Block4-JointCCA-UMAP.pdf
│   ├── Save-Block4-JointCCA.rds
│   ├── Save-Block5-JointCCA-UMAP.pdf
│   └── Save-Block5-JointCCA.rds
├── QualityControl
│   ├── sample1
│   │   ├── sample1-Doublet-Summary.pdf
│   │   ├── sample1-Doublet-Summary.rds
│   │   ├── sample1-Fragment_Size_Distribution.pdf
│   │   ├── sample1-Pre-Filter-Metadata.rds
│   │   └── sample1-TSS_by_Unique_Frags.pdf
│   ├── sample2
│   │   ├── sample2-Doublet-Summary.pdf
│   │   ├── sample2-Doublet-Summary.rds
│   │   ├── sample2-Fragment_Size_Distribution.pdf
│   │   ├── sample2-Pre-Filter-Metadata.rds
│   │   └── sample2-TSS_by_Unique_Frags.pdf
│   ├── sample3
│   │   ├── sample3-Doublet-Summary.pdf
│   │   ├── sample3-Doublet-Summary.rds
│   │   ├── sample3-Fragment_Size_Distribution.pdf
│   │   ├── sample3-Pre-Filter-Metadata.rds
│   │   └── sample3-TSS_by_Unique_Frags.pdf
│   └── sample4
│   ├── sample4-Doublet-Summary.pdf
│   ├── sample4-Doublet-Summary.rds
│   ├── sample4-Fragment_Size_Distribution.pdf
│   ├── sample4-Pre-Filter-Metadata.rds
│   └── sample4-TSS_by_Unique_Frags.pdf
├── Save-ProjHeme2
│   ├── ArrowFiles
│   │   ├── sample1.arrow
│   │   ├── sample2.arrow
│   │   ├── sample3.arrow
│   │   └── sample4.arrow
│   ├── Embeddings
│   │   └── Save-Uwot-UMAP-Params-IterativeLSI2-c7935e9fc38f-Date-2022-10-21_Time-12-04-17.tar
│   ├── IterativeLSI2
│   │   ├── Save-LSI-Iteration-1.pdf
│   │   ├── Save-LSI-Iteration-1.rds
│   │   ├── Save-LSI-Iteration-2.pdf
│   │   ├── Save-LSI-Iteration-2.rds
│   │   ├── Save-LSI-Iteration-3.pdf
│   │   └── Save-LSI-Iteration-3.rds
│   ├── Plots
│   │   ├── GeneScores-Marker-Heatmap.pdf
│   │   ├── GeneScores-Markerall-Heatmap.pdf
│   │   ├── Plot-UMAP-Marker-Genes-RNA-W-Imputation.pdf
│   │   ├── Plot-UMAP-Marker-Genes-WO-Imputation.pdf
│   │   ├── Plot-UMAP-RNA-Integration.pdf
│   │   ├── Plot-UMAP-Remap-Clusters.pdf
│   │   ├── Plot-UMAP-Sample-Clusters.pdf
│   │   ├── QC-Sample-FragSizes-TSSProfile.pdf
│   │   ├── QC-Sample-Statistics.pdf
│   │   ├── percentage_cluster.pdf
│   │   └── percentage_cluster_col1.pdf
│   ├── RNAIntegration
│   │   ├── GeneIntegrationMatrix
│   │   ├── Save-Block1-JointCCA-UMAP.pdf
│   │   ├── Save-Block1-JointCCA.rds
│   │   ├── Save-Block2-JointCCA-UMAP.pdf
│   │   ├── Save-Block2-JointCCA.rds
│   │   ├── Save-Block3-JointCCA-UMAP.pdf
│   │   ├── Save-Block3-JointCCA.rds
│   │   ├── Save-Block4-JointCCA-UMAP.pdf
│   │   ├── Save-Block4-JointCCA.rds
│   │   ├── Save-Block5-JointCCA-UMAP.pdf
│   │   └── Save-Block5-JointCCA.rds
│   └── Save-ArchR-Project.rds
├── tmp
│   └── tmp-c793fe721b3-Date-2022-10-21_Time-10-07-53
│   ├── ArrowFiles
│   └── IterativeLSI

@rootze
Copy link
Collaborator

rootze commented Oct 25, 2022

@yutongo Thank you for letting me know about this. I will make some adjustments on the source code and update you later. I'm a little bit busy this week, but definitely will fix this error before next week. Thank you for your understanding.

@rootze
Copy link
Collaborator

rootze commented Oct 26, 2022

@yutongo I have updated the package to fit your file and path format. Please update the ArchRtoSignac package before using it. Package ArchRtoSignac should be in version 1.0.1.
For the function ArchR2Signac to covert ArchR to Signac; in your example:

fragments_dir <- '/Users/yutongo/Documents/ATAC/ATAC_ALL_NEW/'

#Conversion function
seurat_atac <- ArchR2Signac(
  ArchRProject = YOUR_ArchRProj, # YOUR_ArchRProj is your ArchRProject  
  fragments_dir = fragments_dir, 
  pm = pm, # getting peak matrix
  fragments_fromcellranger = "NO",
  fragments_file_extension = '.tsv.gz',
  refversion = 'hg38', # make sure this fits your choice 
  annotation = annotations
)

I have the above update tested on the ArchR-provided samples. But please let me know if the above update works for you or not. Thank you!

Also, for a detailed step-by-step tutorial for ArchRtoSignac, you can find it in the STAR protocol: https://doi.org/10.1016/j.xpro.2022.101491

Thank you once again for using ArchRtoSignac

Ze

@yutongo
Copy link
Author

yutongo commented Oct 27, 2022

Hi Ze,

Thank you for the help!

However, I meet with the new error based the above example:
'Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments, :
Incorrect number of columns found in fragment file'

Is it because of the filtering steps in ArchR? I tried adding the option 'samples = rownames(MY_ArchRProj)',
but got the error:
'Error in ArchR2Signac(ArchRProject = MY_ArchRProj, refversion = "mm10", fragments_dir = "/Users/yutongo/Documents/ATAC/ATAC_ALL_NEW/", :
unused argument (alist())'.

Do you have any suggestions?

Thanks a lot.

@rootze
Copy link
Collaborator

rootze commented Oct 27, 2022

> 'Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments, :
> Incorrect number of columns found in fragment file'

@yutongo Sorry to hear that you had a different issue.

Would you mind providing the complete code you run? More information you can provide from your ArchRProject metadata would help me understand what might cause the error. I don't know what is left in your data and what you did in your filtering steps. But most likely, I think you're right.

> Is it because of the filtering steps in ArchR? I tried adding the option 'samples = rownames(MY_ArchRProj)',

Also, what is in your ArchRProject@cellColData$Sample?
please run this to check table(ArchRProject@cellColData$Sample)
I have a few ideas of what might result in the problem, and I think you're on the right track. So let's fix the problem together.
In ArchR2Signac, you don't necessarily need to provide the sample list since I set it to default to checking from your ArchRProject. I am not sure what rownames(MY_ArchRProj) would return.

@rootze
Copy link
Collaborator

rootze commented Nov 15, 2022

The ArchRtoSignac package updated to version 1.0.2 to solve this problem
and also a possible way of reformating fragment files to match cellranger atac count output: stuart-lab/signac#748

@rootze rootze closed this as completed Nov 15, 2022
@rootze rootze added the bug Something isn't working label Nov 15, 2022
@mkojima123
Copy link

Thank you for the convenient package!
I use ArchRtoSignac version 1.0.3, but same error occured.

Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments,  :
  Incorrect number of columns found in fragment file

Are there any hints?
Thank you.

@rootze
Copy link
Collaborator

rootze commented Apr 18, 2023

Thank you for the convenient package! I use ArchRtoSignac version 1.0.3, but same error occured.

Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments,  :
  Incorrect number of columns found in fragment file

Are there any hints? Thank you.

@mkojima123 Thanks for using the ArchRtoSignac package. Please provide more information, such as which technology you acquired your scATAC-seq, for example, 10x, snapATAC, or something else. Additionally, what code did you run? What is/are the fragment path(s) you have provided? The more information you can provide, the better for me to interpret your error. Thank you.

@rootze rootze reopened this Apr 18, 2023
@mkojima123
Copy link

Thank you for quick reply.

It is fragment of public data. Not 10x data.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE184462

And, I ran this code.

seurat_atac <- ArchR2Signac(
  ArchRProject = proj,
  refversion = "hg38",
  fragments_dir = fragments_dir,
  pm = pkm,
  fragments_fromcellranger = "No",
  fragments_file_extension = ".fragments.txt.gz",
  annotation = annotations
)

There are *.fragments.txt.gz and *.fragments.txt.gz.tbi in fragments_dir.

@rootze
Copy link
Collaborator

rootze commented Apr 18, 2023

It is fragment of public data. Not 10x data.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE184462

Oh, I see. By skimming through the link and the paper, it seems to me that this is a snapATAC dataset, which means there is a column missing in the fragment file compared to scATAC-seq from 10x. The default format of Signac is 10x. So you need to add another column to fill the fifth column, a readSupport column, in your fragment file. But please double-check whether this is snapATAC first before moving forward. I hope this makes sense.

This is the latest format for 10x Genomics scATAC fragment files: https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/output/fragments

@mkojima123
Copy link

Thank you for your kind explanation.
As you mentioned, It seems to used unique scripts after using SnapATAC.

So you need to add another column to fill the fifth column, a readSupport column, in your fragment file.

Actually, Seeing this fragment file, readSupport column is filled.
Like this.

GL000008.2      61      133     TGCACATTACAGAATGGCACTG  3       .
GL000008.2      61      402     CCTACGAGAGAGTGCCTAACAA  4       .
GL000008.2      450     619     CCGCGTAAGTCGAACGATACAG  1       .

Please let me know if my understanding is wrong.
Thank you.

@rootze
Copy link
Collaborator

rootze commented Apr 18, 2023

Thank you for your kind explanation. As you mentioned, It seems to used unique scripts after using SnapATAC.

So you need to add another column to fill the fifth column, a readSupport column, in your fragment file.

Actually, Seeing this fragment file, readSupport column is filled. Like this.

GL000008.2      61      133     TGCACATTACAGAATGGCACTG  3       .
GL000008.2      61      402     CCTACGAGAGAGTGCCTAACAA  4       .
GL000008.2      450     619     CCGCGTAAGTCGAACGATACAG  1       .

Please let me know if my understanding is wrong. Thank you.

Yeah, it seems that way. What is the 6th (last) column, by the way? Maybe it has something to do with that, I am not sure. Sorry, I have never worked on snapATAC data before, but I am happy to help as much as I can.
By the way, I want to ask what you supply for fragments_dir may you can give me an example and also, could you give an example for one of the fragment paths, just checking.

@mkojima123
Copy link

It's probably strand. Almost all sample have . in this column.

fragment_dir example is /home/name/work/scATAC/all/, and fragment path is sample1__GSM5589375_liver_SM-A8WNZ_rep1.fragments.txt.gz.
I adjusted fragment file name to arrow file's them. sample1__GSM5589375_liver_SM-A8WNZ_rep1.arrow

@rootze
Copy link
Collaborator

rootze commented Apr 19, 2023

@mkojima123 if your complete path to access the fragment file is /home/name/work/scATAC/all/sample1__GSM5589375_liver_SM-A8WNZ_rep1.fragments.txt.gz, and if the error has nothing to do with the 6th column. I am running out of ideas at this stage.
Arrow files are not what Signac needs. The error you got is from the Signac package.
If you could provide your code from ArchR constructing to transfer to Signac using ArchRtoSignac, maybe I can look at it and recreate your error.

@mkojima123
Copy link

I see. Thank you.
I'll check about Signac. If something comes up, I will let you know.

@mkojima123
Copy link

Hi.
It worked by deleting the 6th column. It needed to have 5 columns.

Thank you for your support.

@rootze
Copy link
Collaborator

rootze commented Apr 24, 2023

@mkojima123 Great! Glad it works. Thanks again for using ArchRtoSignac. I will close this issue. Please feel free to open it or make an issue if you have more questions.

@rootze rootze closed this as completed Apr 24, 2023
@A-legac45
Copy link

Hello I am also troubling with the fragment files which are the output of cellranger 10x multiomic data

seurat_atac <- ArchR2Signac(

  • ArchRProject = project_Peaks_MACS2_RES0.9,
  • refversion = "mm10",
  • #samples = samplelist, # list of samples in the ArchRProject (default will use ArchRProject@cellColData$Sample but another list can be provided)
  • fragments_dir = inputFiles,
  • pm = pkm, # peak matrix from getPeakMatrix(),
  • fragments_file_extension = '_fragments.tsv.gz',
  • fragments_fromcellranger = "YES", # fragments_fromcellranger This is an Yes or No selection ("NO" | "N" | "No" or "YES" | "Y" | "Yes")
  • annotation = annotation # annotation from getAnnotation()
  • )
    [1] "In Progress:"
    [1] "Prepare Seurat list for each sample"
    [1] "First_try_multiomic_archr"
    [1] 121983 3733
    Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments, :
    Fragment file does not exist.

For InputFiles I try many things
InputFiles <- c("/Users/alegac/Library/CloudStorage/OneDrive-INSTITUTCURIE/Projet_linda_mutiomic_novembre/KDI_2017426_2023-11-15_16-59-39/")
InputFiles <- c("/Users/alegac/Library/CloudStorage/OneDrive-INSTITUTCURIE/Projet_linda_mutiomic_novembre/KDI_2017426_2023-11-15_16-59-39")
InputFiles <- c("/Users/alegac/Library/CloudStorage/OneDrive-INSTITUTCURIE/Projet_linda_mutiomic_novembre/KDI_2017426_2023-11-15_16-59-39/atac_fragments.tsv.gz")

Thanks for help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants