fragments_dir in ArchR2Signac #3

yutongo · 2022-10-21T06:30:27Z

Thank you for the package to convert an ArchRProject a SeuratObject!

I have a question about fragments_dir in function ArchR2Signac. Could you provide an example of fragments_dir for the ArchRProject? I tried the folder with "fragments.tsv.gz(tbi)" or fragments.arrow" but got the error:

Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments, :
Fragment file does not exist.

Thank you.

rootze · 2022-10-24T22:05:24Z

Hello @yutongo, Thanks for using ArchR2Signac.

For example, if you use 10X Genomics and cellranger-atac count for processing your fastq.gz file, you will likely have cellranger-atac count outputs (fragments.tsv.gz) in a directory under the PATH ./SampleID/out/.

fragments_dir <- "path_to_cellranger_atac_output" # the directory before "/outs/" for all samples
So fragments_dir is the directory to the fragments.tsv.gz files. Since I made /outs/ part default in the code for the PATH that you need to provide, you only need everything before the /outs/.

Example :
/home/PATH_to_ProjectFolder/cellranger_out/Sample10/

I hope this makes sense. Please let me know if this solves your problem. Thank you!

yutongo · 2022-10-24T23:44:40Z

Hello

Hello @yutongo, Thanks for using ArchR2Signac.

For example, if you use 10X Genomics and cellranger-atac count for processing your fastq.gz file, you will likely have cellranger-atac count outputs (fragments.tsv.gz) in a directory under the PATH ./SampleID/out/.

fragments_dir <- "path_to_cellranger_atac_output" # the directory before "/outs/" for all samples So fragments_dir is the directory to the fragments.tsv.gz files. Since I made /outs/ part default in the code for the PATH that you need to provide, you only need everything before the /outs/.

Example : /home/PATH_to_ProjectFolder/cellranger_out/Sample10/

I hope this makes sense. Please let me know if this solves your problem. Thank you!

Thank you for the quick response.

However, I got the fragments files from SnapTools instead of cellranger-atac count and I only have fragments.tsv.gz(tbi) files which are enough inputs for ArchR. The path to fragments.tsv.gz(tbi) files generates the above errors. What else files do I need to use the function CreateFragmentObject?

Thank you.

rootze · 2022-10-25T02:10:46Z

Hello @yutongo,

Thanks for letting me know. I never work with SnapTools before. But it should be a quick fix from my end to update the package or the function to fit your input.
If you could please share a full PATH to two of your fragments.tsv.gz(tbi) files as examples; as well as a tree list with the other files in the same folder, I would make quick changes accordingly in the source code.

Also, I will research the SnapTools output and file formats, and make changes as soon as I can.
Thank you for your understanding.

Regards,
Ze

yutongo · 2022-10-25T05:40:58Z

Hello Ze,

Thank you!

Path:
/Users/yutongo/Documents/ATAC/ATAC_ALL_NEW

Tree list:
.
├── sample1.tsv.gz
├── sample1.tsv.gz.tbi
├── sample2.tsv.gz
├── sample2.tsv.gz.tbi
├── sample3.tsv.gz
├── sample3.tsv.gz.tbi
├── sample4.tsv.gz
├── sample4.tsv.gz.tbi
├── sample1.arrow
├── sample2.arrow
├── sample3.arrow
├── sample4.arrow
├── ArchRLogs
│   ├── ArchR-addClusters-c7937f9bf4f5-Date-2022-10-22_Time-22-54-51.log
│   ├── ArchR-addDoubletScores-c79341433395-Date-2022-10-21_Time-09-59-08.log
│   ├── ArchR-addGeneIntegrationMatrix-c79324480789-Date-2022-10-22_Time-23-15-40.log
│   ├── ArchR-addIterativeLSI-c793f9f96b4-Date-2022-10-23_Time-10-24-35.log
│   ├── ArchR-createArrows-c7932e5bf973-Date-2022-10-21_Time-07-53-55.log
│   ├── ArchR-getMarkerFeatures-c793476a91b8-Date-2022-10-21_Time-12-04-32.log
│   ├── ArchR-plotEmbedding-c793f2d90f1-Date-2022-10-23_Time-16-53-21.log
│   ├── ArchR-plotFragmentSizes-c79310cbe77a-Date-2022-10-21_Time-10-29-08.log
│   ├── ArchR-plotMarkerHeatmap-c793254baab0-Date-2022-10-21_Time-12-15-09.log
│   └── ArchR-plotTSSEnrichment-c7933aca0333-Date-2022-10-21_Time-11-07-06.log
├── HemeTutorial
│   ├── ArrowFiles
│   │   ├── sample1.arrow
│   │   ├── sample2.arrow
│   │   ├── sample3.arrow
│   │   └── sample4.arrow
│   ├── Embeddings
│   │   └── Save-Uwot-UMAP-Params-IterativeLSI2-c7935e9fc38f-Date-2022-10-21_Time-12-04-17.tar
│   ├── ImputeWeights
│   │   ├── Impute-Weights-Rep-1
│   │   └── Impute-Weights-Rep-2
│   ├── IterativeLSI2
│   │   ├── Save-LSI-Iteration-1.pdf
│   │   ├── Save-LSI-Iteration-1.rds
│   │   ├── Save-LSI-Iteration-2.pdf
│   │   ├── Save-LSI-Iteration-2.rds
│   │   ├── Save-LSI-Iteration-3.pdf
│   │   └── Save-LSI-Iteration-3.rds
│   ├── Plots
│   │   ├── GeneScores-Marker-Heatmap.pdf
│   │   ├── GeneScores-Markerall-Heatmap.pdf
│   │   ├── Plot-UMAP-Marker-Genes-RNA-W-Imputation.pdf
│   │   ├── Plot-UMAP-Marker-Genes-WO-Imputation.pdf
│   │   ├── Plot-UMAP-RNA-Integration.pdf
│   │   ├── Plot-UMAP-Remap-Clusters.pdf
│   │   ├── Plot-UMAP-Sample-Clusters.pdf
│   │   ├── QC-Sample-FragSizes-TSSProfile.pdf
│   │   ├── QC-Sample-Statistics.pdf
│   │   ├── percentage_cluster.pdf
│   │   └── percentage_cluster_col1.pdf
│   └── RNAIntegration
│   └── GeneIntegrationMatrix
│   ├── Save-Block1-JointCCA-UMAP.pdf
│   ├── Save-Block1-JointCCA.rds
│   ├── Save-Block2-JointCCA-UMAP.pdf
│   ├── Save-Block2-JointCCA.rds
│   ├── Save-Block3-JointCCA-UMAP.pdf
│   ├── Save-Block3-JointCCA.rds
│   ├── Save-Block4-JointCCA-UMAP.pdf
│   ├── Save-Block4-JointCCA.rds
│   ├── Save-Block5-JointCCA-UMAP.pdf
│   └── Save-Block5-JointCCA.rds
├── QualityControl
│   ├── sample1
│   │   ├── sample1-Doublet-Summary.pdf
│   │   ├── sample1-Doublet-Summary.rds
│   │   ├── sample1-Fragment_Size_Distribution.pdf
│   │   ├── sample1-Pre-Filter-Metadata.rds
│   │   └── sample1-TSS_by_Unique_Frags.pdf
│   ├── sample2
│   │   ├── sample2-Doublet-Summary.pdf
│   │   ├── sample2-Doublet-Summary.rds
│   │   ├── sample2-Fragment_Size_Distribution.pdf
│   │   ├── sample2-Pre-Filter-Metadata.rds
│   │   └── sample2-TSS_by_Unique_Frags.pdf
│   ├── sample3
│   │   ├── sample3-Doublet-Summary.pdf
│   │   ├── sample3-Doublet-Summary.rds
│   │   ├── sample3-Fragment_Size_Distribution.pdf
│   │   ├── sample3-Pre-Filter-Metadata.rds
│   │   └── sample3-TSS_by_Unique_Frags.pdf
│   └── sample4
│   ├── sample4-Doublet-Summary.pdf
│   ├── sample4-Doublet-Summary.rds
│   ├── sample4-Fragment_Size_Distribution.pdf
│   ├── sample4-Pre-Filter-Metadata.rds
│   └── sample4-TSS_by_Unique_Frags.pdf
├── Save-ProjHeme2
│   ├── ArrowFiles
│   │   ├── sample1.arrow
│   │   ├── sample2.arrow
│   │   ├── sample3.arrow
│   │   └── sample4.arrow
│   ├── Embeddings
│   │   └── Save-Uwot-UMAP-Params-IterativeLSI2-c7935e9fc38f-Date-2022-10-21_Time-12-04-17.tar
│   ├── IterativeLSI2
│   │   ├── Save-LSI-Iteration-1.pdf
│   │   ├── Save-LSI-Iteration-1.rds
│   │   ├── Save-LSI-Iteration-2.pdf
│   │   ├── Save-LSI-Iteration-2.rds
│   │   ├── Save-LSI-Iteration-3.pdf
│   │   └── Save-LSI-Iteration-3.rds
│   ├── Plots
│   │   ├── GeneScores-Marker-Heatmap.pdf
│   │   ├── GeneScores-Markerall-Heatmap.pdf
│   │   ├── Plot-UMAP-Marker-Genes-RNA-W-Imputation.pdf
│   │   ├── Plot-UMAP-Marker-Genes-WO-Imputation.pdf
│   │   ├── Plot-UMAP-RNA-Integration.pdf
│   │   ├── Plot-UMAP-Remap-Clusters.pdf
│   │   ├── Plot-UMAP-Sample-Clusters.pdf
│   │   ├── QC-Sample-FragSizes-TSSProfile.pdf
│   │   ├── QC-Sample-Statistics.pdf
│   │   ├── percentage_cluster.pdf
│   │   └── percentage_cluster_col1.pdf
│   ├── RNAIntegration
│   │   ├── GeneIntegrationMatrix
│   │   ├── Save-Block1-JointCCA-UMAP.pdf
│   │   ├── Save-Block1-JointCCA.rds
│   │   ├── Save-Block2-JointCCA-UMAP.pdf
│   │   ├── Save-Block2-JointCCA.rds
│   │   ├── Save-Block3-JointCCA-UMAP.pdf
│   │   ├── Save-Block3-JointCCA.rds
│   │   ├── Save-Block4-JointCCA-UMAP.pdf
│   │   ├── Save-Block4-JointCCA.rds
│   │   ├── Save-Block5-JointCCA-UMAP.pdf
│   │   └── Save-Block5-JointCCA.rds
│   └── Save-ArchR-Project.rds
├── tmp
│   └── tmp-c793fe721b3-Date-2022-10-21_Time-10-07-53
│   ├── ArrowFiles
│   └── IterativeLSI

rootze · 2022-10-25T17:17:50Z

@yutongo Thank you for letting me know about this. I will make some adjustments on the source code and update you later. I'm a little bit busy this week, but definitely will fix this error before next week. Thank you for your understanding.

rootze · 2022-10-26T05:03:51Z

@yutongo I have updated the package to fit your file and path format. Please update the ArchRtoSignac package before using it. Package ArchRtoSignac should be in version 1.0.1.
For the function ArchR2Signac to covert ArchR to Signac; in your example:

fragments_dir <- '/Users/yutongo/Documents/ATAC/ATAC_ALL_NEW/'

#Conversion function
seurat_atac <- ArchR2Signac(
  ArchRProject = YOUR_ArchRProj, # YOUR_ArchRProj is your ArchRProject  
  fragments_dir = fragments_dir, 
  pm = pm, # getting peak matrix
  fragments_fromcellranger = "NO",
  fragments_file_extension = '.tsv.gz',
  refversion = 'hg38', # make sure this fits your choice 
  annotation = annotations
)

I have the above update tested on the ArchR-provided samples. But please let me know if the above update works for you or not. Thank you!

Also, for a detailed step-by-step tutorial for ArchRtoSignac, you can find it in the STAR protocol: https://doi.org/10.1016/j.xpro.2022.101491

Thank you once again for using ArchRtoSignac

Ze

yutongo · 2022-10-27T03:13:09Z

Hi Ze,

Thank you for the help!

However, I meet with the new error based the above example:
'Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments, :
Incorrect number of columns found in fragment file'

Is it because of the filtering steps in ArchR? I tried adding the option 'samples = rownames(MY_ArchRProj)',
but got the error:
'Error in ArchR2Signac(ArchRProject = MY_ArchRProj, refversion = "mm10", fragments_dir = "/Users/yutongo/Documents/ATAC/ATAC_ALL_NEW/", :
unused argument (alist())'.

Do you have any suggestions?

Thanks a lot.

rootze · 2022-10-27T05:48:11Z

> 'Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments, :
> Incorrect number of columns found in fragment file'

@yutongo Sorry to hear that you had a different issue.

Would you mind providing the complete code you run? More information you can provide from your ArchRProject metadata would help me understand what might cause the error. I don't know what is left in your data and what you did in your filtering steps. But most likely, I think you're right.

> Is it because of the filtering steps in ArchR? I tried adding the option 'samples = rownames(MY_ArchRProj)',

Also, what is in your ArchRProject@cellColData$Sample?
please run this to check table(ArchRProject@cellColData$Sample)
I have a few ideas of what might result in the problem, and I think you're on the right track. So let's fix the problem together.
In ArchR2Signac, you don't necessarily need to provide the sample list since I set it to default to checking from your ArchRProject. I am not sure what rownames(MY_ArchRProj) would return.

rootze · 2022-11-15T09:04:22Z

The ArchRtoSignac package updated to version 1.0.2 to solve this problem
and also a possible way of reformating fragment files to match cellranger atac count output: stuart-lab/signac#748

mkojima123 · 2023-04-17T23:33:46Z

Thank you for the convenient package!
I use ArchRtoSignac version 1.0.3, but same error occured.

Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments,  :
  Incorrect number of columns found in fragment file

Are there any hints?
Thank you.

rootze · 2023-04-18T00:14:34Z

Thank you for the convenient package! I use ArchRtoSignac version 1.0.3, but same error occured.
Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments,  :
  Incorrect number of columns found in fragment file
Are there any hints? Thank you.

@mkojima123 Thanks for using the ArchRtoSignac package. Please provide more information, such as which technology you acquired your scATAC-seq, for example, 10x, snapATAC, or something else. Additionally, what code did you run? What is/are the fragment path(s) you have provided? The more information you can provide, the better for me to interpret your error. Thank you.

mkojima123 · 2023-04-18T05:46:07Z

Thank you for quick reply.

It is fragment of public data. Not 10x data.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE184462

And, I ran this code.

seurat_atac <- ArchR2Signac(
  ArchRProject = proj,
  refversion = "hg38",
  fragments_dir = fragments_dir,
  pm = pkm,
  fragments_fromcellranger = "No",
  fragments_file_extension = ".fragments.txt.gz",
  annotation = annotations
)

There are *.fragments.txt.gz and *.fragments.txt.gz.tbi in fragments_dir.

rootze · 2023-04-18T06:41:00Z

It is fragment of public data. Not 10x data.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE184462

Oh, I see. By skimming through the link and the paper, it seems to me that this is a snapATAC dataset, which means there is a column missing in the fragment file compared to scATAC-seq from 10x. The default format of Signac is 10x. So you need to add another column to fill the fifth column, a readSupport column, in your fragment file. But please double-check whether this is snapATAC first before moving forward. I hope this makes sense.

This is the latest format for 10x Genomics scATAC fragment files: https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/output/fragments

mkojima123 · 2023-04-18T07:56:26Z

Thank you for your kind explanation.
As you mentioned, It seems to used unique scripts after using SnapATAC.

So you need to add another column to fill the fifth column, a readSupport column, in your fragment file.

Actually, Seeing this fragment file, readSupport column is filled.
Like this.

GL000008.2      61      133     TGCACATTACAGAATGGCACTG  3       .
GL000008.2      61      402     CCTACGAGAGAGTGCCTAACAA  4       .
GL000008.2      450     619     CCGCGTAAGTCGAACGATACAG  1       .

Please let me know if my understanding is wrong.
Thank you.

rootze · 2023-04-18T17:12:30Z

Thank you for your kind explanation. As you mentioned, It seems to used unique scripts after using SnapATAC.

So you need to add another column to fill the fifth column, a readSupport column, in your fragment file.

Actually, Seeing this fragment file, readSupport column is filled. Like this.
GL000008.2      61      133     TGCACATTACAGAATGGCACTG  3       .
GL000008.2      61      402     CCTACGAGAGAGTGCCTAACAA  4       .
GL000008.2      450     619     CCGCGTAAGTCGAACGATACAG  1       .
Please let me know if my understanding is wrong. Thank you.

Yeah, it seems that way. What is the 6th (last) column, by the way? Maybe it has something to do with that, I am not sure. Sorry, I have never worked on snapATAC data before, but I am happy to help as much as I can.
By the way, I want to ask what you supply for fragments_dir may you can give me an example and also, could you give an example for one of the fragment paths, just checking.

mkojima123 · 2023-04-19T08:32:09Z

It's probably strand. Almost all sample have . in this column.

fragment_dir example is /home/name/work/scATAC/all/, and fragment path is sample1__GSM5589375_liver_SM-A8WNZ_rep1.fragments.txt.gz.
I adjusted fragment file name to arrow file's them. sample1__GSM5589375_liver_SM-A8WNZ_rep1.arrow

rootze · 2023-04-19T20:35:33Z

@mkojima123 if your complete path to access the fragment file is /home/name/work/scATAC/all/sample1__GSM5589375_liver_SM-A8WNZ_rep1.fragments.txt.gz, and if the error has nothing to do with the 6th column. I am running out of ideas at this stage.
Arrow files are not what Signac needs. The error you got is from the Signac package.
If you could provide your code from ArchR constructing to transfer to Signac using ArchRtoSignac, maybe I can look at it and recreate your error.

mkojima123 · 2023-04-20T01:08:46Z

I see. Thank you.
I'll check about Signac. If something comes up, I will let you know.

mkojima123 · 2023-04-24T04:23:17Z

Hi.
It worked by deleting the 6th column. It needed to have 5 columns.

Thank you for your support.

rootze · 2023-04-24T18:07:37Z

@mkojima123 Great! Glad it works. Thanks again for using ArchRtoSignac. I will close this issue. Please feel free to open it or make an issue if you have more questions.

A-legac45 · 2023-12-21T12:46:53Z

Hello I am also troubling with the fragment files which are the output of cellranger 10x multiomic data

seurat_atac <- ArchR2Signac(

ArchRProject = project_Peaks_MACS2_RES0.9,
refversion = "mm10",
#samples = samplelist, # list of samples in the ArchRProject (default will use ArchRProject@cellColData$Sample but another list can be provided)
fragments_dir = inputFiles,
pm = pkm, # peak matrix from getPeakMatrix(),
fragments_file_extension = '_fragments.tsv.gz',
fragments_fromcellranger = "YES", # fragments_fromcellranger This is an Yes or No selection ("NO" | "N" | "No" or "YES" | "Y" | "Yes")
annotation = annotation # annotation from getAnnotation()
)
[1] "In Progress:"
[1] "Prepare Seurat list for each sample"
[1] "First_try_multiomic_archr"
[1] 121983 3733
Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments, :
Fragment file does not exist.

For InputFiles I try many things
InputFiles <- c("/Users/alegac/Library/CloudStorage/OneDrive-INSTITUTCURIE/Projet_linda_mutiomic_novembre/KDI_2017426_2023-11-15_16-59-39/")
InputFiles <- c("/Users/alegac/Library/CloudStorage/OneDrive-INSTITUTCURIE/Projet_linda_mutiomic_novembre/KDI_2017426_2023-11-15_16-59-39")
InputFiles <- c("/Users/alegac/Library/CloudStorage/OneDrive-INSTITUTCURIE/Projet_linda_mutiomic_novembre/KDI_2017426_2023-11-15_16-59-39/atac_fragments.tsv.gz")

Thanks for help

rootze self-assigned this Oct 24, 2022

rootze mentioned this issue Oct 26, 2022

Error in CreateFragmentObject #2

Closed

rootze mentioned this issue Nov 14, 2022

Adaptation: snapTool fragment files #14

Closed

3 tasks

rootze closed this as completed Nov 15, 2022

rootze added the bug Something isn't working label Nov 15, 2022

rootze reopened this Apr 18, 2023

rootze closed this as completed Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fragments_dir in ArchR2Signac #3

fragments_dir in ArchR2Signac #3

yutongo commented Oct 21, 2022

rootze commented Oct 24, 2022

yutongo commented Oct 24, 2022

rootze commented Oct 25, 2022

yutongo commented Oct 25, 2022

rootze commented Oct 25, 2022

rootze commented Oct 26, 2022 •

edited

Loading

yutongo commented Oct 27, 2022 •

edited

Loading

rootze commented Oct 27, 2022

rootze commented Nov 15, 2022

mkojima123 commented Apr 17, 2023

rootze commented Apr 18, 2023

mkojima123 commented Apr 18, 2023

rootze commented Apr 18, 2023

mkojima123 commented Apr 18, 2023

rootze commented Apr 18, 2023

mkojima123 commented Apr 19, 2023

rootze commented Apr 19, 2023 •

edited

Loading

mkojima123 commented Apr 20, 2023

mkojima123 commented Apr 24, 2023

rootze commented Apr 24, 2023

A-legac45 commented Dec 21, 2023

fragments_dir in ArchR2Signac #3

fragments_dir in ArchR2Signac #3

Comments

yutongo commented Oct 21, 2022

rootze commented Oct 24, 2022

yutongo commented Oct 24, 2022

rootze commented Oct 25, 2022

yutongo commented Oct 25, 2022

rootze commented Oct 25, 2022

rootze commented Oct 26, 2022 • edited Loading

yutongo commented Oct 27, 2022 • edited Loading

rootze commented Oct 27, 2022

rootze commented Nov 15, 2022

mkojima123 commented Apr 17, 2023

rootze commented Apr 18, 2023

mkojima123 commented Apr 18, 2023

rootze commented Apr 18, 2023

mkojima123 commented Apr 18, 2023

rootze commented Apr 18, 2023

mkojima123 commented Apr 19, 2023

rootze commented Apr 19, 2023 • edited Loading

mkojima123 commented Apr 20, 2023

mkojima123 commented Apr 24, 2023

rootze commented Apr 24, 2023

A-legac45 commented Dec 21, 2023

rootze commented Oct 26, 2022 •

edited

Loading

yutongo commented Oct 27, 2022 •

edited

Loading

rootze commented Apr 19, 2023 •

edited

Loading