Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build genome database for your own genome and #414

Open
Rafaelsoler13 opened this issue Apr 28, 2023 · 7 comments
Open

Build genome database for your own genome and #414

Rafaelsoler13 opened this issue Apr 28, 2023 · 7 comments

Comments

@Rafaelsoler13
Copy link

Hello,

I am trying to run the pipeline for chicken samples and have tried to create a custom genome reference for the pipeline. However, after finishing the steps here [https://github.com/ENCODE-described DCC/atac-seq-pipeline/blob/master/docs/build_genome_database.md] (build_genome_database.md), the tsv file I get it fails to create the tss file, reg2map... Are these files needed to run the pipeline? If so, what can I do to get them (it doesn't say anything here [https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/build_genome_database.md])

Also, I am trying to run the pipeline with the json file generated, and it gives me this errors in the alignment:

~/ENCODE_workflow/atac-seq-pipeline-master/atac/6e18ee31-db88-4de9-b841-2b2f40910291/metadata.json
2023-04-28 16:00:02,728|caper.cromwell|INFO| Workflow failed. Auto-troubleshooting...
* Started troubleshooting workflow: id=6e18ee31-db88-4de9-b841-2b2f40910291, status=Failed
* Found failures JSON object.
[
    {
        "causedBy": [
            {
                "message": "Job atac.align:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job atac.align_mito:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job atac.align:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            }
        ],
        "message": "Workflow failed"
    }
]
* Recursively finding failures in calls (tasks)...

==== NAME=atac.align_mito, STATUS=RetryableFailure, PARENT=
SHARD_IDX=0, RC=1, JOB_ID=309275
START=2023-04-28T13:47:31.248Z, END=2023-04-28T13:50:22.132Z
STDOUT=~/ENCODE_workflow/atac-seq-pipeline-master/atac/6e18ee31-db88-4de9-b841-2b2f40910291/call-align_mito/shard-0/execution/stdout
STDERR=~/ENCODE_workflow/atac-seq-pipeline-master/atac/6e18ee31-db88-4de9-b841-2b2f40910291/call-align_mito/shard-0/execution/stderr
STDERR_CONTENTS=
Traceback (most recent call last):
  File "/software/atac-seq-pipeline/src/encode_task_bowtie2.py", line 192, in <module>
    main()
  File "/software/atac-seq-pipeline/src/encode_task_bowtie2.py", line 169, in main
    args.out_dir)
  File "/software/atac-seq-pipeline/src/encode_task_bowtie2.py", line 102, in bowtie2_pe
    tmp_bam=tmp_bam,
  File "/software/atac-seq-pipeline/src/encode_lib_common.py", line 359, in run_shell_cmd
    raise Exception(err_str)
Exception: PID=39, PGID=39, RC=127, DURATION_SEC=0.0
STDERR=/bin/bash: line 1: -1: command not found

This is the json file:

{
    "atac.title" : "Chicken_test_atac_ENCODE",
    "atac.description" : "Test performed to validate the ENCODE pipeline in Chicken",

    "atac.pipeline_type" : "atac",
    "atac.align_only" : false,
    "atac.true_rep_only" : false,

    "atac.genome_tsv" : "/media/victor/disco1/ATAC_non_canonical_species/ENCODE_test_files/chicken_GRCg7b.tsv",

    "atac.paired_end" : true,

    "atac.fastqs_rep1_R1" : [ "~/ATAC_non_canonical_species/raw_data/SRR19213758_1.fastq.gz" ],
    "atac.fastqs_rep1_R2" : [ "~/ATAC_non_canonical_species/raw_data/SRR19213758_2.fastq.gz" ],
    "atac.fastqs_rep2_R1" : [ "~/ATAC_non_canonical_species/raw_data/SRR19213759_1.fastq.gz" ],
    "atac.fastqs_rep2_R2" : [ "~/ATAC_non_canonical_species/raw_data/SRR19213759_2.fastq.gz" ],

    "atac.auto_detect_adapter" : false,

    "atac.multimapping" : 8
}

And this one the tsv file.

ref_fa | ~/ATAC_non_canonical_species/ENCODE_test_files/chicken_GRCg7b.gz
ref_mito_fa | ~/ATAC_non_canonical_species/ENCODE_test_files/chicken_GRCg7b.chrM.fa.gz
mito_chr_name | chrM
regex_bfilt_peak_chr_name | chr[\dWZ]+
chrsz | ~/ATAC_non_canonical_species/ENCODE_test_files/chicken_GRCg7b.chrom.sizes
gensz | 1053332251
bowtie2_idx_tar | ~/ATAC_non_canonical_species/ENCODE_test_files/bowtie2_index/chicken_GRCg7b.tar.gz
bowtie2_mito_idx_tar | ~/ATAC_non_canonical_species/ENCODE_test_files/bowtie2_index/chicken_GRCg7b.chrM.fa.tar.gz

Best,

Rafael

@sufyazi
Copy link

sufyazi commented May 3, 2023

Hi there,

The error message says STDERR=/bin/bash: line 1: -1: command not found so I wonder if this is just an issue of you not installing dependencies. Can you double-check what line 1 is referring to here?

Your tsv file looks fine; maybe try using absolute paths (so replace ~ with the full path), and double check if all the files are where they are?

@Rafaelsoler13
Copy link
Author

I used absolute paths to run it but it still does not work. The error is with Bowtie2:

***
Error: Must specify at least one read input with -U/-1/-2
(ERR): bowtie2-align exited with value 1
STDOUT=

But I am putting corretly the fastq files:

    "atac.fastqs_rep1_R1" : [ "/media/analysis/ATAC_non_canonical_species/raw_data/SRR19213758_1.fastq.gz" ],
    "atac.fastqs_rep1_R2" : [ "/media/analysis/ATAC_non_canonical_species/raw_data/SRR19213758_2.fastq.gz" ],
    "atac.fastqs_rep2_R1" : [ "/media/analysis/ATAC_non_canonical_species/raw_data/SRR19213759_1.fastq.gz" ],
    "atac.fastqs_rep2_R2" : [ "/media/analysis/ATAC_non_canonical_species/raw_data/SRR19213759_2.fastq.gz" ],

@Rafaelsoler13
Copy link
Author

I tried to align the samples using Bowtie2 from my PC, and actually it works

bowtie2 --very-sensitive -p 8 -X 2000 -x chicken_bowtie -1 raw_data/SRR19213758_1.fastq.gz -2 raw_data/SRR19213758_2.fastq.gz -S SRR19213758.sam
  11553902 (100.00%) were paired; of these:
    2217557 (19.19%) aligned concordantly 0 times
    8884460 (76.90%) aligned concordantly exactly 1 time
    451885 (3.91%) aligned concordantly >1 times
    ----
    2217557 pairs aligned concordantly 0 times; of these:
      820611 (37.01%) aligned discordantly 1 time
    ----
    1396946 pairs aligned 0 times concordantly or discordantly; of these:
      2793892 mates make up the pairs; of these:
        2499890 (89.48%) aligned 0 times
        215809 (7.72%) aligned exactly 1 time
        78193 (2.80%) aligned >1 times
89.18% overall alignment rate

What could be happening?

@sufyazi
Copy link

sufyazi commented May 4, 2023

Interesting. As I am not a developer I don't think I can help beyond this. There must be something else wrong either during the building of the custom genome, or your installation. Have you tried running a test sample using the default human genome? You should consider trying that first to rule out bad installation.

@Rafaelsoler13
Copy link
Author

Yes! Actually the tutorial run without a problem!

@leepc12
Copy link
Contributor

leepc12 commented May 16, 2023

Sorry for late response, You don't those files tss file, reg2map.. They are extra data for some additional analyses in the pipeline.
Disable analysis using those data. Add the following to your input JSON.

{
  "atac.enable_tss_enrich" : false,
  "atac.enable_annot_enrich" : false,
  "atac.enable_compare_to_roadmap" : false,
  "atac.enable_gc_bias" : false
}

How did u run Caper? It looks like it ran inside a docker container. What is the exact command line used for running Caper? e.g. `caper run atac.wdl -i input.json --docker"?

@junjiemama
Copy link

Yes! Actually the tutorial run without a problem!

Hello, Can you share the instructions of how to build a genome for the non-model organism? I run into so many erros. Thanks in adavance.

Mary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants