Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run the pipeline. invalid jar file error #420

Open
Shashankti opened this issue Jun 22, 2023 · 3 comments
Open

Unable to run the pipeline. invalid jar file error #420

Shashankti opened this issue Jun 22, 2023 · 3 comments

Comments

@Shashankti
Copy link

Shashankti commented Jun 22, 2023

Describe the bug

A clear and concise description of what the problem is.
Thank you for the pipeline,however I am having issues with running the pipeline on an HPC with slurm and singularity. The run fails with the error:
STDERR=Error: Invalid or corrupt jarfile /usr/users/shashank.tiwari/.caper/womtool_jar/womtool-82.jar

These are the input commands used:

#!/bin/bash
#SBATCH -o outfile-%J
#SBATCH -t 22:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=shashank.tiwari@mpi-dortmund.mpg.de
#SBATCH -c 62
#SBATCH --mem=264G


module load singularity

# Run the program:
INPUT_JSON="~/atac-seq-pipeline/example_input_json/edited.json"

caper hpc submit atac.wdl -i "${INPUT_JSON}" --singularity --leader-job-name Leader

OS/Platform

  • OS/Platform: Scientific Linux 7/ HPC-GWDG
  • Conda version: Used singularity/ conda 4.10.1
  • Pipeline version: v2.2.2
  • Caper version: 2.3.1

Caper configuration file

Paste contents of ~/.caper/default.conf.

backend=slurm

# SLURM partition. DEFINE ONLY IF REQUIRED BY YOUR CLUSTER'S POLICY.
# You must define it for Stanford Sherlock.
slurm-partition=medium

# SLURM account. DEFINE ONLY IF REQUIRED BY YOUR CLUSTER'S POLICY.
# You must define it for Stanford SCG.
slurm-account=all

# Local directory for localized files and Cromwell's intermediate files.
# If not defined then Caper will make .caper_tmp/ on CWD or `local-out-dir`.
# /tmp is not recommended since Caper store localized data files here.
local-loc-dir=/scratch1/users/shashank.tiwari/ATACSeq/

cromwell=/usr/users/shashank.tiwari/.caper/cromwell_jar/cromwell-82.jar
womtool=/usr/users/shashank.tiwari/.caper/womtool_jar/womtool-82.jar

Input JSON file

Paste contents of your input JSON file.

{
    "atac.title" : "SW5 (paired end)",
    "atac.description" : "Input JSON for SW5 paired ended sample.",

    "atac.pipeline_type" : "atac",
    "atac.align_only" : false,
    "atac.true_rep_only" : false,

    "atac.genome_tsv" : "/scratch1/users/shashank.tiwari/Genome/hg38.tsv",

    "atac.paired_end" : true,

    "10-040_S2_R1" : [ "/scratch1/users/shashank.tiwari/ATACSeq/Data/Raw/10-040_S2_L001_R1_001.fastq.gz" ],
    "10-040_S2_R2" : [ "/scratch1/users/shashank.tiwari/ATACSeq/Data/Raw/10-040_S2_L001_R2_001.fastq.gz" ],
    "10-230_S29_R1" : [ "/scratch1/users/shashank.tiwari/ATACSeq/Data/Raw/10-230-PAR_S29_L002_R1_001.fastq.gz" ],
    "10-230_S29_R2" : [ "/scratch1/users/shashank.tiwari/ATACSeq/Data/Raw/10-230-PAR_S29_L002_R2_001.fastq.gz" ],


    "atac.auto_detect_adapter" : true,

    "atac.multimapping" : 4
}


Troubleshooting result

If you ran caper run without Caper server then Caper automatically runs a troubleshooter for failed workflows. Find troubleshooting result in the bottom of Caper's screen log.

If you ran caper submit with a running Caper server then first find your workflow ID (1st column) with caper list and run caper debug [WORKFLOW_ID].

Paste troubleshooting result.

023-06-22 14:24:14,770|caper.server_heartbeat|ERROR| Failed to read from a heartbeat file. ~/.caper/default_server_heartbeat
Traceback (most recent call last):
  File "/var/spool/slurm/d/job16044927/slurm_script", line 13, in <module>
    main()
  File "/usr/users/shashank.tiwari/.conda/envs/zarp/lib/python3.11/site-packages/caper/cli.py", line 712, in main
    client(parsed_args)
  File "/usr/users/shashank.tiwari/.conda/envs/zarp/lib/python3.11/site-packages/caper/cli.py", line 318, in client
    subcmd_troubleshoot(c, args)
  File "/usr/users/shashank.tiwari/.conda/envs/zarp/lib/python3.11/site-packages/caper/cli.py", line 582, in subcmd_troubleshoot
    cm = get_single_cromwell_metadata_obj(caper_client, args, 'troubleshoot/debug')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/users/shashank.tiwari/.conda/envs/zarp/lib/python3.11/site-packages/caper/cli.py", line 530, in get_single_cromwell_metadata_obj
    metadata_objs = caper_client.metadata(
                    ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/users/shashank.tiwari/.conda/envs/zarp/lib/python3.11/site-packages/caper/caper_client.py", line 126, in metadata
    return self._cromwell_rest_api.get_metadata(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/users/shashank.tiwari/.conda/envs/zarp/lib/python3.11/site-packages/caper/cromwell_rest_api.py", line 242, in get_metadata
    valid_workflow_ids = self.find_valid_workflow_ids(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/users/shashank.tiwari/.conda/envs/zarp/lib/python3.11/site-packages/caper/cromwell_rest_api.py", line 218, in find_valid_workflow_ids
    workflows = self.find(
                ^^^^^^^^^^
  File "/usr/users/shashank.tiwari/.conda/envs/zarp/lib/python3.11/site-packages/caper/cromwell_rest_api.py", line 477, in find
    result_by_labels = self.find_by_labels(
                       ^^^^^^^^^^^^^^^^^^^^
  File "/usr/users/shashank.tiwari/.conda/envs/zarp/lib/python3.11/site-packages/caper/cromwell_rest_api.py", line 422, in find_by_labels
    resp = self.__request_get(
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/users/shashank.tiwari/.conda/envs/zarp/lib/python3.11/site-packages/caper/cromwell_rest_api.py", line 46, in wrapper
    raise ConnectionError(message) from None
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /api/workflows/v1/query?additionalQueryResultFields=labels&includeSubworkflows=False&labelor=caper-str-label%3A1604482$

Failed to connect to Cromwell server. Check if Caper server is running. Also check if hostname and port are correct. method=GET, url=http://localhost:8000/api/workflows/v1/query?additionalQueryResultFields=labels&includeSubworkflows=Fa$
@leepc12
Copy link
Contributor

leepc12 commented Jun 23, 2023

Please add the following line to your ~/.bashrc

module load singularity

Also check if Cromwell JAR file is corrupted.

java -jar /usr/users/shashank.tiwari/.caper/womtool_jar/womtool-82.jar --version

@Shashankti
Copy link
Author

Thank you for your reply. I fixed the issue with womtool.jar by using changing the java version from:
java version "1.7.0_261" OpenJDK Runtime Environment (rhel-2.6.22.2.el7_8-x86_64 u261-b02) OpenJDK 64-Bit Server VM (build 24.261-b02, mixed mode)
to

openjdk version "17" 2021-09-14 OpenJDK Runtime Environment Temurin-17+35 (build 17+35) OpenJDK 64-Bit Server VM Temurin-17+35 (build 17+35, mixed mode, sharing)
The pipeline is now able to start running, however I get the following error during the run:

2023-06-23 22:16:09,111|caper.cromwell_workflow_monitor|INFO| Workflow: id=784566ac-044a-42b7-aec7-641f3dcb3d5d, status=Failed
2023-06-23 22:16:33,278|caper.cromwell_metadata|INFO| Wrote metadata file. /home/mpg02/MDMP/shashank.tiwari/atac-seq-pipeline/atac/784566ac-044a-42b7-aec7-641f3dcb3d5d/metadata.json
2023-06-23 22:16:33,279|caper.cromwell|INFO| Workflow failed. Auto-troubleshooting...
2023-06-23 22:16:33,324|caper.nb_subproc_thread|ERROR| Cromwell failed. returncode=1
2023-06-23 22:16:33,324|caper.cli|ERROR| Check stdout in /home/mpg02/MDMP/shashank.tiwari/atac-seq-pipeline/cromwell.out.1
* Started troubleshooting workflow: id=784566ac-044a-42b7-aec7-641f3dcb3d5d, status=Failed
* Found failures JSON object.
[
    {
        "causedBy": [
            {
                "message": "Job atac.frac_mito:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            { 
                "message": "Job atac.filter:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            }
        ],
        "message": "Workflow failed"
    }
]
* Recursively finding failures in calls (tasks)...

==== NAME=atac.filter, STATUS=RetryableFailure, PARENT=
SHARD_IDX=0, RC=1, JOB_ID=16051282
START=2023-06-23T17:46:07.349Z, END=2023-06-23T17:46:53.076Z
STDOUT=/home/mpg02/MDMP/shashank.tiwari/atac-seq-pipeline/atac/784566ac-044a-42b7-aec7-641f3dcb3d5d/call-filter/shard-0/execution/stdout
STDERR=/home/mpg02/MDMP/shashank.tiwari/atac-seq-pipeline/atac/784566ac-044a-42b7-aec7-641f3dcb3d5d/call-filter/shard-0/execution/stderr
STDERR_CONTENTS=
Traceback (most recent call last):
  File "/software/atac-seq-pipeline/src/encode_task_filter.py", line 438, in <module>
    main()
  File "/software/atac-seq-pipeline/src/encode_task_filter.py", line 344, in main
    args.nth, args.mem_gb, args.out_dir)
  File "/software/atac-seq-pipeline/src/encode_task_filter.py", line 125, in rm_unmapped_lowq_reads_pe
    res_param=get_samtools_res_param('view', nth=nth),
  File "/software/atac-seq-pipeline/src/encode_lib_common.py", line 359, in run_shell_cmd
    raise Exception(err_str)
Exception: PID=262464, PGID=262464, RC=1, DURATION_SEC=0.1
STDERR=[E::hts_open_format] Failed to open file /home/mpg02/MDMP/shashank.tiwari/atac-seq-pipeline/atac/784566ac-044a-42b7-aec7-641f3dcb3d5d/call-filter/shard-0/inputs/87421767/10-040_S2_L001_R1_001.trim.srt.bam
samtools view: failed to open "/home/mpg02/MDMP/shashank.tiwari/atac-seq-pipeline/atac/784566ac-044a-42b7-aec7-641f3dcb3d5d/call-filter/shard-0/inputs/87421767/10-040_S2_L001_R1_001.trim.srt.bam" for reading: No such file or directory
STDOUT=
ln: failed to access '/home/mpg02/MDMP/shashank.tiwari/atac-seq-pipeline/atac/784566ac-044a-42b7-aec7-641f3dcb3d5d/call-filter/shard-0/execution/*.dup.qc': No such file or directory
ln: failed to access '/home/mpg02/MDMP/shashank.tiwari/atac-seq-pipeline/atac/784566ac-044a-42b7-aec7-641f3dcb3d5d/call-filter/shard-0/execution/*.bai': No such file or directory
ln: failed to access '/home/mpg02/MDMP/shashank.tiwari/atac-seq-pipeline/atac/784566ac-044a-42b7-aec7-641f3dcb3d5d/call-filter/shard-0/execution/*.lib_complexity.qc': No such file or directory
ln: failed to access '/home/mpg02/MDMP/shashank.tiwari/atac-seq-pipeline/atac/784566ac-044a-42b7-aec7-641f3dcb3d5d/call-filter/shard-0/execution/*.samstats.qc': No such file or dir
[cromwell.txt](https://github.com/ENCODE-DCC/atac-seq-pipeline/files/11868104/cromwell.txt)
ectory

I have also attached the output cromwell file in this message. Please let me know what you think about the problem and if I should raise this in a new issue.

Thank you for your help
cromwell.txt

@Shashankti
Copy link
Author

Shashankti commented Jun 26, 2023

To further add to the above issue, the cromwell log file indicated an issue with the call-frac-mito process,

 WorkflowManagerActor: Workflow e0e0e71d-4ee2-4957-8457-0cff43d290b5 failed (during ExecutingWorkflowState): Job atac.frac_mito:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.

Check the content of stderr for potential additional information: /scratch1/users/shashank.tiwari/atac-seq-pipeline/atac/e0e0e71d-4ee2-4957-8457-0cff43d290b5/call-frac_mito/shard-0/attempt-2/execution/stderr.

With the stderr file indicating:

FileNotFoundError: [Errno 2] No such file or directory: '/scratch1/users/shashank.tiwari/atac-seq-pipeline/atac/e0e0e71d-4ee2-4957-8457-0cff43d290b5/call-frac_mito/shard-0/attempt-2/inputs/-1695124415/10-040_S2_L001_R1_001.trim.srt.no_rt.no_chrM.samstats.qc'
ln: failed to access '/scratch1/users/shashank.tiwari/atac-seq-pipeline/atac/e0e0e71d-4ee2-4957-8457-0cff43d290b5/call-frac_mito/shard-0/attempt-2/execution/*.frac_mito.qc': No such file or directory

However on checking the directory I can see that the samstats file is present in the specified location:

lrwxrwxrwx 1 shashank.tiwari MDMP 210 26. Jun 13:44 10-040_S2_L001_R1_001.trim.srt.no_chrM.samstats.qc -> /home/mpg02/MDMP/shashank.tiwari/atac-seq-pipeline/atac/784566ac-044a-42b7-aec7-641f3dcb3d5d/call-align/shard-0/execution/glob-bc1
gwdu101:106 13:59:58 /scratch1/users/shashank.tiwari/atac-seq-pipeline/atac/e0e0e71d-4ee2-4957-8457-0cff43d290b5/call-frac_mito/shard-0/attempt-2/inputs/-1695124415 > nano 10-040_S2_L001_R1_001.trim.srt.no_chrM.samstats.qc 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants