Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test hic fail on aws #152

Open
jarekgeneg opened this issue Mar 15, 2022 · 7 comments
Open

test hic fail on aws #152

jarekgeneg opened this issue Mar 15, 2022 · 7 comments

Comments

@jarekgeneg
Copy link

When we perform test hic on aws it's fail with file can't be read from s3

OS/Platform

  • OS: Ubuntu 18.04.6 on AWS
  • Conda version: none
  • Pipeline version: try on dev and 1.11.2
  • Caper version: 2.1.3

Caper configuration file
backend=aws
no-server-heartbeat=True
max-concurrent-workflows=300
max-concurrent-tasks=1000
local-out-dir=/opt/caper/local_out_dir
local-loc-dir=/opt/caper/local_loc_dir
aws-batch-arn=arn:aws:batch:eu-west-2:my_id:job-queue/caper-queue
aws-region=eu-west-2
aws-out-dir=s3://caper-hic
aws-loc-dir=s3://caper-hic/.caper_tmp
cromwell=https://storage.googleapis.com/caper-data/cromwell/cromwell-65-d16af26-SNAP.jar
db=postgresql
postgresql-db-ip=xxxx
postgresql-db-port=5432
postgresql-db-user=xxxxx
postgresql-db-password=xxxxx
postgresql-db-name=xxxxxx

Input JSON file
standard test file: tests/functional/json/test_hic.json

Error log
Started troubleshooting workflow: id=756ed754-2f3d-46e4-8323-194f21bf11fb, status=Failed
Found failures JSON object.
[
{
"causedBy": [
{
"causedBy": [
{
"causedBy": [
{
"message": "s3://s3.amazonaws.com/caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-rc.txt",
"causedBy": []
}
],
"message": "Could not read from s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-rc.txt: s3://s3.amazonaws.com/caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-rc.txt"
}
],
"message": "[Attempted 1 time(s)] - IOException: Could not read from s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-rc.txt: s3://s3.amazonaws.com/caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-rc.txt"
},
{
"causedBy": [
{
"causedBy": [
{
"message": "s3://s3.amazonaws.com/caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-rc.txt",
"causedBy": []
}
],
"message": "Could not read from s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-rc.txt: s3://s3.amazonaws.com/caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-rc.txt"
}
],
"message": "[Attempted 1 time(s)] - IOException: Could not read from s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-rc.txt: s3://s3.amazonaws.com/caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-rc.txt"
}
],
"message": "Workflow failed"
}
]

  • Recursively finding failures in calls (tasks)...

==== NAME=hic.normalize_assembly_name, STATUS=Failed, PARENT=
SHARD_IDX=-1, RC=None, JOB_ID=74e5e04d-1af7-4a18-97e7-555fc23f58bd
START=2022-03-15T10:05:48.022Z, END=2022-03-15T10:09:36.113Z
STDOUT=s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-stdout.log
STDERR=s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-stderr.log

==== NAME=hic.get_ligation_site_regex, STATUS=Failed, PARENT=
SHARD_IDX=-1, RC=None, JOB_ID=32b91045-cf5f-4bca-8f07-4d266d13f97c
START=2022-03-15T10:05:48.585Z, END=2022-03-15T10:09:26.285Z
STDOUT=s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-stdout.log
STDERR=s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-stderr.log

Machine was provisioned with caper aws create env script

@paul-sud
Copy link
Collaborator

Usually such an error indicates that the job didn't even start executing on Batch. Do you have the backend log available? Looking at your Caper config, you have the following value:

aws-batch-arn=arn:aws:batch:eu-west-2:my_id:job-queue/caper-queue

The my_id is a bit odd to me, I think usually you'd put your AWS account number there.

@jarekgeneg
Copy link
Author

jarekgeneg commented Mar 16, 2022

Caper config - yes, there is account name or queue number - i just hide it
AWS Batch start virtual machine and just fail with error

And aout backenf log ? where I can find it ?

@jarekgeneg
Copy link
Author

jarekgeneg commented Mar 16, 2022

Log from backed:

cromwell_encodedcc_hic-pipeline_1_11_23f2a62da3521400809715244c3eeb9fcfce048da/default/dce5fe3c972e4392a3777d8b608e9d7d:

2022-03-15T10:08:57.007Z
/bin/bash: /var/scratch/fetch_and_run.sh: Is a directory
@ingestionTime
1647338937082
@log
076634410064:/aws/batch/job
@logstream
cromwell_encodedcc_hic-pipeline_1_11_23f2a62da3521400809715244c3eeb9fcfce048da/default/dce5fe3c972e4392a3777d8b608e9d7d
@message
/bin/bash: /var/scratch/fetch_and_run.sh: Is a directory
@timestamp
1647338937007

@paul-sud
Copy link
Collaborator

The backend log should be a file named something like get_ligation_site_regex.log. If you look at the workflow metadata JSON it will point you to the right file on S3, in the JSON it should be under the key calls > hic.get_ligation_site_regex > backendLogs > log. This file will contain more details about anything that went wrong on the provisioned machine before the task started executing, for instance any issue with localizing input data.

@jarekgeneg
Copy link
Author

Question:

in metadata.json :

{
"causedBy": [{
"causedBy": [{
"message": "s3://s3.amazonaws.com/caper-hic/hic/bd74874b-ebed-4b73-8058-f484ac8695bc/call-normalize_assembly_name/normalize_assembly_name-rc.txt",
"causedBy": []
}],
"message": "Could not read from s3://caper-hic/hic/bd74874b-ebed-4b73-8058-f484ac8695bc/call-normalize_assembly_name/normalize_assembly_name-rc.txt: s3://s3.amazonaws.com/caper-hic/hic/bd74874b-ebed-4b73-8058-f484ac8695bc/call-normalize_assembly_name/normalize_assembly_name-rc.txt"
}

is uri s3://s3.amazonaws.com/caper-hic/.... is valid ?
When we search AWS docs (https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html#accessing-a-bucket-using-S3-format) AWS don't recommend this type uri which is used in hic pipeline.

On AWS we give full access to S3 for AWS BATCH and AWS EC2; while pipeline starts, directories and files are created on s3 volume

Anyway, I can't find any backed log -> it's maybe script which is run to create this files can't move files to S3 ? the same script creates stdout and stderr for process (i think ), and also I can't find this files on S3 volume

@leepc12
Copy link

leepc12 commented Mar 17, 2022

I think something went wrong with Caper's AWS backend.

/bin/bash: /var/scratch/fetch_and_run.sh: Is a directory

I will test it on AWS soon and let you know but until then please try on Google Cloud Platform. It's much more stable.

@mziebagg
Copy link

mziebagg commented Mar 18, 2022

Dears Thanks a lot for your help
We are Genegoggle startup and receive some free$ on AWS so we cant easy migrate to Google Cloud platform (costs)

@leepc12 In AWS we have prepared account/organization and configured vm for pipeline-hic .
So if we can accelarate we can give you acess to this test environment

If you find somethings will be great Thanks in advance

BR
Michał

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants