Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS configuration - can't find awscli #1064

Closed
olgabot opened this issue Mar 7, 2019 · 16 comments
Closed

AWS configuration - can't find awscli #1064

olgabot opened this issue Mar 7, 2019 · 16 comments

Comments

@olgabot
Copy link

olgabot commented Mar 7, 2019

Bug report

Expected behavior and actual behavior

I expected to see the same output for the tutorial.rf run in AWS batch as it is in the tutorial description and as I was able to run locally.

Instead, the CloudWatch logs from AWS batch show that the file /home/ec2-user/miniconda/bin/aws does not exist:

screen shot 2019-03-07 at 7 28 25 am

Steps to reproduce the problem

Here is a GitHub repo with the tutorial code and the setup. I use make run to run this command:

	nextflow run tutorial.nf \
		-work-dir s3://olgabot-maca/nextflow-workdir-test/ \
		-bucket-dir s3://olgabot-maca/nextflow-bucket-dir-test/ \
		-with-trace -with-timeline -with-dag -with-report -latest -resume

I followed the AWS custom AMI instructions to install the aws cli, and the executable is definitely there:

[ec2-user@ip-172-31-20-19 ~]$ ll ~/miniconda/bin/aws
-rwxrwxr-x 1 ec2-user ec2-user 834 Mar  7 02:43 /home/ec2-user/miniconda/bin/aws

Here is the nextflow config:

process.executor = 'awsbatch'
process.queue = 'nextflow'
process.container = 'ubuntu'

executor.awscli = '/home/ec2-user/miniconda/bin/aws'

aws {
	region = 'us-west-2'

    client {
        maxConnections = 20
        connectionTimeout = 10000
        uploadStorageClass = 'INTELLIGENT_TIERING'
        storageEncryption = 'AES256'
    }
}

executor {
         name = 'local'
         // Maximum number of cpus
         cpus = 4
}

cloud {
      // Amazon ECS-optimized Nextflow image with AWS cli
    imageId = 'ami-0c323ba3e98b979f9'
    instanceType = 'm4.xlarge'
    subnetId = 'subnet-05222a43'
    autoscale {
              enabled = true
              maxInstances = 20
              terminateWhenIdle = true
              }
}

Program output

Nextflow log

Mar-06 18:45:28.023 [main] DEBUG nextflow.cli.Launcher - $> nextflow run tutorial.nf -work-dir 's3://olgabot-maca/nextflow-workdir-test/' -bucket-dir 's3://olgabot-maca/nextflow-bucket-dir-test/' -with-trace -with-timeline -with-dag -with-report -latest -resume
Mar-06 18:45:28.254 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 19.01.0
Mar-06 18:45:28.276 [main] INFO  nextflow.cli.CmdRun - Launching `tutorial.nf` [naughty_kimura] - revision: 361b274147
Mar-06 18:45:28.303 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /Users/olgabot/code/nextflow-test/nextflow.config
Mar-06 18:45:28.304 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /Users/olgabot/code/nextflow-test/nextflow.config
Mar-06 18:45:28.368 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Mar-06 18:45:29.386 [main] DEBUG nextflow.Session - Session uuid: 8a41ba94-aeb8-477f-85ab-968c510f1d0a
Mar-06 18:45:29.386 [main] DEBUG nextflow.Session - Run name: naughty_kimura
Mar-06 18:45:29.387 [main] DEBUG nextflow.Session - Executor pool size: 8
Mar-06 18:45:39.422 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 19.01.0 build 5050
  Modified: 22-01-2019 11:19 UTC (03:19 PDT)
  System: Mac OS X 10.12.6
  Runtime: Groovy 2.5.5 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11
  Encoding: UTF-8 (UTF-8)
  Process: 23308@Olgas-MacBook-Pro.local [192.168.1.14]
  CPUs: 8 - Mem: 16 GB (109.3 MB) - Swap: 1 GB (668.8 MB)
Mar-06 18:45:39.459 [main] DEBUG nextflow.file.FileHelper - Creating a file system instance for provider: S3FileSystemProvider
Mar-06 18:45:39.473 [main] DEBUG nextflow.Global - Using AWS credential defined in `default` section in file: /Users/olgabot/.aws/credentials
Mar-06 18:45:39.491 [main] DEBUG nextflow.file.FileHelper - AWS S3 config details: {secret_key=Zl6hSJ.., max_connections=20, upload_storage_class=INTELLIGENT_TIERING, storage_encryption=AES256, access_key=AKIAI2.., region=us-west-2, connection_timeout=10000}
Mar-06 18:45:40.940 [main] DEBUG nextflow.Session - Work-dir: s3://olgabot-maca/nextflow-workdir-test [Mac OS X]
Mar-06 18:45:40.941 [main] DEBUG nextflow.Session - Bucket-dir: s3://olgabot-maca/nextflow-bucket-dir-test
Mar-06 18:45:40.941 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /Users/olgabot/code/nextflow-test/bin
Mar-06 18:45:41.335 [main] DEBUG nextflow.Session - Session start invoked
Mar-06 18:45:41.340 [main] DEBUG nextflow.processor.TaskDispatcher - Dispatcher > start
Mar-06 18:45:41.341 [main] DEBUG nextflow.trace.TraceFileObserver - Flow starting -- trace file: /Users/olgabot/code/nextflow-test/trace.txt
Mar-06 18:45:41.350 [main] DEBUG nextflow.script.ScriptRunner - > Script parsing
Mar-06 18:45:41.728 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Mar-06 18:45:42.036 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: awsbatch
Mar-06 18:45:42.039 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'awsbatch'
Mar-06 18:45:42.050 [main] DEBUG nextflow.executor.Executor - Initializing executor: awsbatch
Mar-06 18:45:42.054 [main] INFO  nextflow.executor.Executor - [warm up] executor > awsbatch
Mar-06 18:45:42.087 [main] DEBUG nextflow.util.ThrottlingExecutor - Creating throttling executor with opts: nextflow.util.ThrottlingExecutor$Options(poolName:AWSBatch-executor, limiter:RateLimiter[stableRate=50.0qps], poolSize:40, maxPoolSize:40, queueSize:5000, maxRetries:10, keepAlive:1m, autoThrottle:true, errorBurstDelay:1s, rampUpInterval:100, rampUpFactor:1.2, rampUpMaxRate:1.7976931348623157E308, backOffFactor:2.0, backOffMinRate:0.0166666667, retryDelay:1s)
Mar-06 18:45:42.105 [main] DEBUG nextflow.util.ThrottlingExecutor - Creating throttling executor with opts: nextflow.util.ThrottlingExecutor$Options(poolName:AWSBatch-reaper, limiter:RateLimiter[stableRate=50.0qps], poolSize:40, maxPoolSize:40, queueSize:5000, maxRetries:10, keepAlive:1m, autoThrottle:true, errorBurstDelay:1s, rampUpInterval:100, rampUpFactor:1.2, rampUpMaxRate:1.7976931348623157E308, backOffFactor:2.0, backOffMinRate:0.0166666667, retryDelay:1s)
Mar-06 18:45:42.106 [main] DEBUG n.cloud.aws.batch.AwsBatchExecutor - Creating parallel monitor for executor 'awsbatch' > pollInterval=10s; dumpInterval=5m
Mar-06 18:45:42.119 [main] DEBUG nextflow.processor.TaskDispatcher - Starting monitor: ParallelPollingMonitor
Mar-06 18:45:42.120 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: awsbatch)
Mar-06 18:45:42.123 [main] DEBUG nextflow.executor.Executor - Invoke register for executor: awsbatch
Mar-06 18:45:42.181 [main] DEBUG nextflow.Global - Using AWS credential defined in `default` section in file: /Users/olgabot/.aws/credentials
Mar-06 18:45:42.291 [main] DEBUG nextflow.Session - >>> barrier register (process: splitLetters)
Mar-06 18:45:42.294 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > splitLetters -- maxForks: 8
Mar-06 18:45:42.404 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: awsbatch
Mar-06 18:45:42.405 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'awsbatch'
Mar-06 18:45:42.405 [main] DEBUG nextflow.executor.Executor - Initializing executor: awsbatch
Mar-06 18:45:42.407 [main] DEBUG nextflow.Session - >>> barrier register (process: convertToUpper)
Mar-06 18:45:42.408 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > convertToUpper -- maxForks: 8
Mar-06 18:45:42.445 [main] DEBUG nextflow.script.ScriptRunner - > Await termination 
Mar-06 18:45:42.446 [main] DEBUG nextflow.Session - Session await
Mar-06 18:45:45.002 [AWSBatch-executor-1] DEBUG n.c.aws.batch.AwsBatchTaskHandler - [AWS BATCH] Created job definition name=nf-ubuntu:4; container=ubuntu
Mar-06 18:45:45.096 [AWSBatch-executor-1] DEBUG n.c.aws.batch.AwsBatchTaskHandler - [AWS BATCH] submitted > job=9b068fe2-4c17-4e20-93f6-503663405719; work-dir=s3://olgabot-maca/nextflow-bucket-dir-test/30/fd8289a3b34fd3802d3dbe5cf1068d
Mar-06 18:45:45.096 [AWSBatch-executor-1] INFO  nextflow.Session - [30/fd8289] Submitted process > splitLetters
Mar-06 18:50:52.217 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor awsbatch > tasks to be completed: 1 -- pending tasks are shown below
~> TaskHandler[id: 1; name: splitLetters; status: SUBMITTED; exit: -; error: -; workDir: s3://olgabot-maca/nextflow-bucket-dir-test/30/fd8289a3b34fd3802d3dbe5cf1068d]
Mar-06 18:51:32.670 [Task monitor] DEBUG n.c.aws.batch.AwsBatchTaskHandler - [AWS BATCH] Cannot read exitstatus for task: `splitLetters`
java.nio.file.NoSuchFileException: /olgabot-maca/nextflow-bucket-dir-test/30/fd8289a3b34fd3802d3dbe5cf1068d/.exitcode
	at com.upplication.s3fs.S3FileSystemProvider.newInputStream(S3FileSystemProvider.java:275)
	at java.nio.file.Files.newInputStream(Files.java:152)
	at java.nio.file.Files.newBufferedReader(Files.java:2784)
	at org.codehaus.groovy.runtime.NioGroovyMethods.newReader(NioGroovyMethods.java:1430)
	at org.codehaus.groovy.runtime.NioGroovyMethods.getText(NioGroovyMethods.java:423)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.runtime.metaclass.ReflectionMetaMethod.invoke(ReflectionMetaMethod.java:54)
	at org.codehaus.groovy.runtime.metaclass.NewInstanceMetaMethod.invoke(NewInstanceMetaMethod.java:56)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:326)
	at groovy.lang.MetaClassImpl.getProperty(MetaClassImpl.java:1859)
	at groovy.lang.MetaClassImpl.getProperty(MetaClassImpl.java:3797)
	at groovy.lang.DelegatingMetaClass.getProperty(DelegatingMetaClass.java:130)
	at org.codehaus.groovy.runtime.InvokerHelper.getProperty(InvokerHelper.java:195)
	at org.codehaus.groovy.runtime.callsite.PojoMetaClassGetPropertySite.getProperty(PojoMetaClassGetPropertySite.java:36)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGetProperty(AbstractCallSite.java:298)
	at nextflow.cloud.aws.batch.AwsBatchTaskHandler.readExitFile(AwsBatchTaskHandler.groovy:243)
	at nextflow.cloud.aws.batch.AwsBatchTaskHandler.checkIfCompleted(AwsBatchTaskHandler.groovy:232)
	at nextflow.processor.TaskPollingMonitor.checkTaskStatus(TaskPollingMonitor.groovy:612)
	at nextflow.processor.TaskPollingMonitor.checkAllTasks(TaskPollingMonitor.groovy:539)
	at nextflow.processor.TaskPollingMonitor.pollLoop(TaskPollingMonitor.groovy:418)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:104)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:326)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
	at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:1011)
	at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:994)
	at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:97)
	at nextflow.processor.TaskPollingMonitor$_start_closure2.doCall(TaskPollingMonitor.groovy:302)
	at nextflow.processor.TaskPollingMonitor$_start_closure2.call(TaskPollingMonitor.groovy)
	at groovy.lang.Closure.run(Closure.java:492)
	at java.lang.Thread.run(Thread.java:748)
Mar-06 18:51:32.671 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: splitLetters; status: COMPLETED; exit: -; error: -; workDir: s3://olgabot-maca/nextflow-bucket-dir-test/30/fd8289a3b34fd3802d3dbe5cf1068d]
Mar-06 18:51:32.741 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump output of process 'null' -- Cause: java.nio.file.NoSuchFileException: /var/folders/70/9pmmxs613fg12b1kl7gkgfjm0000gn/T/temp-s3-3077773917845044133/.command.out
Mar-06 18:51:32.808 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'null' -- Cause: java.nio.file.NoSuchFileException: /var/folders/70/9pmmxs613fg12b1kl7gkgfjm0000gn/T/temp-s3-3983631003767390998/.command.err
Mar-06 18:51:32.863 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'null' -- Cause: java.nio.file.NoSuchFileException: /var/folders/70/9pmmxs613fg12b1kl7gkgfjm0000gn/T/temp-s3-8051283222490433711/.command.log
Mar-06 18:51:32.864 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'splitLetters'

Caused by:
  Process `splitLetters` terminated for an unknown reason -- Likely it has been terminated by the external system

Command executed:

  printf 'Hello world!' | split -b 6 - chunk_

Command exit status:
  -

Command output:
  (empty)

Work dir:
  s3://olgabot-maca/nextflow-bucket-dir-test/30/fd8289a3b34fd3802d3dbe5cf1068d

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
Mar-06 18:51:32.872 [Actor Thread 1] DEBUG nextflow.Session - <<< barrier arrive (process: splitLetters)
Mar-06 18:51:32.874 [Actor Thread 5] DEBUG nextflow.Session - <<< barrier arrive (process: convertToUpper)
Mar-06 18:51:32.875 [main] DEBUG nextflow.Session - Session await > all process finished
Mar-06 18:51:32.964 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process `splitLetters` terminated for an unknown reason -- Likely it has been terminated by the external system
Mar-06 18:51:33.163 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'null' -- Cause: java.nio.file.NoSuchFileException: /var/folders/70/9pmmxs613fg12b1kl7gkgfjm0000gn/T/temp-s3-7504635121669872489/.command.err
Mar-06 18:51:33.209 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump output of process 'null' -- Cause: java.nio.file.NoSuchFileException: /var/folders/70/9pmmxs613fg12b1kl7gkgfjm0000gn/T/temp-s3-5077009789467940105/.command.out
Mar-06 18:51:33.214 [main] DEBUG nextflow.Session - Session await > all barriers passed
Mar-06 18:51:33.214 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: awsbatch)
Mar-06 18:51:33.271 [main] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'null' -- Cause: java.nio.file.NoSuchFileException: /var/folders/70/9pmmxs613fg12b1kl7gkgfjm0000gn/T/temp-s3-8434136164046564476/.command.err
Mar-06 18:51:33.325 [main] DEBUG nextflow.processor.TaskRun - Unable to dump output of process 'null' -- Cause: java.nio.file.NoSuchFileException: /var/folders/70/9pmmxs613fg12b1kl7gkgfjm0000gn/T/temp-s3-976114138694934598/.command.out
Mar-06 18:51:33.331 [main] DEBUG nextflow.trace.StatsObserver - Workflow completed > WorkflowStats[succeedCount=0; failedCount=1; ignoredCount=0; cachedCount=0; succeedDuration=0ms; failedDuration=331ms; cachedDuration=0ms]
Mar-06 18:51:33.331 [main] DEBUG nextflow.trace.TraceFileObserver - Flow completing -- flushing trace file
Mar-06 18:51:33.334 [main] DEBUG nextflow.trace.ReportObserver - Flow completing -- rendering html report
Mar-06 18:51:33.371 [main] DEBUG nextflow.trace.ReportObserver - Execution report summary data:
  {"splitLetters":{"cpu":null,"mem":null,"time":{"mean":331,"min":331,"q1":331,"q2":331,"q3":331,"max":331,"minLabel":"splitLetters","maxLabel":"splitLetters","q1Label":"splitLetters","q2Label":"splitLetters","q3Label":"splitLetters"},"reads":null,"writes":null,"cpuUsage":null,"memUsage":null,"timeUsage":null}}
Mar-06 18:51:34.963 [main] DEBUG nextflow.trace.TimelineObserver - Flow completing -- rendering html timeline
Mar-06 18:51:34.997 [main] DEBUG nextflow.CacheDB - Closing CacheDB done
Mar-06 18:51:34.998 [main] DEBUG nextflow.Session - AWS S3 uploader shutdown
Mar-06 18:51:35.027 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

Environment

  • Nextflow version: N E X T F L O W ~ version 19.01.0
  • Java version:
 java -showversion
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
  • Operating system: macOS

Additional context

My hunch is that the No such file or directory error is happening because the file doesn't exist in the docker image ubuntu which is running the workflow, but I thought that shouldn't matter, that the docker image and the AMI can be independent. Am I misunderstanding something?

@KochTobi
Copy link

KochTobi commented Mar 7, 2019

Hi :)
Maybe double check if the awscli is installed correctly on your AMI. Also check if you specified the right custom AMI in your compute environment.
Also you specify two executors process.executor = 'awsbatch' and

executor {
         name = 'local'
         // Maximum number of cpus
         cpus = 4
}

maybe try removing the latter.

@olgabot
Copy link
Author

olgabot commented Mar 7, 2019

Hmm I'm confused because the instructions say to add this line:

executor.awscli = '/home/ec2-user/miniconda/bin/aws'

And there's definitely no file named awscli in that folder on the image:

[ec2-user@ip-172-31-20-19 ~]$ ll miniconda/bin/aws*
-rwxrwxr-x 1 ec2-user ec2-user  834 Mar  7 02:43 miniconda/bin/aws
-rwxrwxr-x 2 ec2-user ec2-user  204 Mar  7 00:21 miniconda/bin/aws_bash_completer
-rwxrwxr-x 2 ec2-user ec2-user 1432 Mar  7 00:21 miniconda/bin/aws.cmd
-rwxrwxr-x 1 ec2-user ec2-user 1155 Mar  7 02:43 miniconda/bin/aws_completer
-rwxrwxr-x 2 ec2-user ec2-user 1807 Mar  7 00:21 miniconda/bin/aws_zsh_completer.sh

I'll try removing the local executor and see if that helps

@KochTobi
Copy link

KochTobi commented Mar 7, 2019

Maybe also check your compute environment. The cloud space does not impact the awsbatch executor. So you need to specify the AMI ami-0c323ba3e98b979f9 in your compute environment via the web console

@rsuchecki
Copy link
Contributor

And there's definitely no file named awscli in that folder on the image:

[ec2-user@ip-172-31-20-19 ~]$ ll miniconda/bin/aws*
-rwxrwxr-x 1 ec2-user ec2-user  834 Mar  7 02:43 miniconda/bin/aws

Notice that the file is aws, that is /home/ec2-user/miniconda/bin/aws and awscli is not a file but just the name of the variable to which the path to aws is assigned.

@pditommaso
Copy link
Member

Closing because there's no more feedback.

@shubhamsendre
Copy link

shubhamsendre commented Mar 30, 2021

Hello @olgabot I am also encountering same issue with awscli but failed to solve this problem. If you have solved this problem can you please guide me through the steps you have taken to solve this issue.
Thankyou.

@Volodymyr128
Copy link

Doesn't work for me as well. I configured a custom AMI with pre-installed AWSCLI due to official instructions. Event with aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws' at nextflow.config I get:

bash: line 1: /home/ec2-user/miniconda/bin/aws: No such file or directory

when I login to EC2 machine created by AWS batch, I can see that /home/ec2-user/miniconda/bin/aws exists

@nathan-will
Copy link

@Volodymyr128 Did you find a resolution for this? I have the same problem.

@HarryMWinters
Copy link

Bump. Same issue.

@HarryMWinters
Copy link

HarryMWinters commented Feb 4, 2023

I'm not sure whether @olgabot and I experienced the same issue but we got the same error message and I ended up here. Here's how I solved my issue.

Background

When the process.executor is awsbatch and executor.awscli is specified Nextflow mounts the grandparent directory of the awscli from the host instance into the container running the job (see AwsBatchTaskHandler.groovy#L499). Nextflow will also use awscli path to try and invoke aws from within the container. This leads to several problems.

Problem 1 - The container cannot execute because it cannot find the AWS binary.

This manifested for me in Batch Job CloudWatch logs like the following.

bash: line 1: /usr/bin/aws: No such file or directory

The problem here is the process within the container is looking for a binary at /usr/bin/aws and not finding it. In my case, this was particularly troubling since when I spun up a container based on the same image /usr/bin/aws worked perfectly. This issue arose because of the implicit volume mounting of /usr directory from the host EC2 instance into the container at /usr. Since the host EC2 instance did not have the AWS CLI installed mounting the /usr directory from the host into the docker container masked the AWS CLI binary.

You can check for unexpected mounts and volumes by examining the AWS Batch Jobs that are created when you try and run Nextflow.

I got past this by using the Amazon Linux 2022 EC2 Optimized AMI for my Batch Compute Environment and specifying the AWS CLI path in the ec2 instance in the awscli argument.

Problem 2 - Odd compatibility errors

bash: /lib/x86_64-linux-gnu/libc.so.6: version 'GLIBC_2.33' not found (required by bash)

Unfortunately passing --awscli /usr/bin/aws replaces the entire /usr directory on the container with that of the host. Since many binaries are installed to /usr/bin, this causes breakage when a container process calls a program that doesn't exist on the host.

I fixed this by installing the AWS CLI in its own directory structure and passing that.

Other Problems

  • If your AWS CLI path is too short Nextflow tries to mount a volume at the container root and breaks.

Solution

  1. Spin an EC2 instance running Amazon Linux 2022 ECS optimized. (Technically, any ECS-compatible OS should work).
  2. SSH into the EC2 instance.
  3. Make the necessary directory structure: mkdir -p /nextflow_tools/bin. The names are up to you, but it should be two levels deep.
  4. Install the AWS CLI within your new directories.
# Download CLI installer
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"

# Download unzip utility
yum install unzip

# Unzip the installer into a temporary directory
unzip awscliv2.zip -d  _tmp_aws_cli_installer

# cd into the installer directory. I'm not sure this is necessary, but I ran into some weird symlink issues without this.
# I was probably just tired
cd _tmp_aws_cli_installer

# Run the install. Keep the binary and its dependencies in /nextflow_tools/bin
./aws/install -i /nextflow_tools/bin/aws_cli_files -b /nextflow_tools/bin/

# Check that the installation worked  
/nextflow_tools/bin/aws

# Remove the tmp install folder
cd && rm -r _tmp_aws_cli_installer
  1. Make an AMI from the instance.
  2. Use that AMI in the AWS Batch Compute Environment you're AWS Batch Queue feeds into.
  3. Update the Nextflow awscli argument to /nextflow_tools/bin/aws.

Conclusion

The awscli parameter does some strange things. Good luck

EDIT: Edit for clarity.

@BJWiley233
Copy link

@Volodymyr128 how do you "login to EC2 machine created by AWS batch"? I can easily ssh into the AMI I indicated in the "Image ID override" section of my compute environment but how to confirm these are the same machines:

  1. "login to EC2 machine created by AWS batch"
  2. machine indicated in "Image ID override"

@BJWiley233
Copy link

BJWiley233 commented Mar 12, 2023

MAKE SURE YOU RUN /path/to/aws configure in your instance before you make an AMI out of it!!!!!! So you have access to download the .command.run and .command.sh file from your S3 bucket.

@ahummel25
Copy link

ahummel25 commented Oct 15, 2023

I'm not sure whether @olgabot and I experienced the same issue but we got the same error message and I ended up here. Here's how I solved my issue.

Background

When the process.executor is awsbatch and executor.awscli is specified Nextflow mounts the grandparent directory of the awscli from the host instance into the container running the job (see AwsBatchTaskHandler.groovy#L499). Nextflow will also use awscli path to try and invoke was from within the container. This led to several problems.

Problem 1 - The container cannot execute because it cannot find the AWS binary.

This manifested for me in Batch Job CloudWatch logs like the following.

bash: line 1: /usr/bin/aws: No such file or directory

The problem here is the process within the container is looking for a binary at /usr/bin/aws and not finding it. In my case, this was particularly troubling since when I spun up a container based on the same image /usr/bin/aws worked perfectly. Th issue arose because of the implicit volume mounting of /usr directory from the host EC2 instance the container at /usr. Since the host EC2 instance did not have the AWS CLI installed the docker image AWS CLI was effectively masked.

You can check for unexpected mounts and volumes by examining the AWS Batch Jobs that are created when you try and run Nextflow.

I got past this by using the Amazon Linux 2022 EC2 Optimized AMI for my Batch Compute Environment and specifying the AWS CLI path in the ec2 instance in the awscli argument.

Problem 2 - Odd compatibility errors

bash: /lib/x86_64-linux-gnu/libc.so.6: version 'GLIBC_2.33' not found (required by bash)

Unfortunately passing --awscli /usr/bin/aws replaces the entire /usr directory on the container with that of the host. Since many binaries install to /usr/bin this causes breakage when a container program calls a host library.

I fixed this by installing the AWS CLI in its own directory structure and passing that.

Other Problems

  • If your AWS CLI path is too short Nextflow tries to mount a volume at the container root and breaks.

Solution

  1. Spin an EC2 instance running Amazon Linux 2022 ECS optimized. (Technically any ECS compatibly OS should work).
  2. SSH into the EC2 instance.
  3. Make the correct necessary directory structure: mkdir -p /nextflow_tools/bin. The names are up to you but it should be two levels deep.
  4. Install the AWS CLI within your new directories.
# Download CLI installer
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"

# Download unzip utility
yum install unzip

# Unzip the installer into a temporary directory
unzip awscliv2.zip -d  _tmp_aws_cli_installer

# cd into the installer directory. I'm not sure this is necessary but I ran into some weird symlink issues without this.
# I was probably just tired
cd _tmp_aws_cli_installer

# Run the install. Keep the binary and its dependencies in /nextflow_tools/bin
./aws/install -i /nextflow_tools/bin/aws_cli_files -b /nextflow_tools/bin/

# Check that the installation worked  
/nextflow_tools/bin/aws

# Remove the tmp install folder
cd && rm -r _tmp_aws_cli_installer
  1. Make an AMI from the instance.
  2. Use that AMI in the AWS Batch Compute Environment you're AWS Batch Queue feeds in to.
  3. Update the Nextflow awscli argument to /nextflow_tools/bin/aws.

Conclusion

The awscli parameter does some strange things. Good luck

@HarryMWinters Have you by chance gotten this solution to work with a docker container running on batch? I've built the following docker image following your solution because I also keep getting aws not found errors.

# use the upstream nextflow container as a base image
ARG VERSION=latest
FROM nextflow/nextflow:${VERSION} AS build

FROM amazonlinux:2023 AS final
COPY --from=build /usr/local/bin/nextflow /usr/bin/nextflow

RUN yum update -y --allowerasing \
	&& yum install -y --allowerasing \
	curl \
	hostname \
	java \
	unzip \
	&& yum clean -y all
RUN rm -rf /var/cache/yum

RUN mkdir -p /nextflow_tools/bin
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
RUN yum install unzip
RUN unzip awscliv2.zip -d _tmp_aws_cli_installer
RUN cd _tmp_aws_cli_installer \
	&& ./aws/install -i /nextflow_tools/bin/aws_cli_files -b /nextflow_tools/bin/ \
	&& /nextflow_tools/bin/aws --version \
	&& cd - && rm -r _tmp_aws_cli_installer

ENV JAVA_HOME /usr/lib/jvm/jre-openjdk/

# # invoke nextflow once to download dependencies
RUN nextflow -version

# # install a custom entrypoint script that handles being run within an AWS Batch Job
COPY nextflow.aws.sh /opt/bin/nextflow.aws.sh
RUN chmod +x /opt/bin/nextflow.aws.sh

WORKDIR /opt/work
ENTRYPOINT ["/opt/bin/nextflow.aws.sh"]

I'm trying to run the very simple nextflow-io/hello, but I cannot get it to work. I keep getting bash: line 1: aws: command not found.

This is my batch command I'm running:

aws batch submit-job \
    --job-name nf-hello \
    --job-queue low-priority-ec2 \
    --job-definition nextflow-dev \
    --container-overrides command=nextflow-io/hello \
    --region us-east-2 \
    --scheduling-priority-override 1 \
    --share-identifier default

I've also tried setting both executor.awscli and aws.batch.cliPath to "/nextflow_tools/bin/aws". Both fail.

@olgabot
Copy link
Author

olgabot commented Oct 17, 2023

Hi @ahummel25, the one thing I did that was the most helpful was a BUNCH of ls -lha in lots of folders. Try adding to your docker file or script that you're running in Nextflow:

ls -lha /usr/bin
ls -lha /usr/local/bin
ls -lha /nextflow_tools/bin

Hope that helps. Good luck!

@stevekm
Copy link
Contributor

stevekm commented Oct 18, 2023

I see this is an old issue, but it sounds similar to #1865 aws/aws-cli#4971 where the container used inside the AWS Batch job was not compatible with the aws program; when I had this issue recently I changed my container from Alpine Linux to Ubuntu and it fixed this issue.

@HarryMWinters
Copy link

@HarryMWinters Have you by chance gotten this solution to work with a docker container running on batch? I've built the following docker image following your solution because I also keep getting aws not found errors.

Hey there @ahummel25,

It has been some time since I worked on this, so please forgive any lapses in memory. It's also worth mentioning that I continued encountering issues with Nextflow (core and pipelines), which led me to use Redun and code the pipeline myself. Obviously, YMMV.

First, I think my solution was for AWS Batch and I don't think I needed to make a custom docker image. Just a custom Amazon Machine Image. All of the installation and path manipulation steps relate to making a custom Amazon Machine Image, which is then used when spinning up EC2 instances for the Batch Compute Environment. Batch uses the EC2 instances as docker hosts, and Nextflow mounts the AWS-CLI directory from the host into the docker container.

Second, I agree with @olgabot. IMO, you can never have too many ls calls in this situation. I'm particularly curious whether the binary path contains anything.

Questions:

  1. What are the contents of nextflow.aws.sh?
  2. What AMI are you using in your Batch Compute Environment?
  3. Do the containers you spin up on Batch have a volume mount? If so, what are the host and container paths?

TL;DR

I made a custom Amazon Machine Image (not a Docker image) to get past my error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests