Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error staging directory from S3 as input #3110

Closed
jeremykap opened this issue Aug 10, 2022 · 18 comments
Closed

Error staging directory from S3 as input #3110

jeremykap opened this issue Aug 10, 2022 · 18 comments

Comments

@jeremykap
Copy link

jeremykap commented Aug 10, 2022

Bug report

Expected behavior and actual behavior

I'd like to be able to stage a directory from S3 as input to task, but am getting a java.lang.UnsupportedOperationException error when doing so. I'd expect this to work as it does locally, where the local path is symlinked into the working directory, and works like any other file.

Steps to reproduce the problem

This workflow takes a directory path as param input_dir, and lists it's contents in a file:

nextflow.enable.dsl=2


process ListFolder {
    input: 
        path(directory)
    output:
        path("dir_contents.txt")
    """
    ls ${directory}/ > dir_contents.txt
    """
}

workflow {
    input_ch = Channel.fromPath(params.input_dir)
    ListFolder(input_ch)
}

Program output

When running with local path it succeeds, when running with s3 path, it fails

nextflow run test.nf --input_dir {S3_PATH}
N E X T F L O W  ~  version 22.04.5
Launching `test.nf` [jovial_hugle] DSL2 - revision: 5877c29902
[-        ] process > ListFolder -
Error executing process > 'ListFolder'

Caused by:
  java.lang.UnsupportedOperationException

from .nextflow.log:

Aug-10 21:15:39.290 [Actor Thread 3] DEBUG nextflow.plugin.PluginUpdater - Installing plugin nf-amazon version: 1.7.2
Aug-10 21:15:39.298 [Actor Thread 3] INFO  org.pf4j.AbstractPluginManager - Plugin 'nf-amazon@1.7.2' resolved
Aug-10 21:15:39.298 [Actor Thread 3] INFO  org.pf4j.AbstractPluginManager - Start plugin 'nf-amazon@1.7.2'
Aug-10 21:15:39.316 [Actor Thread 3] DEBUG nextflow.plugin.BasePlugin - Plugin started nf-amazon@1.7.2
Aug-10 21:15:39.332 [Actor Thread 3] DEBUG nextflow.file.FileHelper - > Added 'S3FileSystemProvider' to list of installed providers [s3]
Aug-10 21:15:39.333 [Actor Thread 3] DEBUG nextflow.file.FileHelper - Started plugin 'nf-amazon' required to handle file: {S3_PATH}
Aug-10 21:15:39.343 [Actor Thread 3] DEBUG nextflow.file.FileHelper - Creating a file system instance for provider: S3FileSystemProvider
Aug-10 21:15:39.353 [Actor Thread 3] DEBUG nextflow.Global - Using AWS credential defined in `default` section in file: /home/ec2-user/.aws/credentials
Aug-10 21:15:39.354 [Actor Thread 3] DEBUG nextflow.file.FileHelper - AWS S3 config details: {secret_key={SECRET_KEY} region=eu-central-1, max_error_retry=5, access_key={ACCESS_EY}
Aug-10 21:15:39.879 [Actor Thread 3] DEBUG c.u.s3fs.S3FileSystemProvider - Using S3 multi-part downloader
Aug-10 21:15:39.882 [Actor Thread 3] DEBUG c.u.s3fs.ng.S3ParallelDownload - Creating S3 download thread pool: workers=10; chunkSize=10 MB; queueSize=10000; max-mem=1 GB; maxAttempts=5; maxDelay=1m 30s; pool-capacity=103
Aug-10 21:15:40.488 [Actor Thread 3] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=2; maxSize=2; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Aug-10 21:15:40.494 [FileTransfer-1] DEBUG nextflow.file.FilePorter - Copying foreign file {S3_PATH} to work dir: /home/ec2-user/work/stage/9f/1e82631cb4bf2e8225d1d8fdf3a65e/test
Aug-10 21:15:40.552 [Actor Thread 3] ERROR nextflow.processor.TaskProcessor - Error executing process > 'ListFolder'

Caused by:
  java.lang.UnsupportedOperationException

java.lang.UnsupportedOperationException: null
	at com.upplication.s3fs.S3FileSystemProvider.getFileAttributeView(S3FileSystemProvider.java:697)
	at java.base/java.nio.file.Files.getFileAttributeView(Files.java:1776)
	at java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:221)
	at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:282)
	at java.base/java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:328)
	at java.base/java.nio.file.Files.walkFileTree(Files.java:2792)
	at nextflow.file.CopyMoveHelper.copyDirectory(CopyMoveHelper.java:174)
	at nextflow.file.CopyMoveHelper.copyToForeignTarget(CopyMoveHelper.java:202)
	at nextflow.file.FileHelper.copyPath(FileHelper.groovy:939)
	at nextflow.file.FilePorter$FileTransfer.stageForeignFile0(FilePorter.groovy:294)
	at nextflow.file.FilePorter$FileTransfer.stageForeignFile(FilePorter.groovy:261)
	at nextflow.file.FilePorter$FileTransfer.run(FilePorter.groovy:247)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Aug-10 21:15:40.564 [Actor Thread 3] DEBUG nextflow.Session - Session aborted -- Cause: java.lang.UnsupportedOperationException

Environment

  • Nextflow version: Seen in both 22.04.5 + 21.10.6
  • Java version: 4.2.46(2)-release
  • Operating system: macOS + Linux
  • Bash version: 4.2.46(2)

Additional context

Using this to stage a directory of BWA Indices so I don't have to specify the individual files as s3 inputs. Would like to download them from s3 so can run alignment on AWS Batch.

@pditommaso
Copy link
Member

Can you please include the full .nextflow.log file?

@jeremykap
Copy link
Author

Aug-11 12:58:29.198 [main] DEBUG nextflow.cli.Launcher - $> nextflow run test.nf --input_dir {S3_PATH}
Aug-11 12:58:29.377 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 22.04.5
Aug-11 12:58:29.537 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 from script declararion
Aug-11 12:58:29.582 [main] INFO  nextflow.cli.CmdRun - Launching `test.nf` [special_marconi] DSL2 - revision: 33d8e41c6c
Aug-11 12:58:29.609 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; plugins-dir=/home/ec2-user/.nextflow/plugins; core-plugins: nf-amazon@1.7.2,nf-azure@0.13.2,nf-console@1.0.3,nf-ga4gh@1.0.3,nf-google@1.1.4,nf-sqldb@0.4.0,nf-tower@1.4.0
Aug-11 12:58:29.611 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
Aug-11 12:58:29.637 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Aug-11 12:58:29.639 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Aug-11 12:58:29.644 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Aug-11 12:58:29.665 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
Aug-11 12:58:29.770 [main] DEBUG nextflow.Session - Session uuid: ee669ace-6b73-4495-b682-48a574fccdea
Aug-11 12:58:29.770 [main] DEBUG nextflow.Session - Run name: special_marconi
Aug-11 12:58:29.773 [main] DEBUG nextflow.Session - Executor pool size: 2
Aug-11 12:58:29.913 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 22.04.5 build 5708
  Created: 15-07-2022 16:09 UTC 
  System: Linux 5.10.130-118.517.amzn2.x86_64
  Runtime: Groovy 3.0.10 on OpenJDK 64-Bit Server VM 11.0.13+7-b1751.21
  Encoding: UTF-8 (UTF-8)
  Process: 4052@{INSTANCE}
  CPUs: 2 - Mem: 7.8 GB (6.7 GB) - Swap: 0 (0)
Aug-11 12:58:29.960 [main] DEBUG nextflow.Session - Work-dir: /home/ec2-user/work [xfs]
Aug-11 12:58:29.960 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /home/ec2-user/bin
Aug-11 12:58:29.979 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
Aug-11 12:58:29.996 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Aug-11 12:58:30.044 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
Aug-11 12:58:30.066 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 3; maxThreads: 1000
Aug-11 12:58:30.200 [main] DEBUG nextflow.Session - Session start invoked
Aug-11 12:58:31.229 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Aug-11 12:58:31.488 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
Aug-11 12:58:31.488 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Aug-11 12:58:31.494 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local
Aug-11 12:58:31.502 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=2; memory=7.8 GB; capacity=2; pollInterval=100ms; dumpInterval=5m
Aug-11 12:58:31.661 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: ListFolder
Aug-11 12:58:31.662 [main] DEBUG nextflow.Session - Ignite dataflow network (2)
Aug-11 12:58:31.669 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > ListFolder
Aug-11 12:58:31.674 [main] DEBUG nextflow.script.ScriptRunner - > Await termination 
Aug-11 12:58:31.674 [main] DEBUG nextflow.Session - Session await
Aug-11 12:58:31.687 [PathVisitor-1] DEBUG nextflow.plugin.PluginUpdater - Installing plugin nf-amazon version: 1.7.2
Aug-11 12:58:31.705 [PathVisitor-1] INFO  org.pf4j.AbstractPluginManager - Plugin 'nf-amazon@1.7.2' resolved
Aug-11 12:58:31.706 [PathVisitor-1] INFO  org.pf4j.AbstractPluginManager - Start plugin 'nf-amazon@1.7.2'
Aug-11 12:58:31.788 [PathVisitor-1] DEBUG nextflow.plugin.BasePlugin - Plugin started nf-amazon@1.7.2
Aug-11 12:58:31.853 [PathVisitor-1] DEBUG nextflow.file.FileHelper - > Added 'S3FileSystemProvider' to list of installed providers [s3]
Aug-11 12:58:31.854 [PathVisitor-1] DEBUG nextflow.file.FileHelper - Started plugin 'nf-amazon' required to handle file: {S3_PATH}
Aug-11 12:58:31.878 [PathVisitor-1] DEBUG nextflow.file.FileHelper - Creating a file system instance for provider: S3FileSystemProvider
Aug-11 12:58:31.898 [PathVisitor-1] DEBUG nextflow.Global - Using AWS credential defined in `default` section in file: /home/ec2-user/.aws/credentials
Aug-11 12:58:31.900 [PathVisitor-1] DEBUG nextflow.file.FileHelper - AWS S3 config details: {secret_key={KEY}.., region=eu-central-1, max_error_retry=5, access_key={KEY}..}
Aug-11 12:58:32.749 [PathVisitor-1] DEBUG c.u.s3fs.S3FileSystemProvider - Using S3 multi-part downloader
Aug-11 12:58:32.753 [PathVisitor-1] DEBUG c.u.s3fs.ng.S3ParallelDownload - Creating S3 download thread pool: workers=10; chunkSize=10 MB; queueSize=10000; max-mem=1 GB; maxAttempts=5; maxDelay=1m 30s; pool-capacity=103
Aug-11 12:58:33.561 [Actor Thread 3] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=2; maxSize=2; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Aug-11 12:58:33.566 [FileTransfer-1] DEBUG nextflow.file.FilePorter - Copying foreign file {S3_PATH} to work dir: /home/ec2-user/work/stage/9f/1e82631cb4bf2e8225d1d8fdf3a65e/test
Aug-11 12:58:33.624 [Actor Thread 3] ERROR nextflow.processor.TaskProcessor - Error executing process > 'ListFolder (1)'

Caused by:
  java.lang.UnsupportedOperationException

java.lang.UnsupportedOperationException: null
	at com.upplication.s3fs.S3FileSystemProvider.getFileAttributeView(S3FileSystemProvider.java:697)
	at java.base/java.nio.file.Files.getFileAttributeView(Files.java:1776)
	at java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:221)
	at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:282)
	at java.base/java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:328)
	at java.base/java.nio.file.Files.walkFileTree(Files.java:2792)
	at nextflow.file.CopyMoveHelper.copyDirectory(CopyMoveHelper.java:174)
	at nextflow.file.CopyMoveHelper.copyToForeignTarget(CopyMoveHelper.java:202)
	at nextflow.file.FileHelper.copyPath(FileHelper.groovy:939)
	at nextflow.file.FilePorter$FileTransfer.stageForeignFile0(FilePorter.groovy:294)
	at nextflow.file.FilePorter$FileTransfer.stageForeignFile(FilePorter.groovy:261)
	at nextflow.file.FilePorter$FileTransfer.run(FilePorter.groovy:247)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Aug-11 12:58:33.645 [Actor Thread 3] DEBUG nextflow.Session - Session aborted -- Cause: java.lang.UnsupportedOperationException
Aug-11 12:58:33.678 [Actor Thread 3] DEBUG nextflow.Session - The following nodes are still active:
[process] ListFolder
  status=ACTIVE
  port 0: (queue) closed; channel: directory
  port 1: (cntrl) -     ; channel: $

Aug-11 12:58:33.682 [main] DEBUG nextflow.Session - Session await > all process finished
Aug-11 12:58:33.682 [main] DEBUG nextflow.Session - Session await > all barriers passed
Aug-11 12:58:33.696 [main] DEBUG nextflow.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=0; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; failedDuration=0ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=0; peakCpus=0; peakMemory=0; ]
Aug-11 12:58:33.872 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Aug-11 12:58:33.872 [main] INFO  org.pf4j.AbstractPluginManager - Stop plugin 'nf-amazon@1.7.2'
Aug-11 12:58:33.872 [main] DEBUG nextflow.plugin.BasePlugin - Plugin stopped nf-amazon
Aug-11 12:58:33.874 [main] DEBUG c.u.s3fs.ng.S3ParallelDownload - Shutdown S3 downloader
Aug-11 12:58:33.874 [main] DEBUG c.u.s3fs.ng.S3ParallelDownload - Shutdown S3 downloader - done
Aug-11 12:58:33.897 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

@pditommaso
Copy link
Member

What's the output of java -version?

@jeremykap
Copy link
Author

openjdk 11.0.13 2021-10-19
OpenJDK Runtime Environment JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21)
OpenJDK 64-Bit Server VM JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21, mixed mode)

@pditommaso
Copy link
Member

The error is caused by the use of a non-official Java distribution, likely installed via Conda. See here for details #2841 (comment)

@jeremykap
Copy link
Author

Yup switching to a different OpenJDK version fixed it. Thanks!

@nick-youngblut
Copy link
Contributor

The error is caused by the use of a non-official Java distribution, likely installed via Conda

@pditommaso at least for nf-core, the docs state that nextflow can be installed via conda (bioconda). Do you think that one should NOT use conda for install, which also means not using the nf-core devcontainer (https://github.com/nf-core/tools/blob/master/.devcontainer/devcontainer.json), since it installs nextflow via conda?

@pditommaso
Copy link
Member

Yeah, it should be avoided. Even better it should look into patching the Conda (Bioconda) recipe for Java. tagging @ewels

@matthdsm
Copy link
Contributor

Any tips of which specific java version should be used then?

@pditommaso
Copy link
Member

For example AdoptJDK or just use https://sdkman.io/ to install it

@matthdsm
Copy link
Contributor

So nothing that can be done within the constraints of bioconda/conda-forge?

@pditommaso
Copy link
Member

It must be updated the conda-forge recipe to use an official Java version

@rpetit3
Copy link

rpetit3 commented Jan 17, 2023

I think as long as conda-forge is included (e.g. -c or ~/.condarc) then things should be fine since its building from https://github.com/openjdk

Here's the OpenJDK from conda-forge

mamba create -y -n nextflow -c conda-forge -c bioconda nextflow
mamba activate nextflow
java --version
openjdk 17.0.3-internal 2022-04-19
OpenJDK Runtime Environment (build 17.0.3-internal+0-adhoc..src)
OpenJDK 64-Bit Server VM (build 17.0.3-internal+0-adhoc..src, mixed mode, sharing)

But the issue is when conda-forge is not included, and conda falls back on main channel, which is the JBR version

mamba create -y -n nextflow -c bioconda nextflow
mamba activate nextflow
java --version
openjdk 11.0.13 2021-10-19
OpenJDK Runtime Environment JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21)
OpenJDK 64-Bit Server VM JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21, mixed mode)

@pditommaso
Copy link
Member

@ewels
Copy link
Member

ewels commented Jan 17, 2023

The docs already talk about setting up the conda channels properly, but we should remove the -c bioconda from the example command there. Then it will fail if the channels aren't configured as described.

The devtainer setup should be fine, as it's based on the GitPod image, and that configures the channels properly before installing Nextflow.

@ewels
Copy link
Member

ewels commented Jan 17, 2023

@rpetit3 what do you think about making the bioconda recipe require > v11 jdk so that it can't install the JBR one?

It'd be a bit of a hack but it might help..

@rpetit3
Copy link

rpetit3 commented Jan 17, 2023

I like your suggestion of dropping the -c bioconda. Would only suggest to merge the conda create and conda install commands to prevent users going directly to conda install and installing it to base.

As for the version pinning, we can do that. Only potential issue would be version conflicts, but that might actually be useful to prevent future conda related issues appearing here.

I think if it becomes too common, we could add postlink check to warn users an unsupported version of java is installed and "here's how to fix it". This would check would happen every time an environment containing Nextflow is activated.

Let me know what you want to do.

@ewels
Copy link
Member

ewels commented Jan 17, 2023

nf-core docs updated in nf-core/website#1538 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants