Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux jobs that timeout are silently marked as completed #5736

Open
epstein6 opened this issue Jan 31, 2025 · 0 comments
Open

Flux jobs that timeout are silently marked as completed #5736

epstein6 opened this issue Jan 31, 2025 · 0 comments

Comments

@epstein6
Copy link

Bug report

(Please follow this template replacing the text between parentheses with the requested information)

Expected behavior and actual behavior

When using the flux executor, if a flux job reaches its timeout, it is silently marked as completed.

Steps to reproduce the problem

nextflow run main.nf -process.executor=flux

main.nf:

process GOTTCHA2 {
    time "1min"
//    container quay.io/biocontainers/gottcha2
    script:
    """
        sleep 360
    """
}

workflow {
    GOTTCHA2 ()
}

Program output

 N E X T F L O W   ~  version 24.11.0-edge

Launching `modules/local/gottcha2/main.nf` [ecstatic_cray] DSL2 - revision: 364e2f58fb

executor >  flux (1)
[40/55d301] process > GOTTCHA2 [100%] 1 of 1 ✔
Completed at: 31-Jan-2025 15:39:38
Duration    : 1m 41s
CPU hours   : (a few seconds)
Succeeded   : 1
Jan-31 15:37:56.080 [main] DEBUG nextflow.cli.Launcher - $> nextflow run modules/local/gottcha2/main.nf -process.executor=flux
Jan-31 15:37:56.184 [main] DEBUG nextflow.cli.CmdRun - N E X T F L O W  ~  version 24.11.0-edge
Jan-31 15:37:56.201 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/g/g20/epstein6/.nextflow/plugins; core-plugins: nf-amazon@2.10.0,nf-azure@1.11.0,nf-cloudcache@0.4.2,nf-codecommit@0.2.2,nf-console@1.1.4,nf-google@1.16.0,nf-tower@1.9.3,nf-wave@1.8.0
Jan-31 15:37:56.224 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Jan-31 15:37:56.225 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Jan-31 15:37:56.227 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.12.0 in 'deployment' mode
Jan-31 15:37:56.239 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
Jan-31 15:37:56.287 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /usr/WS2/epstein6/nf-core-test/nextflow.config
Jan-31 15:37:56.289 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /usr/WS2/epstein6/nf-core-test/nextflow.config
Jan-31 15:37:56.318 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /g/g20/epstein6/.nextflow/secrets/store.json
Jan-31 15:37:56.322 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@6f63c44f] - activable => nextflow.secret.LocalSecretsProvider@6f63c44f
Jan-31 15:37:56.327 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Jan-31 15:37:57.972 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 by global default
Jan-31 15:37:57.986 [main] DEBUG nextflow.cli.CmdRun - Launching `modules/local/gottcha2/main.nf` [ecstatic_cray] DSL2 - revision: 364e2f58fb
Jan-31 15:37:57.987 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins declared=[nf-schema@2.3.0]
Jan-31 15:37:57.988 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
Jan-31 15:37:57.988 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[nf-schema@2.3.0]
Jan-31 15:37:57.988 [main] DEBUG nextflow.plugin.PluginUpdater - Installing plugin nf-schema version: 2.3.0
Jan-31 15:37:57.997 [main] INFO  org.pf4j.AbstractPluginManager - Plugin 'nf-schema@2.3.0' resolved
Jan-31 15:37:57.998 [main] INFO  org.pf4j.AbstractPluginManager - Start plugin 'nf-schema@2.3.0'
Jan-31 15:37:58.004 [main] DEBUG nextflow.plugin.BasePlugin - Plugin started nf-schema@2.3.0
Jan-31 15:37:58.046 [main] DEBUG nextflow.Session - Session UUID: 8940bf40-d113-4aa4-ab8d-0b7b23a2739f
Jan-31 15:37:58.046 [main] DEBUG nextflow.Session - Run name: ecstatic_cray
Jan-31 15:37:58.047 [main] DEBUG nextflow.Session - Executor pool size: 96
Jan-31 15:37:58.052 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null
Jan-31 15:37:58.056 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=288; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Jan-31 15:37:58.074 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 24.11.0-edge build 5929
  Created: 03-12-2024 09:30 UTC (01:30 PDT)
  System: Linux 4.18.0-553.34.1.1toss.t4.x86_64
  Runtime: Groovy 4.0.24 on OpenJDK 64-Bit Server VM 17.0.14-internal+0-adhoc..src
  Encoding: UTF-8 (UTF-8)
  Process: 3120233@<machine>
  CPUs: 96 - Mem: 502 GB (238.3 GB) - Swap: 0 (0)
Jan-31 15:37:58.108 [main] DEBUG nextflow.Session - Work-dir: /usr/WS2/epstein6/nf-core-test/work [nfs]
Jan-31 15:37:58.110 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /usr/WS2/epstein6/nf-core-test/modules/local/gottcha2/bin
Jan-31 15:37:58.126 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
Jan-31 15:37:58.138 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Jan-31 15:37:58.162 [main] DEBUG nextflow.Session - Observer factory: ValidationObserverFactory
Jan-31 15:37:58.207 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
Jan-31 15:37:58.217 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 97; maxThreads: 1000
Jan-31 15:37:58.334 [main] DEBUG nextflow.Session - Session start
Jan-31 15:37:58.337 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow started -- trace file: /usr/WS2/epstein6/nf-core-test/null/pipeline_info/execution_trace_2025-01-31_15-37-56.txt
Jan-31 15:37:58.434 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Jan-31 15:37:58.496 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: flux
Jan-31 15:37:58.497 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'flux'
Jan-31 15:37:58.505 [main] DEBUG nextflow.executor.Executor - [warm up] executor > flux
Jan-31 15:37:58.511 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'flux' > capacity: 100; pollInterval: 5s; dumpInterval: 5m 
Jan-31 15:37:58.512 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: flux)
Jan-31 15:37:58.515 [main] DEBUG n.executor.AbstractGridExecutor - Creating executor 'flux' > queue-stat-interval: 1m
Jan-31 15:37:58.587 [main] DEBUG nextflow.Session - Config process names validation disabled as requested
Jan-31 15:37:58.588 [main] DEBUG nextflow.Session - Igniting dataflow network (1)
Jan-31 15:37:58.588 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > GOTTCHA2
Jan-31 15:37:58.589 [main] DEBUG nextflow.script.ScriptRunner - Parsed script files:
  Script_df8ffcb13a6bf64d: /usr/WS2/epstein6/nf-core-test/modules/local/gottcha2/main.nf
Jan-31 15:37:58.589 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination 
Jan-31 15:37:58.589 [main] DEBUG nextflow.Session - Session await
Jan-31 15:37:58.734 [Task submitter] DEBUG nextflow.executor.FluxExecutor - Custom memory request is not currently supported by Flux.
Jan-31 15:37:59.490 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [FLUX] submitted process GOTTCHA2 > jobId: fZkDQVg53Qs; workDir: /usr/WS2/epstein6/nf-core-test/work/40/55d3012a6120940e499b27603e370e
Jan-31 15:37:59.491 [Task submitter] INFO  nextflow.Session - [40/55d301] Submitted process > GOTTCHA2
Jan-31 15:39:33.541 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: fZkDQVg53Qs; id: 1; name: GOTTCHA2; status: COMPLETED; exit: 0; error: -; workDir: /usr/WS2/epstein6/nf-core-test/work/40/55d3012a6120940e499b27603e370e started: 1738366683522; exited: 2025-01-31T23:39:01.911591Z; ]
Jan-31 15:39:33.543 [Task monitor] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'TaskFinalizer' minSize=10; maxSize=288; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Jan-31 15:39:33.577 [main] DEBUG nextflow.Session - Session await > all processes finished
Jan-31 15:39:33.579 [TaskFinalizer-1] DEBUG nextflow.trace.TraceRecord - Not a valid trace `realtime` value: '2'
Jan-31 15:39:38.535 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: flux) - terminating tasks monitor poll loop
Jan-31 15:39:38.536 [main] DEBUG nextflow.Session - Session await > all barriers passed
Jan-31 15:39:38.542 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'TaskFinalizer' shutdown completed (hard=false)
Jan-31 15:39:38.551 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=1; failedCount=0; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=1m 30s; failedDuration=0ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=1; peakMemory=6 GB; ]
Jan-31 15:39:38.551 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow completed -- saving trace file
Jan-31 15:39:38.553 [main] DEBUG nextflow.trace.ReportObserver - Workflow completed -- rendering execution report
Jan-31 15:39:38.919 [main] DEBUG nextflow.trace.TimelineObserver - Workflow completed -- rendering execution timeline
Jan-31 15:39:39.059 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Jan-31 15:39:39.077 [main] INFO  org.pf4j.AbstractPluginManager - Stop plugin 'nf-schema@2.3.0'
Jan-31 15:39:39.077 [main] DEBUG nextflow.plugin.BasePlugin - Plugin stopped nf-schema
Jan-31 15:39:39.078 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'FileTransfer' shutdown completed (hard=false)
Jan-31 15:39:39.078 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

Environment

  • Nextflow version: b582cb5719620c0d232faca439beb7268129929c
  • Java version: openjdk 17.0.14-internal 2025-01-21
  • Operating system: Linux
  • Bash version: zsh 5.5.1 (x86_64-redhat-linux-gnu)

Additional context

I tried to update the status codes listed here (https://github.com/nextflow-io/nextflow/blob/master/modules/nextflow/src/main/groovy/nextflow/executor/FluxExecutor.groovy#L169 ), as they don't quite match the output, but that didn't seem to help. It looks like the exit code in the work directory is 0. I'm not sure where that comes from or if that's incorrect.

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant