Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

[BUG] Failed to publish file across filesystems #265

Closed
cflerin opened this issue Dec 7, 2020 · 11 comments
Closed

[BUG] Failed to publish file across filesystems #265

cflerin opened this issue Dec 7, 2020 · 11 comments
Labels
bug Something isn't working

Comments

@cflerin
Copy link
Member

cflerin commented Dec 7, 2020

Describe the bug
All publish steps fail to complete when the Nextflow work directory and the current working directory are on different filesystems.

To Reproduce
Steps to reproduce the behavior:

  1. Configure with these options:

Use the NXF_WORK environmental variable to direct all working files to a scratch drive that's on a different filesystem than the current working directory. Test with any of the test profiles:

nextflow pull vib-singlecell-nf/vsn-pipelines -r v0.23.0

export NXF_WORK=/ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp
  1. Run using this entry point:
nextflow run vib-singlecell-nf/vsn-pipelines -profile scenic,test__scenic,singularity -entry scenic -r v0.23.0
  1. See error:
    These are warnings and the pipeline reports that it completes successfully, but there is no output data in the out directory:
WARN: Failed to publish file: /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/77/c96d18799ed09bb8629fa2b43066fc/expr_mat_tiny.loom; to: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/scenic/scenic_CI/cistarget/expr_mat_tiny.loom [link] -- See log file for details
WARN: Failed to publish file: /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/77/c96d18799ed09bb8629fa2b43066fc/scenic_CI__reg_mtf.csv.gz; to: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/scenic/scenic_CI/cistarget/scenic_CI__reg_mtf.csv.gz [link] -- See log file for details
WARN: Failed to publish file: /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/e4/e0b523315c1aa3a54f20007a054098/expr_mat_tiny.loom; to: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/scenic/scenic_CI/aucell/expr_mat_tiny.loom [link] -- See log file for details
WARN: Failed to publish file: /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/e4/e0b523315c1aa3a54f20007a054098/scenic_CI__auc_mtf.loom; to: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/scenic/scenic_CI/aucell/scenic_CI__auc_mtf.loom [link] -- See log file for details
WARN: Failed to publish file: /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/da/146559f2ea88adaccba7d20b0aa383/scenic_visualize.loom; to: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/scenic/scenic_CI/SCENIC_SCope_output.loom [link] -- See log file for details
WARN: Failed to publish file: /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/62/52a6bc5eddae9cd9ba5916966dc4ae/scenic_CI.SCENIC.loom; to: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/data/scenic_CI.SCENIC.loom [link] -- See log file for details

And an excerpt from the log reports Invalid cross-device link:

java.nio.file.FileSystemException: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/data/scenic_CI.SCENIC.loom -> /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/62/52a6bc5eddae9cd9ba5916966dc4ae/scenic_CI.SCENIC.loom: Invalid cross-device link

Expected behavior
Cross-filesystem publishing should work.

Screenshots
NA

Please complete the following information:

  • OS: CentOS Linux release 7.8.2003 (Core)
  • Nextflow Version: 20.04.1
  • vsn-pipelines Version: v0.23.0

Additional context
NA

@cflerin cflerin added the bug Something isn't working label Dec 7, 2020
@KrisDavie
Copy link
Member

I think this needs to just be documented for users to understand that links don't work across file systems. Our other option is to convert to copying for publishing rather than sym/hard linking, that will always work, but of course will take more space when people don't clear their work folders.

@cflerin
Copy link
Member Author

cflerin commented Dec 7, 2020

There must a better solution than to just say that this won't work at all. As a user, if I see that this NXF_WORK environmental variable is available in Nextflow, then it's reasonable to expect that it would work here.

Possible solutions:

  1. Switching everything to copy would indeed be a quick fix for all of this, and working directories should be treated as temporary use in any case.
  2. We can provide a parameter in the config that controls whether publish is done via hardlink or copy.

With option 2, the publish directives become:

process SC__PUBLISH {

    publishDir "${params.global.outdir}/data/intermediate", \
        mode: "${params.utils.publish.mode}", \
        saveAs: {
            filename -> "${outputFileName}"
        }

...

I've tested this briefly and it works for symlink, link, and copy methods in params.utils.publish.mode. I would also remove overwrite: true in this case to avoid re-copying large files, which can take a significant amount of time for many large files.

@dweemx
Copy link
Contributor

dweemx commented Dec 7, 2020

I like option 2 better (as you described) and setting probably symlink as default
Also good idea to remove overwrite: true

@cflerin
Copy link
Member Author

cflerin commented Dec 8, 2020

I implemented option 2 above, but using link as the default (this is how it was in the existing code anyway).

This was referenced Jan 19, 2021
@Zifeng-L
Copy link

I implemented option 2 above, but using link as the default (this is how it was in the existing code anyway).

Same issue here! I pulled v0.26.1 and still had the same problem. How can we get the notebooks now?

@cflerin
Copy link
Member Author

cflerin commented Oct 13, 2021

Hi @Zifeng1995 , I think you can solve this by changing the publish mode to 'copy' in your config file and restarting the pipeline with resume enabled.

@Zifeng-L
Copy link

Hi @cflerin , I am a new hand for nextflow. I tried to copy the publish mode to my config file but it did not work.

manifest {
   name = 'vib-singlecell-nf/vsn-pipelines'
   description = 'A repository of pipelines for single-cell data in Nextflow DSL2'
   homePage = 'https://github.com/vib-singlecell-nf/vsn-pipelines'
   version = '0.26.1'
   mainScript = 'main.nf'
   defaultBranch = 'master'
   nextflowVersion = '!>=20.10.0'
}

params {
   global {
      project_name = '10x_PBMC'
      outdir = 'out'
   }
   misc {
      test {
         enabled = false
      }
   }
   utils {
      container = 'vibsinglecellnf/utils:0.4.0'
      publish {
         compressionLevel = 6
         annotateWithBatchVariableName = false
         mode = 'link'
      }
   }
   sc {
      file_converter {
         off = 'h5ad'
         tagCellWithSampleId = true
         remove10xGEMWell = false
         useFilteredMatrix = true
         makeVarIndexUnique = false
      }
      scanpy {
         container = 'vibsinglecellnf/scanpy:0.5.2'
         report {
            annotations_to_plot = []
         }
         feature_selection {
            report_ipynb = '/src/scanpy/bin/reports/sc_select_variable_genes_report.ipynb'
            method = 'mean_disp_plot'
            minMean = 0.0125
            maxMean = 3
            minDisp = 0.5
            off = 'h5ad'
         }
         feature_scaling {
            method = 'zscore_scale'
            maxSD = 10
            off = 'h5ad'
         }
         neighborhood_graph {
            nPcs = 50
            off = 'h5ad'
         }
         dim_reduction {
            report_ipynb = '/src/scanpy/bin/reports/sc_dim_reduction_report.ipynb'
            pca {
               method = 'pca'
               nComps = 50
               off = 'h5ad'
            }
            umap {
               method = 'umap'
               off = 'h5ad'
            }
            tsne {
               method = 'tsne'
               off = 'h5ad'
            }
         }
         clustering {
            preflight_checks = true
            report_ipynb = '/src/scanpy/bin/reports/sc_clustering_report.ipynb'
            method = 'louvain'
            resolution = 0.8
            off = 'h5ad'
         }
         marker_genes {
            method = 'wilcoxon'
            ngenes = 0
            groupby = 'louvain'
            off = 'h5ad'
         }
         filter {
            report_ipynb = '/src/scanpy/bin/reports/sc_filter_qc_report.ipynb'
            cellFilterStrategy = 'fixedthresholds'
            cellFilterMinNGenes = 200
            cellFilterMaxNGenes = 4000
            cellFilterMaxPercentMito = 0.15
            geneFilterMinNCells = 3
            off = 'h5ad'
            outdir = 'out'
         }
         data_transformation {
            method = 'log1p'
            off = 'h5ad'
         }
         normalization {
            method = 'cpx'
            countsPerCellAfter = 10000
            off = 'h5ad'
         }
      }
      scope {
         genome = ''
         tree {
            level_1 = ''
            level_2 = ''
            level_3 = ''
         }
      }
   }
   data {
      tenx {
         cellranger_mex = 'data/10x/1k_pbmc/1k_pbmc_*/outs/'
      }
   }
}

process SC__PUBLISH {
    publishDir "${params.global.outdir}/data/intermediate", 
	mode: "${params.utils.publish.mode}", \
        saveAs: {
            filename -> "${outputFileName}"
        }


process {
   executor = 'local'
   cpus = 2
   memory = '60 GB'
   clusterOptions = '-A cluster_account'
   withLabel:compute_resources__default {
      time = '1h'
   }
   withLabel:compute_resources__minimal {
      cpus = 1
      memory = '1 GB'
   }
   withLabel:compute_resources__mem {
      cpus = 4
      memory = '160 GB'
   }
   withLabel:compute_resources__cpu {
      cpus = 20
      memory = '80 GB'
   }
   withLabel:compute_resources__report {
      maxForks = 2
      cpus = 1
      memory = '160 GB'
   }
   withLabel:compute_resources__24hqueue {
      time = '24h'
   }
}

timeline {
   enabled = true
   file = 'out/nextflow_reports/execution_timeline.html'
}

report {
   enabled = true
   file = 'out/nextflow_reports/execution_report.html'
}

trace {
   enabled = true
   file = 'out/nextflow_reports/execution_trace.txt'
}

dag {
   enabled = true
   file = 'out/nextflow_reports/pipeline_dag.svg'
}

min {
   enabled = false
}

vsc {
   enabled = false
}

docker {
   enabled = true
   runOptions = '-i -v /cluster/home/zfli:/cluster/home/zfli'
}

@cflerin
Copy link
Member Author

cflerin commented Oct 13, 2021

ok, take that publish step out (process SC__PUBLISH), and go back to your original config. This is the section you need to change:

   utils {
      container = 'vibsinglecellnf/utils:0.4.0'
      publish {
         compressionLevel = 6
         annotateWithBatchVariableName = false
         mode = 'link'
      }

make sure to set mode = 'copy' instead of link and this should fix your hardlink issue with the notebooks. Then re-run the pipeline with resume: nextflow run [...] -resume.

@Zifeng-L
Copy link

It still did not work after setting mode = 'copy'
This is part of my config file

params {
   global {
      project_name = '10x_PBMC'
      outdir = 'out'
   }
   misc {
      test {
         enabled = false
      }
   }
   utils {
      container = 'vibsinglecellnf/utils:0.4.0'
      publish {
         compressionLevel = 6
         annotateWithBatchVariableName = false
         mode = 'copy'
      }
   }

These are warnings

WARN: Failed to publish file: /cluster/home/zfli/test/single_sample_test/work/5e/e00c91db073fe4576372b28d935b48/1k_pbmc_v3_chemistry.SC__H5AD_TO_LOOM.loom; to: /cluster/home/zfli/test/single_sample_test/out/loom/1k_pbmc_v3_chemistry.SCope_output.loom [link] -- See log file for details
WARN: Failed to publish file: /cluster/home/zfli/test/single_sample_test/work/a6/a699ab4cc3a5b58bd7beb10bd99a9a/1k_pbmc_v2_chemistry.SC__H5AD_TO_LOOM.loom; to: /cluster/home/zfli/test/single_sample_test/out/loom/1k_pbmc_v2_chemistry.SCope_output.loom [link] -- See log file for details

@cflerin
Copy link
Member Author

cflerin commented Oct 13, 2021

It seems there are a few places where the publish mode is hardcoded in the loomHandler.nf processes.

But to get these files immediately you can just copy them using the full source and destination paths from the warning, for example:

cp \
  /cluster/home/zfli/test/single_sample_test/work/5e/e00c91db073fe4576372b28d935b48/1k_pbmc_v3_chemistry.SC__H5AD_TO_LOOM.loom \
  /cluster/home/zfli/test/single_sample_test/out/loom/1k_pbmc_v3_chemistry.SCope_output.loom

@Zifeng-L
Copy link

It seems there are a few places where the publish mode is hardcoded in the loomHandler.nf processes.

But to get these files immediately you can just copy them using the full source and destination paths from the warning, for example:

cp \
  /cluster/home/zfli/test/single_sample_test/work/5e/e00c91db073fe4576372b28d935b48/1k_pbmc_v3_chemistry.SC__H5AD_TO_LOOM.loom \
  /cluster/home/zfli/test/single_sample_test/out/loom/1k_pbmc_v3_chemistry.SCope_output.loom

Thanks for your help! I got it!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants