Skip to content

Tools to simplify transferring data from NERSC to GridPP storage elements.

License

Notifications You must be signed in to change notification settings

djwhiteastro/NERSC-to-GridPP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NERSC-to-GridPP

Tools to 'simplify' transferring data from NERSC to GridPP storage elements.

Requires

  • DIRAC tools - tested with version 6.21.10
  • GFAL2 (and python bindings) - tested with version 2.15.2
  • Access to relevant systems from local client machine (e.g. via voms-proxy-init).

Environment setup

During development of this tool a bug was found in the version of GFAL that was accessible with a CernVM via CVMFS. A fix was created, and can be sourced as below, but this results in incompatibility with DIRAC. It is not possible to have access to both GFAL and DIRAC tools in the same environment as things currently stand. Therefore, the script has been split in to two stages (with the previous version retained for posterity and ease of update if things are fixed later on).

If using a CernVM (with SL6/Centos6), the following will source the correct version of GFAL:

source /cvmfs/grid.cern.ch/umd-sl6ui-test/etc/profile.d/setup-ui-example.sh

To access DIRAC tools, in a clean environment, use:

source /cvmfs/ganga.cern.ch/dirac_ui/bashrc 

For both cases, a valid proxy is required. For ease of use, set up for the longest valid time allowed:

voms-proxy-init --voms lsst --valid 24:00

If using a system without CernVM, similar incompatibilities are present. You must still use two different environments for each stage of the script.

Usage

Since the required packages are incompatible, the tools now has two distinct steps, and must make use of a tracking file to store previously-transferred files and relevant metadata.

Transfer:

python transfer.py -o <track-file> -s <source-dir> -d <dest-dir>

The script should only transfer files that have not been transferred correctly previously. This is useful as transfer of large amounts of data can take longer than proxy is valid for.

Future improvements:

  • Multiprocessing to transfer multiple files at once
  • Use local track-file to speed up batch transfer and save checking files through GFAL (takes significant amount of time when having to check many files)

Register:

python register.py -i <track-file> -l <LFN-path> -e <storage-element>

Again, this should (in theory) work for files already registered, but extensive testing is still ongoing.

About

Tools to simplify transferring data from NERSC to GridPP storage elements.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages