-
Notifications
You must be signed in to change notification settings - Fork 0
xtreemfs/xtreemfs-slurm
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
These scripts distributes XtreemFS over allocated slurm nodes. 1. Adjust to your environment. First, please adjust the env.sh. Afterwards, use "xtreemfs_slurm.sh" to start and stop a distributed environment. It can also clone a git repository (clone) or clean (cleanup) the local XtreemFS folder for the current slurm job on each allocated slurm node. 2. How to allocate slurm nodes? An environment file will be created in the current job folder containing useful variables regarding the started XtreemFS environment. Use the following statement, whereas -N refers to the number of nodes, which should be allocated. $ salloc -N2 -p <partition> -A <account> --exclusive 3. How to limit the number of OSD servers? Adjust the variable "NUMBER_OF_NODES" in the env.sh file to the number of OSD servers. Depending on "SAME_DIR_AND_MRC_NODE" add one or two to the number of nodes. E.g. for four OSD server and a shared DIR and MRC server, set the NUMBER_OF_NODES to five (four + one). 4. Something went wrong stopping the server. How to fix a broken server? Connect to the specific node via the following statement: $ srun -N1-1 --nodelist=node-n[xx] --pty bash If you want to stop a running server, locate the PID file and call "kill", e.g.: $ cat /local/$USER/xtreemfs/xxxxx/MRC.pid 12345 $ kill 12345 or short: $ kill $(</local/$USER/xtreemfs/xxxxx/MRC.pid) If the current job folder has been deleted already, use the following statement to retrieve the process id and kill it. $ ps -ax | grep java $ kill 12345 5. On the slurm nodes are still traces due to an error. Call "cleanup" or connect to each server (comp. (2)) and remove the folder by $ rm -r /local/$USER/xtreemfs
About
Scripts to deploy XtreemFS on a SLURM cluster
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published