-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy pathREADME.txt
123 lines (94 loc) · 3.99 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
Rubra: a bioinformatics pipeline.
---------------------------------
https://github.com/bjpop/rubra
License:
--------
Rubra is licensed under the MIT license. See LICENSE.txt.
Description:
------------
Rubra is a pipeline system for bioinformatics workflows. It is built on top
of the Ruffus (http://www.ruffus.org.uk/) Python library, and adds support
for running pipeline stages on a distributed compute cluster.
Authors:
--------
Bernie Pope, Clare Sloggett, Gayle Philip, Matthew Wakefield
Installation:
-------------
To install, clone this repository and run `setup.py`:
git clone https://github.com/bjpop/rubra
cd rubra
python setup.py install
If you are on a system where you do not have administrative privileges, we
suggest using virtualenv ( http://www.virtualenv.org/ ). On HPC systems you
may find virtualenv is already installed.
Usage:
------
usage: rubra [-h] PIPELINE_FILE --config CONFIG_FILE
[CONFIG_FILE ...] [--verbose {0,1,2}]
[--style {print,run,touchfiles,flowchart}] [--force TASKNAME]
[--end TASKNAME] [--rebuild {fromstart,fromend}]
A bioinformatics pipeline system.
optional arguments:
-h, --help show this help message and exit
PIPELINE_FILE Your Ruffus pipeline stages (a Python module)
--config CONFIG_FILE [CONFIG_FILE ...]
One or more configuration files (Python modules)
--verbose {0,1,2} Output verbosity level: 0 = quiet; 1 = normal; 2 =
chatty (default is 1)
--style {print,run,touchfiles,flowchart}
Pipeline behaviour: print; run; touchfiles; flowchart (default is
print)
--force TASKNAME tasks which are forced to be out of date regardless of
timestamps
--end TASKNAME end points (tasks) for the pipeline
--rebuild {fromstart,fromend}
rebuild outputs by working back from end tasks or
forwards from start tasks (default is fromstart)
Example:
--------
Below is a little example pipeline which you can find in the Rubra source
tree. It counts the number of lines in two files (test/data1.txt and
test/data2.txt), and then sums the results together.
rubra example_pipeline.py --config example_config.py --style run
There are 2 lines in the first file and 1 line in the second file. So the
result is 3, which is written to the output file test/total.txt.
The --pipeline argument is a Python script which contains the actual
code for each pipeline stage (using Ruffus notation). The --config
argument is a Python script which contains configuration options for the
whole pipeline, plus options for each stage (including the shell command
to run in the stage). The --style argument says what to do with the pipeline:
"run" means "perform the out-of-date steps in the pipeline". The default
style is "print" which just displays what the pipeline would do if it were
run. You can get a diagram of the pipeline using the "flowchart" style. You
can touch all files in order using the "touchfiles" style, which is mostly
useful for forcing Ruffus to acknowledge that a set of steps is up to date.
Configuration:
--------------
Configuration options are written into one or more Python scripts, which
are passed to Rubra via the --config command line argument.
Some options are required, and some are, well, optional.
Options for the whole pipeline:
-------------------------------
pipeline = {
"logDir": "log",
"logFile": "pipeline.log",
"procs": 2,
"end": ["total"],
}
Options for each stage of the pipeline:
---------------------------------------
stageDefaults = {
"distributed": False,
"walltime": "00:10:00",
"memInGB": 1,
"queue": "batch",
"modules": ["python-gcc"]
}
stages = {
"countLines": {
"command": "wc -l %file > %out",
},
"total": {
"command": "./test/total.py %files > %out",
},
}