-
Notifications
You must be signed in to change notification settings - Fork 22
window
The window program extracts data as tiled windows, in an easy-to-see way.
The usage:
bwtool window - slide a window across the bigWig and at each step print data
in a format like:
chrom<TAB>start<TAB>end<TAB>val_start,val_start+1,val_start+2,...,val_end
usage:
bwtool window size file.bw
options:
-step=n skip n bases when sliding window (default 1)
-skip-NA don't output lines (windows) containing any NA values
-center print start and end coordinates of the middle of the window
with size step such that the start/ends are connected each
line (if step < size)
Using the same example from the aggregate page:
I can do very simply ask for 5-base windows of data, every base:
$ bwtool window 5 main.bigWig
chr 0 5 1.00,2.00,5.00,6.00,5.00
chr 1 6 2.00,5.00,6.00,5.00,3.00
chr 2 7 5.00,6.00,5.00,3.00,3.00
chr 3 8 6.00,5.00,3.00,3.00,5.00
chr 4 9 5.00,3.00,3.00,5.00,5.00
chr 5 10 3.00,3.00,5.00,5.00,5.00
chr 6 11 3.00,5.00,5.00,5.00,6.00
chr 7 12 5.00,5.00,5.00,6.00,6.00
chr 8 13 5.00,5.00,6.00,6.00,0.00
chr 9 14 5.00,6.00,6.00,0.00,2.00
chr 10 15 6.00,6.00,0.00,2.00,3.00
chr 11 16 6.00,0.00,2.00,3.00,3.00
chr 12 17 0.00,2.00,3.00,3.00,10.00
chr 13 18 2.00,3.00,3.00,10.00,4.00
chr 14 19 3.00,3.00,10.00,4.00,4.00
chr 15 20 3.00,10.00,4.00,4.00,2.00
chr 16 21 10.00,4.00,4.00,2.00,2.00
chr 17 22 4.00,4.00,2.00,2.00,2.00
chr 18 23 4.00,2.00,2.00,2.00,1.00
chr 19 24 2.00,2.00,2.00,1.00,NA
chr 20 25 2.00,2.00,1.00,NA,NA
chr 21 26 2.00,1.00,NA,NA,NA
chr 22 27 1.00,NA,NA,NA,NA
chr 23 28 NA,NA,NA,NA,2.00
chr 24 29 NA,NA,NA,2.00,3.00
chr 25 30 NA,NA,2.00,3.00,4.00
chr 26 31 NA,2.00,3.00,4.00,6.00
chr 27 32 2.00,3.00,4.00,6.00,6.00
chr 28 33 3.00,4.00,6.00,6.00,4.00
chr 29 34 4.00,6.00,6.00,4.00,4.00
chr 30 35 6.00,6.00,4.00,4.00,4.00
chr 31 36 6.00,4.00,4.00,4.00,2.00
But maybe those NAs are not desired. To simplify things downstream, you can use -fill=value, or -skip-NA:
$ bwtool window 5 main.bigWig -skip-NA
chr 0 5 1.00,2.00,5.00,6.00,5.00
chr 1 6 2.00,5.00,6.00,5.00,3.00
chr 2 7 5.00,6.00,5.00,3.00,3.00
chr 3 8 6.00,5.00,3.00,3.00,5.00
chr 4 9 5.00,3.00,3.00,5.00,5.00
chr 5 10 3.00,3.00,5.00,5.00,5.00
chr 6 11 3.00,5.00,5.00,5.00,6.00
chr 7 12 5.00,5.00,5.00,6.00,6.00
chr 8 13 5.00,5.00,6.00,6.00,0.00
chr 9 14 5.00,6.00,6.00,0.00,2.00
chr 10 15 6.00,6.00,0.00,2.00,3.00
chr 11 16 6.00,0.00,2.00,3.00,3.00
chr 12 17 0.00,2.00,3.00,3.00,10.00
chr 13 18 2.00,3.00,3.00,10.00,4.00
chr 14 19 3.00,3.00,10.00,4.00,4.00
chr 15 20 3.00,10.00,4.00,4.00,2.00
chr 16 21 10.00,4.00,4.00,2.00,2.00
chr 17 22 4.00,4.00,2.00,2.00,2.00
chr 18 23 4.00,2.00,2.00,2.00,1.00
chr 27 32 2.00,3.00,4.00,6.00,6.00
chr 28 33 3.00,4.00,6.00,6.00,4.00
chr 29 34 4.00,6.00,6.00,4.00,4.00
chr 30 35 6.00,6.00,4.00,4.00,4.00
chr 31 36 6.00,4.00,4.00,4.00,2.00
It should also be mentioned that the output of the window program can be severely large. Think for a minute whether you have the space to store 1000 bp windows every base, with 4 decimal precision, uncompressed, for a genome-wide bigWig. You might not. The best case is that this is immediately piped into something else, which promptly does a calculation of some sort. Otherwise, this is a good time to limit decimal places and perhaps use the -step option to not go base-by-base, but instead make jumps of a specified amount. For example:
$ bwtool window 5 main.bigWig -skip-NA -decimals=0 -step=3
chr 0 5 1,2,5,6,5
chr 3 8 6,5,3,3,5
chr 6 11 3,5,5,5,6
chr 9 14 5,6,6,0,2
chr 12 17 0,2,3,3,10
chr 15 20 3,10,4,4,2
chr 18 23 4,2,2,2,1
chr 27 32 2,3,4,6,6
chr 30 35 6,6,4,4,4
has trimmed things down quite a bit. At this point, one thing left that may seem convenient is to change coordinates to something that reflects what the window is boiling down to. If the idea is to pipe the data from the window into something that averages it, and then make a new bedGraph based on the average, using for example a small awk program (window_ave.awk):
$ bwtool window 5 main.bigWig -skip-NA -step=3 -decimals=0 | awk -f window_ave.awk
chr 0 5 2.8
chr 3 8 3.4
chr 6 11 3.6
chr 9 14 3.4
chr 12 17 1.6
chr 15 20 4.2
chr 18 23 2
chr 27 32 3
chr 30 35 4
is not the answer, because overlapping intervals are not allowed in the bedGraph format. The -center option helps with this problem:
$ bwtool window 5 main.bigWig -skip-NA -step=3 -decimals=0 -center | awk -f window_ave.awk
chr 1 4 2.8
chr 4 7 3.4
chr 7 10 3.6
chr 10 13 3.4
chr 13 16 1.6
chr 16 19 4.2
chr 19 22 2
chr 28 31 3
chr 31 34 4