Skip to content
Andy Pohl edited this page Jan 29, 2014 · 3 revisions

The paste program is a way to extract data, base by base, from multiple bigWigs simultaneously. Earlier versions of bwtool had programs to multiply or add two bigWigs together. Eventually it became clear that little is gained by adding complexity in that direction and bwtool be a lot more useful just by presenting the data in an easy-enough way that a small script can perform the calculation. Wig files have a vertical data format, but there is no guarantee the data in a wig file will properly align to another wig file base by base, line after line. Chromosomes may be stored in any order in a wig file, and data may be missing in different regions. paste takes care of this alignment issue, and can take an arbitrary number of bigWigs. Usage:

bwtool paste - simultaneously output same regions of multiple bigWigs
usage:
   bwtool paste input1.bw input2.bw input3.bw ...
options:
   -header           put header with labels from file or filenames
   -consts=c1,c2...  add constants to output lines
   -skip-NA          don't output lines (bases) where one of the inputs is NA

Examples

With the two main example bigWigs:

using paste the default way will result in the following:

$ bwtool paste main.bigWig second.bigWig 
chr	0	1	1.00	4.00
chr	1	2	2.00	2.00
chr	2	3	5.00	3.00
chr	3	4	6.00	4.00
chr	4	5	5.00	4.00
chr	5	6	3.00	3.00
chr	6	7	3.00	3.00
chr	7	8	5.00	7.00
chr	8	9	5.00	8.00
chr	9	10	5.00	7.00
chr	10	11	6.00	7.00
chr	11	12	6.00	5.00
chr	12	13	0.00	1.00
chr	13	14	2.00	2.00
chr	14	15	3.00	3.00
chr	15	16	3.00	3.00
chr	16	17	10.00	4.00
chr	17	18	4.00	4.00
chr	18	19	4.00	4.00
chr	19	20	2.00	2.00
chr	20	21	2.00	2.00
chr	21	22	2.00	2.00
chr	22	23	1.00	1.00
chr	23	24	NA	1.00
chr	24	25	NA	1.00
chr	25	26	NA	2.00
chr	26	27	NA	1.00
chr	27	28	2.00	2.00
chr	28	29	3.00	3.00
chr	29	30	4.00	4.00
chr	30	31	6.00	4.00
chr	31	32	6.00	2.00
chr	32	33	4.00	2.00
chr	33	34	4.00	2.00
chr	34	35	4.00	2.00
chr	35	36	2.00	2.00

As in the window program, it is useful to adjust the number of decimals with the -decimals option. Also useful is to keep track of which column represents which bigWig by adding a header with the -header option. And finally, NA values may or may not be useful, depending on the circumstance. If they're not desired, then the -skip-NA option can be used to eliminate output lines where any of the bigWigs has a missing base. All three of these options can be seen in this example:

$ bwtool paste main.bigWig second.bigWig -decimals=3 -header -skip-NA
#chrom	chromStart	chromEnd	main.bigWig	second.bigWig
chr	0	1	1.000	4.000
chr	1	2	2.000	2.000
chr	2	3	5.000	3.000
chr	3	4	6.000	4.000
chr	4	5	5.000	4.000
chr	5	6	3.000	3.000
chr	6	7	3.000	3.000
chr	7	8	5.000	7.000
chr	8	9	5.000	8.000
chr	9	10	5.000	7.000
chr	10	11	6.000	7.000
chr	11	12	6.000	5.000
chr	12	13	0.000	1.000
chr	13	14	2.000	2.000
chr	14	15	3.000	3.000
chr	15	16	3.000	3.000
chr	16	17	10.000	4.000
chr	17	18	4.000	4.000
chr	18	19	4.000	4.000
chr	19	20	2.000	2.000
chr	20	21	2.000	2.000
chr	21	22	2.000	2.000
chr	22	23	1.000	1.000
chr	27	28	2.000	2.000
chr	28	29	3.000	3.000
chr	29	30	4.000	4.000
chr	30	31	6.000	4.000
chr	31	32	6.000	2.000
chr	32	33	4.000	2.000
chr	33	34	4.000	2.000
chr	34	35	4.000	2.000
chr	35	36	2.000	2.000
Clone this wiki locally