-
Notifications
You must be signed in to change notification settings - Fork 0
File Format
3dg file stores the spatial positions of each genomic bin. It formatted as (separated by \t)
chr start end x y z is_valid
chr is the chromosome. start, end is the genomic position. x, y, z is the spatial position. is_valid denoted if the bin is valid (1 for valid, 0 for invalid). Usually, bins with the least contacts in Hi-C map are regarded as invalid. If you cannnot tell the validity, add 1 at the end of each row.
Index file is used at entire work. It indexes the genomic bins for faster searching. It formatted as (separated by \t)
chr start end index
chr is the chromosome. start, end is the genomic position. index is the index for genomic bins. This file can be created using *bedtools, as followed
bedtools makewindows -g REF/GENOME/PATH -w RESOLUTION | awk -v OFS="\t" 'BEGIN{i=0}{print $1, $2, $3, i; i+=1}' > OUT_FILE
Marker file is bed files denoting the markers enrichment at genome. Here we recommende to use the fold change over control file, for it preserving the enrichment information of the whole genome. The file is a noremal bed file format (separated by \t), as follow
chr start end value
chr is the chromosome. start, end is the genomic position. value is the enrichment.
den_dtp is a tsv file storing the density and DisTP. It formatted as
chr start end index is_valid x_loc y_loc z_loc density DisTP
chr is the chromosome. start, end is the genomic position. is_valid denoted if the bin is valid, same as inputted is_valid. x_loc, y_loc, z_loc is the spatial index for bin (see Methods from paper).
After D2 map, we puts the bins on density-DisTP matrix. The matrix is two-dimensional, with density on x-axis and DisTP on y-axis. For efficiently storing, we use np.reshape to reshape the matrix to one-dimensional and output to hist file. The first two lines in hist file denote the x axis and y axis of matrix. The third line shows the header for the rest of lines.