-
Notifications
You must be signed in to change notification settings - Fork 183
Arrowhead
Arrowhead is an algorithm for finding contact domains.
This is the usage that most users will likely use (more detailed usage below):
arrowhead <HiC file> <output_file>
Upon a successful run of Arrowhead, output_file will contain all the contact domains found along the diagonal in this format.
See this Colab notebook with an example run: notebook
arrowhead local/folder/HIC006.hic local/folder/contact_domains_list
This command will run Arrowhead on HIC006 at resolution 5 kB or 10 kB (depending on the map's resolution) and save all contact domains to the contact_domains_list file.
arrowhead https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined_30.hic contact_domains_list
This command will run Arrowhead at resolution 5kB on the GM12878 HiC map (high resolution) and save all contact domains to the contact_domains_list file. Note: these are the settings used to generate the official GM12878 contact domain list.
Default parameters for arrowhead described below.
arrowhead [-c chromosome(s)] [-m matrix size] [-r resolution] [--threads num_threads]
[-k normalization (NONE/VC/VC_SQRT/KR)] <HiC file>
<output_file> [feature_list] [control_list]
The required arguments are:
- <HiC file>: Address of HiC file which should end with ".hic". This is the file you would load into Juicebox. URLs or local addresses may be used. Running Arrowhead on MAPQ>30 and MAPQ>0 files generally gives comprable results.
- <output_file>: Final list of all contact domains found by Arrowhead. Can be visualized directly in Juicebox as a 2D annotation.
-- NOTE -- If you want to find scores for a feature and control list, both must be provided:
- [feature_list]: Feature list of loops/domains for which block scores are to be calculated
- [control_list]: Control list of loops/domains for which block scores are to be calculated
The optional arguments are:
-
-c <String(s)>
Chromosome(s) on which Arrowhead will be run. The number/letter for the chromosome can be used with or without appending the "chr" string. Multiple chromosomes can be specified using commas (e.g. 1,chr2,X,chrY) -
-m <int>
Size of the sliding window along the diagonal in which contact domains will be found. Must be an even number as (m/2) is used as the increment for the sliding window. (Default 2000) -
-r <int>
resolution for which Arrowhead will be run. Generally, 5kB (5000) or 10kB (10000) resolution is used depending on the depth of sequencing in the HiC file(s). -
-k <NONE/VC/VC_SQRT/KR>
Normalizations (case sensitive) that can be selected. Generally, KR (Knight-Ruiz) balancing should be used when available. -
--threads <int>
Number of threads to use (Arrowhead is multi-threaded). As of Juicer Tools Version 1.13.02, the default number of threads used is 1. Passing in a value of 0 will result in the jar calculating the number of available threads. Passing in a value >0 will result in that value being used directly.
Arrowhead uses the following parameters if optional flags are not provided.
Medium resolution maps:
-c (all chromosomes)
-m 2000
-r 10000
-k KR
High resolution maps:
-c (all chromosomes)
-m 2000
-r 5000
-k KR
The contact domain list created by Arrowhead will start with a header line, followed by a line for every contact domain. By default, the file should contain 12 fields per line in the following format:
chromosome1 x1 x2 chromosome2 y1 y2 color
corner_score Uvar Lvar Usign Lsign
Explanations of each field are as follows:
- chromosome = the chromosome that the domain is located on
- x1,x2/y1,y2 = the interval spanned by the domain (contact domains manifest as squares on the diagonal of a Hi-C matrix and as such: x1=y1, x2=y2)
- color = the color that the feature will be rendered as if loaded in Juicebox
- corner_score = the corner score, a score indicating the likelihood that a pixel is at the corner of a contact domain. Higher values indicate a greater likelihood of being at the corner of a domain (labeled
f1
in old format) - Uvar = the variance of the upper triangle (labeled
f2
in old format) - Lvar = the variance of the lower triangle (labeled
f3
in old format) - Usign = -1*(sum of the sign of the entries in the upper triangle) (labeled
f4
in old format) - Lsign = sum of the sign of the entries in the lower triangle (labeled
f5
in old format)
See Section IV.a.3 of the Extended Experimental Procedures of Rao, Huntley et al. Cell 2014 for more details.