hickit.js:457: Error #24

chenggang108 · 2020-04-26T02:04:16Z

Hi

I run into the problem as following, Can you please help with it?

hickit.js:457: Error: [File] Fail to open the file
	file = new File(args[getopt.ind]);
        ^
Error: [File] Fail to open the file
    at hic_bedflt (hickit.js:457:9)
    at main (hickit.js:723:28)
    at hickit.js:730:1

Thanks

Gang

The text was updated successfully, but these errors were encountered:

tanlongzhi · 2020-04-26T03:09:40Z

Hi Gang,

It seems the BED file you're using (for hickit.js bedflt) cannot be opened. Can you tell us the command you're using, and attach your input files?

Best,
Tan

chenggang108 · 2020-04-26T04:37:30Z

Hi Tan,

Here is how I ran the program

`./k8 hickit.js sam2seg -v b6_cast_snp_chr_parentheses.tsv aln.sam.gz | ./k8 hickit.js chronly - | ./k8 hickit.js bedflt par.bed - | gzip > contacts.seg.gz

The sam and tsv files are too big to attach. Do I need to provide a bed file? I thought that's generated by the previous step.

Thank you very much,

Gang

`

tanlongzhi · 2020-04-26T04:44:40Z

Hi Gang,

This error is caused by missing the file par.bed during the step | ./k8 hickit.js bedflt par.bed -.

Depending on the purpose of your analysis, a BED file par.bed may be required. In particular, for haplotype imputation and 3D modeling of a single diploid male cell, this file must be supplied, so that contacts involving the PARs can be removed (because this repo does not support 3D modeling of PARs in male cells).

If that is not your purpose, you can simply delete the step | ./k8 hickit.js bedflt par.bed -.

Best,
Tan

chenggang108 · 2020-04-26T04:52:12Z

Hi Tan.

Thank you so much. I am kind of new in the field of data analysis. Did nor realize the need for the bed file. I do no have the file in hand. Can you please kindly provide it?

Thanks

Gang

tanlongzhi · 2020-04-26T05:13:38Z

Hi Gang,

Here I've attached the BED files for the human hg19 genome and for the mouse mm10 genome, respectively. The files are quite simple, containing just the (0-based) genomic coordinates of each PAR.

hg19.par.bed.gz
mm10.par.bed.gz

Best,
Tan

chenggang108 · 2020-04-26T05:20:54Z

Hi Tan,

Thank you so much,

Gang

chenggang108 · 2020-04-26T05:30:20Z

I got the information when after running, Does it mean there is something wrong with my SNP file?

WARNING: a new allele A on read E00526:261:H773CCCX2:7:1101:11485:62980 at position chr4:37046608 (not C/T)
WARNING: a new allele C on read E00526:261:H773CCCX2:7:1101:13778:27187 at position chr7:87372925 (not G/A)
WARNING: a new allele C on read E00526:261:H773CCCX2:7:1101:13839:27222 at position chr7:87372925 (not G/A)
WARNING: a new allele T on read E00526:261:H773CCCX2:7:1101:17614:21948 at position chr9:33098399 (not A/G)
WARNING: a new allele T on read E00526:261:H773CCCX2:7:1101:18923:20454 at position chr10:84269456 (not G/C)
WARNING: a new allele A on read E00526:261:H773CCCX2:7:1101:18944:34518 at position chr9:73823632 (not T/C)
WARNING: conflicting phase at a segment of read E00526:261:H773CCCX2:7:1101:22617:60571
WARNING: a new allele T on read E00526:261:H773CCCX2:7:1101:31477:59991 at position chr10:35550755 (not C/G)
WARNING: a new allele A on read E00526:261:H773CCCX2:7:1101:31619:13580 at position chr4:22307624 (not C/G)
.
.
.
.

tanlongzhi · 2020-04-26T06:02:11Z

This is completely normal. I typically redirect these warning messages to a log file, so they don't flood your screen:

hickit -i contacts.seg.gz -o - 2> contacts.pairs.log | bgzip > contacts.pairs.gz

chenggang108 · 2020-04-27T00:42:33Z

Thank you, Tan,

chenggang108 · 2020-04-27T01:09:50Z

Can I skip the impute step?

chenggang108 · 2020-04-27T01:11:52Z

It looks the program does not allow me to use the raw contact.pair file directly.

./hickit -i contacts.pairs.gz -Sr1m -c1 -r10m -c5 -o no_imput.flt.pairs
[M::hk_map_read] read 675992 pairs
[M::hk_pair_filter_close_legs] filtered out 0/675992 pairs with close legs
[M::hk_pair_dedup] duplicate rate: 0.00% = 0 / 675992
hickit: main.c:188: main: Assertion `(m->cols & 0x3c) == 0x3c' failed.
Aborted

tanlongzhi · 2020-04-27T03:22:22Z

Technically you can skip imputation; but you can only use contacts whose both legs are haplotype-resolved. In other words, for each (non-header) row of contacts.pairs.gz, the two columns phase0 and phase1 must be either 0 or 1, but never ..

Therefore, to skip imputation, you must first remove any contacts with . in either of the two phase columns. You may do so using for example awk.

chenggang108 · 2020-04-30T00:26:04Z

Than you Tan,

I got my first reconstruction from my data.

Best

Gang

tanlongzhi · 2020-05-01T14:40:56Z

Hi, I can't find the comment from your latest email, but you need to use the latest version of this repo to have the #unit line in the output .3dg file.

chenggang108 · 2020-05-01T14:52:20Z

Hi Tan,

Sorry that I delete the previous issue. Because I found that my .3dg does not have #unit line. I am regenerating the file. But it looks the new file does not have the line either. Everything looks fine except the #unit line. What may cause this?

tanlongzhi · 2020-05-01T14:53:52Z

Again, you need to use a new version such as v0.1.1.

chenggang108 · 2020-05-01T14:56:46Z

I used curl -L https://github.com/lh3/hickit/releases/download/v0.1/hickit-0.1_x64-linux.tar.bz2 | tar -jxf -
to download the files.

chenggang108 · 2020-05-01T14:58:10Z

is it 0.1 not 0.1.1?

tanlongzhi · 2020-05-01T15:00:27Z

You downloaded an older release.

chenggang108 · 2020-05-01T15:06:19Z

Ok, got it.
Thank you

chenggang108 · 2020-05-06T22:58:54Z

Hi Tan,@tanlongzhi

I checked my imputation with hickit -i contacts.pairs.gz --out-val impute.val and found that the accuracy of intra-chromosome contacts close to the diagonal is only 0.5 in the first line, with the probability threshold of 0.99. I wonder why is that.

Is it possible that there are too few hits with the threshold of 0.99? For example, two hits with a threshold of 0.99, one is correct but the other one is wrong.

The following is how it looks:

0.99 0.5452 0.5000 0.4394 0.9999 0.4592 0.9999
0.98 0.5581 0.9995 0.4496 1.0000 0.4699 1.0000
0.97 0.5595 0.9998 0.4535 1.0000 0.4734 1.0000
0.96 0.5619 0.9999 0.4574 1.0000 0.4770 1.0000
0.95 0.5641 1.0000 0.4613 0.9996 0.4806 0.9997
0.94 0.5657 0.9977 0.4654 0.9996 0.4842 0.9990
0.93 0.5668 0.9974 0.4695 0.9993 0.4877 0.9987
0.92 0.5677 0.9970 0.4736 0.9991 0.4912 0.9984
0.91 0.5684 0.9966 0.4778 0.9989 0.4948 0.9981
0.90 0.5688 0.9956 0.4820 0.9986 0.4983 0.9977
0.89 0.5692 0.9946 0.4862 0.9987 0.5017 0.9974
0.88 0.5696 0.9926 0.4902 0.9981 0.5051 0.9963
0.87 0.5701 0.9926 0.4946 0.9972 0.5088 0.9959
0.86 0.5706 0.9906 0.4988 0.9954 0.5123 0.9940
0.85 0.5712 0.9814 0.5030 0.9944 0.5158 0.9907
0.84 0.5722 0.9728 0.5072 0.9921 0.5194 0.9866
0.83 0.5732 0.9681 0.5117 0.9899 0.5232 0.9839
0.82 0.5742 0.9646 0.5160 0.9880 0.5269 0.9817
0.81 0.5753 0.9602 0.5204 0.9848 0.5307 0.9783
0.80 0.5769 0.9492 0.5246 0.9804 0.5344 0.9723
0.79 0.5790 0.9406 0.5293 0.9754 0.5386 0.9664
0.78 0.5812 0.9326 0.5339 0.9687 0.5427 0.9595
0.77 0.5840 0.9167 0.5389 0.9625 0.5474 0.9508
0.76 0.5875 0.9057 0.5441 0.9542 0.5522 0.9419
0.75 0.5915 0.8902 0.5495 0.9463 0.5574 0.9322
0.74 0.5961 0.8733 0.5555 0.9345 0.5631 0.9193
0.73 0.6011 0.8598 0.5618 0.9217 0.5692 0.9064
0.72 0.6068 0.8449 0.5686 0.9100 0.5758 0.8940
0.71 0.6130 0.8313 0.5760 0.8964 0.5830 0.8805
0.70 0.6205 0.8144 0.5840 0.8835 0.5909 0.8666
0.69 0.6289 0.8043 0.5926 0.8665 0.5994 0.8516
0.68 0.6385 0.7898 0.6019 0.8519 0.6087 0.8371
0.67 0.6488 0.7770 0.6120 0.8373 0.6189 0.8230
0.66 0.6606 0.7613 0.6228 0.8233 0.6299 0.8087
0.65 0.6728 0.7495 0.6345 0.8089 0.6417 0.7951
0.64 0.6861 0.7356 0.6469 0.7960 0.6543 0.7821
0.63 0.7009 0.7241 0.6606 0.7807 0.6681 0.7679
0.62 0.7167 0.7089 0.6751 0.7663 0.6829 0.7534
0.61 0.7335 0.6968 0.6907 0.7536 0.6987 0.7408
0.60 0.7513 0.6900 0.7073 0.7412 0.7155 0.7298
0.59 0.7708 0.6759 0.7245 0.7280 0.7332 0.7166

tanlongzhi · 2020-05-07T02:19:02Z

Hi @chenggang108, I think your guess (low 50% imputation accuracy at 0.99 threshold is caused by statistical noise from small sample size) is probably right.

If you'd like to look further, @lh3 once wrote an undocumented debug mode for imputation validation:

hickit -i contacts.pairs.gz --val-radius=10m --dbg-val --out-val=impute.dbg.val 2> impute.dbg.val.log

This mode should output true and imputed haplotypes.

chenggang108 · 2020-05-20T03:00:10Z

Hi @tanlongzhi Longzhi,

Hope you are doing well. I am trying to figure out the calculation you did in the paper. Something I am not quite sure. I need to bother you again.

My first question is about the unit used for all the values. I want to make sure that the units of the coordinate in the 3dg file is 'particle radii', right? So that the values from ''dip-c pd'' and ''dip-c dist'' are all with 'particle radii' as units. Am I right?

The second question is about the potential somatic pairing. I am trying to figure out how you did it. Did you generate leg file for each haploid chromosome and did the pairwise distance? For example: dip-c pd -1 chr1P.leg -2 chr1M.leg cell1.3dg. It looks too much.

Thanks
Gang

tanlongzhi · 2020-05-20T06:41:37Z

Hi @chenggang108,

Yes, as long as you used scripts/hickit_3dg_to_3dg_rescale_unit.sh to convert hickit .3dg files to dip-c .3dg files.
For each 20-kb bin, its distance to its homologous locus is calculated with dip-c color -h.

chenggang108 · 2020-05-21T02:36:24Z

Thank you so much

chenggang108 · 2020-05-23T23:56:13Z

Hi @tanlongzhi ,

Sorry that I need to bother you again. I wonder is it true that we should get a very similar structure for one cell no matter how the seed is selected for hickit.

For example, if I run:

for rep in `seq 1 100`
do
  hickit -s${rep} -i impute.pairs.gz -Sr1m -c1 -r10m -c2 -b4m -b1m -O 1m.${rep}.3dg -b200k -O 200k.${rep}.3dg -D5 -b50k -O 50k.${rep}.3dg -D5 -b20k -O 20k.${rep}.3dg
done

I will have 100 3dg files for every resolution. If everything goes right, the RMSD between any two 3dg files from the 100 files should be very small, let's say RMSD< 3 or 2. The must be some wrong if it is not true.

I am not sure whether I am right.

Thanks

Gang

tanlongzhi · 2020-05-24T00:57:43Z

Yes, that's exactly how we determined which cell is good at which resolution. We took this RMSD concept (here implemented as "dip-c align") from Stevens et al. Nature 2017.

Note that one must take the RMSD values with a grain of salt. The 3D modeling algorithm is not perfect, and may thus become overconfident at a structure (RMSD too low) or terrible at finding a good structure (RMSD too high).

chenggang108 · 2020-05-24T01:12:23Z

So, usually, RMSD must be < 2?

tanlongzhi · 2020-05-24T01:33:09Z

I typically do RMSD ≤ 1.5.

chenggang108 · 2020-05-24T01:40:41Z

Thank you

chenggang108 · 2020-06-08T02:22:32Z

Hi @tanlongzhi,

I have a question about the 3D-modeling of mitotic cells. Are there any prophase or metaphase cells among all the cells you analyzed? Does the program work on these cells?

Thanks

Gang

tanlongzhi · 2020-06-08T02:31:53Z

We had one GM12878 cell at the very end of mitosis. It's a daughter cell right after cytokinesis. You can see its arrangement like one half of an anaphase cell. Please refer to the main figure of the paper for which cell.

In general, mitotic cells stand out because of their very low inter-chromosomal contacts.

I heard that Dr. Fraser's group managed to adapt the Dip-C algorithm for 3D-modeling mitotic cells (before cytokinesis); but I'm not aware of the details.

chenggang108 · 2020-06-08T03:04:18Z

Does the compaction of the genome affect the modeling?

tanlongzhi · 2020-06-08T03:09:47Z

I don't think so. According to electron microscopy (e.g. ChromEMT), DNA density in mitotic cells is not that high.

The main problem (before cytokinesis) is the presence of 4 copies (rather than 2) of each chromosome.

chenggang108 · 2020-06-08T04:04:21Z

Thank you

hickit.js:457: Error #24

hickit.js:457: Error #24

Comments

chenggang108 commented Apr 26, 2020

tanlongzhi commented Apr 26, 2020

chenggang108 commented Apr 26, 2020

tanlongzhi commented Apr 26, 2020 • edited Loading

chenggang108 commented Apr 26, 2020

tanlongzhi commented Apr 26, 2020

chenggang108 commented Apr 26, 2020

chenggang108 commented Apr 26, 2020

tanlongzhi commented Apr 26, 2020

chenggang108 commented Apr 27, 2020

chenggang108 commented Apr 27, 2020

chenggang108 commented Apr 27, 2020

tanlongzhi commented Apr 27, 2020

chenggang108 commented Apr 30, 2020

tanlongzhi commented May 1, 2020

chenggang108 commented May 1, 2020

tanlongzhi commented May 1, 2020

chenggang108 commented May 1, 2020

chenggang108 commented May 1, 2020

tanlongzhi commented May 1, 2020

chenggang108 commented May 1, 2020

chenggang108 commented May 6, 2020 • edited Loading

tanlongzhi commented May 7, 2020

chenggang108 commented May 20, 2020

tanlongzhi commented May 20, 2020

chenggang108 commented May 21, 2020

chenggang108 commented May 23, 2020

tanlongzhi commented May 24, 2020 • edited Loading

chenggang108 commented May 24, 2020

tanlongzhi commented May 24, 2020

chenggang108 commented May 24, 2020

chenggang108 commented Jun 8, 2020

tanlongzhi commented Jun 8, 2020

chenggang108 commented Jun 8, 2020

tanlongzhi commented Jun 8, 2020

chenggang108 commented Jun 8, 2020

tanlongzhi commented Apr 26, 2020 •

edited

Loading

chenggang108 commented May 6, 2020 •

edited

Loading

tanlongzhi commented May 24, 2020 •

edited

Loading