Skip to content

Commit

Permalink
v2.0 issues
Browse files Browse the repository at this point in the history
  • Loading branch information
arangrhie committed Dec 2, 2022
1 parent e2fb383 commit ad6f5df
Show file tree
Hide file tree
Showing 6 changed files with 1,056 additions and 7 deletions.
12 changes: 5 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ For any downstream analysis, please use the following files:
* Het sites: <ver.>/chm13.draft_<ver.>.curated_sv.20210612.vcf, <ver.>/chm13.draft_<ver.>.hets_combined.20210615.bed

## Releases
* 2022-12-02 Issues added for v2.0. X and Y were simultaneously used in T2T-HG002XYv2.7, and issues found on the Y are appended to v1.1_issues.bed. Note the sequencing data used is from HG002
* 2021-10-13 Het regions lifted over from v1.0 to v1.1
* 2021-06-23 Updating 3 additional issues and adding error k-mers in v1.0 and v1.1
* 2021-06-15 Validated het SVs and clusters of heterozygous sites in v1.0 assembly
Expand All @@ -28,14 +29,11 @@ Brief descriptions are provided for
* [Error Detection](https://github.com/marbl/CHM13-issues/blob/main/error_detection.md)
* [Heterozygous variants in CHM13](https://github.com/marbl/CHM13-issues/blob/main/het_variants.md)

More details will be provided in [T2T-Polish](https://github.com/arangrhie/T2T-Polish).
More details for the polishing and evaluation methods applied on CHM13 is available in [T2T-Polish](https://github.com/arangrhie/T2T-Polish). For the methods used for polishing a nd evaluating the Y, see this [preprint](https://doi.org/10.1101/2022.12.01.518724) for more details.

## Citation

Please cite the papers below if any of the materials posted on this github are used:

Mc Cartney AM, Shafin K, Alonge M et al., Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat Methods (2022) https://www.nature.com/articles/s41592-022-01440-3

(bioRxiv version: https://doi.org/10.1101/2021.07.02.450803)

Nurk S, Koren S, Rhie A, and Rautiainen M et al., The complete sequence of a human genome. Science (2022) https://www.science.org/doi/10.1126/science.abj6987
- Issues found in HG002XYv2.7, especially the Y: Rhie A, Nurk S, Cechova M, Hoyt S, Taylor DJ et al., The complete sequence of a human Y chromosome. bioRxiv (2022) https://doi.org/10.1101/2022.12.01.518724
- General methods for finding issues and hets: Mc Cartney AM, Shafin K, Alonge M et al., Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat Methods (2022) https://doi.org/10.1038/s41592-022-01440-3 (bioRxiv version: https://doi.org/10.1101/2021.07.02.450803)
- Issues for v0.9-v1.1: Nurk S, Koren S, Rhie A, and Rautiainen M et al., The complete sequence of a human genome. Science (2022) https://doi.org/10.1126/science.abj6987
63 changes: 63 additions & 0 deletions v2.0/issues_raw/hg002XYv2.7_hifi.pri.markersandlength.issues.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
chrX 111789 124348 Low_GA/TC 100 . 111789 124348 153,153,255
chrX 464345 466804 Low_GA/TC 100 . 464345 466804 153,153,255
chrX 952516 977463 Low_GA/TC 100 . 952516 977463 153,153,255
chrX 1013607 1022517 Low_GA/TC 100 . 1013607 1022517 153,153,255
chrX 1169056 1199860 Low_GA/TC 100 . 1169056 1199860 153,153,255
chrX 1271779 1288246 Low_GA/TC 100 . 1271779 1288246 153,153,255
chrX 1481327 1493730 Low_GA/TC 100 . 1481327 1493730 153,153,255
chrX 1507903 1536405 Low_GA/TC 100 . 1507903 1536405 153,153,255
chrX 1656815 1666791 Low_GA/TC 100 . 1656815 1666791 153,153,255
chrX 1741755 1756333 Low_GA/TC 100 . 1741755 1756333 153,153,255
chrX 1841479 1841670 Low_GA/TC 100 . 1841479 1841670 153,153,255
chrX 2058369 2067082 Low 100 . 2058369 2067082 204,0,0
chrX 53074107 53074468 Low_AT 100 . 53074107 53074468 204,153,255
chrX 59152747 59175740 Low 100 . 59152747 59175740 204,0,0
chrX 59526514 59551616 Low 100 . 59526514 59551616 204,0,0
chrX 59619212 59642810 Low 100 . 59619212 59642810 204,0,0
chrX 59989683 60022043 Low 100 . 59989683 60022043 204,0,0
chrX 70196748 70217523 Low 100 . 70196748 70217523 204,0,0
chrX 70247031 70264821 Low 100 . 70247031 70264821 204,0,0
chrX 106439648 106440791 Low_GA/TC 100 . 106439648 106440791 153,153,255
chrX 114264697 114298621 Low_GA/TC 100 . 114264697 114298621 153,153,255
chrX 123809020 123820204 Low_GA/TC 100 . 123809020 123820204 153,153,255
chrY 17104 18186 Low 100 . 17104 18186 204,0,0
chrY 17408 18432 Clipped 100 . 17408 18432 153,153,153
chrY 160996 177655 Low_Qual 100 . 160996 177655 204,0,0
chrY 378455 383588 Low_GA/TC 100 . 378455 383588 153,153,255
chrY 443956 456524 Low_GA/TC 100 . 443956 456524 153,153,255
chrY 1011556 1025684 Low_Qual 100 . 1011556 1025684 204,0,0
chrY 1222520 1231947 Low_GA/TC 100 . 1222520 1231947 153,153,255
chrY 1239627 1240846 Low_GA/TC 100 . 1239627 1240846 153,153,255
chrY 1329712 1344829 Low_GA/TC 100 . 1329712 1344829 153,153,255
chrY 1378171 1379807 Low_GA/TC 100 . 1378171 1379807 153,153,255
chrY 1536858 1549302 Low_GA/TC 100 . 1536858 1549302 153,153,255
chrY 1565551 1595335 Low_Qual 100 . 1565551 1595335 204,0,0
chrY 1716287 1722799 Low_GA/TC 100 . 1716287 1722799 153,153,255
chrY 1806170 1811785 Low_GA/TC 100 . 1806170 1811785 153,153,255
chrY 1988968 1994281 Low_GA/TC 100 . 1988968 1994281 153,153,255
chrY 2124419 2131881 Low 100 . 2124419 2131881 204,0,0
chrY 32870443 32871393 Low_AT 100 . 32870443 32871393 204,153,255
chrY 37012166 37013162 Low_AT 100 . 37012166 37013162 204,153,255
chrY 37490885 37490905 Low_AT 100 . 37490885 37490905 204,153,255
chrY 37497796 37500859 Low_AT 100 . 37497796 37500859 204,153,255
chrY 38668791 38674960 Low_AT 100 . 38668791 38674960 204,153,255
chrY 40125391 40131766 Low_AT 100 . 40125391 40131766 204,153,255
chrY 42701050 42702863 Low_AT 100 . 42701050 42702863 204,153,255
chrY 42847777 42849936 Low_AT 100 . 42847777 42849936 204,153,255
chrY 43271560 43282928 Low_AT 100 . 43271560 43282928 204,153,255
chrY 43358895 43375703 Low_AT 100 . 43358895 43375703 204,153,255
chrY 46504723 46516658 Low_AT 100 . 46504723 46516658 204,153,255
chrY 46602730 46604942 Low_AT 100 . 46602730 46604942 204,153,255
chrY 47775928 47775977 Low_AT 100 . 47775928 47775977 204,153,255
chrY 49294426 49297464 Low_AT 100 . 49294426 49297464 204,153,255
chrY 49602111 49606377 Low_AT 100 . 49602111 49606377 204,153,255
chrY 49651185 49672128 Low_AT 100 . 49651185 49672128 204,153,255
chrY 50650067 50650218 Low_AT 100 . 50650067 50650218 204,153,255
chrY 53129348 53147569 Low_AT 100 . 53129348 53147569 204,153,255
chrY 53133312 53134336 Clipped 100 . 53133312 53134336 153,153,153
chrY 53744112 53745202 Low_AT 100 . 53744112 53745202 204,153,255
chrY 56066458 56068192 Low_AT 100 . 56066458 56068192 204,153,255
chrY 58942886 58942957 Low_AT 100 . 58942886 58942957 204,153,255
chrY 59102175 59131089 Low_AT 100 . 59102175 59131089 204,153,255
chrY 61405166 61409209 Low_AT 100 . 61405166 61409209 204,153,255
chrY 61421838 61424056 Low_AT 100 . 61421838 61424056 204,153,255
16 changes: 16 additions & 0 deletions v2.0/issues_raw/hg002XYv2.7_merqury_hybrid_error.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
chrX 34770652 34770687
chrX 69746592 69746619
chrX 98419015 98419048
chrX 99494537 99494561
chrX 118615807 118615841
chrX 122423772 122423793
chrX 149333396 149333427
chrY 174609 174634
chrY 174636 174657
chrY 174661 174682
chrY 174732 174757
chrY 1012054 1012113
chrY 1012661 1012683
chrY 1029257 1029278
chrY 1594903 1594924
chrY 21427265 21427295
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
chrY 32270368 32452817 High 100 . 32270368 32452817 153,102,255
chrY 32348160 32349184 Clipped 100 . 32348160 32349184 153,153,153
chrY 32358400 32359424 Clipped 100 . 32358400 32359424 153,153,153
chrY 32360448 32361472 Clipped 100 . 32360448 32361472 153,153,153
chrY 32377856 32378880 Clipped 100 . 32377856 32378880 153,153,153
chrY 32408576 32409600 Clipped 100 . 32408576 32409600 153,153,153
chrY 32410624 32411648 Clipped 100 . 32410624 32411648 153,153,153
chrY 32416768 32417792 Clipped 100 . 32416768 32417792 153,153,153
chrY 32420864 32421888 Clipped 100 . 32420864 32421888 153,153,153
chrY 32424960 32425984 Clipped 100 . 32424960 32425984 153,153,153
chrY 33844815 33846968 High 100 . 33844815 33846968 153,102,255
chrY 33859931 33860799 High 100 . 33859931 33860799 153,102,255
chrY 34058753 34065421 High 100 . 34058753 34065421 153,102,255
chrY 42045397 42046647 High 100 . 42045397 42046647 153,102,255
chrY 42092549 42092770 High 100 . 42092549 42092770 153,102,255
chrY 42131101 42188575 High 100 . 42131101 42188575 153,102,255
chrY 42165248 42166272 Clipped 100 . 42165248 42166272 153,153,153
chrY 42194253 42196099 High 100 . 42194253 42196099 153,102,255
chrY 47017218 47018428 High 100 . 47017218 47018428 153,102,255
chrY 47046812 47050387 High 100 . 47046812 47050387 153,102,255
chrY 47337415 47345371 High 100 . 47337415 47345371 153,102,255
chrY 59111424 59116544 Clipped 100 . 59111424 59116544 153,153,153
chrY 59112370 59115793 Low_AT 100 . 59112370 59115793 204,153,255
Loading

0 comments on commit ad6f5df

Please sign in to comment.