A tandem repeat (TR) catalog generated from high-quality long-read human genome assemblies

This repository keeps the analysis scripts that were used to generated the TR catalog from public diploid long-read human genome assemblies from the following data soucres:

Workflow

Mapping of TRs from assemblies to the reference genome

Catalog

v1

haplotype names separated by semi-colons are shown in first header line preceded by '#'
column descriptions:

Column	Description
chrom	chromosome
start	start coordinate
end	end coordinate
motif	consensus repeat motif
copy_numbers	copy numbers in haplotypes separated by semi-colons ('-' for missing genotypes)
sizes	sizes (bp) in haplotypes separated by semi-colons ('-' for missing genotypes)
motifs	motifs in haplotypes separated by semi-colons ('-' for missing genotypes)
max_change	maximum change (of all haplotypes) in size (bp) substracted from reference genome size
num_samples	number of samples with genotype
num_calls	number of haplotypes with genotype
motif_frequency	number of haplotypes associated with each motif observed e.g. CAG(10);CAA(2)
feature	gene element overlapped. Format: gene\|transcript\|, where = exon#\|intron#\|utr5\|utr3\|cds\|promoter\|exon_bound (exon boundary)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
pipeline		pipeline
utils		utils
1b_300_cropped.png		1b_300_cropped.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A tandem repeat (TR) catalog generated from high-quality long-read human genome assemblies

Workflow

Catalog

About

Releases

Packages

Languages

bcgsc/tr_catalog

Folders and files

Latest commit

History

Repository files navigation

A tandem repeat (TR) catalog generated from high-quality long-read human genome assemblies

Workflow

Catalog

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages