Skip to content
/ unitato Public

UniTato: a web server for evidence and community based Unification of poTato gene models

License

Notifications You must be signed in to change notification settings

NIB-SI/unitato

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UniTato

DOI

bioRxiv

Input files

GFF3

FASTA

Annotations

  • panTranscriptome components: ./input/5cv_weak-components.txt
  • ITAG-PGSC-pairs: ./input/ITAG-PGSC-pairs.xlsx
  • ITAG v4 CDS len and GC content: ./input/Solanum_tuberosum-ITAG_DM_v1_cds_GC-len https://doi.org/10.1038/s41597-020-00581-4
  • PGSC v4 CDS len and GC content: ./input/Solanum_tuberosum_PGSC_merged_GC-len https://doi.org/10.1038/s41597-020-00581-4
  • Desiree, Rywal, and PW363 CDS and transcripts:
    • stCuSTr-D_cds_representatives.fasta
    • stCuSTr-D_tr_representatives.fasta
    • stCuSTr-P_cds_representatives.fasta
    • stCuSTr-P_tr_representatives.fasta
    • stCuSTr-R_cds_representatives.fasta
    • stCuSTr-R_tr_representatives.fasta

Liftoff-specific files

  • Chromosome: ./input/pairs chroms.txt
  • Unplaced: ./input/unplaced_DM404.txt

Proteomes

Software

For more information see README in scripts

R packages

For more information see sessionInfo() in R Markdown files (.html)

Output

  • Translation table: ./output/v4-v6.1_translationTable.xlsx
  • GFFs:
    • flank-based: ./output/matched-unmatched-gff/
    • unified v6v4 GFF: UniTato.gff
  • miniprot detailed results: ./output/miniprot/

Reports

  • many-to-many matches: ./reports/overlaps.xlsx

  • Venn: ./reports/01_Venn_wm.tiff

  • pan-transcriptome ITAG-PGSC pair matching dependent of the F parameter: ./reports/ITAG-PGSC_F-dependent_pairs-matches.txt

  • DMv6.1 wm scaffolds without gene features: ./reports/v6scaffolds-without-v6genes.fasta

  • Chord diagram visualisation

  • Chr12 inversion visualisation

    Additional information

    DMv6.1

    “High-confidence gene models”, as defined by Pham et al (2020), are based on the following criteria:

  • Transcripts per Million value greater than 0 in at least one RNA-Seq library
  • Gene models that have a match to a PFAM domain are considered high-confidence
  • Gene models that are partial or have matches to transposable element-related PFAM domains are excluded from the high-confidence model set

About

UniTato: a web server for evidence and community based Unification of poTato gene models

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages