This folder contains the assemblies presented in the manuscript. These are also uploaded to NCBI and hopefully there soon as well.
Everything related to primary contigs is indicated by an 'p' and to haplotigs by an 'h' in the file name.
E.g. Pst_104E_v13_p_ctg.fa cotains the nucleotide sequence of all primary contigs. Primary contigs and haplotigs are linked by their names. For example haplotigs associated with primary contig pcontig_000 have the name hcontig_000_XXX.
The ending encode the following:
*.anno.gff3 are all the protein coding gene annotation
*.cds.fa are all the nucleotide sequences of the codning sequences.
*.gbk everything in gene bank format.
*.protein.fa are all the amino acid sequences of proteins.
*.REPET.gff is the repetitive element annotation as generated by REPET.