Wishlist for snakemake pipeline upgrades

General

Align output of case step with input to local analysis scripts.
- Currently there is a function in the local analysis script to convert the data objects into the expected inputs for generating an instance of the candidate mutation table custom python class.
- Ideally build_candidate_mutation_table.py would be harmonized with the local analysis scripts, make data objects of the right type and dimensions, and generate an instance of the candidate mutation table class.
- It would also be great to generate an instance of the new coverage matrix class as part of the cluster step too. This would also involve modifying build_candidate_mutation_table.py.
...

Use updated version of samtools for rule mpileup2vcf (env: samtools15_bcftools12.yaml)
- Version in rule sam2bam was already updated to enable deduplication (env: samtools115.yaml)
Fix outstanding issues with de-duplicating reads and metagenomic data files
...

Harmonize data objects coming out of GUS to match the input data objects for the local python scripts
- Switch around axes of data arrays such that the 0th axis represents samples and the 1st axis represents position on the genome
- Make sure all arrays are numpy arrays
Compute and save a simplified coverage matrix that includes median coverage per sample over each contig
- Code for computing this already exists in the local analysis script
- Inputs to build_candidate_mutation_table.py will need to be udpated to include the reference genome (so that it is possible to know where the contig boundaries are)
Make it possible to align a group of samples to multiple reference genomes
...

Start using bakta instead of prokka for annotations
Add ability to make lineage co-assemblies with a specified number of reads per sample
...