Skip to content

Releases: tjparnell/biotoolbox

Bio-ToolBox-v2.01

30 Oct 22:35
Compare
Choose a tag to compare
  • Update chromosome sorting to properly handle chromosomal arms, for
    example with Drosophila
  • Change .groups.txt group file name to .col_groups.txt when writing
    column metadata file for scripts get_binned_data.pl and get_relative_data.pl
  • Change back to _summary.txt file name when writing a summary file
  • Change --blacklist option to --exclude in bam2wig.pl
  • Improve error handling scenarios in data2wig.pl, including invalid indexes
  • Fix bugs in manipulate_datasets.pl, including missing lines in the view function
    and restricting the addname function to only update a proper "Name" column
  • Update any remaining POD text references about 0-base indexing to 1-base

BioToolBox-v2.00

20 Jul 22:11
Compare
Choose a tag to compare
  • MAJOR UPDATE: Change all internal and user-oriented column indexing
    to 1-base instead of 0-base indexing, i.e. column numbers are now
    listed beginning with 1 instead of 0. WARNING!!! THIS WILL BREAK ALL
    PRE-EXISTING SCRIPTS AND CODE THAT USES HARD-CODED COLUMN INDEXES!!!
  • MAJOR UPDATE: Use a single unified Bio::ToolBox::Parser module with
    subclasses for bed, gff, gtf, and ucsc table formats. NOTE: This changed
    name capitalization of Bio::ToolBox::Parser subclasses from parser
  • Improve parsing of gtf files, especially with duplicate tags
  • Replaced old table sorting algorithm to use numeric, mixed digit-string,
    and/or string sorting
  • Improved accuracy of detecting standard columns such as name, ID, start, etc
  • Add support for column median and trimmed-mean methods when generating a
    summary file
  • Fix bug with filtering features by transcript_support_level and gencode
  • Remove Bio::Seq::IO requirement for writing fasta files in data2fasta.pl
  • Changed behavior to always use 1-base coordinate when generating coordinate
    strings, which is standard behavior e.g. with HTSlib (samtools and tabix) queries
  • Improve support for coordinate lookup in merge_datasets.pl, including
    handling either 1-base or 0-base coordinate strings.
  • Always report both transcript and gene name IDs and names in text output
    from get_gene_regions.pl
  • Add bedpe format support
  • Fix bugs with parsing file headers and assigning standard column metadata
  • Fix bug with naming empirically derived introns
  • Add option to skip chromosomes in get_gene_regions.pl
  • Add option to adjust relative coordinates based on narrowPeak peak
    in get_features.pl
  • Fix edge-case bugs with low-level bam parsing
  • Speed up certain stats functions and improve detection of numbers
    in manipulate_datasets.pl
  • Remove defunct supplementary tables in ucsc_table2gff3.pl
  • Improve data format verification, and only run it when reading and writing
  • Improve error reporting in scripts
  • Use a proper prompting module for user-input
  • Fix massive numbers of perlcritic and perltidy issues
  • Hundreds of other bug fixes

BioToolBox-v1.691

13 Oct 18:11
Compare
Choose a tag to compare
  • Fix critical error in script get_relative_data.pl
  • Fix prerequisite version numbers leading to build failures
  • Change a private function to a public function

BioToolBox-v1.69

24 Sep 02:19
Compare
Choose a tag to compare
  • Revise genomic sorting by introducing a sane, logical chromosome
    ordering that smartly handles numerical, Roman, contigs, and
    alternate names. Sorting is done by both start and end coordinates.
    Sorting speed modestly improved.
  • Improve handling of coordinates of Data Feature objects, including
    caching and setting.
  • Add support for narrowPeak summit coordinate as reference point
    in multiple scripts.
  • Improve handling of databases, including bigWigSet feature types.
    Make simplification of dataset names a little less aggressive.
  • Include options for excluding chromosomes and/or intervals when
    generating a new list of genomic bins
  • Improve tasting of file formats, keeping the file format of parsed files
    in the Data object.
  • Allow non-stranded values when parsing UCSC files, including bed files.
  • Optimize scoring subroutines
  • Remove legacy subroutines from utility module
  • Include new test file for utility functions
  • Numerous other small changes and fixes

BioToolBox-v1.68

24 Jan 19:31
Compare
Choose a tag to compare
  • Script bam2wig.pl script can now record both ends of paired-end
    fragments, rather than faking it as single-end. Paired-end start
    now respectes orientation. Added new option to only record either
    first or second read in a pair. Added new option to ignore
    zero intervals when writing bedGraph format. Changed multi-hit
    scoring to preferentially use NH instead of IH.
  • Scripts get_binned_data.pl and get_relative_data.pl now
    can write out column names and associated datasets in separate
    groups file for use in plotting. Also specify score decimal format.
  • Script get_features.pl has new option to only keep features with
    explicit tag value.
  • High level ToolBox convenience function parse_file now includes
    basic default subfeatures exon, cds, and utr.
  • Efficiency improvements in loading large text files by going
    back to chomp. Should still fail appropriately with wrong
    line endings.
  • Feature objects now allow certain attribute methods to be both
    get and set, including seq_id, start, end, strand, name, and
    type, so long as the table does not contain parsed or database
    SeqFeature objects.
  • Add Data object function to return any single row Feature without
    having to use an iterator.
  • Add high level function for iterating over Bam alignments.
  • Add support for intron subfeatures in Feature objects and data
    collection scripts.
  • Allow bigWigToBedGraph to be explicitly used
  • Better handling of verified dataset names
  • Bug fixes and improvements in identifying database file
    formats and loading adapters.
  • Bug fix in writing bgzip files.

BioToolBox-v1.67

09 Nov 17:00
Compare
Choose a tag to compare
  • Add new option of smart coverage to script bam2wig that smartly handles pair-end alignments with gaps (introns)
  • Add capability to collect from multiple datasets at once for scripts get_binned_data and get_relative_data. Summary files can now handle multiple datasets.
  • Allow specific number of up and down windows in script get_relative_data.
  • Add option to provide list of specific feature IDs to script get_features.
  • Write shift correlation region data from bam2wig.
  • Improve GTF export.
  • Add utility function to simplify dataset names, used in data collection scripts. Strips path and everything after first period from dataset file names.
  • Improve sort function in manipulate_datasets by taking a range of columns and sort by mean. Also addname function will overwrite a feature name if present.
  • Adjust logic for setting a file extension when none is provided.
  • Lots of additional minor fixes and changes

BioToolBox-v1.66

04 Jun 03:30
Compare
Choose a tag to compare
  • Optimize data2wig fast mode, about 3 times faster
  • Summary files now use a cleaned-up column name. Fix
    bugs with summary file generation.
  • Bam2wig now properly reports alignment counts for each
    strand when provided with multiple input bam files
    (previously reported the same number).
  • Fix bug where the Big adapter would crash when search
    coordinate was out of bound, unlike UCSC, HTS, and Sam.
  • Improve GTF export with correct formatting and no longer
    export transcript lines.
  • Improve GTF parsing where both transcripts and genes are
    inferred but coordinates where not updated correctly.

BioToolBox-v1.65

24 Feb 03:36
Compare
Choose a tag to compare
  • Add function to read directly from bigWig files, and add
    support for bigWig files to script manipulate_wig
  • Added options for filtering transcript Gencode or biotype
    in script get_gene_regions.
  • Added option to discard low count features from script
    get_datasets.
  • Add option to explicitly set number of columns of output
    bed file in script data2bed
  • Update script get_feature_info to work with annotation files
  • Optimize data2wig to handle fast option in more scenarios
  • Coordinate string generation in manipulate_datasets takes
    start values as is
  • Bug fixes in Bio::ToolBox, get_relative_data,
    manipulate_datasets, more

BioToolBox-v1.64

02 Jan 19:39
Compare
Choose a tag to compare
  • Added support for Encode gappedPeak files. Also support for
    gleaning file formats from bed track lines. This should make
    future file formats easier to support in the future.
  • Fix critical bug with skipping duplicate features from GTF
    files, particularly from Ensembl where exons share the same exon ID.
  • Fix double-counting of stranded alignments in bam2wig script.
    Also correctly set minimum paired-end size.
  • Fix bug to correctly count FPKM and TPM over length-adjusted
    features in script get_datasets.
  • Fix bug with filtering transcripts in script get_features.
  • Reset and clarify behavior regarding stop codons when parsing
    and exporting transcript features for various annotation formats.
  • Add single-letter option support to script get_gene_regions.

BioToolBox-v1.63

24 Oct 03:51
Compare
Choose a tag to compare
  • Added minimal Cram file support through the HTS adapter.
    Currently only supports the reference fasta listed in the Cram
    file header.
  • Added fast paired-end option and paired-end start point options
    to script bam2wig. Temporary files now written to a temporary
    subdirectory, which can be specified. Extreme depth can now be
    handled properly by using 32 bit integers instead of 16. Splice
    segments can now be fractionally counted.
  • Brought back and updated old script correlate_position_data to
    identify positional shifts in nucleosome or ChIP signal peaks.
  • Added new SeqFeature methods to duplicate objects and delete
    subfeatures.
  • Added option to format result numbers in script get_datasets.
  • Fix numerous small bugs in scripts data2gff, data2fasta,
    get_intersecting_features, get_relative_data, and more