Releases: tjparnell/biotoolbox
Releases · tjparnell/biotoolbox
Bio-ToolBox-v2.01
- Update chromosome sorting to properly handle chromosomal arms, for
example with Drosophila - Change
.groups.txt
group file name to.col_groups.txt
when writing
column metadata file for scripts get_binned_data.pl and get_relative_data.pl - Change back to
_summary.txt
file name when writing a summary file - Change
--blacklist
option to--exclude
in bam2wig.pl - Improve error handling scenarios in data2wig.pl, including invalid indexes
- Fix bugs in manipulate_datasets.pl, including missing lines in the view function
and restricting the addname function to only update a proper "Name" column - Update any remaining POD text references about 0-base indexing to 1-base
BioToolBox-v2.00
- MAJOR UPDATE: Change all internal and user-oriented column indexing
to 1-base instead of 0-base indexing, i.e. column numbers are now
listed beginning with 1 instead of 0. WARNING!!! THIS WILL BREAK ALL
PRE-EXISTING SCRIPTS AND CODE THAT USES HARD-CODED COLUMN INDEXES!!! - MAJOR UPDATE: Use a single unified Bio::ToolBox::Parser module with
subclasses for bed, gff, gtf, and ucsc table formats. NOTE: This changed
name capitalization of Bio::ToolBox::Parser subclasses from parser - Improve parsing of gtf files, especially with duplicate tags
- Replaced old table sorting algorithm to use numeric, mixed digit-string,
and/or string sorting - Improved accuracy of detecting standard columns such as name, ID, start, etc
- Add support for column median and trimmed-mean methods when generating a
summary file - Fix bug with filtering features by transcript_support_level and gencode
- Remove Bio::Seq::IO requirement for writing fasta files in data2fasta.pl
- Changed behavior to always use 1-base coordinate when generating coordinate
strings, which is standard behavior e.g. with HTSlib (samtools and tabix) queries - Improve support for coordinate lookup in merge_datasets.pl, including
handling either 1-base or 0-base coordinate strings. - Always report both transcript and gene name IDs and names in text output
from get_gene_regions.pl - Add bedpe format support
- Fix bugs with parsing file headers and assigning standard column metadata
- Fix bug with naming empirically derived introns
- Add option to skip chromosomes in get_gene_regions.pl
- Add option to adjust relative coordinates based on narrowPeak peak
in get_features.pl - Fix edge-case bugs with low-level bam parsing
- Speed up certain stats functions and improve detection of numbers
in manipulate_datasets.pl - Remove defunct supplementary tables in ucsc_table2gff3.pl
- Improve data format verification, and only run it when reading and writing
- Improve error reporting in scripts
- Use a proper prompting module for user-input
- Fix massive numbers of perlcritic and perltidy issues
- Hundreds of other bug fixes
BioToolBox-v1.691
- Fix critical error in script get_relative_data.pl
- Fix prerequisite version numbers leading to build failures
- Change a private function to a public function
BioToolBox-v1.69
- Revise genomic sorting by introducing a sane, logical chromosome
ordering that smartly handles numerical, Roman, contigs, and
alternate names. Sorting is done by both start and end coordinates.
Sorting speed modestly improved. - Improve handling of coordinates of Data Feature objects, including
caching and setting. - Add support for narrowPeak summit coordinate as reference point
in multiple scripts. - Improve handling of databases, including bigWigSet feature types.
Make simplification of dataset names a little less aggressive. - Include options for excluding chromosomes and/or intervals when
generating a new list of genomic bins - Improve tasting of file formats, keeping the file format of parsed files
in the Data object. - Allow non-stranded values when parsing UCSC files, including bed files.
- Optimize scoring subroutines
- Remove legacy subroutines from utility module
- Include new test file for utility functions
- Numerous other small changes and fixes
BioToolBox-v1.68
- Script
bam2wig.pl
script can now record both ends of paired-end
fragments, rather than faking it as single-end. Paired-end start
now respectes orientation. Added new option to only record either
first or second read in a pair. Added new option to ignore
zero intervals when writing bedGraph format. Changed multi-hit
scoring to preferentially useNH
instead ofIH
. - Scripts
get_binned_data.pl
andget_relative_data.pl
now
can write out column names and associated datasets in separate
groups file for use in plotting. Also specify score decimal format. - Script
get_features.pl
has new option to only keep features with
explicit tag value. - High level ToolBox convenience function
parse_file
now includes
basic default subfeatures exon, cds, and utr. - Efficiency improvements in loading large text files by going
back tochomp
. Should still fail appropriately with wrong
line endings. - Feature objects now allow certain attribute methods to be both
get and set, includingseq_id
,start
,end
,strand
,name
, and
type
, so long as the table does not contain parsed or database
SeqFeature objects. - Add Data object function to return any single row Feature without
having to use an iterator. - Add high level function for iterating over Bam alignments.
- Add support for intron subfeatures in Feature objects and data
collection scripts. - Allow bigWigToBedGraph to be explicitly used
- Better handling of verified dataset names
- Bug fixes and improvements in identifying database file
formats and loading adapters. - Bug fix in writing bgzip files.
BioToolBox-v1.67
- Add new option of smart coverage to script bam2wig that smartly handles pair-end alignments with gaps (introns)
- Add capability to collect from multiple datasets at once for scripts get_binned_data and get_relative_data. Summary files can now handle multiple datasets.
- Allow specific number of up and down windows in script get_relative_data.
- Add option to provide list of specific feature IDs to script get_features.
- Write shift correlation region data from bam2wig.
- Improve GTF export.
- Add utility function to simplify dataset names, used in data collection scripts. Strips path and everything after first period from dataset file names.
- Improve sort function in manipulate_datasets by taking a range of columns and sort by mean. Also addname function will overwrite a feature name if present.
- Adjust logic for setting a file extension when none is provided.
- Lots of additional minor fixes and changes
BioToolBox-v1.66
- Optimize data2wig fast mode, about 3 times faster
- Summary files now use a cleaned-up column name. Fix
bugs with summary file generation. - Bam2wig now properly reports alignment counts for each
strand when provided with multiple input bam files
(previously reported the same number). - Fix bug where the Big adapter would crash when search
coordinate was out of bound, unlike UCSC, HTS, and Sam. - Improve GTF export with correct formatting and no longer
export transcript lines. - Improve GTF parsing where both transcripts and genes are
inferred but coordinates where not updated correctly.
BioToolBox-v1.65
- Add function to read directly from bigWig files, and add
support for bigWig files to script manipulate_wig - Added options for filtering transcript Gencode or biotype
in script get_gene_regions. - Added option to discard low count features from script
get_datasets. - Add option to explicitly set number of columns of output
bed file in script data2bed - Update script get_feature_info to work with annotation files
- Optimize data2wig to handle fast option in more scenarios
- Coordinate string generation in manipulate_datasets takes
start values as is - Bug fixes in Bio::ToolBox, get_relative_data,
manipulate_datasets, more
BioToolBox-v1.64
- Added support for Encode gappedPeak files. Also support for
gleaning file formats from bed track lines. This should make
future file formats easier to support in the future. - Fix critical bug with skipping duplicate features from GTF
files, particularly from Ensembl where exons share the same exon ID. - Fix double-counting of stranded alignments in bam2wig script.
Also correctly set minimum paired-end size. - Fix bug to correctly count FPKM and TPM over length-adjusted
features in script get_datasets. - Fix bug with filtering transcripts in script get_features.
- Reset and clarify behavior regarding stop codons when parsing
and exporting transcript features for various annotation formats. - Add single-letter option support to script get_gene_regions.
BioToolBox-v1.63
- Added minimal Cram file support through the HTS adapter.
Currently only supports the reference fasta listed in the Cram
file header. - Added fast paired-end option and paired-end start point options
to script bam2wig. Temporary files now written to a temporary
subdirectory, which can be specified. Extreme depth can now be
handled properly by using 32 bit integers instead of 16. Splice
segments can now be fractionally counted. - Brought back and updated old script correlate_position_data to
identify positional shifts in nucleosome or ChIP signal peaks. - Added new SeqFeature methods to duplicate objects and delete
subfeatures. - Added option to format result numbers in script get_datasets.
- Fix numerous small bugs in scripts data2gff, data2fasta,
get_intersecting_features, get_relative_data, and more