Releases · tjparnell/biotoolbox

30 Oct 22:35

tjparnell

version_2.01

597975b

Bio-ToolBox-v2.01 Latest

Latest

Update chromosome sorting to properly handle chromosomal arms, for
example with Drosophila
Change .groups.txt group file name to .col_groups.txt when writing
column metadata file for scripts get_binned_data.pl and get_relative_data.pl
Change back to _summary.txt file name when writing a summary file
Change --blacklist option to --exclude in bam2wig.pl
Improve error handling scenarios in data2wig.pl, including invalid indexes
Fix bugs in manipulate_datasets.pl, including missing lines in the view function
and restricting the addname function to only update a proper "Name" column
Update any remaining POD text references about 0-base indexing to 1-base

Assets 2

20 Jul 22:11

tjparnell

version_2.00

906b438

BioToolBox-v2.00

MAJOR UPDATE: Change all internal and user-oriented column indexing
to 1-base instead of 0-base indexing, i.e. column numbers are now
listed beginning with 1 instead of 0. WARNING!!! THIS WILL BREAK ALL
PRE-EXISTING SCRIPTS AND CODE THAT USES HARD-CODED COLUMN INDEXES!!!
MAJOR UPDATE: Use a single unified Bio::ToolBox::Parser module with
subclasses for bed, gff, gtf, and ucsc table formats. NOTE: This changed
name capitalization of Bio::ToolBox::Parser subclasses from parser
Improve parsing of gtf files, especially with duplicate tags
Replaced old table sorting algorithm to use numeric, mixed digit-string,
and/or string sorting
Improved accuracy of detecting standard columns such as name, ID, start, etc
Add support for column median and trimmed-mean methods when generating a
summary file
Fix bug with filtering features by transcript_support_level and gencode
Remove Bio::Seq::IO requirement for writing fasta files in data2fasta.pl
Changed behavior to always use 1-base coordinate when generating coordinate
strings, which is standard behavior e.g. with HTSlib (samtools and tabix) queries
Improve support for coordinate lookup in merge_datasets.pl, including
handling either 1-base or 0-base coordinate strings.
Always report both transcript and gene name IDs and names in text output
from get_gene_regions.pl
Add bedpe format support
Fix bugs with parsing file headers and assigning standard column metadata
Fix bug with naming empirically derived introns
Add option to skip chromosomes in get_gene_regions.pl
Add option to adjust relative coordinates based on narrowPeak peak
in get_features.pl
Fix edge-case bugs with low-level bam parsing
Speed up certain stats functions and improve detection of numbers
in manipulate_datasets.pl
Remove defunct supplementary tables in ucsc_table2gff3.pl
Improve data format verification, and only run it when reading and writing
Improve error reporting in scripts
Use a proper prompting module for user-input
Fix massive numbers of perlcritic and perltidy issues
Hundreds of other bug fixes

Assets 2

13 Oct 18:11

tjparnell

v1.691

95942f6

BioToolBox-v1.691

Fix critical error in script get_relative_data.pl
Fix prerequisite version numbers leading to build failures
Change a private function to a public function

Assets 2

24 Sep 02:19

tjparnell

v1.69

20a65f5

BioToolBox-v1.69

Revise genomic sorting by introducing a sane, logical chromosome
ordering that smartly handles numerical, Roman, contigs, and
alternate names. Sorting is done by both start and end coordinates.
Sorting speed modestly improved.
Improve handling of coordinates of Data Feature objects, including
caching and setting.
Add support for narrowPeak summit coordinate as reference point
in multiple scripts.
Improve handling of databases, including bigWigSet feature types.
Make simplification of dataset names a little less aggressive.
Include options for excluding chromosomes and/or intervals when
generating a new list of genomic bins
Improve tasting of file formats, keeping the file format of parsed files
in the Data object.
Allow non-stranded values when parsing UCSC files, including bed files.
Optimize scoring subroutines
Remove legacy subroutines from utility module
Include new test file for utility functions
Numerous other small changes and fixes

Assets 2

24 Jan 19:31

tjparnell

v1.68

a0702d1

BioToolBox-v1.68

Script bam2wig.pl script can now record both ends of paired-end
fragments, rather than faking it as single-end. Paired-end start
now respectes orientation. Added new option to only record either
first or second read in a pair. Added new option to ignore
zero intervals when writing bedGraph format. Changed multi-hit
scoring to preferentially use NH instead of IH.
Scripts get_binned_data.pl and get_relative_data.pl now
can write out column names and associated datasets in separate
groups file for use in plotting. Also specify score decimal format.
Script get_features.pl has new option to only keep features with
explicit tag value.
High level ToolBox convenience function parse_file now includes
basic default subfeatures exon, cds, and utr.
Efficiency improvements in loading large text files by going
back to chomp. Should still fail appropriately with wrong
line endings.
Feature objects now allow certain attribute methods to be both
get and set, including seq_id, start, end, strand, name, and
type, so long as the table does not contain parsed or database
SeqFeature objects.
Add Data object function to return any single row Feature without
having to use an iterator.
Add high level function for iterating over Bam alignments.
Add support for intron subfeatures in Feature objects and data
collection scripts.
Allow bigWigToBedGraph to be explicitly used
Better handling of verified dataset names
Bug fixes and improvements in identifying database file
formats and loading adapters.
Bug fix in writing bgzip files.

Assets 2

09 Nov 17:00

tjparnell

v1.67

dd7af43

BioToolBox-v1.67

Add new option of smart coverage to script bam2wig that smartly handles pair-end alignments with gaps (introns)
Add capability to collect from multiple datasets at once for scripts get_binned_data and get_relative_data. Summary files can now handle multiple datasets.
Allow specific number of up and down windows in script get_relative_data.
Add option to provide list of specific feature IDs to script get_features.
Write shift correlation region data from bam2wig.
Improve GTF export.
Add utility function to simplify dataset names, used in data collection scripts. Strips path and everything after first period from dataset file names.
Improve sort function in manipulate_datasets by taking a range of columns and sort by mean. Also addname function will overwrite a feature name if present.
Adjust logic for setting a file extension when none is provided.
Lots of additional minor fixes and changes

Assets 2

04 Jun 03:30

tjparnell

v1.66

2ab769e

BioToolBox-v1.66

Optimize data2wig fast mode, about 3 times faster
Summary files now use a cleaned-up column name. Fix
bugs with summary file generation.
Bam2wig now properly reports alignment counts for each
strand when provided with multiple input bam files
(previously reported the same number).
Fix bug where the Big adapter would crash when search
coordinate was out of bound, unlike UCSC, HTS, and Sam.
Improve GTF export with correct formatting and no longer
export transcript lines.
Improve GTF parsing where both transcripts and genes are
inferred but coordinates where not updated correctly.

Assets 2

24 Feb 03:36

tjparnell

v1.65

18d9023

BioToolBox-v1.65

Add function to read directly from bigWig files, and add
support for bigWig files to script manipulate_wig
Added options for filtering transcript Gencode or biotype
in script get_gene_regions.
Added option to discard low count features from script
get_datasets.
Add option to explicitly set number of columns of output
bed file in script data2bed
Update script get_feature_info to work with annotation files
Optimize data2wig to handle fast option in more scenarios
Coordinate string generation in manipulate_datasets takes
start values as is
Bug fixes in Bio::ToolBox, get_relative_data,
manipulate_datasets, more

Assets 2

02 Jan 19:39

tjparnell

v1.64

c035000

BioToolBox-v1.64

Added support for Encode gappedPeak files. Also support for
gleaning file formats from bed track lines. This should make
future file formats easier to support in the future.
Fix critical bug with skipping duplicate features from GTF
files, particularly from Ensembl where exons share the same exon ID.
Fix double-counting of stranded alignments in bam2wig script.
Also correctly set minimum paired-end size.
Fix bug to correctly count FPKM and TPM over length-adjusted
features in script get_datasets.
Fix bug with filtering transcripts in script get_features.
Reset and clarify behavior regarding stop codons when parsing
and exporting transcript features for various annotation formats.
Add single-letter option support to script get_gene_regions.

Assets 2

24 Oct 03:51

tjparnell

v1.63

98a119c

BioToolBox-v1.63

Added minimal Cram file support through the HTS adapter.
Currently only supports the reference fasta listed in the Cram
file header.
Added fast paired-end option and paired-end start point options
to script bam2wig. Temporary files now written to a temporary
subdirectory, which can be specified. Extreme depth can now be
handled properly by using 32 bit integers instead of 16. Splice
segments can now be fractionally counted.
Brought back and updated old script correlate_position_data to
identify positional shifts in nucleosome or ChIP signal peaks.
Added new SeqFeature methods to duplicate objects and delete
subfeatures.
Added option to format result numbers in script get_datasets.
Fix numerous small bugs in scripts data2gff, data2fasta,
get_intersecting_features, get_relative_data, and more

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: tjparnell/biotoolbox

Bio-ToolBox-v2.01

BioToolBox-v2.00

BioToolBox-v1.691

BioToolBox-v1.69

BioToolBox-v1.68

BioToolBox-v1.67

BioToolBox-v1.66

BioToolBox-v1.65

BioToolBox-v1.64

BioToolBox-v1.63