All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Version checking from cmd line
- seekr_domain_person can run without reference path. (See Issue)
black
formatting!
- Unofficial support for different alphabets
- Additional log transformation option
- Length normalization divisor (length to [length -k +1])
- Includes a small fix to the length normalization step of the core k-mer counting code. In v1.4.2, k-mer counts were normalized to the number of basepairs in each input sequence. In v1.4.3, k-mer counts are normalized to the number of k-mers in each input sequence ([length_of_sequence] -[k-mer_length]+1). This is a minor change for correctness that should not meaningfully affect results.
- Behavior of --log2 argument
- Includes a redesigned flag to indicate the method of k-mer standardization, and an additional option for k-mer standardization: --log2 [1,2,3] or -l [1,2,3]. These options correspond to log-transforming pre-standardization, post-standardization, or no log-transform, respectively.
--log2 post
(Default standardization method). This is the same default standardization method used in SEEKR v1.4.2. For a given set of sequences, k-mers are counted, then length normalized (counts per kb of sequence), then z-scores for each k-mer are calculated, and then these z-scores are log2-tranformed. See PMID 31097619 for examples and an in-depth description of the rationale for using log2-transformed z-scores as a default.--log2 none
is the same as the no-log transformation option (-nl) of seekr v1.4.2. Here, after k-mers are counted and length-normalized, z-scores are calculated and used without any transformation. This was the approached used in our original SEEKR publication (PMID 30224646).--log2 pre
is the additional/new option for standardization. In this approach, k-mers are counted across a set of sequences and then length normalized (counts per kb of sequence). For each k-mer that has a zero-count value in each sequence, a pseudo-count of 1 is added; this allows the k-mer count values to be log2 transformed. Z-scores are calculated after log2-transformation of k-mer counts, and these z-scores can then be used directly for comparisons. In effect, this standardization method is not much different from the default option of --log2 2 (users can compare for themselves). It is, however, a slightly cleaner heuristic. In the time since our original two publications using seekr, we have noted that k-mer counts in the mouse and human transcriptomes tend to follow a log-normal distribution; thus, we currently favor this method of standardization.
- Includes small updates to “notes” and “Help” sections of README.md
seekr_pwms
is now callable from the command lineseekr_gen_rand_rnas
is live
- Updated README
- Let
seekr_visualize_distro
handle other matrices
- In
seekr_domain_pearson
, change the way percentiles are calculated, to now be relative to a reference fasta. - Improve error when passing a bad release to
seekr_download_gencode
.
seekr_canonical_gencode
command line script filters for -001 transcripts.- Example integration script in the test directory.
- Add legacy option to continue using Louvain instead of Leiden.
- Travis CI automatic push testing.
seekr_visualize_distro
command makes distribution of r-values.seekr_domain_pearson
command line script compares queries and domains in targets.
- Separate fasta.Downloader's url building from file downloading functionality.
- Convert arguments to integers appropriately in console_scripts.
- Add help strings to
seed
docs. - 'None' is no longer a part of downloaded file names.
- Provide unique default path for dumping gml file if one isn't provided.
- In
seekr_graph
, change '-l', '--limit' to '-t', '--threshold'. - Set default community detection to Leiden.
- Remove testing of fasta file downloading.
- This CHANGELOG was added!
- Users can see all commands and examples by running "seekr".
- Users can download fasta files from Gencode with
seekr_download
. - Log2 normalization is now possible and on by default.
- Users can find Louvain based transcript communities with
seekr_graph
. - Package info is now in
__version__.py
(modeled afterrequests
). This allows users to do things likeseekr.__version__
in a REPL.
- Removed examples of
k=7
from README. - Stop building list twice while reading fasta data.
- All commands now start with 'seekr_' (e.g. 'pearson' is now 'seekr_pearson')
- README has undergone a large re-write to reflect changes and new features.
- Console commands produce plain text files by default, instead of binary.
- Requirements are all described in setup.py requirements.txt has been removed.