Skip to content

Latest commit

 

History

History
196 lines (158 loc) · 6.48 KB

release.rst

File metadata and controls

196 lines (158 loc) · 6.48 KB

History

Release v0.4.1 (10/01/2025)

  • RDR: --minINCLUDE supports both INT and FLOAT. If INT, the minimum length of included part within specific feature. If FLOAT, the minimum fraction of included part within specific feature.
  • BAF: add index to the output folder of each step, e.g., "1_pileup".
  • docs: update TODO list.

Release v0.4.0 (06/12/2024)

  • RDR: add --minINCLUDE option for read filtering, which is the minimum length of included part within specific feature. For example, if the genomic range of a feature is chr1:1000-3000, and one fetched read (100bp) aligned to two locus, chr1:601-660 (60bp) and chr1:3801-3840 (40bp), then no any part of the read is actually included within the feature, hence it will be filtered by --minINCLUDE=30, whereas older versions of xcltk may keep the read. Note, as the feature counting in RDR is performmed independently for each feature, so one read filtered by --minINCLUDE in one feature may still be fetched and counted by other features.
  • update docstring, using the numpydoc style.
  • add TODO list in docs/TODO.md.
  • fix typo.

Release v0.3.1 (06/06/2024)

  • BAF: in ref_phasing, use multiprocessing to phase SNPs of one chromosome per subprocess.
  • specify dtype of column 0 as str in pd.read_csv() when loading region file.

Release v0.3.0 (11/05/2024)

The v0.2.x was skipped since this new version has several substantial updates:

  • BAF: do reference phasing on local machines instead of using online service.
  • BAF & RDR: better support well-based (e.g., SMART-seq) data without the need to merge the input BAM files first;
  • coding improvement using a more unified framework, mainly using the fc (feature counting) and utils sub-modules.

Feature enhancement

BAF part:

  • add xcltk baf command line tool to support reference phasing on local machines instead of using online service.
  • xcltk allelefc: better support well-based (e.g., SMART-seq) data without the need to merge the input BAM files first;
  • xcltk allelefc: both REF and ALT allele counting will exclude the UMIs/reads mapped to both alleles when no_dup_hap is True.

RDR part:

  • better support well-based (e.g., SMART-seq) data without the need to merge the input BAM files first;
  • re-implement the xcltk basefc using the fc (feature counting) framework.

Preprocess:

  • re-implement the preprocess pipeline by (1) replace the bash scripts with python functions, e.g.,

    wrapping SNP calling (previously baf_pre_phase.sh) into xcltk.baf.genotype::pileup(); reference phasing locally with xcltk.baf.genotype::ref_phasing(); wrapping allele-specific feature counting (previously baf_post_phase.sh) with xcltk.baf.count::afc_wrapper().

    1. further wrap the three functions into a pipeline implemented as a sub-module xcltk.baf.pipeline and also as a command line tool xcltk baf.

Others:

  • rename the cmdline command xcltk pileup to xcltk allelefc.
  • make the cmdline options more unified, e.g., "--samList" and "--ncores" in "xcltk allelefc", "xcltk pipeline", and "xcltk basefc".
  • usage() functions by default output to stdout instead of stderr.
  • cmdline "--help" option exit code changes from 1 to 0.
  • add/update a few util sub-modules such as vcf.py, xlog.py etc.
  • add post_hoc scripts for post-processing xcltk output.
  • initialize "data" dir and add feature annotation files.

Release v0.1.16 (28/01/2023)

  • baf: add reference phasing correction (xcltk rpc).
  • preprocess: restructure, update scripts and data.
  • rdr: output 4-column features.

Release v0.1.15 (17/07/2022)

  • baf_pre_impute: keep het SNPs only after calling germline SNPs
  • baf_post_impute: output all regions when running xcltk pileup
  • rdr: fix a bug that pysam was not imported.

Release v0.1.14 (01/05/2022)

update baf haploblock pileup:
  • re-implement the module
  • fix the double counting issue of UMIs or reads when aggregating phased SNPs (some UMIs or reads could cover more than one SNPs)
  • fix the issue that some UMIs are aligned to both haplotype alleles (--countDupHap)
  • add an option to output all regions (--outputAllReg)

Release v0.1.13 (02/08/2021)

  • rdr: fix program suspension caused by unmatched chrom

Release v0.1.12 (26/02/2021)

  • baf_pre: add --umi and --duplicates options

Release v0.1.11 (28/01/2021)

re-implement fixref with pysam:
  • support genome fasta as ref (-r)
  • support gzip/bgzip input and output vcf
  • support multiple alt alleles
  • support multiple samples
  • indels would be filtered
  • support only ploidy = 2 for now

Release v0.1.10 (09/01/2021)

  • baf_post: support multiple BAMs
  • baf_pileup: set cellTAG None when given bam list
  • copy barcode file for baf_pileup and copy barcode & region files for phase_snp
  • basefc: replace region.stop with region.end
  • small fixes

Release v0.1.9 (04/01/2021)

  • baf_pileup: add --uniqCOUNT
  • specify sample ID through cmdline option

Release v0.1.8 (31/12/2020)

  • phase_snp: fix load_phase
  • baf_post: update pileup cmdline

Release v0.1.7 (29/12/2020)

  • add pileup module and fix double counting

Release v0.1.6 (28/12/2020)

  • phase_snp: support bed,gff,tsv for input region
  • phase_snp: support vcf as input for phase file
  • add gzip support for region sub-module
  • baf_pre_impute: add -C/--call option and use cellsnp-lite by default to call germline SNPs instead of freebayes

Release v0.1.5 (19/12/2020)

  • small fix
  • baf_pre_impute and baf_pileup pass tests

Release v0.1.4 (17/12/2020)

  • add baf_pileup pipeline

Release v0.1.3 (16/12/2020)

  • add baf_pre_imputation pipeline

Release v0.1.2 (15/12/2020)

  • add utils

Release v0.1.1 (14/12/2020)

  • add fixref

Release v0.1.0 (13/12/2020)

  • add feature-count

Release v0.0.2 (13/12/2020)

  • add xcltk cmdline

Release v0.0.1 (12/12/2020)

  • init modules: baf, rdr and reg
  • add cmdline apps: xcltk-baf, xcltk-rdr and xcltk-reg