Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the docs #236

Merged
merged 47 commits into from
Oct 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
76d25f4
Update docstrings
robomics Aug 27, 2024
4b7efe8
Update the README
robomics Aug 27, 2024
a4ec9f1
Update CLI reference
robomics Aug 27, 2024
d7772db
Update requirements.txt for the docs
robomics Aug 27, 2024
48adae4
Update test dataset download instructions
robomics Aug 27, 2024
cde25b0
Fix typo
robomics Aug 27, 2024
4815d0a
Add docs for hictk metadata
robomics Aug 27, 2024
1adccb6
Update CLI reference [no ci]
robomics Sep 19, 2024
fbdcb59
Add missing SPDX header [no ci]
robomics Sep 24, 2024
a64422a
Merge branch 'main' into docs/update
robomics Sep 26, 2024
cedf5a9
Update docs for C++ API
robomics Sep 26, 2024
fde7770
Update docs and tutorials for hictk
robomics Sep 27, 2024
9ef8937
hictk load: document supported compression algorithms
robomics Sep 27, 2024
dc66df5
Update docs for To*Matrix transformers
robomics Sep 28, 2024
7ec1bab
Merge branch 'main' into docs/update
robomics Sep 30, 2024
0c76328
Update CITATION.cff and add workflow to lint CITATION.cff
robomics Sep 30, 2024
dd27ca7
Add pre-commit hook to lint the docs
robomics Sep 30, 2024
64516ff
Update CLI reference
robomics Sep 30, 2024
c840e23
Revert "Add pre-commit hook to lint the docs"
robomics Sep 30, 2024
e9ec932
Merge branch 'main' into docs/update
robomics Sep 30, 2024
4bc81c6
[no ci]
robomics Sep 30, 2024
e92a878
[no ci]
robomics Sep 30, 2024
a8151fd
Fix PDF docs
robomics Sep 30, 2024
f296071
Rewrite generate_cli_reference script in python
robomics Sep 30, 2024
1f432a2
Fix incorrect display of std::uint8_t default values in the CLI help …
robomics Sep 30, 2024
f8fa4ce
Remove unnecessary extension from docs/conf.py
robomics Sep 30, 2024
0ac5fbd
Switch to using build.commands in .readthedocs.yaml in preparation fo…
robomics Sep 30, 2024
ff96f8f
[no ci]
robomics Sep 30, 2024
e331870
Bugfix [no ci]
robomics Sep 30, 2024
2a3b0cc
Check for broken links when building the docs
robomics Sep 30, 2024
86b4a9e
Fix broken link [no ci]
robomics Sep 30, 2024
fd6b66c
Fix PDF docs [no ci]
robomics Sep 30, 2024
4f3951e
Update doc URLS
robomics Sep 30, 2024
5ff23bc
Add script to automate updating doc links in index.rst
robomics Sep 30, 2024
dc17385
Bugfix [no ci]
robomics Sep 30, 2024
5b8ed03
Improve formatting [no ci]
robomics Sep 30, 2024
f96d851
Run linkcheck after patching index.rst [no ci]
robomics Sep 30, 2024
dd9f2b6
Update links to the doc in the readme [no ci]
robomics Oct 1, 2024
f0080ce
Merge branch 'main' into docs/update
robomics Oct 9, 2024
56d01ff
Update CLI reference [no ci]
robomics Oct 9, 2024
5f0793c
Merge branch 'main' into docs/update
robomics Oct 10, 2024
bc776d7
Fix permissions
robomics Oct 10, 2024
1176186
Update API docs [no ci]
robomics Oct 10, 2024
a91dffb
Bugfix [no ci]
robomics Oct 10, 2024
11270c1
Fix hictk * --help messages [no ci]
robomics Oct 10, 2024
d1f4f9a
Merge branch 'main' into docs/update
robomics Oct 10, 2024
ff71292
Address clang-tidy warnings
robomics Oct 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions .github/workflows/lint-cff.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Copyright (C) 2024 Roberto Rossini <roberros@uio.no>
# SPDX-License-Identifier: MIT

name: Lint CITATION.cff

on:
push:
branches: [main]
paths:
- ".github/workflows/lint-cff.yml"
- "CITATION.cff"

pull_request:
paths:
- ".github/workflows/lint-cff.yml"
- "CITATION.cff"

# https://stackoverflow.com/a/72408109
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

defaults:
run:
shell: bash

jobs:
lint-cff:
runs-on: ubuntu-latest
name: Lint CITATION.cff

steps:
- uses: actions/checkout@v4
with:
sparse-checkout: CITATION.cff
sparse-checkout-cone-mode: false

- name: Generate DESCRIPTION file
run: |
cat << EOF > DESCRIPTION
Package: hictk
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R:
person("First", "Last", , "first.last@example.com", role = c("aut", "cre"))
Description: What the package does (one paragraph).
License: MIT
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Imports:
cffr
EOF

- name: Setup R
uses: r-lib/actions/setup-r@v2

- name: Add requirements
uses: r-lib/actions/setup-r-dependencies@v2

- name: Lint CITATION.cff
run: Rscript -e 'cffr::cff_validate("CITATION.cff")'
21 changes: 11 additions & 10 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,19 @@
version: 2

build:
os: ubuntu-22.04
apt_packages:
- librsvg2-bin
os: ubuntu-24.04
tools:
python: "3.11"
python: "3.12"

sphinx:
configuration: docs/conf.py

python:
install:
- requirements: docs/requirements.txt
commands:
- pip install -r docs/requirements.txt
- docs/update_index_links.py --root-dir "$PWD" --inplace
- make -C docs linkcheck
- make -C docs html
- make -C docs latexpdf
- mkdir -p "$READTHEDOCS_OUTPUT/pdf"
- cp -r docs/_build/html "$READTHEDOCS_OUTPUT/"
- cp docs/_build/latex/hictk.pdf "$READTHEDOCS_OUTPUT/pdf/"

formats:
- pdf
35 changes: 29 additions & 6 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,19 @@ abstract: 'Blazing fast toolkit to work with .hic and .cool files.'
doi: '10.5281/zenodo.8214220'
url: 'https://github.com/paulsengroup/hictk'
repository-code: 'https://github.com/paulsengroup/hictk'
repository-artifact: 'https://github.com/paulsengroup/hictk/pkgs/container/hictk'
type: software
license: MIT
keywords:
- bioinformatics
- cxx
- conversion
- cooler
- cli-application
- hic
- cxx17
- cxx-library
- hictk
preferred-citation:
type: article
authors:
Expand All @@ -30,10 +41,22 @@ preferred-citation:
orcid: 'https://orcid.org/0000-0002-7918-5495'
email: jonas.paulsen@ibv.uio.no
affiliation: 'Department of Biosciences, University of Oslo'
doi: '10.1101/2023.11.26.568707'
url: 'https://doi.org/10.1101/2023.11.26.568707'
journal: 'Cold Spring Harbor Laboratory'
year: 2023
month: 11
doi: '10.1093/bioinformatics/btae408'
url: 'https://academic.oup.com/bioinformatics/article/40/7/btae408/7698028'
journal: 'Bioinformatics'
year: 2024
month: 06
title: 'hictk: blazing fast toolkit to work with .hic and .cool files'
abstract: 'We developed hictk, a toolkit that can transparently operate on .hic and .cool files with excellent performance. The toolkit is written in C++ and consists of a C++ library with Python bindings as well as CLI tools to perform common operations directly from the shell, including converting between .hic and .mcool formats. We benchmark the performance of hictk and compare it with other popular tools and libraries. We conclude that hictk significantly outperforms existing tools while providing the flexibility of natively working with both file formats without code duplication.'
abstract: >
Hi-C is gaining prominence as a method for mapping genome organization.
With declining sequencing costs and a growing demand for higher-resolution data, efficient tools for processing Hi-C datasets at different resolutions are crucial.
Over the past decade, the .hic and Cooler file formats have become the de-facto standard to store interaction matrices produced by Hi-C experiments in binary format.
Interoperability issues make it unnecessarily difficult to convert between the two formats and to develop applications that can process each format natively.

We developed hictk, a toolkit that can transparently operate on .hic and .cool files with excellent performance.
The toolkit is written in C++ and consists of a C++ library with Python and R bindings as well as CLI tools to perform common operations directly from the shell, including converting between .hic and .mcool formats. We benchmark the performance of hictk and compare it with other popular tools and libraries.
We conclude that hictk significantly outperforms existing tools while providing the flexibility of natively working with both file formats without code duplication.

The hictk library, Python bindings and CLI tools are released under the MIT license as a multi-platform application available at github.com/paulsengroup/hictk.
Pre-built binaries for Linux and macOS are available on bioconda.
Python bindings for hictk are available on GitHub at github.com/paulsengroup/hictkpy, while R bindings are available on GitHub at github.com/paulsengroup/hictkR.
37 changes: 19 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,41 +23,42 @@ hictk is a blazing fast toolkit to work with .hic and .cool files.

This repository hosts `hictk`: a set of CLI tools to work with Cooler, as well as `libhictk`: the C++ library underlying `hictk`.

Python bindings for `libhictk` are available at [paulsengroup/hictkpy](https://github.com/paulsengroup/hictkpy).
Python bindings for `libhictk` are available at [paulsengroup/hictkpy](https://github.com/paulsengroup/hictkpy), while R bindings are published at [paulsengroup/hictkR](https://github.com/paulsengroup/hictkR).

hictk is capable of reading files in `.cool`, `.mcool`, `.scool` and `.hic` format (including hic v9) as well as writing `.hic`, `.cool` and `.mcool` files.

## Installing hictk

hictk is developed on Linux and tested on Linux, MacOS and Windows.

hictk can be installed using containers, bioconda or directly from source. Refer to [Installation](https://hictk.readthedocs.io/en/latest/installation.html) for more information.
hictk can be installed using containers, bioconda or directly from source. Refer to [Installation](https://hictk.readthedocs.io/en/stable/installation.html) for more information.

## Running hictk

hictk provides the following subcommands:

| subcommand | description |
| ---------------------- | ---------------------------------------------------------------------------------- |
| **balance** | Balance HiC matrices using ICE, SCALE or VC. |
| **convert** | Convert matrices to a different format. |
| **dump** | Dump data from .hic and Cooler files to stdout. |
| **fix-mcool** | Fix corrupted .mcool files. |
| **load** | Build .cool and .hic files from interactions in various text formats. |
| **merge** | Merge multiple Cooler or .hic files into a single file. |
| **rename-chromosomes** | Rename chromosomes found in a Cooler file. |
| **validate** | Validate .hic and Cooler files. |
| **zoomify** | Convert single-resolution Cooler and .hic files to multi-resolution by coarsening. |

Refer to [Quickstart (CLI)](https://hictk.readthedocs.io/en/latest/quickstart_cli.html) and [CLI Reference](https://hictk.readthedocs.io/en/latest/cli_reference.html) for more details.
| subcommand | description |
| ---------------------- | ---------------------------------------------------------------------------------------------- |
| **balance** | Balance Hi-C files using ICE, SCALE, or VC. |
| **convert** | Convert Hi-C files between different formats. |
| **dump** | Read interactions and other kinds of data from .hic and Cooler files and write them to stdout. |
| **fix-mcool** | Fix corrupted .mcool files. |
| **load** | Build .cool and .hic files from interactions in various text formats. |
| **merge** | Merge multiple Cooler or .hic files into a single file. |
| **metadata** | Print file metadata to stdout. |
| **rename-chromosomes** | Rename chromosomes found in a Cooler file. |
| **validate** | Validate .hic and Cooler files. |
| **zoomify** | Convert single-resolution Cooler and .hic files to multi-resolution by coarsening. |

Refer to [Quickstart (CLI)](https://hictk.readthedocs.io/en/stable/quickstart_cli.html) and [CLI Reference](https://hictk.readthedocs.io/en/stable/cli_reference.html) for more details.

## Using libhictk

libhictk can be installed in various way, including with Conan and CMake FetchContent. Section [Quickstart (API)](https://hictk.readthedocs.io/en/latest/quickstart_api.html) of hictk documentation contains further details on how this can be accomplished.
libhictk can be installed in various way, including with Conan and CMake FetchContent. Section [Quickstart (API)](https://hictk.readthedocs.io/en/stable/quickstart_api.html) of hictk documentation contains further details on how this can be accomplished.

[Quickstart (API)](https://hictk.readthedocs.io/en/latest/quickstart_api.html) also showcases the basic functionality offered by libhictk. For more complex examples refer to the sample programs under the [examples/](./examples/) folder as well as to the [source code](./src/hictk/) of hictk.
[Quickstart (API)](https://hictk.readthedocs.io/en/stable/quickstart_api.html) also showcases the basic functionality offered by libhictk. For more complex examples refer to the sample programs under the [examples/](./examples/) folder as well as to the [source code](./src/hictk/) of hictk.

The public C++ API of hictk is documented in the [C++ API Reference](https://hictk.readthedocs.io/en/latest/cpp_api/index.html) section of hictk documentation.
The public C++ API of hictk is documented in the [C++ API Reference](https://hictk.readthedocs.io/en/stable/cpp_api/index.html) section of hictk documentation.

## Citing

Expand Down
Binary file added docs/assets/4dnucleome_bug_notice.pdf
Binary file not shown.
28 changes: 15 additions & 13 deletions docs/balancing_matrices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,22 @@ The following is an example showing how to balance a .cool file using ICE.

user@dev:/tmp$ hictk balance ice 4DNFIZ1ZVXC8.mcool::/resolutions/1000

[2023-10-01 13:18:02.119] [info]: Running hictk v0.0.2-f83f93e
[2023-10-01 13:18:02.130] [info]: Writing interactions to temporary file /tmp/4DNFIZ1ZVXC8.tmp0...
[2023-10-01 13:18:05.098] [info]: Initializing bias vector...
[2023-10-01 13:18:05.099] [info]: Masking rows with fewer than 10 nnz entries...
[2023-10-01 13:18:06.298] [info]: Masking rows using mad_max=5...
[2023-10-01 13:18:06.971] [info]: Iteration 1: 36874560.192587376
[2023-10-01 13:18:07.634] [info]: Iteration 2: 21347543.04950776
[2023-10-01 13:18:08.307] [info]: Iteration 3: 7819314.542541969
[2024-09-26 16:02:19.731] [info]: Running hictk v1.0.0-fbdcb591
[2024-09-26 16:02:19.731] [info]: balancing using ICE (GW_ICE)
[2024-09-26 16:02:19.734] [info]: Writing interactions to temporary file /tmp/hictk-tmp-XXXX1ZC9FF/4DNFIZ1ZVXC8.mcool.tmp...
[2024-09-26 16:02:22.480] [info]: Initializing bias vector...
[2024-09-26 16:02:22.482] [info]: Masking rows with fewer than 10 nnz entries...
[2024-09-26 16:02:23.392] [info]: Masking rows using mad_max=5...
[2024-09-26 16:02:23.860] [info]: Iteration 1: 36452362.243888594
[2024-09-26 16:02:24.327] [info]: Iteration 2: 21649057.88060747
[2024-09-26 16:02:24.792] [info]: Iteration 3: 7890065.688497526
...
[2023-10-01 13:19:20.365] [info]: Iteration 105: 2.1397932757529552e-05
[2023-10-01 13:19:21.146] [info]: Iteration 106: 1.6604770462001875e-05
[2023-10-01 13:19:21.870] [info]: Iteration 107: 1.2885285040054778e-05
[2023-10-01 13:19:22.608] [info]: Iteration 108: 9.99900768769869e-06
[2023-10-01 13:19:22.619] [info]: Writing weights to 4DNFIZ1ZVXC8.mcool::/resolutions/1000/bins/weight...
[2024-09-26 16:03:12.285] [info]: Iteration 107: 2.0533518142916073e-05
[2024-09-26 16:03:12.752] [info]: Iteration 108: 1.601698258037195e-05
[2024-09-26 16:03:13.216] [info]: Iteration 109: 1.2493901433163442e-05
[2024-09-26 16:03:13.681] [info]: Iteration 110: 9.745791018854495e-06
[2024-09-26 16:03:13.707] [info]: Writing weights to 4DNFIZ1ZVXC8.mcool::/resolutions/1000/bins/GW_ICE...
[2024-09-26 16:03:13.708] [info]: Linking weights to 4DNFIZ1ZVXC8.mcool::/resolutions/1000/bins/weight...

When balancing files in .mcool or .hic formats, all resolutions are balanced.

Expand Down
Loading
Loading