-
Notifications
You must be signed in to change notification settings - Fork 157
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
2aef8b6
commit 8bae973
Showing
21 changed files
with
259 additions
and
55 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Required | ||
version: 2 | ||
|
||
build: | ||
os: "ubuntu-20.04" | ||
apt_packages: | ||
- libsndfile1 | ||
tools: | ||
python: "3.10" | ||
|
||
python: | ||
install: | ||
- method: pip | ||
path: . | ||
extra_requirements: | ||
- docs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Changelog | ||
|
||
<!--next-version-placeholder--> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Code of Conduct | ||
|
||
Everyone interacting in the project's codebases and documentation is expected to follow the [PyPA Code of Conduct](https://www.pypa.io/en/latest/code-of-conduct/). This includes, but is not limited to, issue trackers, chat rooms, mailing lists, and other virtual or real-life communication. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# Contributing | ||
|
||
## Development workflow | ||
|
||
Hi there! This repository follows the [GitHub flow](https://docs.github.com/en/get-started/quickstart/github-flow). The GitHub flow contains the main branch and many feature branches. Generally speaking, the main branch always uses no direct commit and only can be integrated by rebase and merge. The feature branches, like new features, bug fixes, refactoring, experiments, etc., are used for development. The GitHub flow keeps the main branch working well with documents and tests. | ||
|
||
## Commit | ||
|
||
This repository uses the [Angular commit style](https://github.com/angular/angular.js/blob/master/DEVELOPERS.md#commit-message-format), which looks like this: | ||
|
||
```shell | ||
<type>(optional scope): short summary in present tense | ||
|
||
(optional body: explains motivation for the change) | ||
|
||
(optional footer: note BREAKING CHANGES here, and issues to be closed) | ||
``` | ||
|
||
Generally speaking, you need to at least specify a type and a short summary for each commit. `<type>` refers to the kind of change made and is usually one of: | ||
|
||
- `feat`: A new feature. | ||
- `fix`: A bug fix. | ||
- `docs`: Documentation changes. | ||
- `style`: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc). | ||
- `refactor`: A code change that neither fixes a bug nor adds a feature. | ||
- `perf`: A code change that improves performance. | ||
- `test`: Changes to the test framework. | ||
- `build`: Changes to the build process or tools. | ||
|
||
By using the standardized commit message in this Angular commit style, the continuous integration configuration will automatically bump version numbers based on keywords it finds in commit messages. | ||
|
||
## References | ||
|
||
- [Git for Professionals Tutorial - Tools & Concepts for Mastering Version Control with Git](https://www.youtube.com/watch?v=Uszj_k0DGsg) | ||
- [GitHub flow](https://docs.github.com/en/get-started/quickstart/github-flow) | ||
- [How to Write a Git Commit Message](https://cbea.ms/git-commit/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Minimal makefile for Sphinx documentation | ||
|
||
# You can set these variables from the command line. | ||
SPHINXOPTS = | ||
SPHINXBUILD = python -m sphinx | ||
SPHINXPROJ = mixsim | ||
SOURCEDIR = source | ||
BUILDDIR = _build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
div.wy-nav-content { | ||
max-width: 800px; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
import importlib_metadata | ||
|
||
# -- Project information ----------------------------------------------------- | ||
project = "FullSubNet" | ||
author = "HAO Xiang <haoxiangsnr@gmail.com>" | ||
project_copyright = "2022, HAO Xiang" | ||
release = importlib_metadata.version(project) | ||
version = ".".join(release.split(".")[:2]) # e.g., "0.3" stand for the major is "0" and the minor is "3" | ||
|
||
# -- MetaConfig configuration --------------------------------------------------- | ||
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"] | ||
extensions = [ | ||
"myst_parser", # markdown file parser. | ||
"sphinx.ext.todo", # enable the todo. | ||
"sphinx.ext.autodoc", # provide automatic documentation for module (*.py), class, function, and typehints. | ||
"sphinx.ext.autosummary", # auto-generate the summary (include links) of the modules. | ||
"sphinx.ext.intersphinx", # enable cross-referencing between Sphinx projects. | ||
"sphinx.ext.viewcode", # add a helpful link to the source code of each object in the API reference sheet. | ||
"sphinx.ext.mathjax", # enable math support in the documentation. | ||
"sphinx.ext.napoleon", # [ordered] parse our docstrings and generate Google-style docstrings. | ||
"sphinxcontrib.autodoc_pydantic", # generate the suitable docstrings to pydantic models. | ||
] | ||
|
||
# -- Extension configuration ------------------------------------------------- | ||
napoleon_numpy_docstring = False | ||
napoleon_attr_annotations = True | ||
intersphinx_mapping = { | ||
"python": ("https://docs.python.org/3", None), | ||
"numpy": ("https://numpy.org/doc/stable/", None), | ||
} | ||
autosummary_generate = True | ||
autodoc_mock_imports = ["soundfile", "gpuRIR"] | ||
autodoc_pydantic_model_signature_prefix = "Config" | ||
autodoc_pydantic_member_order = "bysource" | ||
autodoc_pydantic_model_show_field_summary = False | ||
autodoc_pydantic_model_show_json = False | ||
autodoc_pydantic_model_show_validator_members = False | ||
autodoc_pydantic_model_show_validator_summary = False | ||
autodoc_pydantic_model_summary_list_order = "bysource" | ||
autodoc_pydantic_model_list_validators = False | ||
autodoc_pydantic_field_signature_prefix = "option" | ||
|
||
# -- Options for HTML output ------------------------------------------------- | ||
html_theme = "sphinx_rtd_theme" | ||
html_context = { | ||
"display_github": True, # edit on Github, see https://github.com/readthedocs/sphinx_rtd_theme/issues/529 | ||
"github_user": "haoxiangsnr", | ||
"github_repo": "FullSubNet", | ||
"github_version": "main", | ||
} | ||
html_static_path = ["_static"] | ||
html_css_files = [ | ||
"css/custom.css", | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
============================================ | ||
Welcome to FullSubNet's documentation! | ||
============================================ | ||
|
||
The FullSubNet a full-band and sub-band fusion model for single-channel real-time speech enhancement. The full-band and sub-band refer to the models that input full-band and sub-band noisy spectral feature, output full-band and sub-band speech target, respectively. The sub-band model processes each frequency independently. Its input consists of one frequency and several context frequencies. The output is the prediction of the clean speech target for the corresponding frequency. These two types of models have distinct characteristics. The full-band model can capture the global spectral context and the long-distance cross-band dependencies. However, it lacks the ability to modeling signal stationarity and attending the local spectral pattern. The sub-band model is just the opposite. In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages. We conducted experiments on the DNS challenge (INTERSPEECH 2020) dataset to evaluate the proposed method. Experimental results show that full-band and sub-band information are complementary, and the FullSubNet can effectively integrate them. Besides, the performance of the FullSubNet also exceeds that of the top-ranked methods in the DNS Challenge (INTERSPEECH 2020). | ||
|
||
.. toctree:: | ||
:caption: Getting started | ||
:maxdepth: 1 | ||
:titlesonly: | ||
|
||
usage/prerequisites.md | ||
usage/getting_started.md | ||
usage/release.md | ||
usage/perf.md | ||
usage/presentation.md | ||
|
||
.. toctree:: | ||
:caption: Reference | ||
:maxdepth: 1 | ||
:titlesonly: | ||
|
||
reference/contributing.md | ||
reference/conduct.md | ||
reference/changelog.md | ||
|
||
|
||
Indices and tables | ||
------------------ | ||
|
||
* :ref:`genindex` | ||
* :ref:`modindex` | ||
* :ref:`search` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
```{include} ../../../CHANGELOG.md | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
```{include} ../../../CODE_OF_CONDUCT.md | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
```{include} ../../../CONTRIBUTING.md | ||
``` |
File renamed without changes
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Performance | ||
|
||
![perf](fullsubnet-result.png) |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# FullSubNet presentation | ||
|
||
[![Click it to show a video](https://i.imgur.com/s3mq7NNl.png)](https://youtu.be/XJeE-MWDlk0 "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# FullSubNet Checkpoints and RIRs | ||
|
||
## Checkpoints | ||
|
||
This [release](https://github.com/haoxiangsnr/FullSubNet/releases) has two model checkpoints. All checkpoints include "model_state_dict", "optimizer_state_dict", and some other meta information. | ||
|
||
The first model checkpoint is the original model checkpoint at the 58th epoch. The performance is shown in this table: | ||
|
||
| | With Reverb | | | | No Reverb | | | | | ||
|:----------:|:-----------:|:-------:|:------:|:-----:|:---------:|:-------:|:------:|:-----:| | ||
| Method | WB-PESQ | NB-PESQ | SI-SDR | STOI | WB-PESQ | NB-PESQ | SI-SDR | STOI | | ||
| FullSubNet | 2.987 | 3.496 | 15.756 | 0.926 | 2.889 | 3.385 | 17.635 | 0.964 | | ||
|
||
In addition, some people are interested in the performance when using cumulative normalization. The below one is a pre-trained FullSubNet using cumulative normalization: | ||
|
||
| | With Reverb | | | | No Reverb | | | | | ||
|:----------:|:-----------:|:-------:|:------:|:-----:|:---------:|:-------:|:------:|:-----:| | ||
| Method | WB-PESQ | NB-PESQ | SI-SDR | STOI | WB-PESQ | NB-PESQ | SI-SDR | STOI | | ||
|FullSubNet (Cumulative Norm)| 2.978| 3.503 | 15.820 | 0.928 | 2.863| 3.376 | 17.913 | 0.964 | | ||
|
||
If you want to inference or fine-tune based on these checkpoints, please check the usage in the documents. | ||
|
||
## Room Impulse Responses | ||
|
||
As mentioned in the paper, the room impulse responses (RIRs) come from the Multichannel Impulse Response Database and the Reverb Challenge dataset. Please download the zip package "RIR (Multichannel Impulse Response Database + The REVERB challenge).zip" if you would like to retrain the FullSubNet. | ||
|
||
Note that the zip package includes a folder "rir" and a file "rir.txt." The folder "rir" contains all separated single-channel RIRs extracted from the above two datasets. The suffix (e.g., "m_<n>") of the filename is the index of a microphone. The file "rir.txt" is just a path list of all RIRs. Please modify it to fit your case before you use it. | ||
|
||
For some cases, if you would like to extract channel by yourself, you can download these RIRs from pages: | ||
1. Multichannel Impulse Response Database: https://www.eng.biu.ac.il/~gannot/RIR_DATABASE/ | ||
2. The REVERB challenge data: https://reverb2014.dereverberation.com/tools/reverb_tools_for_Generate_mcTrainData.tgz and https://reverb2014.dereverberation.com/tools/reverb_tools_for_Generate_SimData.tgz |
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# ----------------- Build System ----------------- | ||
[build-system] | ||
requires = ["flit_core >=3.2,<4"] | ||
build-backend = "flit_core.buildapi" | ||
|
||
# ----------------- Metadata ----------------- | ||
[project] | ||
name = "FullSubNet" | ||
description = "The FullSubNet a full-band and sub-band fusion model for single-channel real-time speech enhancement." | ||
authors = [{ name = "HAO Xiang", email = "haoxiangsnr@gmail.com" }] | ||
readme = "README.md" | ||
requires-python = ">=3.10" | ||
version = "0.0.1" | ||
classifiers = [ | ||
"Programming Language :: Python :: 3.10", | ||
"License :: OSI Approved :: MIT License", | ||
"Environment :: GPU :: NVIDIA CUDA", | ||
"Operating System :: OS Independent", | ||
] | ||
keywords = ["speech enhancement", "single-channel"] | ||
dependencies = [ | ||
"webrtcvad", | ||
"numpy", | ||
"scipy", | ||
"matplotlib", | ||
"geomdl", | ||
"joblib", | ||
"librosa", | ||
"pyroomacoustics", | ||
"soundfile", | ||
"toml", | ||
"tqdm", | ||
"pydantic", | ||
"typing_inspect", | ||
] | ||
[project.optional-dependencies] | ||
test = ["pytest", "pytest-cov"] | ||
docs = ["sphinx-rtd-theme", "myst-nb", "autodoc_pydantic"] # TODO add python-semantic-release? | ||
build = ["flit"] | ||
[project.urls] | ||
Documentation = "https://FullSubNet.readthedocs.io/en/latest/" | ||
Source = "https://github.com/haoxiangsnr/FullSubNet" |