Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use thread-safe loading of mel matrix IN audio.py #1511

Merged
merged 5 commits into from
Nov 6, 2023

Conversation

mzamini92
Copy link
Contributor

The mel_filters function is using a np.load function to load a pre-computed mel filterbank matrix. This function is not thread-safe, which means that if it is called from multiple threads at the same time, it may corrupt the data.

To fix this, you can use the torch.load function instead. This function is thread-safe, so it will not corrupt the data if it is called from multiple threads at the same time.

 The `mel_filters` function is using a `np.load` function to load a pre-computed mel filterbank matrix. This function is not thread-safe, which means that if it is called from multiple threads at the same time, it may corrupt the data.

To fix this, you can use the `torch.load` function instead. This function is thread-safe, so it will not corrupt the data if it is called from multiple threads at the same time.
@jstoone
Copy link

jstoone commented Jul 9, 2023

Hi @mzamini92 this is great! I would suggest that you update the PR title, so that it reflects what the overall change strives to address.

Something like:

Use thread-safe loading of mel matrix

@mzamini92 mzamini92 changed the title Update audio.py Use thread-safe loading of mel matrix IN audio.py Jul 9, 2023
@mzamini92
Copy link
Contributor Author

Hi @mzamini92 this is great! I would suggest that you update the PR title, so that it reflects what the overall change strives to address.

Something like:

Use thread-safe loading of mel matrix

Thanks for your comment. Done.

@jongwook
Copy link
Collaborator

Hi, thanks for pointing this out. Is there a documentation stating that np.load is not thread-safe? I think using torch.load() is a better idea in either case, but the docstring and the .npz file should be updated in conjunction.

@mzamini92
Copy link
Contributor Author

mzamini92 commented Sep 19, 2023

Hi, thanks for pointing this out. Is there a documentation stating that np.load is not thread-safe? I think using torch.load() is a better idea in either case, but the docstring and the .npz file should be updated in conjunction.

Hi @jongwook . yes
https://numpy.org/devdocs/reference/generated/numpy.load.html
In the warning it says: Loading files that contain object arrays uses the pickle module, which is not secure against erroneous or maliciously constructed data. Consider passing allow_pickle=False to load data that is known not to contain object arrays for the safer handling of untrusted sources.
also, updated the docstring below.

updated the docstring
@jongwook
Copy link
Collaborator

jongwook commented Nov 6, 2023

The warning is not related to thread safety, and torch.load() also has the same security hazard regarding unpickling. Also, torch.save does not perform compression unlike np.savez_compressed, which would add a huge binary to the git repo or make it slightly more annoying having to call a decompress function. I'll just add allow_pickle=False instead to address your concern.

@jongwook jongwook merged commit 7dfcd56 into openai:main Nov 6, 2023
8 checks passed
abyesilyurt pushed a commit to abyesilyurt/whisper that referenced this pull request Nov 13, 2023
* Update audio.py

 The `mel_filters` function is using a `np.load` function to load a pre-computed mel filterbank matrix. This function is not thread-safe, which means that if it is called from multiple threads at the same time, it may corrupt the data.

To fix this, you can use the `torch.load` function instead. This function is thread-safe, so it will not corrupt the data if it is called from multiple threads at the same time.

* Update audio.py

updated the docstring

* allow_pickle=False

* newline

---------

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
yaomingamd pushed a commit to ROCm/whisper that referenced this pull request Nov 27, 2023
* Update audio.py

 The `mel_filters` function is using a `np.load` function to load a pre-computed mel filterbank matrix. This function is not thread-safe, which means that if it is called from multiple threads at the same time, it may corrupt the data.

To fix this, you can use the `torch.load` function instead. This function is thread-safe, so it will not corrupt the data if it is called from multiple threads at the same time.

* Update audio.py

updated the docstring

* allow_pickle=False

* newline

---------

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants