-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks #773
Benchmarks #773
Conversation
I'd like to suggest using the https://pypi.org/project/nvidia-ml-py/ Here's an example of how I'm using it: https://github.com/BBC-Esq/VectorDB-Plugin-for-LM-Studio/blob/main/src/metrics_bar.py |
As an aside, it's my understanding that "nvidia-ml-py" (i.e. the one I'm using) imports as "pynvml" NOT "nvidia-ml-py," which is what confused me initially. Multiple forks exist, some that have created pypi packages named similarly so...I'm about 85%-95% sure I have the official one. ;-) The only way it'd matter is if, for example, Nvidia updates nvidia-ml-py (e.g. to support modern gpus) while an unofficial fork doesn't...Can't speak to huggingface...perhaps they're using an unofficial fork that's regularly updated by the guy/gal. Either way, just thought I'd raise the issue in case it matters to you guys! |
Don't know if it's helpful, but here's a script of mine that collects more than just GPU VRAM usage...might be helpful for benchmarking. I've found that gpu/cuda usage is a useful metric as well as power usage... https://github.com/BBC-Esq/Nvidia_Gpu_Monitor/blob/main/metrics_pynvml.py |
If Ctranslate2 ever supports RocM, that'd be great, and you want to benchmark on AMD gpus I've struggled unsuccessfully to do it. I don't own an AMD gpu, almost bought one solely to program, but anyways here's where my research left off... |
Tks, I will look at this.
Unfortunately, I don't have either. |
ba937f9
to
563a51c
Compare
Benchmarks
This PR introduces functionality to measure benchmarking for memory, Word Error Rate (WER), and speed in Faster-whisper.
1. Memory
GPU
Use Python thread with the py3nvml module to monitor GPU memory. This approach will continuously track GPU memory consumption at set intervals, then return the maximum memory used for the inference function.
RAM
Use memory_profiler module to measure maximum increase of memory usage.
2. WER
Evaluate the Faster-whisper model on the LibriSpeech validation-clean dataset with streaming mode, meaning no audio data has to be downloaded to your local device.
Use jiwer and evaluate modules from HF to calculate WER.
3. Speed
Calculate the min of
args.repeat x time-averaged
over 10 inference function calls.Feel free to edit/update more benchmarking methods !