Releases: huggingface/optimum-benchmark
Releases · huggingface/optimum-benchmark
v0.5.0
What's Changed
- Fix images building by @IlyasMoutawwakil in #242
- Faster quality check by @IlyasMoutawwakil in #243
- Decode output of
nvmlDeviceGetName
to avoid JSON serialize issue by @KeitaW in #240 - Fix makefile typo by @IlyasMoutawwakil in #244
- fix neural compressor backend by @baptistecolle in #245
- Update cuda images by @IlyasMoutawwakil in #246
- Add t4 for llm perf leaderboard by @baptistecolle in #238
- add optimum-intel ipex backend into benchmark by @yao-matrix in #250
- WIP fix rocm runners by @baptistecolle in #249
- Code Style by @IlyasMoutawwakil in #254
- Update ROCm by @IlyasMoutawwakil in #253
- Build from source quantization packages by @baptistecolle in #239
- Fix py-txi ci by @IlyasMoutawwakil in #255
- Fix API tests on ROCm by @IlyasMoutawwakil in #256
-
- refine cpu Dockerfile for better performance 2. add ipex_bert example by @yao-matrix in #257
- Add support for intel in leaderboard by @baptistecolle in #248
- fix broken canonical list by @baptistecolle in #262
- Fix broken canonical list by @baptistecolle in #264
- Fix issue with CodeCarbon lock by @regisss in #265
- Set is_distributed false by default in vllm by @asesorov in #266
- fix broken cuda and rocm images by @baptistecolle in #263
- Styling by @IlyasMoutawwakil in #267
- Labeling system in CI by @IlyasMoutawwakil in #268
- Multi-gpu vllm by @IlyasMoutawwakil in #269
- fix multi gpu ipc by @IlyasMoutawwakil in #270
- Allow multiple runs and handle connection communication errors by @IlyasMoutawwakil in #271
- Removing barriers by @IlyasMoutawwakil in #273
- Update readme with IPEX by @IlyasMoutawwakil in #274
- Distributed trt-llm by @IlyasMoutawwakil in #275
- ipex backend enhancements by @yao-matrix in #272
- Bump version by @IlyasMoutawwakil in #278
- Pass backend name to EnergyTracker in Training scenario by @asesorov in #279
- move to new runners by @glegendre01 in #281
- Markdown Report by @IlyasMoutawwakil in #280
- dev version by @IlyasMoutawwakil in #284
- Using intermediate env vars in CI by @IlyasMoutawwakil in #290
- remove old code linked to llm-perf leaderboard by @baptistecolle in #291
- Image Text To Text Support by @IlyasMoutawwakil in #296
- Feat: reimplement vllm backend beam search using logprobs by @vicoooo26 in #293
- Add the logic for Energy Star by @regisss in #261
- Add torchao to optimum as a pytorch backend configuration by @jerryzh168 in #297
- fix llamacpp and windows libuv by @IlyasMoutawwakil in #298
- Remove DP vs TP distinction and simplify aggregation across processes by @IlyasMoutawwakil in #299
- Fix misc test by @IlyasMoutawwakil in #300
- Adding latency and memory to energy star by @IlyasMoutawwakil in #302
- Add per_step diffusion measurments by @IlyasMoutawwakil in #303
- Fixes by @IlyasMoutawwakil in #304
- Secure Instinct CI by @IlyasMoutawwakil in #301
- Remove non maintained backends (llm-swarm, inc) by @IlyasMoutawwakil in #305
- Test examples by @IlyasMoutawwakil in #306
- Optional backend kwargs by @IlyasMoutawwakil in #307
- Fix trt llm by @IlyasMoutawwakil in #308
- Protect hf token by @IlyasMoutawwakil in #309
- Preparing for version 0.5 and checking CI by @IlyasMoutawwakil in #310
New Contributors
- @KeitaW made their first contribution in #240
- @yao-matrix made their first contribution in #250
- @asesorov made their first contribution in #266
- @glegendre01 made their first contribution in #281
- @vicoooo26 made their first contribution in #293
- @jerryzh168 made their first contribution in #297
Full Changelog: v0.4.0...v0.5.0
v0.4.0
What's Changed
- Refactor backends and add
load
tracking by @IlyasMoutawwakil in #227 - Update readme by @IlyasMoutawwakil in #228
- Update vllm backend to support offline and online serving modes by @IlyasMoutawwakil in #232
- Misc CI updates and multi-platform support by @IlyasMoutawwakil in #233
- Add llama.cpp backend by @baptistecolle in #231
- Misc changes and fixes for llama cpp by @IlyasMoutawwakil in #236
- release by @IlyasMoutawwakil in #237
New Contributors
- @baptistecolle made their first contribution in #231
Full Changelog: v0.3.1...v0.4.0
v0.3.1
What's Changed
- Fix per token latency by @IlyasMoutawwakil in #223
- Per token latency outliers by @IlyasMoutawwakil in #225
- Patch release by @IlyasMoutawwakil in #224
Full Changelog: v0.3.0...v0.3.1
v0.3.0
What's Changed
- Remove experiment schema by @IlyasMoutawwakil in #210
- Numactl support by @IlyasMoutawwakil in #211
- Fix sentence transformers models by @IlyasMoutawwakil in #212
- Enable security checks by @mfuntowicz in #216
- Fix
PyTorchBackend
TP vs DP inputs distribution across replicas and shards by @IlyasMoutawwakil in #218 - Pin eager attn in torch-ort backend by @IlyasMoutawwakil in #219
- Fix INC by @IlyasMoutawwakil in #220
- bump version 0.3.0 by @IlyasMoutawwakil in #221
New Contributors
- @mfuntowicz made their first contribution in #216
Full Changelog: v0.2.1...v0.3.0
v0.2.1
What's Changed
- Llm perf update by @IlyasMoutawwakil in #206
- Fix diffusers repo id naming by @IlyasMoutawwakil in #208
- Release by @IlyasMoutawwakil in #209
Full Changelog: v0.2.0...v0.2.1
v0.2.0
What's Changed
- [feature][refactor] Optimum-Benchmark API by @IlyasMoutawwakil in #118
- [feature][refactor] Benchmark Reporting + Hub Mixin by @IlyasMoutawwakil in #122
- [feature][refactor] Better Metrics and Trackers by @IlyasMoutawwakil in #124
- moved complex examples by @IlyasMoutawwakil in #127
- Fix ort inputs filtering by @IlyasMoutawwakil in #129
- Support per token measurements through logits processor by @IlyasMoutawwakil in #130
- Fix git revisions by @IlyasMoutawwakil in #131
- Support rocm benchmarking with text generation inference backend by @IlyasMoutawwakil in #132
- Use Py-TGI and add testing by @IlyasMoutawwakil in #134
- Text2Text input generator by @IlyasMoutawwakil in #139
- Faster DeepSpeed engine initialization by @IlyasMoutawwakil in #140
- llm-swarm backend integration for slurm clusters by @IlyasMoutawwakil in #142
- Better hub utils by @IlyasMoutawwakil in #143
- Support Py-TXI (TGI and TEI) by @IlyasMoutawwakil in #147
- Migrate CUDA CI workflows by @IlyasMoutawwakil in #156
- Fix: Enable Energy Calculation in Benchmarking by Implementing Subtraction Method by @karthickai in #149
- add test configurations for quantization with onnxruntime, awq, bnb (#95) by @aliabdelkader in #144
- Fix gptq exllamav2 check by @IlyasMoutawwakil in #152
- Fix gptq exllamav2 check by @IlyasMoutawwakil in #157
- Compute the real prefill latency using the logits processor by @IlyasMoutawwakil in #150
- zentorch plugin support by @IlyasMoutawwakil in #162
- torch compile diffusers vae by @IlyasMoutawwakil in #163
- Fix Exllama V2 typo by @IlyasMoutawwakil in #165
- Added test llama-2-7b with GPTQ quant. scheme by @lopozz in #141
- Update zentorch plugin by @IlyasMoutawwakil in #167
- Fix
to_csv
andto_dataframe
by @IlyasMoutawwakil in #168 - add test configurations to run Torch compile (#95) by @aliabdelkader in #155
- Images builder CI by @IlyasMoutawwakil in #171
- Refactor test configs and CI by @IlyasMoutawwakil in #170
- Add OpenVINO GPU support by @helena-intel in #172
- Update readme, examples, makefile by @IlyasMoutawwakil in #173
- Add energy star benchmark by @regisss in #169
- Remove rocm5.6 support and add global vram tracking using pyrsmi by @IlyasMoutawwakil in #174
- Explicitly passing visible devices to isolation process by @IlyasMoutawwakil in #177
- Trackers revamp by @IlyasMoutawwakil in #178
- Full hub mixin integration by @IlyasMoutawwakil in #179
- Fix tasks by @IlyasMoutawwakil in #181
- fix py-txi by @IlyasMoutawwakil in #182
- specify which repo to push to by @IlyasMoutawwakil in #183
- Refactor prefill and inference benchmark by @IlyasMoutawwakil in #184
- Add LLM-Perf script and CI by @IlyasMoutawwakil in #185
- Fix isolation by @IlyasMoutawwakil in #186
- First warmup with the same input/output as benchmark by @IlyasMoutawwakil in #188
- Remove unnecessary surrogates attached to double quotes by @yamaura in #192
- save benchmark in files instead of passing them through a queue by @IlyasMoutawwakil in #191
- [refactor] add scenarios, drop experiments by @IlyasMoutawwakil in #187
- Use queues to not pollute cwd by @IlyasMoutawwakil in #193
- Update llm perf by @IlyasMoutawwakil in #195
- Gather llm perf benchmarks by @IlyasMoutawwakil in #198
- Build and Publish Images by @IlyasMoutawwakil in #199
- Communicate error/exception/traceback with main process by @IlyasMoutawwakil in #200
- vLLM backend by @IlyasMoutawwakil in #196
- update readme by @IlyasMoutawwakil in #201
- added 1xA100 by @IlyasMoutawwakil in #202
- Test quality for different python versions by @IlyasMoutawwakil in #203
- release by @IlyasMoutawwakil in #204
- v0.2.0 release bis by @IlyasMoutawwakil in #205
New Contributors
- @karthickai made their first contribution in #149
- @aliabdelkader made their first contribution in #144
- @lopozz made their first contribution in #141
- @helena-intel made their first contribution in #172
- @regisss made their first contribution in #169
- @yamaura made their first contribution in #192
Full Changelog: 0.0.1...v0.2.0