Key Changes
- Observability improvements in JetStream Server (prometheus metrics)
- Tensorboard support for remote access
- Engine API update for TTFT and TPOT measurements
- Hugginface tokenizer support
- Copybara G3 support
- Threading optimizations
What's Changed
- Add tensorboard plugin dep for remote access by @JoeZijunZhou in #97
- Update benchmark config for xlml automation by @morgandu in #96
- Minor fix by @morgandu in #98
- Add ssh port forward support for profile readme by @FanhaiLu1 in #99
- Add inference sampling utils in JetStream by @JoeZijunZhou in #100
- Add profiling server for proxy backend by @zhihaoshan-google in #101
- Change
jetstream_slots_available_percentage
tojetstream_slots_used_percentage
by @Bslabe123 in #102 - Bump urllib3 from 2.2.0 to 2.2.2 in the pip group across 1 directory by @dependabot in #104
- Added
jetstream_transfer_backlog_size
andjetstream_generate_backlog_size
metrics by @Bslabe123 in #103 - Update docs for benchmark warmup mode by @JoeZijunZhou in #106
- Update docs with metrics observation instructions by @Bslabe123 in #107
- Prefill return first token by @jwyang-google in #105
- change the detokenization thread to return the actual eos token. by @jwyang-google in #108
- Add loadgen in dev image by @morgandu in #109
- Bump certifi from 2024.2.2 to 2024.7.4 in the pip group by @dependabot in #110
- Bump zipp from 3.17.0 to 3.19.1 in the pip group by @dependabot in #111
- Model warmup support with AOT and endpoint for JetStream by @vivianrwu in #92
- Cleanup orchestrator proto by @JoeZijunZhou in #112
- Update images for mlperf by @morgandu in #113
- image fix by @morgandu in #114
- del prefill_result & update dev image by @morgandu in #116
- Fix benchmark script for saving benchmark result by @lsy323 in #117
- Add
jetstream_server_startup_latency
metric by @Bslabe123 in #118 - Add http server to JetStream by @JoeZijunZhou in #115
- Free engine resource for the slot after finished one request decoding by @FanhaiLu1 in #119
- Add
jetstream_request_success_count
metric by @Bslabe123 in #124 - Request input/output size metrics by @Bslabe123 in #123
- Makefile by @Bslabe123 in #125
- Various request time metrics by @Bslabe123 in #121
- Standalone JetStream removes pinned deps by @JoeZijunZhou in #129
- Update deps file by @JoeZijunZhou in #130
- Manual model warmup to resolve AOT model warmup performance degradation by @vivianrwu in #126
- Update JetStream instructions by @yeandy in #132
- Add an optional parameter for sampling in prefill / sample. by @qihqi in #133
- remove excessive logs in production run by changing from DEBUG to INFO by @jwyang-google in #134
- Change the default message for requester.py and remove mlperf 4.1 install for proxy version support. by @zhihaoshan-google in #136
- Change previewutilities -> pathwaysutils by @vivianrwu in #138
- Add option to use hf tokenizer by @RissyRan in #147
- Rename third_party folder to Avoid Copybara g3 Errors by @jyj0w0 in #148
- add seperate prefill detokenization thread by @zhihaoshan-google in #152
- Revert the change created by copybara by @jyj0w0 in #156
New Contributors
- @lsy323 made their first contribution in #117
- @RissyRan made their first contribution in #147
- @jyj0w0 made their first contribution in #148
Full Changelog: v0.2.2...v0.3