Release v0.3 · AI-Hypercomputer/JetStream

Key Changes

Observability improvements in JetStream Server (prometheus metrics)
Tensorboard support for remote access
Engine API update for TTFT and TPOT measurements
Hugginface tokenizer support
Copybara G3 support
Threading optimizations

What's Changed

Add tensorboard plugin dep for remote access by @JoeZijunZhou in #97
Update benchmark config for xlml automation by @morgandu in #96
Minor fix by @morgandu in #98
Add ssh port forward support for profile readme by @FanhaiLu1 in #99
Add inference sampling utils in JetStream by @JoeZijunZhou in #100
Add profiling server for proxy backend by @zhihaoshan-google in #101
Change jetstream_slots_available_percentage to jetstream_slots_used_percentage by @Bslabe123 in #102
Bump urllib3 from 2.2.0 to 2.2.2 in the pip group across 1 directory by @dependabot in #104
Added jetstream_transfer_backlog_size and jetstream_generate_backlog_size metrics by @Bslabe123 in #103
Update docs for benchmark warmup mode by @JoeZijunZhou in #106
Update docs with metrics observation instructions by @Bslabe123 in #107
Prefill return first token by @jwyang-google in #105
change the detokenization thread to return the actual eos token. by @jwyang-google in #108
Add loadgen in dev image by @morgandu in #109
Bump certifi from 2024.2.2 to 2024.7.4 in the pip group by @dependabot in #110
Bump zipp from 3.17.0 to 3.19.1 in the pip group by @dependabot in #111
Model warmup support with AOT and endpoint for JetStream by @vivianrwu in #92
Cleanup orchestrator proto by @JoeZijunZhou in #112
Update images for mlperf by @morgandu in #113
image fix by @morgandu in #114
del prefill_result & update dev image by @morgandu in #116
Fix benchmark script for saving benchmark result by @lsy323 in #117
Add jetstream_server_startup_latency metric by @Bslabe123 in #118
Add http server to JetStream by @JoeZijunZhou in #115
Free engine resource for the slot after finished one request decoding by @FanhaiLu1 in #119
Add jetstream_request_success_count metric by @Bslabe123 in #124
Request input/output size metrics by @Bslabe123 in #123
Makefile by @Bslabe123 in #125
Various request time metrics by @Bslabe123 in #121
Standalone JetStream removes pinned deps by @JoeZijunZhou in #129
Update deps file by @JoeZijunZhou in #130
Manual model warmup to resolve AOT model warmup performance degradation by @vivianrwu in #126
Update JetStream instructions by @yeandy in #132
Add an optional parameter for sampling in prefill / sample. by @qihqi in #133
remove excessive logs in production run by changing from DEBUG to INFO by @jwyang-google in #134
Change the default message for requester.py and remove mlperf 4.1 install for proxy version support. by @zhihaoshan-google in #136
Change previewutilities -> pathwaysutils by @vivianrwu in #138
Add option to use hf tokenizer by @RissyRan in #147
Rename third_party folder to Avoid Copybara g3 Errors by @jyj0w0 in #148
add seperate prefill detokenization thread by @zhihaoshan-google in #152
Revert the change created by copybara by @jyj0w0 in #156

New Contributors

@lsy323 made their first contribution in #117
@RissyRan made their first contribution in #147
@jyj0w0 made their first contribution in #148

Full Changelog: v0.2.2...v0.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3

Key Changes

What's Changed

New Contributors

Contributors