18 Dec 19:41

vipannalla

973647d

v0.3 Latest

Latest

Key Changes

Observability improvements in JetStream Server (prometheus metrics)
Tensorboard support for remote access
Engine API update for TTFT and TPOT measurements
Hugginface tokenizer support
Copybara G3 support
Threading optimizations

What's Changed

Add tensorboard plugin dep for remote access by @JoeZijunZhou in #97
Update benchmark config for xlml automation by @morgandu in #96
Minor fix by @morgandu in #98
Add ssh port forward support for profile readme by @FanhaiLu1 in #99
Add inference sampling utils in JetStream by @JoeZijunZhou in #100
Add profiling server for proxy backend by @zhihaoshan-google in #101
Change jetstream_slots_available_percentage to jetstream_slots_used_percentage by @Bslabe123 in #102
Bump urllib3 from 2.2.0 to 2.2.2 in the pip group across 1 directory by @dependabot in #104
Added jetstream_transfer_backlog_size and jetstream_generate_backlog_size metrics by @Bslabe123 in #103
Update docs for benchmark warmup mode by @JoeZijunZhou in #106
Update docs with metrics observation instructions by @Bslabe123 in #107
Prefill return first token by @jwyang-google in #105
change the detokenization thread to return the actual eos token. by @jwyang-google in #108
Add loadgen in dev image by @morgandu in #109
Bump certifi from 2024.2.2 to 2024.7.4 in the pip group by @dependabot in #110
Bump zipp from 3.17.0 to 3.19.1 in the pip group by @dependabot in #111
Model warmup support with AOT and endpoint for JetStream by @vivianrwu in #92
Cleanup orchestrator proto by @JoeZijunZhou in #112
Update images for mlperf by @morgandu in #113
image fix by @morgandu in #114
del prefill_result & update dev image by @morgandu in #116
Fix benchmark script for saving benchmark result by @lsy323 in #117
Add jetstream_server_startup_latency metric by @Bslabe123 in #118
Add http server to JetStream by @JoeZijunZhou in #115
Free engine resource for the slot after finished one request decoding by @FanhaiLu1 in #119
Add jetstream_request_success_count metric by @Bslabe123 in #124
Request input/output size metrics by @Bslabe123 in #123
Makefile by @Bslabe123 in #125
Various request time metrics by @Bslabe123 in #121
Standalone JetStream removes pinned deps by @JoeZijunZhou in #129
Update deps file by @JoeZijunZhou in #130
Manual model warmup to resolve AOT model warmup performance degradation by @vivianrwu in #126
Update JetStream instructions by @yeandy in #132
Add an optional parameter for sampling in prefill / sample. by @qihqi in #133
remove excessive logs in production run by changing from DEBUG to INFO by @jwyang-google in #134
Change the default message for requester.py and remove mlperf 4.1 install for proxy version support. by @zhihaoshan-google in #136
Change previewutilities -> pathwaysutils by @vivianrwu in #138
Add option to use hf tokenizer by @RissyRan in #147
Rename third_party folder to Avoid Copybara g3 Errors by @jyj0w0 in #148
add seperate prefill detokenization thread by @zhihaoshan-google in #152
Revert the change created by copybara by @jyj0w0 in #156

New Contributors

@lsy323 made their first contribution in #117
@RissyRan made their first contribution in #147
@jyj0w0 made their first contribution in #148

Full Changelog: v0.2.2...v0.3

Contributors

Bslabe123, qihqi, and 11 other contributors

Assets 2

31 May 18:44

github-actions

v0.2.2

ec26ec2

v0.2.2

Key Changes

Enable observability in JetStream Server (prometheus metrics)
Enable JAX profiler support on single-host JetStream Server
Support both text and token ids I/O for JetStream Decode API
Add health check API
Support MLPerf evaluation
Enable JetStream Server E2E tests
Increase unit test coverage (>=96%)

What's Changed

Accuracy eval mlperf by @jwyang-google in #76
Add metadata metrics by @yeandy in #77
Fix pad_tokens function description by @FanhaiLu1 in #80
Prometheus Metrics by @Bslabe123 in #71
Update JetStream grpc proto to support I/O with text and token ids by @JoeZijunZhou in #78
Update benchmark script to easily test llama-3 by @bhavya01 in #83
Unit test coverage cleanup by @JoeZijunZhou in #81
Allow tokenizer to customize stop_tokens by @qihqi in #84
Decode Batch Percentage Metrics/Improved Scraping by @Bslabe123 in #82
Bump requests from 2.31.0 to 2.32.0 in the pip group across 1 directory by @dependabot in #86
Add profiling support and update docs by @JoeZijunZhou in #85
Add ray disaggregated serving support by @FanhaiLu1 in #87
Ensure server warmup before benchmark by @JoeZijunZhou in #91
Add healthcheck support for JetStream by @vivianrwu in #90
Add JetStream E2E test CI by @JoeZijunZhou in #89
Release v0.2.2 by @JoeZijunZhou in #95

New Contributors

@jwyang-google made their first contribution in #76
@Bslabe123 made their first contribution in #71
@vivianrwu made their first contribution in #90

Full Changelog: v0.2.1...v0.2.2

Contributors

Bslabe123, qihqi, and 7 other contributors

Assets 2

03 May 21:23

github-actions

v0.2.1

dabded4

v0.2.1

Key Changes

Support Llama3 tokenizer
JetStream Tokenizer refactor
Disaggregation preparation work

What's Changed

add sample_idx in InputRequest for debugging by @morgandu in #32
Update README.md with user guides by @JoeZijunZhou in #34
Update README.md with PT user guide by @JoeZijunZhou in #35
Reorganize unit tests and update CICD by @JoeZijunZhou in #37
Add badges for JetStream by @JoeZijunZhou in #38
Bump idna from 3.6 to 3.7 by @dependabot in #39
Reformat benchmark metrics by @yeandy in #42
Update server host default value by @JoeZijunZhou in #43
Refactor readme by @FanhaiLu1 in #41
Add missing Documentation by @FanhaiLu1 in #47
Update README.md to fix broken link by @charbull in #50
Add np padded token support by @FanhaiLu1 in #49
Format token utils and test by @FanhaiLu1 in #51
Align Tokenizer in JetStream by @JoeZijunZhou in #40
Do nothing for nd array in copy_to_host_async by @FanhaiLu1 in #52
Add jax_padding support driver and server lib by @FanhaiLu1 in #54
Update maxtext user guide by @JoeZijunZhou in #56
Fix benchmark script type issue by @JoeZijunZhou in #59
Fix requester flag default value by @JoeZijunZhou in #60
Fix float division by zero in benchmark by @FanhaiLu1 in #62
Register IFRT proxy backend when proxy is defined in the jax_platforms by @zhihaoshan-google in #63
Add an abstract class for Tokenizer by @bhavya01 in #53
refactor slice_to_num_chips to adapt to Cloud config by @zhihaoshan-google in #65
Support llama3 tokenizer by @bhavya01 in #67
Prerequisite work for supporting disaggregation: by @zhihaoshan-google in #68
Create init.py in Jetstream/third_party by @bhavya01 in #69
Add tokenize_and_pad function to backward compatible by @FanhaiLu1 in #70
Release v0.2.1 by @JoeZijunZhou in #72
Bump tqdm from 4.66.1 to 4.66.3 in the pip group across 1 directory by @dependabot in #73
Release v0.2.1 with docs update by @JoeZijunZhou in #74

New Contributors

@dependabot made their first contribution in #39
@yeandy made their first contribution in #42
@charbull made their first contribution in #50
@zhihaoshan-google made their first contribution in #63
@bhavya01 made their first contribution in #53

Full Changelog: v0.2.0...v0.2.1

Contributors

charbull, morgandu, and 6 other contributors

Assets 2

05 Apr 20:42

JoeZijunZhou

v0.2.0

ae2ca8c

v0.2.0

Major Changes

Support JetStream MaxText inference on Cloud TPU VM
Support JetStream Pytorch inference on Cloud TPU VM
Support Continuous Batching with interleaved mode in JetStream
Support online serving benchmarking

What's Changed

Add unit tests CI github action by @JoeZijunZhou in #1
Refine thread in orchestrator by @JoeZijunZhou in #2
Optimize maximum threads to saturate decoding capacity by @JoeZijunZhou in #3
Add benchmarks maximum threads config by @JoeZijunZhou in #4
First support necessary for MaxText by @rwitten in #5
Support gracefully stopping orchestrator and server by @JoeZijunZhou in #6
Save request outputs and add eval accuracy support by @FanhaiLu1 in #8
Use parameter based num as inference request max output length by @FanhaiLu1 in #10
Fix output token drop issue by @JoeZijunZhou in #9
Add option to warm up by @qihqi in #11
Replace token_list with generated_text in saved outputs by @FanhaiLu1 in #12
Refine requester util by @JoeZijunZhou in #15
Adds filtering for sharegpt based on conversation starter. by @patemotter in #17
Allows more requests than available data. by @patemotter in #19
Fix starvation with async server and interleaving optimization by @JoeZijunZhou in #13
Add Token util unit test by @FanhaiLu1 in #20
Fix llama2 decode bug in tokenizer by @FanhaiLu1 in #22
Fix whitespace replacement bug by @FanhaiLu1 in #24
Update benchmark to run openorca dataset by @morgandu in #21
Add model ckpt conversion and AQT scripts for JetStream MaxText Serving by @JoeZijunZhou in #23
Refactor to sample before tokenize by @morgandu in #26
Update ckpt conversion scripts by @JoeZijunZhou in #25
move tokenizer model to third party llama2 by @FanhaiLu1 in #27
Support JetStream MaxText user guide by @JoeZijunZhou in #28
Enable pylint linter and pyink formatter by @JoeZijunZhou in #29
Update README by @JoeZijunZhou in #30
Release v0.2.0 by @JoeZijunZhou in #31

New Contributors

@JoeZijunZhou made their first contribution in #1
@rwitten made their first contribution in #5
@FanhaiLu1 made their first contribution in #8
@qihqi made their first contribution in #11
@patemotter made their first contribution in #17
@morgandu made their first contribution in #21

Full Changelog: https://github.com/google/JetStream/commits/v0.2.0

Contributors

rwitten, patemotter, and 4 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Key Changes

What's Changed

New Contributors

Contributors

Key Changes

What's Changed

New Contributors

Contributors

Key Changes

What's Changed

New Contributors

Contributors

Major Changes

What's Changed

New Contributors

Contributors

Releases: AI-Hypercomputer/JetStream

v0.3

Key Changes

What's Changed

New Contributors

Contributors

v0.2.2

Key Changes

What's Changed

New Contributors

Contributors

v0.2.1

Key Changes

What's Changed

New Contributors

Contributors

v0.2.0

Major Changes

What's Changed

New Contributors

Contributors