Update benchmark to run openorca dataset #21

morgandu · 2024-03-29T02:15:13Z

No description provided.

JoeZijunZhou · 2024-03-29T20:52:28Z

benchmarks/benchmark_serving.py

@@ -42,7 +42,7 @@
    python -m benchmarks.benchmark_serving \
        --request-rate 1

-e2e example: python3 benchmark_serving.py --tokenizer /home/rwitten/maxtext/assets/tokenizer --num-prompts 100  --dataset ~/ShareGPT_V3_unfiltered_cleaned_split.json 
+e2e example: python3 benchmark_serving.py --tokenizer /home/rwitten/maxtext/assets/tokenizer --num-prompts 100  --dataset ~/ShareGPT_V3_unfiltered_cleaned_split.json


We can also update the example here with your change. And also the README in /benchmark.

JoeZijunZhou

We can merge this change first, since I need to release a new JetStream py package. We can do refactor later since I see the current sample filter logic is identical for both dataset.

JoeZijunZhou · 2024-03-30T01:24:39Z

benchmarks/benchmark_serving.py

+  # Tokenize the prompts and completions.
+  prompts = dataset["prompts"]
+  outputs = dataset["results"]
+  n = len(prompts)
+  prompt_token_ids = tokenizer.tokenize(prompts)
+  output_token_ids = tokenizer.tokenize(outputs)


I feel we could extract this part out as a func for different dataset, and the rest are identical and thus we could keep them in the sample_request function?

Discussed offline, there are some existing data processing that may not be necessary. Will revisit and refactor the data preprocessing part if needed.

morgandu requested a review from vipannalla as a code owner March 29, 2024 02:15

morgandu requested review from JoeZijunZhou, patemotter, FanhaiLu1 and vipannalla and removed request for vipannalla March 29, 2024 02:15

morgandu force-pushed the mor--golden-openorca-dataset branch 2 times, most recently from 5f5540f to 54194d3 Compare March 29, 2024 02:26

FanhaiLu1 approved these changes Mar 29, 2024

View reviewed changes

JoeZijunZhou approved these changes Mar 29, 2024

View reviewed changes

morgandu force-pushed the mor--golden-openorca-dataset branch from 54194d3 to 7477e24 Compare March 29, 2024 22:46

add openorca dataset

71c111c

morgandu force-pushed the mor--golden-openorca-dataset branch from 7477e24 to 71c111c Compare March 29, 2024 22:51

update readme

4f41058

morgandu force-pushed the mor--golden-openorca-dataset branch from e9e3ac7 to 4f41058 Compare March 30, 2024 01:13

fix input_requests for test case

bb06af5

JoeZijunZhou reviewed Mar 30, 2024

View reviewed changes

morgandu merged commit 81beb11 into main Mar 30, 2024
3 checks passed

morgandu deleted the mor--golden-openorca-dataset branch March 30, 2024 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update benchmark to run openorca dataset #21

Update benchmark to run openorca dataset #21

morgandu commented Mar 29, 2024

JoeZijunZhou Mar 29, 2024 •

edited

Loading

JoeZijunZhou left a comment •

edited

Loading

JoeZijunZhou Mar 30, 2024

morgandu Mar 30, 2024

Update benchmark to run openorca dataset #21

Update benchmark to run openorca dataset #21

Conversation

morgandu commented Mar 29, 2024

JoeZijunZhou Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

JoeZijunZhou left a comment • edited Loading

Choose a reason for hiding this comment

JoeZijunZhou Mar 30, 2024

Choose a reason for hiding this comment

morgandu Mar 30, 2024

Choose a reason for hiding this comment

JoeZijunZhou Mar 29, 2024 •

edited

Loading

JoeZijunZhou left a comment •

edited

Loading