Modify benchmarks #563

dacorvo · 2024-04-11T11:58:58Z

What does this PR do?

Modify LLM benchmarks section:

now evaluates decode throughput only,
updated numbers for LLama 7b and Llama 13b using AWS Neuron SDK 2.18,
added Mistralv2

HuggingFaceDocBuilderDev · 2024-04-11T12:02:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

michaelbenayoun · 2024-04-12T09:21:39Z

benchmark/text-generation/benchmark.py

+    def get_input_ids(tokens, batch_size, input_length):
+        return tokens.input_ids[:, :input_length].repeat((batch_size, 1))


nit: if tokens.input_ids initially has a batch size > 1 the output will have a batch size of batch_size x original_batch_size, maybe just select the first batch before the repeat.

michaelbenayoun · 2024-04-12T09:23:19Z

benchmark/text-generation/mistralv2.py

+
+for model_name, model_configuration in model_configurations.items():
+    model_id, batch_size, seq_length = model_configuration
+    model = NeuronModelForCausalLM.from_pretrained(
+        model_id,
+        export=True,
+        batch_size=batch_size,
+        sequence_length=seq_length,
+        auto_cast_type="bf16",
+        num_cores=NUM_CORES,
+    )
+    with TemporaryDirectory() as tmpdir:
+        model.save_pretrained(tmpdir)
+        tokenizer = AutoTokenizer.from_pretrained(model_id)
+        tokenizer.save_pretrained(tmpdir)
+        json_path = f"{model_name}.json"
+        run(tmpdir, 256, 2048, json_path=json_path)


Maybe add if __name__ == "__main__" to not run this code at any import.

michaelbenayoun · 2024-04-12T09:24:13Z

benchmark/text-generation/benchmark.py

+    with open("./wiki.txt") as f:
+        prompt = f.read()


You read a 12k lines document and you tokenize everything at once?

yes, tokenizer is a brute

dacorvo added 3 commits April 11, 2024 11:57

feat(benchmark): evaluate throughput on output tokens only

253aa5b

feat(benchmark): add mistralv2

055985f

docs(benchmark): update Llama-7b results

2e0bf44

dacorvo marked this pull request as ready for review April 11, 2024 11:59

dacorvo requested review from philschmid, michaelbenayoun and JingyaHuang April 11, 2024 11:59

dacorvo added 2 commits April 11, 2024 12:04

docs(benchmark): add mistral v0.2 benchmark

91d527c

docs(benchmark): update llama2-13b benchmark

7875e94

dacorvo force-pushed the modify-benchmarks branch from 280f41c to 7875e94 Compare April 11, 2024 12:09

michaelbenayoun approved these changes Apr 12, 2024

View reviewed changes

dacorvo added 2 commits April 12, 2024 09:37

fix: wrap bench in main()

2c1ab08

fix(bench): be careful with initial batch size

4806409

dacorvo merged commit c8f15f9 into main Apr 12, 2024
1 check passed

dacorvo deleted the modify-benchmarks branch April 12, 2024 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify benchmarks #563

Modify benchmarks #563

dacorvo commented Apr 11, 2024

HuggingFaceDocBuilderDev commented Apr 11, 2024

michaelbenayoun Apr 12, 2024

michaelbenayoun Apr 12, 2024

michaelbenayoun Apr 12, 2024

dacorvo Apr 12, 2024

		def get_input_ids(tokens, batch_size, input_length):
		return tokens.input_ids[:, :input_length].repeat((batch_size, 1))

Modify benchmarks #563

Modify benchmarks #563

Conversation

dacorvo commented Apr 11, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Apr 11, 2024

michaelbenayoun Apr 12, 2024

Choose a reason for hiding this comment

michaelbenayoun Apr 12, 2024

Choose a reason for hiding this comment

michaelbenayoun Apr 12, 2024

Choose a reason for hiding this comment

dacorvo Apr 12, 2024

Choose a reason for hiding this comment