Skip to content

Commit

Permalink
[Neuron] Adding support for context-lenght, token-gen buckets for lat…
Browse files Browse the repository at this point in the history
…ency optimization with neuron device.
  • Loading branch information
Harsha Bikki committed Aug 27, 2024
1 parent 2b90eb9 commit d65bf7d
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion examples/offline_inference_neuron.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
import os

from vllm import LLM, SamplingParams

# Builds the cache for the neuron compiled model.
os.environ['NEURONX_DUMP_TO'] = "./Cache"

# creates XLA hlo graphs for all the context length buckets.
os.environ['NEURON_CONTEXT_LENGTH_BUCKETS'] = "512,1024,2048"
# creates XLA hlo graphs for all the token gen buckets.
os.environ['NEURON_TOKEN_GEN_BUCKETS'] = "512,1024,2048"

# Sample prompts.
prompts = [
"Hello, my name is",
Expand Down

0 comments on commit d65bf7d

Please sign in to comment.