Skip to content

Commit

Permalink
Merge pull request #2 from BBuf/fix_default_chunked_prefill_size_4090
Browse files Browse the repository at this point in the history
fix chunked prefill size defualt value in GTX 4090
  • Loading branch information
BBuf authored Nov 28, 2024
2 parents 2946943 + 12f55fd commit 4e419ac
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions python/sglang/srt/server_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,9 +169,11 @@ def __post_init__(self):
gpu_mem = get_amdgpu_memory_capacity()
else:
gpu_mem = get_nvgpu_memory_capacity()

# If the GPU memory is less than 25GB (like GTX 4090) and the user hasn't manually specified the chunked prefill size, we reduce its default value by a factor of 4.
if gpu_mem < 25000:
self.chunked_prefill_size //= 4 # make it 2048
self.cuda_graph_max_bs = 4
if self.chunked_prefill_size == 8192:
self.chunked_prefill_size //= 4 # make it 2048
logger.info("Automatically adjust --chunked-prefill-size for small GPUs.")

# Choose kernel backends
Expand Down

0 comments on commit 4e419ac

Please sign in to comment.