[Fix] Adjust default chunked prefill size and cuda graph max bs according to GPU memory capacity #1927
Annotations
2 errors
|
Benchmark offline throughput (w/o RadixAttention) (TP=2)
The operation was canceled.
|
Loading