[Fix] Adjust default chunked prefill size and cuda graph max bs according to GPU memory capacity #1927
Job | Run time |
---|---|
5m 42s | |
10m 34s | |
13m 10s | |
15m 6s | |
9m 5s | |
15m 5s | |
12m 48s | |
15m 9s | |
15m 10s | |
3m 16s | |
1s | |
1h 55m 6s |
Job | Run time |
---|---|
5m 42s | |
10m 34s | |
13m 10s | |
15m 6s | |
9m 5s | |
15m 5s | |
12m 48s | |
15m 9s | |
15m 10s | |
3m 16s | |
1s | |
1h 55m 6s |