Skip to content

Commit

Permalink
perf: change minimal kv_chunk_size back to 128 (#329)
Browse files Browse the repository at this point in the history
  • Loading branch information
yzh119 authored Jun 22, 2024
1 parent 1df7b03 commit f237f5f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion include/flashinfer/attention/handler.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -566,7 +566,7 @@ cudaError_t PrefillSplitQOKVIndptr(bool& split_kv, uint32_t& split_max_batch_siz
// step 2: determine kv_chunk_size
std::tie(split_kv, kv_chunk_size, new_batch_size) =
PrefillBinarySearchKVChunkSize(max_grid_size, num_kv_heads, packed_qo_len_arr, kv_len_arr,
qo_chunk_size, /*min_kv_chunk_size=*/(512 / page_size));
qo_chunk_size, /*min_kv_chunk_size=*/(128 / page_size));

// step 3: split qo_indptr and kv_indptr
total_num_tiles_q = 0;
Expand Down

0 comments on commit f237f5f

Please sign in to comment.