Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
When distributed KV cache is enabled, a chunk that is fully hit in the KV cache skips the subsequent compute step to optimize performance. However, skipping the compute step results in the attn metadata being set to None on the driver side, as the attn metadata is not generated. In tensor parallelism, the missing of attn metadata on the driver side causes failures in building attn metadata on non-driver workers, leading to runtime errors. To address this issue, this PR introduces a signal that allows non-driver workers to skip building attn metadata. Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>
- Loading branch information