Dyn per token varun cleanup #227

varun-sundar-rabindranath · 2024-05-07T17:49:32Z

Description:

Remove inline asm for float to int8 conversion
Refactor reduction utils to add blockReduceMax

varun-sundar-rabindranath · 2024-05-07T17:51:43Z

csrc/reduction_utils.cuh

@@ -20,29 +20,47 @@
 #include "cuda_compat.h"

 namespace vllm {
+


Hey @bnellnm - I made some C++ interface changes to this file - Can you please take a look. Thanks.

looks ok to me.

Description: - Remove inline asm for float to int8 conversion - Refactor reduction utils to add blockReduceMax --------- Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

varun-sundar-rabindranath requested review from dsikka and bnellnm May 7, 2024 17:49

varun-sundar-rabindranath commented May 7, 2024

View reviewed changes

dsikka approved these changes May 7, 2024

View reviewed changes

varun-sundar-rabindranath merged this pull request into dyn-per-token May 13, 2024

varun-sundar-rabindranath deleted the dyn-per-token-varun-cleanup branch May 13, 2024 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dyn per token varun cleanup #227

Dyn per token varun cleanup #227

varun-sundar-rabindranath commented May 7, 2024

varun-sundar-rabindranath May 7, 2024 •

edited

Loading

bnellnm May 7, 2024

Dyn per token varun cleanup #227

Dyn per token varun cleanup #227

Conversation

varun-sundar-rabindranath commented May 7, 2024

varun-sundar-rabindranath May 7, 2024 • edited Loading

Choose a reason for hiding this comment

bnellnm May 7, 2024

Choose a reason for hiding this comment

varun-sundar-rabindranath May 7, 2024 •

edited

Loading