Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Dyn per token varun cleanup #227

Merged
merged 0 commits into from
May 13, 2024

Conversation

varun-sundar-rabindranath

Description:

  • Remove inline asm for float to int8 conversion
  • Refactor reduction utils to add blockReduceMax

@@ -20,29 +20,47 @@
#include "cuda_compat.h"

namespace vllm {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bnellnm - I made some C++ interface changes to this file - Can you please take a look. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok to me.

@varun-sundar-rabindranath varun-sundar-rabindranath merged this pull request into dyn-per-token May 13, 2024
@varun-sundar-rabindranath varun-sundar-rabindranath deleted the dyn-per-token-varun-cleanup branch May 13, 2024 14:11
dsikka pushed a commit that referenced this pull request May 14, 2024
Description:
 - Remove inline asm for float to int8 conversion
 - Refactor reduction utils to add blockReduceMax

---------

Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
dsikka pushed a commit that referenced this pull request May 24, 2024
Description:
 - Remove inline asm for float to int8 conversion
 - Refactor reduction utils to add blockReduceMax

---------

Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants