Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ExLlama Backend Context Size & Rope Scaling #1311

Merged
merged 2 commits into from
Nov 21, 2023

Conversation

ok2sh
Copy link
Contributor

@ok2sh ok2sh commented Nov 21, 2023

Description

This PR fixes ExLlama backend having fixed 2048 size cache. This results in the following error when 2048 context is exceeded: stderr RuntimeError: start (2009) + length (44) exceeds dimension size (2048)

This PR also passes through Rope scaling config to the ExLlama backend.

Notes for Reviewers

These attributes are set by default here, and not loaded from the model config json. Therefore, we need to set it from LocalAI's end. It would also be nice to add warnings for all backends of unimplemented config flags that the user defined in the yaml.

Signed commits

  • Yes, I signed my commits.

Copy link
Owner

@mudler mudler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, thanks

Copy link
Collaborator

@lunamidori5 lunamidori5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mudler why does it keep asking me for a review everytime I add you lol

@mudler mudler marked this pull request as ready for review November 21, 2023 18:26
@mudler mudler merged commit 20d637e into mudler:master Nov 21, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants