fix: ExLlama Backend Context Size & Rope Scaling #1311

ok2sh · 2023-11-21T00:35:18Z

Description

This PR fixes ExLlama backend having fixed 2048 size cache. This results in the following error when 2048 context is exceeded: stderr RuntimeError: start (2009) + length (44) exceeds dimension size (2048)

This PR also passes through Rope scaling config to the ExLlama backend.

Notes for Reviewers

These attributes are set by default here, and not loaded from the model config json. Therefore, we need to set it from LocalAI's end. It would also be nice to add warnings for all backends of unimplemented config flags that the user defined in the yaml.

Signed commits

Yes, I signed my commits.

mudler

looking good, thanks

lunamidori5

@mudler why does it keep asking me for a review everytime I add you lol

ok2sh added 2 commits November 20, 2023 16:17

fix: context_size not propagated to exllama backend

ab49d85

fix: exllama rope scaling

e3156b5

lunamidori5 requested review from mudler and lunamidori5 November 21, 2023 01:11

mudler approved these changes Nov 21, 2023

View reviewed changes

lunamidori5 approved these changes Nov 21, 2023

View reviewed changes

mudler marked this pull request as ready for review November 21, 2023 18:26

mudler merged commit 20d637e into mudler:master Nov 21, 2023
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ExLlama Backend Context Size & Rope Scaling #1311

fix: ExLlama Backend Context Size & Rope Scaling #1311

ok2sh commented Nov 21, 2023 •

edited

Loading

mudler left a comment

lunamidori5 left a comment

fix: ExLlama Backend Context Size & Rope Scaling #1311

fix: ExLlama Backend Context Size & Rope Scaling #1311

Conversation

ok2sh commented Nov 21, 2023 • edited Loading

mudler left a comment

Choose a reason for hiding this comment

lunamidori5 left a comment

Choose a reason for hiding this comment

ok2sh commented Nov 21, 2023 •

edited

Loading