Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dry sample #1187

Closed
wants to merge 7 commits into from
Closed

Dry sample #1187

wants to merge 7 commits into from

Conversation

81549361
Copy link

Motivation

Looping is an undesirable behavior where the model repeats phrases verbatim that have previously occurred in the input. It affects most models, and is exacerbated by the use of truncation samplers. Chat formats are particularly susceptible due to their regular structure, which models appear to interpret as an invitation to repeat previous messages in whole or in part. Prompting the model to avoid looping has little or no effect.

The traditional weapon to combat looping are the three flavors of repetition penalty that are built into most loaders (multiplicative, additive, and frequency penalty). But those samplers are rather blunt instruments that distort the grammar of standard language, which the model has been painstakingly trained to reproduce. I have previously attempted to fix this problem ggml-org/llama.cpp#5561 that protects the basic structure of language from being penalized, but that's a hacky solution that fails to do the right thing in many cases, and even in their raw form, classical repetition penalties don't actually prevent looping reliably.

In the past weeks, I have rethought the looping problem from the ground up, and in this PR present the DRY repetition penalty, a mechanism that is able to detect textual looping and steer against it. It is far superior to the existing samplers at preventing verbatim repetition, while having essentially none of their negative effects on language structure. The result is less repetitive and higher quality output.

I have tested this sampler for about 20 hours in chat scenarios so far, and they have without question been the highest-quality chats I have ever experienced. Looping in the traditional sense simply does not happen with DRY, and the positive effects from being able to drop the standard repetition penalty are very noticeable.

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@merrymercy
Copy link
Contributor

@81549361 Hi, thanks for the contribution. However, "Dry sample" is not currently a mainstream method. To keep our current code minimal, we may be unable to accept this PR.

One thing I can think of is to expose a custom logit processor interface, then other users can easily register their external custom logit processor without changing the sglang code.

@merrymercy merrymercy closed this Aug 29, 2024
@zhyncs zhyncs mentioned this pull request Sep 8, 2024
2 tasks
@supa-thibaud
Copy link

@merrymercy yes it could be helpful, at least, to have a way to resigester our own logit processor.

@p-e-w
Copy link

p-e-w commented Sep 20, 2024

@merrymercy

However, "Dry sample" is not currently a mainstream method.

Not sure what your definition of mainstream is, but DRY is already supported by:

  1. text-generation-webui
  2. llama.cpp (PR open; maintainer has signaled intention to merge)
  3. mistral.rs
  4. ExLlamaV2
  5. KoboldCpp
  6. SillyTavern

... and probably other projects that I'm not aware of. DRY is also commonly recommended by model authors on their HF model cards.

See the original DRY pull request for more information.

@merrymercy
Copy link
Contributor

Hi @p-e-w, @81549361, @supa-thibaud

Custom logit processor is supported by #2396. We can implement try sample with the custom logit processor.

We can still accept new sampling methods. We can merge some built-in custom logit processor into the sglang source code. Contributions are welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants