Allow prompt callback to `generate_step` #1133

awni · 2024-12-02T20:18:14Z

Add prompt callback to generate step
Use the callback in cache_prompt (simplifies some code)
Refactor generate_step to accept max_tokens.

CC @neilmehta24, the prompt callback takes two arguments (the prompt tokens processed and the total prompt tokens) as it's more flexible. Let me know if that works for you.

neilmehta24 · 2024-12-02T21:03:12Z

That will work great. Thanks for this feature!

angeloskath

Very nice!

I would change the n < max_tokens to n != max_tokens and document the negative max_tokens and I think it is golden :-)

llms/mlx_lm/utils.py

angeloskath · 2024-12-02T22:02:21Z

llms/mlx_lm/cache_prompt.py

-        maybe_quantize_kv_cache(
-            cache, args.quantized_kv_start, args.kv_group_size, args.kv_bits
-        )
+    for _ in generate_step(


This does an unnecessary sampling but I like very much that it removes so much duplicated logic.

Right, it does the async eval on the first token / log probs. It should be negligible compared to the rest of the computation..

It's nice it simplifies the logic a bit.. mostly I changed this to have an example use of the callback and verify that it was working correctly.

Another option I considered is to split out a prefill_prompt API which generate_step can use and which cache_prompt.py can use. Maybe it's better, but so far we don't need to use it anywhere except here. If we end up needing to do the prompt computation in more places I will play around with that.

awni · 2024-12-03T14:28:59Z

Also closes #1134

* allow prompt callback and use in cache_prompt * nit * comments * bump version

awni added 2 commits December 2, 2024 12:16

allow prompt callback and use in cache_prompt

1036798

nit

21a05d1

awni requested review from angeloskath and barronalex December 2, 2024 20:20

angeloskath approved these changes Dec 2, 2024

View reviewed changes

comments

3c63be8

awni mentioned this pull request Dec 3, 2024

min_tokens_to_keep has to be a positive integer, but it's type is float in mlx_lm/generate.py #1134

Closed

bump version

2c8d49d

awni merged commit 1963df8 into main Dec 4, 2024
2 checks passed

awni deleted the prompt_callback branch December 4, 2024 00:17

mokeddembillel pushed a commit to mokeddembillel/mlx-examples that referenced this pull request Dec 16, 2024

Allow prompt callback to generate_step (ml-explore#1133)

e08c470

* allow prompt callback and use in cache_prompt * nit * comments * bump version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow prompt callback to `generate_step` #1133

Allow prompt callback to `generate_step` #1133

awni commented Dec 2, 2024 •

edited

Loading

neilmehta24 commented Dec 2, 2024

angeloskath left a comment

angeloskath Dec 2, 2024

awni Dec 3, 2024 •

edited

Loading

awni commented Dec 3, 2024 •

edited

Loading

Allow prompt callback to generate_step #1133

Allow prompt callback to generate_step #1133

Conversation

awni commented Dec 2, 2024 • edited Loading

neilmehta24 commented Dec 2, 2024

angeloskath left a comment

Choose a reason for hiding this comment

angeloskath Dec 2, 2024

Choose a reason for hiding this comment

awni Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

awni commented Dec 3, 2024 • edited Loading

Allow prompt callback to `generate_step` #1133

Allow prompt callback to `generate_step` #1133

awni commented Dec 2, 2024 •

edited

Loading

awni Dec 3, 2024 •

edited

Loading

awni commented Dec 3, 2024 •

edited

Loading