Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle empty string case in maybe_trim_space #1055

Merged
merged 2 commits into from
Oct 21, 2024

Conversation

aronson
Copy link
Contributor

@aronson aronson commented Oct 19, 2024

I ran into this issue when testing a locally-generated model based on Qwen.

----------------------------------------
Exception occurred during processing of request from ('[IP REDACTED]', 32822)
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socketserver.py", line 318, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socketserver.py", line 349, in process_request
    self.finish_request(request, client_address)
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socketserver.py", line 362, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/Users/isaac/venv/lib/python3.12/site-packages/mlx_lm/server.py", line 758, in <lambda>
    lambda *args, **kwargs: handler_class(
                            ^^^^^^^^^^^^^^
  File "/Users/isaac/venv/lib/python3.12/site-packages/mlx_lm/server.py", line 200, in __init__
    super().__init__(*args, **kwargs)
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socketserver.py", line 761, in __init__
    self.handle()
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/server.py", line 436, in handle
    self.handle_one_request()
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/server.py", line 424, in handle_one_request
    method()
  File "/Users/isaac/venv/lib/python3.12/site-packages/mlx_lm/server.py", line 313, in do_POST
    method(prompt, stop_id_sequences)
  File "/Users/isaac/venv/lib/python3.12/site-packages/mlx_lm/server.py", line 594, in handle_stream
    detokenizer.add_token(token)
  File "/Users/isaac/venv/lib/python3.12/site-packages/mlx_lm/tokenizer_utils.py", line 210, in add_token
    self.text += self._maybe_trim_space(current_text)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/isaac/venv/lib/python3.12/site-packages/mlx_lm/tokenizer_utils.py", line 196, in _maybe_trim_space
    if current_text[0] != " ":
       ~~~~~~~~~~~~^^^
IndexError: string index out of range
----------------------------------------

This covers a gap in the logic and should help future-proof the function's use.

Copy link

@chan4lk chan4lk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this on my local machine (M3 Pro). And working well

@@ -193,7 +193,9 @@ def reset(self):
self.tokens = []

def _maybe_trim_space(self, current_text):
if current_text[0] != " ":
if len(current_text) < 1 or current_text is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the current_text is ever None then taking the len will raise an exception, so I don't think it makes sense to check for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, makes sense and the if condition improvement you made looks good to me.

Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Will merge as soon as tests clear.

@awni awni merged commit 743763b into ml-explore:main Oct 21, 2024
4 checks passed
@aronson aronson deleted the bugfix/tokenizer_empty_string branch October 21, 2024 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants