Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix detokenizer space match for quote #1072

Merged
merged 3 commits into from
Oct 27, 2024
Merged

Fix detokenizer space match for quote #1072

merged 3 commits into from
Oct 27, 2024

Conversation

awni
Copy link
Member

@awni awni commented Oct 27, 2024

I introduces a bug when trying to port the clean_up_tokenization_spaces from HF.

Their match is actually to find and replace instances of ' with '. That one specifically requires a special case to post-process which is tedious, so I just removed that case as I have never seen it come up. But if it does we can revisit.

For reference, the transformers matching conditions.

Closes #1073

Comment on lines 5 to 7
import os
os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get's rid of the warning:

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.

Copy link
Contributor

@hschaeufler hschaeufler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blanks are now there: import ‘package:flutter/material.dart’; Thank you for fixing

from ._version import __version__

os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
Copy link
Contributor

@hschaeufler hschaeufler Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment in the code would be nice, why you set the flag.

Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@awni awni merged commit 8fe9539 into main Oct 27, 2024
2 checks passed
@awni awni deleted the fix_space_match branch October 27, 2024 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] missing spaces in response with mlx-lm 0.19.1 and 0.19.2
3 participants