Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fix for Deberta tokenizer when input sequence exceeds 512 tokens #117595

Merged
merged 4 commits into from
Nov 26, 2024

Conversation

maxhniebergall
Copy link
Member

We were missing the "balanced" case for the NLP tokenizer which caused exceptions with large inputs. In addition to the fix, I've also added a test which I confirmed fails without the fix, with the same error message as reported.

@maxhniebergall maxhniebergall added >bug :ml Machine learning auto-backport Automatically create backport pull requests when merged v9.0.0 v8.17.0 v8.16.2 labels Nov 26, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Nov 26, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @maxhniebergall, I've created a changelog YAML for you.

@maxhniebergall maxhniebergall enabled auto-merge (squash) November 26, 2024 22:11
@maxhniebergall maxhniebergall merged commit 433a00c into main Nov 26, 2024
17 checks passed
@maxhniebergall maxhniebergall deleted the debertaTokenizerTruncationFix branch November 26, 2024 23:00
maxhniebergall added a commit to maxhniebergall/elasticsearch that referenced this pull request Nov 26, 2024
…elastic#117595)

* Add test and fix

* Update docs/changelog/117595.yaml

* Remove test which wasn't working
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.17
8.16

maxhniebergall added a commit to maxhniebergall/elasticsearch that referenced this pull request Nov 26, 2024
…elastic#117595)

* Add test and fix

* Update docs/changelog/117595.yaml

* Remove test which wasn't working
elasticsearchmachine pushed a commit that referenced this pull request Nov 27, 2024
…#117595) (#117601)

* Add test and fix

* Update docs/changelog/117595.yaml

* Remove test which wasn't working
elasticsearchmachine pushed a commit that referenced this pull request Nov 27, 2024
…#117595) (#117600)

* Add test and fix

* Update docs/changelog/117595.yaml

* Remove test which wasn't working

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Nov 27, 2024
…elastic#117595)

* Add test and fix

* Update docs/changelog/117595.yaml

* Remove test which wasn't working
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
…elastic#117595)

* Add test and fix

* Update docs/changelog/117595.yaml

* Remove test which wasn't working
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :ml Machine learning Team:ML Meta label for the ML team v8.16.2 v8.17.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants