Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detokenizer #209

Merged
merged 36 commits into from
Jul 4, 2024
Merged

Detokenizer #209

merged 36 commits into from
Jul 4, 2024

Conversation

Nexesenex
Copy link
Owner

jaime-m-p and others added 30 commits June 20, 2024 17:51
Add detokenizer checks
New generator: ascii_lr_strip
New generator: apostrophe
Add more vocabs files
Useful when automating tests:
 - If you don't know in advance the vocab type.
 - Differenciate other loading errors.
Using exit() is throwing random exceptions
UNKNOWN and CONTROL are 'special pieces'.
Remove space after UNKNOWN and CONTROL.
Refactor llama_token_to_piece().
Detokenize special tokens.
Replace errors with '\uFFFD' when detokenizing to 'utf-8'.
More edge cases.
Better detokenization results check.
* ppl : fix n_seq_max for perplexity

* use 1 seq for kl_divergence
* llama : suppress unref var in Windows MSVC

This commit suppresses two warnings that are currently generated for
src/llama.cpp when building on Windows MSVC

```console
C:\llama.cpp\src\llama.cpp(14349,45): warning C4101: 'ex':
unreferenced local variable [C:\llama.cpp\build\src\llama.vcxproj]
C:\llama.cpp\src\llama.cpp(19285,44): warning C4101: 'e':
unreferenced local variable [C:\llama.cpp\build\src\llama.vcxproj]
```

* Update src/llama.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit adds the compile definition `_CRT_SECURE_NO_WARNINGS`
to the root cmake subproject.

The motivation for this is that currently the following warnings are
displayed when compiling the tests and common cmake subprojects:
```console
test-llama-grammar.cpp
C:\llama.cpp\src\.\llama.cpp(1406,77): warning C4996: 'strerror':
This function or variable may be unsafe. Consider using strerror_s
instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See
online help for details.
[C:\llama.cpp\build\tests\test-llama-grammar.vcxproj]
...
```

This compile definition is currently set for the `src` subproject
and this change moves into the root cmake project so that it is applied
to all cmake subprojects.
@Nexesenex Nexesenex merged commit 56c66da into Nexesenex:marstream Jul 4, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants