-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compilade/fix mpt pretok #215
Compilade/fix mpt pretok #215
Commits on Jun 30, 2024
-
Configuration menu - View commit details
-
Copy full SHA for db2ffd5 - Browse repository at this point
Copy the full SHA db2ffd5View commit details
Commits on Jul 7, 2024
-
server: Retrieve prompt template in /props (#8337)
* server: Retrieve prompt template in /props This PR adds the following: - Expose the model's Jinja2 prompt template from the model in the /props endpoint. - Change log-level from Error to Warning for warning about template mismatch. The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it. Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function. * Make string buffer dynamic * Add doc and better string handling * Using chat_template naming convention * Use intermediate vector for string assignment
Configuration menu - View commit details
-
Copy full SHA for cb4d86c - Browse repository at this point
Copy the full SHA cb4d86cView commit details -
finetune: Rename an old command name in finetune.sh (#8344)
This patch replaces an old commad "main" with "llama-cli" in finetune.sh. The part that I fixed is comment, so it doesn't change the script. Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 210eb9e - Browse repository at this point
Copy the full SHA 210eb9eView commit details -
finetune: Rename command name in README.md (#8343)
Rename an old command name "finetune" to "llama-finetune" in README.md Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for b81ba1f - Browse repository at this point
Copy the full SHA b81ba1fView commit details -
Configuration menu - View commit details
-
Copy full SHA for d39130a - Browse repository at this point
Copy the full SHA d39130aView commit details -
Configuration menu - View commit details
-
Copy full SHA for b504008 - Browse repository at this point
Copy the full SHA b504008View commit details -
llama : support glm3 and glm4 (#8031)
* add chatglm3-6b model support huggingface model: https://hf-mirror.com/THUDM/chatglm3-6b Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * remove .rotary_pos_emb.inv_freq and unuse code for chatglm3 model Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * fix lint error Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * optimize convert-hf-to-gguf.py for chatglm model Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * support glm-4-9b-chat Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * fix eos tokens to glm4 * remove unused log * add preprocess to chatglm3 and chatglm4 * add eos_id_list to llama.cpp * fix code style * fix code style * fix conflicts * fix conflicts * Revert "add eos_id_list to llama.cpp" This reverts commit 3a4d579. * set <|endoftext|> as eos and <|user|> as eot * fix chat template bug * add comment to glm prefix and suffix * fix conflicts and add rope_ratio & ChatGLMForConditionalGeneration * fix chat template bug * fix codestyle * fix conflicts * modified the general name of glm model * fix conflicts * remove prefix and suffix * use normal glm4 chattempalte & use LLM_FFN_SWIGLU in phi3 * fix: resolve Flake8 errors in `convert-hf-to-gguf.py` - Fix E302 by adding two blank lines before top-level function definitions - Replace print statements to fix NP100 - Fix E303 by ensuring only one blank line between lines of code * fix rope ratio to solve incorrect answers * fix by comments --------- Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> Co-authored-by: XingXing Qiao <qiaoxx@dingdao.com> Co-authored-by: Umpire2018 <138990495+Umpire2018@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 905942a - Browse repository at this point
Copy the full SHA 905942aView commit details -
gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#…
…8048) CLI to hash GGUF files to detect difference on a per model and per tensor level The hash type we support is: - `--xxh64`: use xhash 64bit hash mode (default) - `--sha1`: use sha1 - `--uuid`: use uuid - `--sha256`: use sha256 While most POSIX systems already have hash checking programs like sha256sum, it is designed to check entire files. This is not ideal for our purpose if we want to check for consistency of the tensor data even if the metadata content of the gguf KV store has been updated. This program is designed to hash a gguf tensor payload on a 'per tensor layer' in addition to a 'entire tensor model' hash. The intent is that the entire tensor layer can be checked first but if there is any detected inconsistencies, then the per tensor hash can be used to narrow down the specific tensor layer that has inconsistencies. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for f7cab35 - Browse repository at this point
Copy the full SHA f7cab35View commit details -
readme : update bindings list (#8222)
* adding guile_llama_cpp to binding list * fix formatting * fix formatting
Configuration menu - View commit details
-
Copy full SHA for f1948f1 - Browse repository at this point
Copy the full SHA f1948f1View commit details -
ci : add checks for cmake,make and ctest in ci/run.sh (#8200)
* Added checks for cmake,make and ctest * Removed erroneous whitespace
Configuration menu - View commit details
-
Copy full SHA for 4090ea5 - Browse repository at this point
Copy the full SHA 4090ea5View commit details -
Update llama-cli documentation (#8315)
* Update README.md * Update README.md * Update README.md fixed llama-cli/main, templates on some cmds added chat template sections and fixed typos in some areas * Update README.md * Update README.md * Update README.md
Configuration menu - View commit details
-
Copy full SHA for a8db2a9 - Browse repository at this point
Copy the full SHA a8db2a9View commit details -
Configuration menu - View commit details
-
Copy full SHA for ac0f33c - Browse repository at this point
Copy the full SHA ac0f33cView commit details -
py : type-check all Python scripts with Pyright (#8341)
* py : type-check all Python scripts with Pyright * server-tests : use trailing slash in openai base_url * server-tests : add more type annotations * server-tests : strip "chat" from base_url in oai_chat_completions * server-tests : model metadata is a dict * ci : disable pip cache in type-check workflow The cache is not shared between branches, and it's 250MB in size, so it would become quite a big part of the 10GB cache limit of the repo. * py : fix new type errors from master branch * tests : fix test-tokenizer-random.py Apparently, gcc applies optimisations even when pre-processing, which confuses pycparser. * ci : only show warnings and errors in python type-check The "information" level otherwise has entries from 'examples/pydantic_models_to_grammar.py', which could be confusing for someone trying to figure out what failed, considering that these messages can safely be ignored even though they look like errors.
Configuration menu - View commit details
-
Copy full SHA for 3fd62a6 - Browse repository at this point
Copy the full SHA 3fd62a6View commit details -
Configuration menu - View commit details
-
Copy full SHA for d5d30b2 - Browse repository at this point
Copy the full SHA d5d30b2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6b961e3 - Browse repository at this point
Copy the full SHA 6b961e3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 56df1fc - Browse repository at this point
Copy the full SHA 56df1fcView commit details -
convert_hf : identify which user-defined tokens are control tokens
Only used in _set_vocab_gpt2() for now.
Configuration menu - View commit details
-
Copy full SHA for 6e351e0 - Browse repository at this point
Copy the full SHA 6e351e0View commit details
Commits on Jul 8, 2024
-
convert_hf : identify more added control tokens for SPM tokenziers
This makes Gemma and Gemma-2 tokenize pretty much EVERYTHING correctly, including HTML tags and consecutive spaces, but it unfortunately requires model re-conversion. There seems to be a weird behavior of the HF tokenizer for Gemma, which prefers to use the 16-space token over more lengthy space tokens, while using the SentencePiece tokenizer does not do this. (the implementation in llama.cpp has the same behavior as SentencePiece) * llama : fix wrong pre-tokenization of byte tokens
Configuration menu - View commit details
-
Copy full SHA for f9d42c5 - Browse repository at this point
Copy the full SHA f9d42c5View commit details