Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilade/fix mpt pretok #215

Merged
merged 18 commits into from
Jul 8, 2024

Commits on Jun 30, 2024

  1. Configuration menu
    Copy the full SHA
    db2ffd5 View commit details
    Browse the repository at this point in the history

Commits on Jul 7, 2024

  1. server: Retrieve prompt template in /props (#8337)

    * server: Retrieve prompt template in /props
    
    This PR adds the following:
    - Expose the model's Jinja2 prompt template from the model in the /props endpoint.
    - Change log-level from Error to Warning for warning about template mismatch.
    
    The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it.
    
    Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function.
    
    * Make string buffer dynamic
    
    * Add doc and better string handling
    
    * Using chat_template naming convention
    
    * Use intermediate vector for string assignment
    bviksoe authored Jul 7, 2024
    Configuration menu
    Copy the full SHA
    cb4d86c View commit details
    Browse the repository at this point in the history
  2. finetune: Rename an old command name in finetune.sh (#8344)

    This patch replaces an old commad "main" with "llama-cli"
    in finetune.sh.
    The part that I fixed is comment, so it doesn't change
    the script.
    
    Signed-off-by: Masanari Iida <standby24x7@gmail.com>
    standby24x7 authored Jul 7, 2024
    Configuration menu
    Copy the full SHA
    210eb9e View commit details
    Browse the repository at this point in the history
  3. finetune: Rename command name in README.md (#8343)

    Rename an old command name "finetune" to "llama-finetune"
    in README.md
    
    Signed-off-by: Masanari Iida <standby24x7@gmail.com>
    standby24x7 authored Jul 7, 2024
    Configuration menu
    Copy the full SHA
    b81ba1f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    d39130a View commit details
    Browse the repository at this point in the history
  5. llama : fix n_rot default (#8348)

    ggml-ci
    ggerganov authored Jul 7, 2024
    Configuration menu
    Copy the full SHA
    b504008 View commit details
    Browse the repository at this point in the history
  6. llama : support glm3 and glm4 (#8031)

    * add chatglm3-6b model support huggingface model:
     https://hf-mirror.com/THUDM/chatglm3-6b
    
    Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com>
    
    * remove .rotary_pos_emb.inv_freq and unuse code for chatglm3 model
    
    Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com>
    
    * fix lint error
    
    Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com>
    
    * optimize convert-hf-to-gguf.py for chatglm model
    
    Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com>
    
    * support glm-4-9b-chat
    
    Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com>
    
    * fix eos tokens to glm4
    
    * remove unused log
    
    * add preprocess to chatglm3 and chatglm4
    
    * add eos_id_list to llama.cpp
    
    * fix code style
    
    * fix code style
    
    * fix conflicts
    
    * fix conflicts
    
    * Revert "add eos_id_list to llama.cpp"
    
    This reverts commit 3a4d579.
    
    * set <|endoftext|> as eos and <|user|> as eot
    
    * fix chat template bug
    
    * add comment to glm prefix and suffix
    
    * fix conflicts and add rope_ratio & ChatGLMForConditionalGeneration
    
    * fix chat template bug
    
    * fix codestyle
    
    * fix conflicts
    
    * modified the general name of glm model
    
    * fix conflicts
    
    * remove prefix and suffix
    
    * use normal glm4 chattempalte & use LLM_FFN_SWIGLU in phi3
    
    * fix: resolve Flake8 errors in `convert-hf-to-gguf.py`
    
    - Fix E302 by adding two blank lines before top-level function definitions
    - Replace print statements to fix NP100
    - Fix E303 by ensuring only one blank line between lines of code
    
    * fix rope ratio to solve incorrect answers
    
    * fix by comments
    
    ---------
    
    Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com>
    Co-authored-by: XingXing Qiao <qiaoxx@dingdao.com>
    Co-authored-by: Umpire2018 <138990495+Umpire2018@users.noreply.github.com>
    3 people authored Jul 7, 2024
    Configuration menu
    Copy the full SHA
    905942a View commit details
    Browse the repository at this point in the history
  7. gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#…

    …8048)
    
    CLI to hash GGUF files to detect difference on a per model and per tensor level
    
    The hash type we support is:
    
    - `--xxh64`: use xhash 64bit hash mode (default)
    - `--sha1`: use sha1
    - `--uuid`: use uuid
    - `--sha256`: use sha256
    
    While most POSIX systems already have hash checking programs like sha256sum, it
    is designed to check entire files. This is not ideal for our purpose if we want
    to check for consistency of the tensor data even if the metadata content of the
    gguf KV store has been updated.
    
    This program is designed to hash a gguf tensor payload on a 'per tensor layer'
    in addition to a 'entire tensor model' hash. The intent is that the entire
    tensor layer can be checked first but if there is any detected inconsistencies,
    then the per tensor hash can be used to narrow down the specific tensor layer
    that has inconsistencies.
    
    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
    mofosyne and ggerganov authored Jul 7, 2024
    Configuration menu
    Copy the full SHA
    f7cab35 View commit details
    Browse the repository at this point in the history
  8. readme : update bindings list (#8222)

    * adding guile_llama_cpp  to binding list
    
    * fix formatting
    
    * fix formatting
    andy-tai authored Jul 7, 2024
    Configuration menu
    Copy the full SHA
    f1948f1 View commit details
    Browse the repository at this point in the history
  9. ci : add checks for cmake,make and ctest in ci/run.sh (#8200)

    * Added checks for cmake,make and ctest
    
    * Removed erroneous whitespace
    AlexsCode authored Jul 7, 2024
    Configuration menu
    Copy the full SHA
    4090ea5 View commit details
    Browse the repository at this point in the history
  10. Update llama-cli documentation (#8315)

    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    fixed llama-cli/main, templates on some cmds added chat template sections and fixed typos in some areas
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    dspasyuk authored Jul 7, 2024
    Configuration menu
    Copy the full SHA
    a8db2a9 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    ac0f33c View commit details
    Browse the repository at this point in the history
  12. py : type-check all Python scripts with Pyright (#8341)

    * py : type-check all Python scripts with Pyright
    
    * server-tests : use trailing slash in openai base_url
    
    * server-tests : add more type annotations
    
    * server-tests : strip "chat" from base_url in oai_chat_completions
    
    * server-tests : model metadata is a dict
    
    * ci : disable pip cache in type-check workflow
    
    The cache is not shared between branches, and it's 250MB in size,
    so it would become quite a big part of the 10GB cache limit of the repo.
    
    * py : fix new type errors from master branch
    
    * tests : fix test-tokenizer-random.py
    
    Apparently, gcc applies optimisations even when pre-processing,
    which confuses pycparser.
    
    * ci : only show warnings and errors in python type-check
    
    The "information" level otherwise has entries
    from 'examples/pydantic_models_to_grammar.py',
    which could be confusing for someone trying to figure out what failed,
    considering that these messages can safely be ignored
    even though they look like errors.
    compilade authored Jul 7, 2024
    Configuration menu
    Copy the full SHA
    3fd62a6 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    d5d30b2 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    6b961e3 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    56df1fc View commit details
    Browse the repository at this point in the history
  16. convert_hf : identify which user-defined tokens are control tokens

    Only used in _set_vocab_gpt2() for now.
    compilade committed Jul 7, 2024
    Configuration menu
    Copy the full SHA
    6e351e0 View commit details
    Browse the repository at this point in the history

Commits on Jul 8, 2024

  1. convert_hf : identify more added control tokens for SPM tokenziers

    This makes Gemma and Gemma-2 tokenize pretty much EVERYTHING correctly,
    including HTML tags and consecutive spaces,
    but it unfortunately requires model re-conversion.
    
    There seems to be a weird behavior of the HF tokenizer for Gemma,
    which prefers to use the 16-space token over more lengthy space tokens,
    while using the SentencePiece tokenizer does not do this.
    (the implementation in llama.cpp has the same behavior as SentencePiece)
    
    * llama : fix wrong pre-tokenization of byte tokens
    compilade committed Jul 8, 2024
    Configuration menu
    Copy the full SHA
    f9d42c5 View commit details
    Browse the repository at this point in the history