Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge from llama.cpp #33

Merged
merged 29 commits into from
Aug 12, 2024
Merged

merge from llama.cpp #33

merged 29 commits into from
Aug 12, 2024

Commits on Aug 8, 2024

  1. make : clean llamafile objects (ggerganov#8923)

    `ggml/src/llamafile/sgemm.o` was not deleted on `make clean`
    DrDub authored Aug 8, 2024
    Configuration menu
    Copy the full SHA
    ebd541a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    85fca8d View commit details
    Browse the repository at this point in the history
  3. metal : fix struct name (ggml/912)

    ggml-ci
    ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    5b33ea1 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f93d49a View commit details
    Browse the repository at this point in the history
  5. sync : ggml

    ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    e44a561 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    366d486 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    afd27f0 View commit details
    Browse the repository at this point in the history
  8. gguf-py : simplify support for quant types (ggerganov#8838)

    * gguf-py : use classes for quants
    
    * convert_hf : simplify internal quantization type selection
    
    * gguf-py : fix flake8 lint
    
    * gguf-py : fix BF16 numpy view type
    
    * gguf-py : remove LlamaFileTypeMap
    
    Too specific to 'llama.cpp', and would be a maintenance burden
    to keep up to date.
    
    * gguf-py : add generic quantize and dequantize functions
    
    The quant classes no longer need to be known,
    only the target or the source type,
    for 'quantize' and 'dequantize', respectively.
    compilade authored Aug 8, 2024
    Configuration menu
    Copy the full SHA
    3a14e00 View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2024

  1. llama : reduce useless copies when saving session (ggerganov#8916)

    * llama : avoid useless copies in dummy session writer
    
    * llama : avoid double tensor copy when saving session to buffer
    compilade authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    345a686 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    daef3ab View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6f6496b View commit details
    Browse the repository at this point in the history
  4. embedding : add --pooling option to README.md [no ci] (ggerganov#8934)

    This commit adds the `--pooling` option to the README.md file in the
    `examples/embedding` directory.
    
    The motivation for adding this options is that currently if the model
    used does not specify a pooling type the embedding example will fail
    with the following error message:
    ```console
    main: error: pooling type NONE not supported
    ```
    
    This commit also updates the name of the executable in the examples
    section.
    danbev authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    5b2c04f View commit details
    Browse the repository at this point in the history
  5. whisper : use vulkan as gpu backend when available (whisper/2302)

    * ggml: use vulkan as gpu backend when available
    
    Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
    
    * whisper: enable using vk as default buffer type
    
    Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
    
    ---------
    
    Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
    mstephenson6 authored and ggerganov committed Aug 9, 2024
    Configuration menu
    Copy the full SHA
    70c0ea3 View commit details
    Browse the repository at this point in the history
  6. sync : ggml

    ggerganov committed Aug 9, 2024
    Configuration menu
    Copy the full SHA
    4305b57 View commit details
    Browse the repository at this point in the history
  7. llava : support MiniCPM-V-2.5 (ggerganov#7599)

    * init
    
    * rename
    
    * add run android for termux in readme
    
    * add android readme
    
    * add instructions in readme
    
    * change name in readme
    
    * Update README.md
    
    * fixed line
    
    * add result in readme
    
    * random pos_embed
    
    * add positions index
    
    * change for ollama
    
    * change for ollama
    
    * better pos_embed in clip
    
    * support ollama
    
    * updata cmakelist
    
    * updata cmakelist
    
    * rename wrapper
    
    * clear code
    
    * replace and organize code
    
    * add link
    
    * sync master
    
    * fix warnings
    
    * fix warnings
    
    * fix bug in bicubic resize when need resize iamge smaller
    
    * receive review comments and modify
    
    * receive review comments and modify
    
    * put all code into llava dir
    
    * fix quality problem in pr code
    
    * change n_layer
    
    * add space in "-1"
    
    * imitate reshape bug of python code
    
    * fix bug in clip
    
    * fix issues for merging
    
    * fix llama-minicpmv-cli in cmake file
    
    * change pr readme
    
    * fix code review
    
    * remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir
    
    * fix cmakefile
    
    * add warn
    
    * fix KEY_HAS_MINICPMV_PROJ
    
    * remove load_image_size into clip_ctx
    
    * remove the extern "C", MINICPMV_API
    
    * fix uhd code for review comment
    
    * delete minicpmv-wrapper in pr
    
    * remove uhd_image_embed
    
    * Modify 2 notes
    
    * clip : style changes
    
    * del common.h in clip
    
    * fix Type-Check error
    
    * fix Type-Check error
    
    * fix Type-Check error
    
    * fix Type-Check error
    
    * fix makefile error
    
    * fix ubuntu-make error
    
    * try fix clip
    
    * try fix 1
    
    ---------
    
    Co-authored-by: Hongji Zhu <fireyoucan@gmail.com>
    Co-authored-by: harvestingmoon <leewenyeong@gmail.com>
    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
    4 people authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    3071c0a View commit details
    Browse the repository at this point in the history
  8. llama : better replace_all (cont) (ggerganov#8926)

    * llama : better replace_all (cont)
    
    ggml-ci
    
    * code : deduplicate replace_all
    
    ggml-ci
    ggerganov authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    45a55b9 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    272e3bd View commit details
    Browse the repository at this point in the history
  10. llama : add support for lora adapters in T5 model (ggerganov#8938)

    Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
    fairydreaming and sszymczy authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    6afd1a9 View commit details
    Browse the repository at this point in the history
  11. Merge commit from fork

    ggerganov authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    b72942f View commit details
    Browse the repository at this point in the history

Commits on Aug 10, 2024

  1. gguf-py : fix double call to add_architecture() (ggerganov#8952)

    Signed-off-by: tarilabs <matteo.mortari@gmail.com>
    tarilabs authored Aug 10, 2024
    Configuration menu
    Copy the full SHA
    911b437 View commit details
    Browse the repository at this point in the history
  2. Add support for encoder-only T5 models (ggerganov#8900)

    * gguf-py : add T5ENCODER model architecture
    
    * common : call llama_decode() during warmup only if the model has decoder
    
    * convert-hf : add T5EncoderModel
    
    * llama : add llama_model_has_decoder() API function
    
    * llama : split build_t5() into build_t5_encoder() and build_t5_decoder()
    
    * llama : add support for LLM_ARCH_T5ENCODER
    
    * llama-embedding : add support for LLAMA_POOLING_TYPE_NONE
    
    * llama-embedding : add support for encoder-only models
    
    ---------
    
    Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
    fairydreaming and sszymczy authored Aug 10, 2024
    Configuration menu
    Copy the full SHA
    7c3f55c View commit details
    Browse the repository at this point in the history
  3. llama : default n_swa for phi-3 (ggerganov#8931)

    * default n_swa for phi-3
    
    * fix
    
    * double check swa
    ngxson authored Aug 10, 2024
    Configuration menu
    Copy the full SHA
    7eb2384 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    6e02327 View commit details
    Browse the repository at this point in the history

Commits on Aug 11, 2024

  1. Optimize Vulkan backend for better CPU performance and less GPU synch…

    …ronization overhead. (ggerganov#8943)
    
    * Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead.
    
    - Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove.
    - ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors.
    
    * Fix small typo
    
    ---------
    
    Co-authored-by: 0cc4m <picard12@live.de>
    mtavenrath and 0cc4m authored Aug 11, 2024
    Configuration menu
    Copy the full SHA
    7c5bfd5 View commit details
    Browse the repository at this point in the history
  2. llama : check all graph nodes when searching for result_embd_pooled (g…

    …gerganov#8956)
    
    
    Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
    fairydreaming and sszymczy authored Aug 11, 2024
    Configuration menu
    Copy the full SHA
    33309f6 View commit details
    Browse the repository at this point in the history
  3. update guide (ggerganov#8909)

    Co-authored-by: Neo Zhang <>
    arthw authored Aug 11, 2024
    Configuration menu
    Copy the full SHA
    a21c6fd View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    8cd1bcf View commit details
    Browse the repository at this point in the history
  5. gguf-py : Numpy dequantization for most types (ggerganov#8939)

    * gguf-py : Numpy dequantization for most types
    
    * gguf-py : Numpy dequantization for grid-based i-quants
    compilade authored Aug 11, 2024
    Configuration menu
    Copy the full SHA
    4134999 View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2024

  1. Merge pull request #32 from ggerganov/master

    merge upstream
    l3utterfly authored Aug 12, 2024
    Configuration menu
    Copy the full SHA
    32335d5 View commit details
    Browse the repository at this point in the history