-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge upstream #36
merge upstream #36
Commits on Sep 2, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c6d4cb4 - Browse repository at this point
Copy the full SHA c6d4cb4View commit details -
build(nix): Package gguf-py (ggerganov#5664)
* style: format with nixfmt/rfc101-style * build(nix): Package gguf-py * build(nix): Refactor to new scope for gguf-py * build(nix): Exclude gguf-py from devShells * build(nix): Refactor gguf-py derivation to take in exact deps * build(nix): Enable pytestCheckHook and pythonImportsCheck for gguf-py * build(python): Package python scripts with pyproject.toml * chore: Cleanup * dev(nix): Break up python/C devShells * build(python): Relax pytorch version constraint Nix has an older version * chore: Move cmake to nativeBuildInputs for devShell * fmt: Reconcile formatting with rebase * style: nix fmt * cleanup: Remove unncessary __init__.py * chore: Suggestions from review - Filter out non-source files from llama-scripts flake derivation - Clean up unused closure - Remove scripts devShell * revert: Bad changes * dev: Simplify devShells, restore the -extra devShell * build(nix): Add pyyaml for gguf-py * chore: Remove some unused bindings * dev: Add tiktoken to -extra devShells
Configuration menu - View commit details
-
Copy full SHA for 9c1ba55 - Browse repository at this point
Copy the full SHA 9c1ba55View commit details -
Configuration menu - View commit details
-
Copy full SHA for b60074f - Browse repository at this point
Copy the full SHA b60074fView commit details -
server : refactor multitask handling (ggerganov#9274)
* server : remove multitask from server_task * refactor completions handler * fix embeddings * use res_ok everywhere * small change for handle_slots_action * use unordered_set everywhere * (try) fix test * no more "mutable" lambda * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * use deque --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 6e7d133 - Browse repository at this point
Copy the full SHA 6e7d133View commit details -
Configuration menu - View commit details
-
Copy full SHA for f771d06 - Browse repository at this point
Copy the full SHA f771d06View commit details -
Configuration menu - View commit details
-
Copy full SHA for 048de84 - Browse repository at this point
Copy the full SHA 048de84View commit details -
Configuration menu - View commit details
-
Copy full SHA for f148516 - Browse repository at this point
Copy the full SHA f148516View commit details -
Configuration menu - View commit details
-
Copy full SHA for 48baa61 - Browse repository at this point
Copy the full SHA 48baa61View commit details
Commits on Sep 3, 2024
-
Configuration menu - View commit details
-
Copy full SHA for b69a480 - Browse repository at this point
Copy the full SHA b69a480View commit details -
llama-bench : add JSONL (NDJSON) output mode (ggerganov#9288)
* llama-bench : add JSONL (NDJSON) output mode * llama-bench : update usage docs
Configuration menu - View commit details
-
Copy full SHA for 8962422 - Browse repository at this point
Copy the full SHA 8962422View commit details -
flake.lock: Update (ggerganov#9261)
Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/8471fe90ad337a8074e957b69ca4d0089218391d?narHash=sha256-XOQkdLafnb/p9ij77byFQjDf5m5QYl9b2REiVClC%2Bx4%3D' (2024-08-01) → 'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/c374d94f1536013ca8e92341b540eba4c22f9c62?narHash=sha256-Z/ELQhrSd7bMzTO8r7NZgi9g5emh%2BaRKoCdaAv5fiO0%3D' (2024-08-21) → 'github:NixOS/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2?narHash=sha256-GnR7/ibgIH1vhoy8cYdmXE6iyZqKqFxQSVkFgosBh6w%3D' (2024-08-28) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 7605ae7 - Browse repository at this point
Copy the full SHA 7605ae7View commit details
Commits on Sep 4, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 9379d3c - Browse repository at this point
Copy the full SHA 9379d3cView commit details -
rpc : make RPC servers come first in the device list (ggerganov#9296)
* rpc : make RPC servers come first in the device list * rpc : disable options for non-RPC builds * rpc : rpc_count always zero for non-RPC builds
Configuration menu - View commit details
-
Copy full SHA for 82e3b03 - Browse repository at this point
Copy the full SHA 82e3b03View commit details -
Configuration menu - View commit details
-
Copy full SHA for c8671ae - Browse repository at this point
Copy the full SHA c8671aeView commit details -
[SYCL] Fix DMMV dequantization (ggerganov#9279)
Fixed dmmv dequant for ncols== GGML_SYCL_DMMV_X
Configuration menu - View commit details
-
Copy full SHA for 5910ea9 - Browse repository at this point
Copy the full SHA 5910ea9View commit details -
ggml : AVX2 support for Q4_0_8_8 (ggerganov#8713)
* Add AVX2 based implementations for quantize_q8_0_4x8, ggml_gemv_q4_0_8x8_q8_0 and ggml_gemm_q4_0_8x8_q8_0 functions * Update code to fix issues occuring due to non alignment of elements to be processed as multiple of 16 in MSVC * Update comments and indentation * Make updates to reduce number of load instructions
Configuration menu - View commit details
-
Copy full SHA for 581c305 - Browse repository at this point
Copy the full SHA 581c305View commit details
Commits on Sep 5, 2024
-
Configuration menu - View commit details
-
Copy full SHA for bdf314f - Browse repository at this point
Copy the full SHA bdf314fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4db0478 - Browse repository at this point
Copy the full SHA 4db0478View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1031771 - Browse repository at this point
Copy the full SHA 1031771View commit details -
Configuration menu - View commit details
-
Copy full SHA for 32b2ec8 - Browse repository at this point
Copy the full SHA 32b2ec8View commit details
Commits on Sep 6, 2024
-
ggml-quants : ternary packing for TriLMs and BitNet b1.58 (ggerganov#…
…8151) * ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b * ggml-quants : faster 1.625 bpw AVX2 vec_dot Not using a lookup table anymore makes it match q4_0 speed. * gguf-py : fix formatting * llama : remove spaces on empty line * ggml-quants : subtract 1 when back in epi8 This makes the 1.625 bpw type go faster than q4_0. Still not the fastest. * ggml-quants : Q2_2 now faster than Q4_K on with AVX2 * ggml-quants : cleanup Q1_3 code formatting * ggml-quants : ARM NEON vec_dot for q2_2 and q1_3 * ggml-quants : use ceiling division when quantizing q1_3 * convert-hf : simplify BitNet pre-quantization This still results in the exact same tensor weights and scales, but it reveals some weirdness in the current algorithm. * convert-hf : allow converting the weird BitNet 1.3B Its FFN size is 5460 which is not convenient. The offending tensors are kept in F16, which makes the final model 5.01 bpw. * bitnet : replace 1.58b with b1.58, as in the paper * ggml-quants : fix build failure on Windows * ggml-quants : attempt to fix Arm 32-bit support * ggml : add some informative comments in q1_3 vec_dot * ggml : add TQ1_0 and TQ2_0 ternary quantization types * ggml : even faster TQ2_0 * ggml : also faster TQ1_0 Same optimization as for TQ2_0 by offsetting the sum instead of the weights. This makes TQ1_0 almost as fast as Q8_0 on AVX2. * ggml : fix build issues in certain environments * ggml : add NEON vec_dot implementation for TQ1_0 and TQ2_0 * ggml : avoid directly using vmlal_high_s8, for 32-bit ARM compat The compiler seems smart enough to use the same instruction even when using vget_high_s8 instead. * ggml : remove q1_3 and q2_2 No more 1.625 bpw and 2.000 bpw, now instead using 1.6875 bpw and 2.0625 bpw with TQ1_0 and TQ2_0, respectively. * llama : remove the separate scale tensors of BitNet b1.58 They won't be needed, since the remaining ternary quant types have built-in scales. * ggml-quants : rename fields of TQ1_0 and TQ2_0 structs for consistency * ggml-quants : allow using vdotq_s32 in TQ2_0 vec_dot Not yet tested on hardware which supports it, might not work or might not even compile. But also it might. It should make the performance better on recent ARM CPUs. * ggml-quants : remove comment about possible format change of TQ2_0 Making it slightly more convenient for AVX512 but less convenient for everything else is not worth the trouble. * gguf-py : Numpy (de)quantization for TQ1_0 and TQ2_0 * ggml-quants : use roundf instead of nearest_int for TQ1_0 and TQ2_0 This does not change anything for ternary models, since their values should never end up being in halfway cases anyway. * convert : allow direct conversion to TQ1_0 and TQ2_0 The token embeddings and output tensors are kept in F16 to allow quantizing them to Q4_K and Q6_K with llama-quantize. * llama : handle fallback for TQ1_0 and TQ2_0 with Q4_0 Q4_0 is not completely symmetric (so not lossless for ternary models), but it should be good enough. * ggml-quants : allow using ARM dot product instructions for TQ1_0 * ggml-quants : deduplicate TQ1_0 and TQ2_0 __ARM_FEATURE_DOTPROD support * ggml : remove unused ggml_mul special case It would otherwise conflict with the more general optimization coming with Mamba-2. * ggml : handle TQ1_0 and TQ2_0 in dequantization-based operators * test-backend-ops : add TQ1_0 and TQ2_0 comments for later Not yet adding uncommented, because some backends like SYCL and Metal do not properly handle unknown types in supports_op for GGML_OP_MUL_MAT. (and Metal also doesn't handle it with GGML_OP_GET_ROWS) Support for TQ1_0 and TQ2_0 for other backends than CPU will be added in follow-up pull requests.
Configuration menu - View commit details
-
Copy full SHA for 9bc6db2 - Browse repository at this point
Copy the full SHA 9bc6db2View commit details -
Improve Vulkan shader build system (ggerganov#9239)
* Improve Vulkan shader builds system - Add dependency to vulkan-shaders-gen to rebuild shaders when changing the shader compilation utility. - Add option to generate debug info for Vulkan shaders to provide shader source to Vulkan shader profiling tools * remove not required self dependency
Configuration menu - View commit details
-
Copy full SHA for 8ebe8dd - Browse repository at this point
Copy the full SHA 8ebe8ddView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4a1411b - Browse repository at this point
Copy the full SHA 4a1411bView commit details -
ggml : fix build break for the vulkan-debug (ggerganov#9265)
- windows build : Ok. - linux build : Ok. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
Configuration menu - View commit details
-
Copy full SHA for 409dc4f - Browse repository at this point
Copy the full SHA 409dc4fView commit details -
batched-bench : add
--output-format jsonl
option (ggerganov#9293)`--output-format` is modeled after `llama-bench`'s options
Configuration menu - View commit details
-
Copy full SHA for 815b1fb - Browse repository at this point
Copy the full SHA 815b1fbView commit details -
llama-bench : log benchmark progress (ggerganov#9287)
* llama-bench : add optional progress messages
Configuration menu - View commit details
-
Copy full SHA for 134bc38 - Browse repository at this point
Copy the full SHA 134bc38View commit details -
server : simplify state machine for slot (ggerganov#9283)
* server : simplify state machine for slot * add SLOT_STATE_DONE_PROMPT * pop_deferred_task * add missing notify_one * fix passkey test * metrics : add n_busy_slots_per_decode * fix test step * add test * maybe fix AddressSanitizer? * fix deque ? * missing lock * pop_deferred_task: also notify * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 9b2c24c - Browse repository at this point
Copy the full SHA 9b2c24cView commit details
Commits on Sep 7, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 6c89eb0 - Browse repository at this point
Copy the full SHA 6c89eb0View commit details -
ggml : fix missing
cpu_set_t
on emscripten (ggerganov#9336)* ggml : fix missing cpu_set_t on emscripten * better version * bring back android part
Configuration menu - View commit details
-
Copy full SHA for 947538a - Browse repository at this point
Copy the full SHA 947538aView commit details -
llama : refactor sampling v2 (ggerganov#9294)
- Add `struct llama_sampler` and `struct llama_sampler_i` - Add `llama_sampler_` API - Add `llama_sampler_chain_` API for chaining multiple samplers - Remove `LLAMA_API_INTERNAL` - Add `llama_perf_` API and remove old `llama_print_timings` and `llama_reset_timings`
Configuration menu - View commit details
-
Copy full SHA for df270ef - Browse repository at this point
Copy the full SHA df270efView commit details -
Configuration menu - View commit details
-
Copy full SHA for e32d081 - Browse repository at this point
Copy the full SHA e32d081View commit details -
common : refactor arg parser (ggerganov#9308)
* (wip) argparser v3 * migrated * add test * handle env * fix linux build * add export-docs example * fix build (2) * skip build test-arg-parser on windows * update server docs * bring back missing --alias * bring back --n-predict * clarify test-arg-parser * small correction * add comments * fix args with 2 values * refine example-specific args * no more lamba capture Co-authored-by: slaren@users.noreply.github.com * params.sparams * optimize more * export-docs --> gen-docs
Configuration menu - View commit details
-
Copy full SHA for 1b9ae51 - Browse repository at this point
Copy the full SHA 1b9ae51View commit details -
Configuration menu - View commit details
-
Copy full SHA for e536426 - Browse repository at this point
Copy the full SHA e536426View commit details -
llama : sanitize invalid tokens (ggerganov#9357)
* common : do not add null tokens during warmup ggml-ci * llama : check that the input tokens are valid ggml-ci * tests : fix batch size of bert model ggml-ci
Configuration menu - View commit details
-
Copy full SHA for faf69d4 - Browse repository at this point
Copy the full SHA faf69d4View commit details -
Configuration menu - View commit details
-
Copy full SHA for f12295b - Browse repository at this point
Copy the full SHA f12295bView commit details -
Configuration menu - View commit details
-
Copy full SHA for a5b5d9a - Browse repository at this point
Copy the full SHA a5b5d9aView commit details
Commits on Sep 8, 2024
-
Configuration menu - View commit details
-
Copy full SHA for fbb7fcf - Browse repository at this point
Copy the full SHA fbb7fcfView commit details -
ggml : fix cont with transposed tensors when one dimension is 1 (ggml…
…/934) * ggml_cont: fix issue with transposed tensors when one dimension is 1 when using multiple threads, it is not enough to check for the tensors to be contiguous for ggml_compute_forward_dup_same_cont to work correctly. The tensors strides also need to match. Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Add ggml_cont tests Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Remove dead code it isn't possible to reach this code because all these functions are invoked by ggml_compute_forward_dup if and only if src0->type != dst->type Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Make ggml_compute_forward_dup_same_cont work with contiguous tensors Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> --------- Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for efe6a83 - Browse repository at this point
Copy the full SHA efe6a83View commit details -
Configuration menu - View commit details
-
Copy full SHA for 51d964a - Browse repository at this point
Copy the full SHA 51d964aView commit details -
cann : add Ascend NPU support (whisper/2336)
* enable Ascend NPU in src/whisper.cpp * sync test-backend-ops with llama.cpp
Configuration menu - View commit details
-
Copy full SHA for d2d3200 - Browse repository at this point
Copy the full SHA d2d3200View commit details -
Configuration menu - View commit details
-
Copy full SHA for ba1cf84 - Browse repository at this point
Copy the full SHA ba1cf84View commit details -
Configuration menu - View commit details
-
Copy full SHA for dbbebca - Browse repository at this point
Copy the full SHA dbbebcaView commit details -
tests: add gradient tests for all backends (ggml/932)
* tests: add gradient checking to test-backend-ops * remove old comment * reorder includes * adjust SIN/COS parameters * add documentation, use supports_op if possible
Configuration menu - View commit details
-
Copy full SHA for 202084d - Browse repository at this point
Copy the full SHA 202084dView commit details -
vulkan: correctly report support for OP_CONT (ggml/946)
test-backend-ops fails because ggml_cont aborts when invoked passing an unsupported type. This commit makes ggml_cont tests pass Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 9cb9260 - Browse repository at this point
Copy the full SHA 9cb9260View commit details -
vulkan: add dryrun support to sin and cos ops (ggml/947)
sin and cos failed test-backend-ops because they tried to dereference a context pointer that is null on dry runs. This commit prevents that segfault. Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 406c1a3 - Browse repository at this point
Copy the full SHA 406c1a3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 60a3107 - Browse repository at this point
Copy the full SHA 60a3107View commit details -
Configuration menu - View commit details
-
Copy full SHA for 385decb - Browse repository at this point
Copy the full SHA 385decbView commit details -
Configuration menu - View commit details
-
Copy full SHA for a876861 - Browse repository at this point
Copy the full SHA a876861View commit details -
Configuration menu - View commit details
-
Copy full SHA for d11bd3b - Browse repository at this point
Copy the full SHA d11bd3bView commit details