Releases: EricLBuehler/mistral.rs
Releases · EricLBuehler/mistral.rs
v0.1.25
Summary
- Added AnyMoE
- Added the Starcoder 2 model
- Vision interactive mode
- X-LoRA and LoRA support for Gemma 2
- Optimized local loading speed
- Release using
cargo dist
MSRV
cargo msrv
MSRV of mistral.rs is 1.75.0
.
What's Changed
- Add a vision interactive mode by @EricLBuehler in #468
- Add support for X-LoRA/LoRA to Gemma 2 by @EricLBuehler in #499
- Add adapter activation support for Gemma 2 LoRA by @EricLBuehler in #500
- Decrease MSRV to 1.75.0 by @EricLBuehler in #501
- Fix docs with preprocessor config by @EricLBuehler in #502
- Allow for optional
model
field in OpenAI requests by @EricLBuehler in #504 - Support streaming for completion by @EricLBuehler in #508
- Select the best device in examples by @EricLBuehler in #507
- Handle Metal error from bf16 autodtype selection by @EricLBuehler in #509
- Use chat messages for the rust examples and show T/s by @EricLBuehler in #510
- Avoid f64 to f32 cast in phirope for dtype on metal by @EricLBuehler in #512
- Handle errors in rust examples by @EricLBuehler in #513
- AnyMoE: Build an MoE model from anything, quickly by @EricLBuehler in #476
- chore: update paths.rs by @eltociear in #516
- Add AnyMoE support for vision models by @EricLBuehler in #515
- Support images in AnyMoE dataset by @EricLBuehler in #517
- Handle metal case for nonzero and bitwise ops by @EricLBuehler in #518
- Support saving and loading AnyMoE gating layer by @EricLBuehler in #519
- Implement the Starcoder 2 model architecture by @EricLBuehler in #522
- Docs for Starcoder 2 by @EricLBuehler in #524
- Starcoder2 docs by @EricLBuehler in #525
- Export loader for starcoder2 by @EricLBuehler in #526
- Support the new phi3 models by @EricLBuehler in #530
- Source the AnyMoE dataset from a structured JSON file by @EricLBuehler in #531
- Create an AnyMoE loss graph by @EricLBuehler in #532
- Add AnyMoE demo video by @EricLBuehler in #533
- Add judgment for whether model_id is a local path to speed up local model loading by @chenwanqq in #523
- Update docs by @EricLBuehler in #534
- release using cargo-dist by @kranurag7 in #480
- Bump version to 0.1.25 by @EricLBuehler in #535
New Contributors
- @kranurag7 made their first contribution in #480
Full Changelog: v0.1.24...v0.1.25
Download mistralrs-server 0.1.25
File | Platform | Checksum |
---|---|---|
mistralrs-server-aarch64-apple-darwin.tar.xz | Apple Silicon macOS | checksum |
mistralrs-server-x86_64-apple-darwin.tar.xz | Intel macOS | checksum |
mistralrs-server-x86_64-unknown-linux-gnu.tar.xz | x64 Linux | checksum |
v0.1.24
Patch release, please update
What's Changed
- Bump version to 0.1.24 by @EricLBuehler in #497
Full Changelog: v0.1.23...v0.1.24
v0.1.23
What's Changed
- Improve and update docs by @EricLBuehler in #477
- Progress bar and logging when loading repeating layers by @EricLBuehler in #479
- Update deps by @EricLBuehler in #483
- Optimize decoding by removing redundant qkv transpose by @EricLBuehler in #487
- Fixes and tweak docs, logging for local loading by @EricLBuehler in #489
- Add the Gemma 2 model by @EricLBuehler in #490
- Update demo video by @EricLBuehler in #491
- Utilize new quantize_onto qtensor api by @EricLBuehler in #492
- Update deps by @EricLBuehler in #493
- Bump version to 0.1.23 by @EricLBuehler in #495
Full Changelog: v0.1.22...v0.1.23
v0.1.22
What's Changed
- Remove erroneously flaky CI test by @EricLBuehler in #466
- NVCC flags support for mistralrs_core build by @EricLBuehler in #469
- Prevent divide by zero in cuda kernel by @joshpopelka20 in #471
- Better cuda build.rs linking of stdc++ by @EricLBuehler in #472
- Remove some unnecessary
&mut
s by @EricLBuehler in #473 - Fix arg order for pdoc by @EricLBuehler in #474
- Bump version to 0.1.22 by @EricLBuehler in #475
Full Changelog: v0.1.21...v0.1.22
v0.1.21
What's Changed
- Expose idefics2 loader by @EricLBuehler in #450
- Try auto dtypes based on compute cap by @EricLBuehler in #453
- Fix dtype error for logit bias by @EricLBuehler in #454
- Fix sequence prompt len for Phi3-V by @EricLBuehler in #455
- Tune threshold for matmul via f16 by @EricLBuehler in #457
- Improve short/long scaling precision for LongRope by @EricLBuehler in #458
- Fix LongRope models position ids calculation by @EricLBuehler in #459
- Update deps by @EricLBuehler in #460
- Improve handling of errors in auto dtype selection by @EricLBuehler in #461
- Add support for cross-gpu device mapping by @EricLBuehler in #462
- Bump version to 0.1.21 by @EricLBuehler in #463
Full Changelog: v0.1.20...v0.1.21
v0.1.20
What's Changed
- Fix with docker images by fixing use of pyo3 by @EricLBuehler in #440
- Update readme with docker info by @EricLBuehler in #441
- Add Cargo.lock file by @EricLBuehler in #442
- Fix causal masks dtype by @EricLBuehler in #443
- Add support for Idefics 2 by @EricLBuehler in #309
- Bump to version 0.1.20 by @EricLBuehler in #449
Full Changelog: v0.1.19...v0.1.20
v0.1.19
What's Changed
- Format readme by @EricLBuehler in #427
- Remove multiple tracing initializations and init outside of mistralrs-core by @EricLBuehler in #428
- Run clippy by @EricLBuehler in #429
- adding reboot functionality by @gregszumel in #378
- Lower memory spike when loading with ISQ on CUDA by @EricLBuehler in #433
- Fix failing docs workflow by @EricLBuehler in #435
- Remove unused line in dockerignore by @EricLBuehler in #436
- Improve
Auto
dtype determination by @EricLBuehler in #438 - Bump version to 0.1.19 by @EricLBuehler in #439
Full Changelog: v0.1.18...v0.1.19
v0.1.18
What's Changed
- Switch to minijinja's pycompat mode by @mitsuhiko in #421
- chore: update speculative.rs by @eltociear in #423
- Bump to new commit of candle with cudarc 0.11.5 by @EricLBuehler in #424
- Use rev key instead of commit to get rid of warning by @EricLBuehler in #425
- Add nonzero and bitwise operators by @chenwanqq in #422
- Fix Python deps, base64 impl, add examples by @EricLBuehler in #426
New Contributors
- @mitsuhiko made their first contribution in #421
- @chenwanqq made their first contribution in #422
Full Changelog: v0.1.17...v0.1.18
v0.1.17
What's Changed
- Add and update template READMEs by @EricLBuehler in #405
- Improve Rust crates docs by @EricLBuehler in #406
- Expose phi3v loader and remove unused deps by @EricLBuehler in #408
- Support GGUF Mixtral format where experts are in one tensor by @EricLBuehler in #355
- Refactor with normal loading metadata for vision models by @EricLBuehler in #409
- Phi 3 vision ISQ support by @EricLBuehler in #410
- Remove causal masks cache by @EricLBuehler in #412
- Fix: use new slice_assign by @EricLBuehler in #415
- Fix Phi-3 GGUF by @EricLBuehler in #414
- Implement gpt2 (BPE) GGUF tokenizer conversion by @EricLBuehler in #397
- Support chat template from GGUF by @EricLBuehler in #416
- Expose API to specify dtype during loading by @EricLBuehler in #417
- Lock candle version to commit by @EricLBuehler in #419
- Bump version to 0.1.17 by @EricLBuehler in #420
Full Changelog: v0.1.16...v0.1.17
v0.1.16
Summary
- Various fixes
- Excellent work on refactoring by @polarathene
- First vision model: Phi 3 vision
What's Changed
- Implement the Phi 3 vision model by @EricLBuehler in #351
- Bump version again to 0.1.15 by @EricLBuehler in #390
- Add docs for installing huggingface-cli by @EricLBuehler in #391
- Fix metal loading issue by loading sequentially by @EricLBuehler in #394
- Fix logging in gguf and ggml by @EricLBuehler in #399
- Add fused bias linear layer with cublaslt by @EricLBuehler in #400
- docs: Resolve CI lints on docs by @polarathene in #401
- Refactor: GGUF metadata tokenizer by @polarathene in #389
- Add
Nonzero
layer by @EricLBuehler in #402 - Bump version to 0.1.16 by @EricLBuehler in #404
Full Changelog: v0.1.15...v0.1.16