Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnyMoE: Build an MoE model from anything, quickly #476

Merged
merged 87 commits into from
Jul 1, 2024
Merged

Conversation

EricLBuehler
Copy link
Owner

@EricLBuehler EricLBuehler commented Jun 24, 2024

This PR implements AnyMoE, a method to build a flexible MoE model from any combination of fine-tuned expert models. Please see the paper for reference, although currently there are a few implementation-level differences.

The technique implemented here is similar to the gating method found in models such as Cephalo Vision 3x8b beta or Mixtral: it uses a gating layer to select experts in the MLP layer. This PR implements a built-in pretraining strategy with the option to save the resulting gating layer weights, powered by Candle's autograd infrastructure.

By allowing users to create an MoE model quickly by using the expert weights and a small pretraining dataset, AnyMoE lowers the barrier to entry for users who want to create and deploy tailored MoE models for their application.

Status

  • AnyMoEPipeline
    • Add most of it
    • Pretraining infrastructure
    • Connect to the rest of mistral.rs
      • Python API
      • CLI
      • Server

@EricLBuehler EricLBuehler added the new feature New feature or request label Jun 24, 2024
Copy link

github-actions bot commented Jun 24, 2024

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                    9           21           21            0            0
 Python                 33         1274         1089           37          148
 TOML                   16          445          403            2           40
-------------------------------------------------------------------------------
 Jupyter Notebooks       1            0            0            0            0
 |- Markdown             1           60           30           22            8
 |- Python               1           96           87            1            8
 (Total)                            156          117           23           16
-------------------------------------------------------------------------------
 Markdown               19         1373            0         1028          345
 |- BASH                 5          101           98            0            3
 |- Python               5           98           88            0           10
 |- Rust                 3          151          135            6           10
 (Total)                           1723          321         1034          368
-------------------------------------------------------------------------------
 Rust                  122        38838        35152          695         2991
 |- Markdown            70          682           13          631           38
 (Total)                          39520        35165         1326         3029
===============================================================================
 Total                 203        42427        37059         1762         3606
===============================================================================
  

@EricLBuehler
Copy link
Owner Author

cargo run --release --features cuda -- -i toml -f toml-selectors/anymoe.toml

@EricLBuehler EricLBuehler mentioned this pull request Jul 1, 2024
5 tasks
@EricLBuehler EricLBuehler merged commit a3c8eaa into master Jul 1, 2024
10 checks passed
@EricLBuehler EricLBuehler deleted the amoe branch July 1, 2024 14:59
@EricLBuehler EricLBuehler mentioned this pull request Jul 1, 2024
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant