AnyMoE: Build an MoE model from anything, quickly #476

EricLBuehler · 2024-06-24T19:47:41Z

This PR implements AnyMoE, a method to build a flexible MoE model from any combination of fine-tuned expert models. Please see the paper for reference, although currently there are a few implementation-level differences.

The technique implemented here is similar to the gating method found in models such as Cephalo Vision 3x8b beta or Mixtral: it uses a gating layer to select experts in the MLP layer. This PR implements a built-in pretraining strategy with the option to save the resulting gating layer weights, powered by Candle's autograd infrastructure.

By allowing users to create an MoE model quickly by using the expert weights and a small pretraining dataset, AnyMoE lowers the barrier to entry for users who want to create and deploy tailored MoE models for their application.

Status

github-actions · 2024-06-24T19:48:41Z

Code Metrics Report

  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                    9           21           21            0            0
 Python                 33         1274         1089           37          148
 TOML                   16          445          403            2           40
-------------------------------------------------------------------------------
 Jupyter Notebooks       1            0            0            0            0
 |- Markdown             1           60           30           22            8
 |- Python               1           96           87            1            8
 (Total)                            156          117           23           16
-------------------------------------------------------------------------------
 Markdown               19         1373            0         1028          345
 |- BASH                 5          101           98            0            3
 |- Python               5           98           88            0           10
 |- Rust                 3          151          135            6           10
 (Total)                           1723          321         1034          368
-------------------------------------------------------------------------------
 Rust                  122        38838        35152          695         2991
 |- Markdown            70          682           13          631           38
 (Total)                          39520        35165         1326         3029
===============================================================================
 Total                 203        42427        37059         1762         3606
===============================================================================

EricLBuehler · 2024-06-25T16:01:49Z

cargo run --release --features cuda -- -i toml -f toml-selectors/anymoe.toml

EricLBuehler added 10 commits June 20, 2024 09:17

Add gating layer and some infrastructure

2f91033

Merge branch 'master' into amoe

a1baaf7

Add traits to the pipeline

ee1a02f

Merge branch 'master' into amoe

522ea16

Add the training loop

2506266

Remove unused

cd0f192

Add loader and pipeline

80f1c75

Merge branch 'master' into amoe

ea66764

Complete merge

50f96c4

Move method

8bbe5e6

EricLBuehler added the new feature New feature or request label Jun 24, 2024

EricLBuehler added 14 commits June 24, 2024 23:56

Add a default for anymoeconfig

92e2137

Add training support

250218f

Expose in toml selector

2dd3230

Inject anymoe layers

8fa0f7f

Load pretraining dataset from csv

0d475cd

Run the training

ba46aa0

Add a csv file

492ec02

Add default dtype option

75a049e

Fix lin varmap

3a21e09

Add some debugs and fix

398fb2d

Template it

9cf9980

Fix assert condition

be6b10d

To scalar

8354590

Take cached outputs

2bf7693

EricLBuehler added 3 commits June 26, 2024 01:18

It doesn't oom

95d63d9

Remove debugs

90cb866

Merge branch 'master' into amoe

6766dde

EricLBuehler added 13 commits June 30, 2024 19:53

Correctly gate

a280805

Correctly gate

22184a9

Fix clippy

7641b26

Fix training

f9b9a6f

Fix typos

73c663a

Merge branch 'master' into amoe

4aa2df9

Fix scale

096ba25

Use the zephyr lora adapter

bc788a1

Change the base moel

b56edf8

Typo

be282cf

Merge branch 'master' into amoe

cfb05f8

Add to the python api

7001915

Update the type stubs

97d6511

EricLBuehler mentioned this pull request Jul 1, 2024

Tracking issue for AnyMoE #514

Closed

5 tasks

EricLBuehler added 13 commits July 1, 2024 09:12

Update readme

0a71439

Add some examples

eb5b838

Add some examples and docs

0a9e69c

Update docs

067fe71

Update example to use layers

7818e87

Take into account silent loading

f672b01

Clippy

3c9bc13

Update example lora model id

f567d2e

Update target modules

d510e6f

Remove multiple target modules from examples

1170875

Update readme

07ac15e

Update readme

830476e

Missed anymoe lora rust example

30f5fd3

EricLBuehler merged commit a3c8eaa into master Jul 1, 2024
10 checks passed

EricLBuehler deleted the amoe branch July 1, 2024 14:59

EricLBuehler mentioned this pull request Jul 1, 2024

Add LLaVA Support #484

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AnyMoE: Build an MoE model from anything, quickly #476

AnyMoE: Build an MoE model from anything, quickly #476

EricLBuehler commented Jun 24, 2024 •

edited

Loading

github-actions bot commented Jun 24, 2024 •

edited

Loading

EricLBuehler commented Jun 25, 2024

AnyMoE: Build an MoE model from anything, quickly #476

AnyMoE: Build an MoE model from anything, quickly #476

Conversation

EricLBuehler commented Jun 24, 2024 • edited Loading

Status

github-actions bot commented Jun 24, 2024 • edited Loading

EricLBuehler commented Jun 25, 2024

EricLBuehler commented Jun 24, 2024 •

edited

Loading

github-actions bot commented Jun 24, 2024 •

edited

Loading