Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Models with Non-Standard Architectures (e.g., Multimodal Models) #450

Closed
wants to merge 24 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
e15c24d
Generalise mergekit to work for any architecture: implemented Automat…
ElliotStein Oct 18, 2024
14a72e3
Fix small error in test!
ElliotStein Oct 22, 2024
1571e57
Fixes to auto-architecture
ElliotStein Oct 22, 2024
bcab860
update formatting with black
ElliotStein Oct 22, 2024
0381b32
update formatting with isort
ElliotStein Oct 22, 2024
4d98a39
streamline architecture loading. use json if possible else use automatic
ElliotStein Oct 23, 2024
5833523
formatting
ElliotStein Oct 23, 2024
7b260df
error fix
ElliotStein Oct 23, 2024
19c9da1
Merge branch 'main' into architecture-agnostic
ElliotStein Oct 25, 2024
8bd864a
tidy up
ElliotStein Oct 31, 2024
dcf8c31
missing param in gpt2 config
ElliotStein Oct 31, 2024
5b96b97
Enable autodetection and merging for submodules. If parameter names m…
ElliotStein Nov 4, 2024
9cf8d3c
Merge remote-tracking branch 'origin/main' into architecture-agnostic
ElliotStein Nov 4, 2024
ee730f2
Update architecture.py
ElliotStein Nov 4, 2024
63afb68
Update architecture.py
ElliotStein Nov 4, 2024
d3f0774
Merge branch 'architecture-agnostic' of https://github.com/arcee-ai/m…
ElliotStein Nov 5, 2024
6cc135d
formatting
ElliotStein Nov 5, 2024
d526eb9
formatting
ElliotStein Nov 5, 2024
f081a0b
script to manually add weights from base model after merging submodul…
ElliotStein Nov 19, 2024
84260f0
Address PR feedback: resolve merge architecture error
ElliotStein Dec 2, 2024
eda0db1
Improve robustness and logging in architecture functions
ElliotStein Dec 6, 2024
1fcaca0
Merge remote-tracking branch 'origin/main' into architecture-agnostic
ElliotStein Dec 9, 2024
9cfa073
Full pytest support for VLM merges
ElliotStein Dec 9, 2024
97985c4
refactor get_architecture_info into ArchitectureInfoUtils.get_archite…
ElliotStein Dec 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions mergekit/_data/architectures/gpt2.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@
"num_layers_config_key": "n_layer",
"layer_templates": {
"weights": [
{
"name": "h.${layer_index}.attn.bias"
},
{
"name": "h.${layer_index}.attn.c_attn.weight"
},
Expand Down
Loading
Loading