Merging models

OK. Now it is being popular in LLM community. Meanwhile, it is faded in SD community. Crap.

Blue pill

Loads of academic paper recoded in HuggingFace, focusing on LLM.
A comprehensive study from a member in LLM community.
More on techniques used while merging. ~~Definitely not MBW.~~
Articles in Rentry

Blue pill, but in academic paper

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning comparative study
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities comparative study
Model merging with SVD to tie the Knots tldr: Merge LoRA OK
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
Arcee's MergeKit: A Toolkit for Merging Large Language Models toolkit
Model Stock: All we need is just a few fine-tuned models
Dataless Knowledge Fusion by Merging Weights of Language Models
Model Fusion via Optimal Transport
Transformer Fusion with Optimal Transport
Re-basin via implicit Sinkhorn differentiation
Training-Free Pretrained Model Merging
Do the Frankenstein, or how to achieve better out-of-distribution performance with manifold mixing model soup
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
TIES-Merging: Resolving Interference When Merging Models
Editing Models with Task Arithmetic
Git Re-Basin: Merging Models modulo Permutation Symmetries
ZipIt! Merging Models from Different Tasks without Training
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
Merging Models with Fisher-Weighted Averaging
Acceleration of Stochastic Approximation by Averaging a.k.a "Polyak averaging"
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization

Github repos (mergers / algorithm implementations) excluding MBW / LBW

"mergekit", a LLM merger.
My merge. Merge / Ensemble with Average weighting is a well discussed method. It is a good choice, at least for generalization. It refers to uniform-soup / isotropic merge in different papers.
Supermario merge(DARE). Ported to SD.
SD-Mecha, a merger focusing on serializable, extensive, efficiency, and supporting multiple algorithms.
Git-Rebasin stuffs.
Implementation on Fisher-Weighted Averaging.
Merge with any recognizable patterns: sd-webui-supermerger, "Elemental Merge" in sd-webui-supermerger

MBW / LBW related topics

Some explaination (how to use instead of why): BlockMergeExplained
Current meta: Merging multiple LoRAs. I don't know the procedure because I never do either LoRA or merging.
Great potential: Select best merging hyperparameters by Reinforcement Learning Medium article sdweb-auto-MBW. Note: Score metric may not fit everybody, just like what WD / SD / NAI did. ImageReward would be more relatable.
Now boosting is available as sd-webui-bayesian-merger. autombw v2 supports both bayesian boosting and ImageReward. I will include it in my next merge. Currently using my own fork to make them works, mainly code update.. Also see the findings focusing on related works and general findings.
Circulating in QQ: MBW魔法密录02-14, 模型基础理论
Commentary by GhostShell, author of GhostMix, description of GhostMix, just cite for docuemtary purpose. Bilibili mirror 1, Bilibili mirror 2
"rimo_random_mix" which is written in Chinese Note: BayesianOptimization

Not "merge" but worth mentioning

LoRA to complete model. Video tutorial.
LoCon / LyCORIS to complete model.
Now "merging" is a sustainable act.

Nice merges

majicMIX realistic: Better cosplay model (series). The "chained" mix is great, covering nice models from all "dimensions".
Chilloutmix: Cosplay model. However there is no cosplayer in dataset. Just merging "real photo" and "anime" together.
AbyssOrangeMix2: Realistic anime style. More focus on muscle and proportions, which is lack in most anime models. Merging "real photo" and "anime" also.
PastelMix: At least there is a clear theme, without owning the dataset.
Lawlas's yiffymix: There is way too many speices to train. AI will get confused. yiffy-e18 is an example.
AnythingV3: SOTA for hitting the perfect spot of the market desire.
Bayesian Merger, SD-Silicon: A model using auto RL to select merging hyperparameters.

My merge (shameless advertisement)

"Uniform merge" of lots of models (original theory).

Merge by attention blocks (exclusive)

Swapping attention per layers ref.
Some hints to perform such merge:

targets:
  - index: ["attentions"]
    targets:
      - targets:
          - index: ["attn1"]

"CC" found that there is no clear pattern per model, as some models contribute by "FF", meanwhile some others are "sattn / xattn". Twitter post.

The original and eatly stage on merging

Docuement in Official repo.

The mergin method that was forgotten (nice try?)

Cross-Domain Few-Shot Learning with Meta Fine-Tuning. Note: Not designed for SD!
Twitter post to figur out the timeline
The SD related repo. Not even merged into webui
Some related discussion.
JP article. The GUI.

Start ranting

~~We had a hard tome to find something related thesis / papers.~~ (Moved to top session)
Oh my god there is some discussion. Robust fine-tuning of zero-shot models
Some Chinese articles
As stated in 6569e224.md, try to theorize things formally. You may archive more if a more appropriate mechanism is applied.
A nice merge: WD1.4 with SD2.1 TTE.. ~~The TTE in WD1.4 is awful. No astolfo must be a failure. No execuses.~~

Try to read thesis and don't try to dream about the blackbox.

Why We Will Never Open Deep Learning’s Black Box.
Nope.

Where is the bleach? However there is visualization tools. There is always people interested. Make sure what you're doing.

The bruteforced result (Layer 7) is not useful for other tasks..., even it is supported by another popular merge model (AOM2)...

So colorful...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge.md

merge.md

Merging models

Blue pill

Blue pill, but in academic paper

Github repos (mergers / algorithm implementations) excluding MBW / LBW

MBW / LBW related topics

Not "merge" but worth mentioning

Nice merges

My merge (shameless advertisement)

Merge by attention blocks (exclusive)

The original and eatly stage on merging

The mergin method that was forgotten (nice try?)

Start ranting

Try to read thesis and don't try to dream about the blackbox.

Files

merge.md

Latest commit

History

merge.md

File metadata and controls

Merging models

Blue pill

Blue pill, but in academic paper

Github repos (mergers / algorithm implementations) excluding MBW / LBW

MBW / LBW related topics

Not "merge" but worth mentioning

Nice merges

My merge (shameless advertisement)

Merge by attention blocks (exclusive)

The original and eatly stage on merging

The mergin method that was forgotten (nice try?)

Start ranting

Try to read thesis and don't try to dream about the blackbox.