OK. Now it is being popular in LLM community. Meanwhile, it is faded in SD community. Crap.
-
Loads of academic paper recoded in HuggingFace, focusing on LLM.
-
More on techniques used while merging.
Definitely not MBW.
-
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning comparative study
-
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities comparative study
-
Model merging with SVD to tie the Knots tldr: Merge LoRA OK
-
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
-
Arcee's MergeKit: A Toolkit for Merging Large Language Models toolkit
-
Dataless Knowledge Fusion by Merging Weights of Language Models
-
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
-
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
-
Acceleration of Stochastic Approximation by Averaging a.k.a "Polyak averaging"
-
My merge. Merge / Ensemble with Average weighting is a well discussed method. It is a good choice, at least for generalization. It refers to uniform-soup / isotropic merge in different papers.
-
SD-Mecha, a merger focusing on serializable, extensive, efficiency, and supporting multiple algorithms.
-
Merge with any recognizable patterns: sd-webui-supermerger, "Elemental Merge" in sd-webui-supermerger
-
Some explaination (how to use instead of why): BlockMergeExplained
-
Current meta: Merging multiple LoRAs. I don't know the procedure because I never do either LoRA or merging.
-
Great potential: Select best merging hyperparameters by Reinforcement Learning Medium article sdweb-auto-MBW. Note: Score metric may not fit everybody, just like what WD / SD / NAI did. ImageReward would be more relatable.
-
Now boosting is available as sd-webui-bayesian-merger. autombw v2 supports both bayesian boosting and ImageReward. I will include it in my next merge. Currently using my own fork to make them works, mainly code update.. Also see the findings focusing on related works and general findings.
-
Circulating in QQ: MBW魔法密录02-14, 模型基础理论
-
Commentary by GhostShell, author of GhostMix, description of GhostMix, just cite for docuemtary purpose. Bilibili mirror 1, Bilibili mirror 2
-
"rimo_random_mix" which is written in Chinese Note: BayesianOptimization
-
Now "merging" is a sustainable act.
-
majicMIX realistic: Better cosplay model (series). The "chained" mix is great, covering nice models from all "dimensions".
-
Chilloutmix: Cosplay model. However there is no cosplayer in dataset. Just merging "real photo" and "anime" together.
-
AbyssOrangeMix2: Realistic anime style. More focus on muscle and proportions, which is lack in most anime models. Merging "real photo" and "anime" also.
-
PastelMix: At least there is a clear theme, without owning the dataset.
-
Lawlas's yiffymix: There is way too many speices to train. AI will get confused. yiffy-e18 is an example.
-
AnythingV3: SOTA for hitting the perfect spot of the market desire.
-
Bayesian Merger, SD-Silicon: A model using auto RL to select merging hyperparameters.
targets:
- index: ["attentions"]
targets:
- targets:
- index: ["attn1"]
- "CC" found that there is no clear pattern per model, as some models contribute by "FF", meanwhile some others are "sattn / xattn". Twitter post.
-
Cross-Domain Few-Shot Learning with Meta Fine-Tuning. Note: Not designed for SD!
-
We had a hard tome to find something related thesis / papers.(Moved to top session) -
Oh my god there is some discussion. Robust fine-tuning of zero-shot models
-
As stated in 6569e224.md, try to theorize things formally. You may archive more if a more appropriate mechanism is applied.
-
A nice merge: WD1.4 with SD2.1 TTE..
The TTE in WD1.4 is awful. No astolfo must be a failure. No execuses.
- Where is the bleach? However there is visualization tools. There is always people interested. Make sure what you're doing.
- The bruteforced result (Layer 7) is not useful for other tasks..., even it is supported by another popular merge model (AOM2)...
- So colorful...