Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Models with Non-Standard Architectures (e.g., Multimodal Models) #450

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

ElliotStein
Copy link

This update extends merge capabilities to models without predefined architecture JSON files, enabling support for models with non-standard architectures such as multimodal models.

Key Changes:

  • Automatic Architecture Info Creation: For models without an architecture JSON specification in /_data/architectures, an ArchitectureInfo class is automatically generated by reading parameter names from the saved model.
    - Efficiency Note: Reading parameter names is significantly faster for models saved in .safetensors format compared to .bin.
  • Parameter Organization: While pre/post weights aren’t explicitly segregated, parameters are grouped into layers based on the pattern .{integer}. in parameter names.
  • Architectural Compatibility Requirement: Currently only supports merging models with identical architectures.

Modifications to Core Functionality:

  • get_architecture_info: Now returns False and raises a warning if it fails, rather than raising an error outright.

Additional Updates:

  • Minor bug fixes to testing and GPT-2 configuration.

This change does not impact functionality for models with predefined architecture JSON specifications (i.e., all currently supported merges).

@ElliotStein ElliotStein changed the title missing param in gpt2 config Enable Support for Merging Models with Non-Standard Architectures (e.g., Multimodal Models) Oct 31, 2024
@ElliotStein ElliotStein changed the title Enable Support for Merging Models with Non-Standard Architectures (e.g., Multimodal Models) Merge Models with Non-Standard Architectures (e.g., Multimodal Models) Oct 31, 2024
ElliotStein and others added 4 commits November 4, 2024 17:20
…atch, or match when a prefix is removed (e.g. vision_block.layer.0 and layer.0), their overlapping layers can now be merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant