-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mergekit_with_sparsify #9561
mergekit_with_sparsify #9561
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9561 +/- ##
===========================================
- Coverage 53.19% 52.87% -0.32%
===========================================
Files 700 716 +16
Lines 110757 111685 +928
===========================================
+ Hits 58921 59058 +137
- Misses 51836 52627 +791 ☔ View full report in Codecov by Sentry. |
llm/tools/run_model_weight_merge.py
Outdated
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
import argparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要在下个PR新增脚本文档和API文档
paddlenlp/mergekit/merge_config.py
Outdated
dot_threshold: float = field( | ||
default=0.99, metadata={"help": "Threshold for considering the two vectors as colinear.(Used in slerp)"} | ||
) | ||
scaling: bool = field(default=False, metadata={"help": "Whether to scale the weights."}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些参数代表的什么不是很直观,考虑修改一下命名或者是在注释写的更清楚一些
paddlenlp/mergekit/merge_sparsify.py
Outdated
import numpy as np | ||
|
||
|
||
class SparsificationMethod: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mask没必要传出来,直接传一个稀疏化后的tensor即可
paddlenlp/mergekit/merge_linear.py
Outdated
""" | ||
Linear interpolation between two values. | ||
""" | ||
if sparsify_method is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
写成:
def merge_op(self, v_0, v_1, sparsify_method=None):
v_0 = sparsify_method.sparsify(v_0)
v_1 = sparsify_method.sparsify(v_1)
v_merge = 1 - self.merge_config.linear_ratio * v_0 + self.merge_config.linear_ratio * v_1
return v_merge
paddlenlp/mergekit/merge_slerp.py
Outdated
def __init__(self, merge_config): | ||
self.merge_config = merge_config | ||
|
||
def merge_op(self, v0, v1, eps=float(1e-8), dot_threshold=None, sparsify_method=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eps和dot_threshold是不是直接用merge_config的就行
paddlenlp/mergekit/merge_linear.py
Outdated
Linear interpolation between two values. | ||
""" | ||
if self.merge_config.sparsify_type is not None: | ||
sparsify = SparsificationMethod(self.merge_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要在这里初始化
paddlenlp/mergekit/merge_slerp.py
Outdated
""" | ||
if dot_threshold is None: | ||
dot_threshold = self.merge_config.dot_threshold | ||
if self.merge_config.sparsify_type is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
paddlenlp/mergekit/merge_model.py
Outdated
|
||
class MergeModel: | ||
def __init__(self, merge_config): | ||
self.merge_config = merge_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的方法怎么变少了
paddlenlp/mergekit/merge_model.py
Outdated
|
||
def merge_model(self, model_path0, model_path1, output_path, base_path=None): | ||
is_safetensor0 = self.check_model_path(model_path0) | ||
is_safetensor1 = self.check_model_path(model_path1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也要check base_path
paddlenlp/mergekit/merge_model.py
Outdated
raise ValueError("Weights total_size mismatch. " "Please make sure you load the correct weight file") | ||
if index0["weight_map"].keys() != index1["weight_map"].keys(): | ||
raise ValueError("Weights weight_map mismatch. Please make sure you load the correct weight file") | ||
if self.merge_config.merge_type in {"ties", "dare", "della", "dare_ties", "della_linear"}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里加个merge_type判断有什么作用
fce8d97
to
d534a72
Compare
@@ -0,0 +1,35 @@ | |||
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved. | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里提供一个示例的config文件?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为比较通用,感觉专门每个config写一个json必要性不大。类似lora merge直接给一个运行命令就行
python ./tools/merge_weight.py ,这块后续下一个pr会有相应的文档
import numpy as np | ||
|
||
|
||
class SparsifyMethod: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些稀疏化的方式只能在cpu上操作是吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前这个pr只支持numpy,后续会开发基于paddle tensor的版本
""" | ||
# init weight | ||
weight_list = self.merge_config.weight_list | ||
if self.merge_config.normalize: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种normalize是不是默认打开比较好?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
效果侧是建议设置True还是False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改为True,建议为True比较好
with fast_safe_open(os.path.join(model_path, self.safe_weight_name()), framework="numpy") as f: | ||
for k in f.keys(): | ||
state_dict[k] = f.get_tensor(k) | ||
elif file_type == "pdparams": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是否需要考虑tp格式的存储了?如果不支持tp格式的存储需要抛出报错或者报错?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
暂时不支持,在check_model_path会先检查模型权重存储类型。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Others
Description
add merge kit