Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

为什么不使用增训方式构造大模型 #6

Open
zt1112 opened this issue Oct 10, 2023 · 1 comment
Open

为什么不使用增训方式构造大模型 #6

zt1112 opened this issue Oct 10, 2023 · 1 comment

Comments

@zt1112
Copy link

zt1112 commented Oct 10, 2023

很棒的工作,有两个疑惑希望作者帮助解答下:

1、类似的行业大模型会采用先增训再用指令数据集SFT的方案,请教下这里为什么考虑直接使用SFT呢?
2、SFT方案对安全领域的知识扩充是否足够,不知道作者有没有这方面的实验,多谢

@ddzipp
Copy link
Owner

ddzipp commented Nov 13, 2024

Apologies for the delayed response. Due to some matters in 2023, I was unable to reply in a timely manner, and I apologize once again. Regarding your questions:

Due to our lack of experience during the initial exploration, we did not consider data augmentation, which would undoubtedly enhance the diversity of the dataset.

Recently, there has been some work on data augmentation in the cybersecurity field as well. IBM’s CyberPal mentioned specific details on this topic (unfortunately, the article is closed-source, and the code and dataset were not published). We are working on reproducing it and plan to release our dataset and corresponding code. We look forward to your continued interest in our work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants