为什么不使用增训方式构造大模型 #6

zt1112 · 2023-10-10T03:45:55Z

很棒的工作，有两个疑惑希望作者帮助解答下：

1、类似的行业大模型会采用先增训再用指令数据集SFT的方案，请教下这里为什么考虑直接使用SFT呢？
2、SFT方案对安全领域的知识扩充是否足够，不知道作者有没有这方面的实验，多谢

ddzipp · 2024-11-13T16:03:35Z

Apologies for the delayed response. Due to some matters in 2023, I was unable to reply in a timely manner, and I apologize once again. Regarding your questions:

Due to our lack of experience during the initial exploration, we did not consider data augmentation, which would undoubtedly enhance the diversity of the dataset.

Recently, there has been some work on data augmentation in the cybersecurity field as well. IBM’s CyberPal mentioned specific details on this topic (unfortunately, the article is closed-source, and the code and dataset were not published). We are working on reproducing it and plan to release our dataset and corresponding code. We look forward to your continued interest in our work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

为什么不使用增训方式构造大模型 #6

为什么不使用增训方式构造大模型 #6

zt1112 commented Oct 10, 2023

ddzipp commented Nov 13, 2024

为什么不使用增训方式构造大模型 #6

为什么不使用增训方式构造大模型 #6

Comments

zt1112 commented Oct 10, 2023

ddzipp commented Nov 13, 2024