This repository is forked from https://github.com/xming521/WeClone. The only difference is that we add support for datasets extracted from WeChatMsg, and optimizes the data preprocessing to enhance the data quality.
To tune the LLM model with your chat history, firstly download MemoTrace. Log in to WeChat PC and migrate your WeChat data to it. Then export your chat histories to CSV files, and follow the same procedure as instructed in WeClone repository to train the model.
Please note that this repository is only for personal use. Do not copy or share.
- Migrate your WeChat data to your PC;
- Export your chat history to a .csv file through MemoTrace;
- Create a directory "data/csv/chat" and place the exported .csv file to the directory;
- Install the required python packages by
pip install -r requirements
; - Run
python make_dataset/csv_to_json.py
to convert the dataset to a json file; - Revise
default_prompt
in "src/template.py" to provide an identity for the model; - Run
python src/train_sft.py
to train the model (if you are in China mainland, you may want to download the model from ModelScope instead of HuggingFace; then you need to setmodel_name_or_path
toZhipuAI/chatglm3-6b
in "settings.json" and modify the environment variable byexport USE_MODELSCOPE_HUB=1
); - Run
python src/web_demo.py
to launch the chatbot.