Skip to content

Latest commit

 

History

History

llama_chinese

Llama3 -> Cross-Linguistic Adaptation

Basic Information

Setup Details

  • Accelerator: NVIDIA RTX 4090D $\times$ 1
  • Platform: Linux
  • Internet: Enabled

Model and Resources

Additional Information

Important

Deployment-related updates will not be posted here. For detailed deployment updates, please refer to our repository: llama-ops.

  • How to use:

    • Install dependencies such as torch, transformers, modelscope, etc.

    • Prepare:

      $ source ./prepare.sh
    • Execute Lora training:

      $ source ./train.sh

      This step takes several hours. Please be patient as the outcomes are well worth the wait.

    • Merge the trained adapter with Llama3:

      $ source ./merge.sh
  • Performance comparison

    • Before LoRa:

      $ python3 inference.py

      Q: 你好,你是谁?

      A: 😊 Ni Hao! I'm a helpful assistant, nice to meet you! I'm here to assist you with any questions, tasks, or topics you'd like to discuss. I'm a language model trained to understand and respond to human language, so feel free to ask me anything! 💬

    • After LoRa:

      $ python3 inference.py --model_dir Meta-Llama-3-8B-Instruct-zh-10k

      Q: 你好,你是谁?

      A: 你好!我是一个人工智能助手,我的名字叫做AI助手。

  • llama.cpp: Quantization

    • Prepare:

      $ source ./quantize_prepare.sh
    • Quantize:

      $ source ./quantize.sh
    • Test:

      $ source ./quantize_test.sh

      Terminate the process using Ctrl or Control plus C.

  • llama.cpp: Deployment

    • Deploy

      • Method 1: Command line

        $ source ./deploy_cli.sh

        Simlarly, kill the process using Ctrl or Control plus C.

      • Method 2: Docker (Untested)

        $ source ./deploy_docker.sh
    • Test

      $ source ./deploy_test.sh