-
Generate a ChatRWKV weight file by
v2/convert_model.py
(in ChatRWKV repo) and strategycuda fp16
. -
Generate a faster-rwkv weight file by
tools/convert_weight.py
. For example,python3 tools/convert_weight.py RWKV-4-World-CHNtuned-1.5B-v1-20230620-ctx4096-converted-fp16.pth rwkv-4-1.5b-chntuned-fp16.fr
.
mkdir build
cd build
cmake -DFR_ENABLE_CUDA=ON -DCMAKE_BUILD_TYPE=Release -GNinja ..
ninja
./chat tokenizer_file_path weight_file_path "cuda fp16"
For example, ./chat ../tokenizer_model ../rwkv-4-1.5b-chntuned-fp16.fr "cuda fp16"
-
Generate a ChatRWKV weight file by
v2/convert_model.py
(in ChatRWKV repo) and strategycuda fp32
orcpu fp32
. Note that though we use fp32 here, the real dtype is determined is the following step. -
Generate a faster-rwkv weight file by
tools/convert_weight.py
. -
Export ncnn model by
./export_ncnn <input_faster_rwkv_model_path> <output_path_prefix>
. You can download pre-builtexport_ncnn
from Releases if you are a Linux users, or build it by yourself.
Download the pre-built Android AAR library from Releases, or run the aar/build_aar.sh
to build it by yourself.
For the path of Android NDK and toolchain file, please refer to Android NDK docs.
mkdir build
cd build
cmake -DFR_ENABLE_NCNN=ON -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28 -DANDROID_NDK=xxxx -DCMAKE_TOOLCHAIN_FILE=xxxx -DCMAKE_BUILD_TYPE=Release -GNinja ..
ninja
-
Copy
chat
into the Android phone (by using adb or Termux). -
Copy the tokenizer_model and the ncnn models (.param, .bin and .config) into the Android phone (by using adb or Termux).
-
Run
./chat tokenizer_model ncnn_models_basename "ncnn fp16"
in adb shell or Termux, for example, if the ncnn models are namedrwkv-4-chntuned-1.5b.param
,rwkv-4-chntuned-1.5b.bin
andrwkv-4-chntuned-1.5b.config
, the command should be./chat tokenizer_model rwkv-4-chntuned-1.5b "ncnn fp16"
.
-
Android System >= 9.0
-
RAM >= 4GB (for 1.5B model)
-
No hard requirement for CPU. More powerful = faster.
Run one of the following commands in Termux to download prebuilt executables and models automatically. The download script supports continuely downloading partially downloaded files, so feel free to Ctrl-C and restart it if the speed is too slow.
Executables, 1.5B CHNtuned int8 model, 1.5B CHNtuned int4 model and 0.1B world int8 model:
curl -L -s https://raw.githubusercontent.com/daquexian/faster-rwkv/master/download_binaries_and_models_termux.sh | bash -s 3
Executables, 1.5B CHNtuned int4 model and 0.1B world int8 model:
curl -L -s https://raw.githubusercontent.com/daquexian/faster-rwkv/master/download_binaries_and_models_termux.sh | bash -s 2
Executables and 0.1B world int8 model:
curl -L -s https://raw.githubusercontent.com/daquexian/faster-rwkv/master/download_binaries_and_models_termux.sh | bash -s 1
Executables only:
curl -L -s https://raw.githubusercontent.com/daquexian/faster-rwkv/master/download_binaries_and_models_termux.sh | bash -s 0
-
Install
rwkv2onnx
python package bypip install rwkv2onnx
. -
Run
rwkv2onnx <input path> <output path> <ChatRWKV path>
. For example,rwkv2onnx ~/RWKV-5-World-0.1B-v1-20230803-ctx4096.pth ~/RWKV-5-0.1B.onnx ~/ChatRWKV
- JNI
- v5 models support (models are published at https://huggingface.co/daquexian/fr-models/tree/main)
- ABC music models support (models are published at https://huggingface.co/daquexian/fr-models/tree/main)
- CI
- ARM NEON int8 (~2x speedup compared to fp16)
- ARM NEON int4 (>2x speedup compared to fp16)
- MIDI music models support
- custom initial state support
- export ONNX
- seq mode
- CUDA
- Others
- Raven models support
- more backends..
- simplify model convertion