We provide a simple inference script from HuggingFace in inference.py. You can begin by using this script, but it is not mandatory. To improve the inference time for the entire dataset, you can try batch inference, but for fair comparison, the maximum batch size is limited to 4. Finally, we will evaluate your method based on "time per sample" and "time of all samples." Note that the dataset only contains the instructions, so your speed-up method must produce the same output as the current inference.py.
Download the model from [https://pan.baidu.com/s/1ls7baMfok8NtucCKVOyS1Q] (code: 7952)
Download the dataset from [https://pan.baidu.com/s/1ruJAzxJrctRwAp8RLdsUgA] (code: u3k7)
Start your Efficient-Inference journey!