This is a Dora program that uses the qwenvl2 model to realize the movement of the car. We want to use a camera installed on the car as the input of the large language model, and have the large language model output four instructions: forward, backward, turn left, and turn right to achieve autonomous driving of the car.
The mickrobot_chassis node is implemented based on C/C++. This node gets keyboard value and controls the movement of the chassis
-
w a d x: indicates front, back, left, and right
-
up, down, left, and right (direction keys): indicates front, back, left, and right
-
s indicates stop
-
Number keys 1、2、 3 indicate speed 1m/s 、2m/s、 3m/s
==Note:== When using this node, you need to turn on the remote control switch in the upper left corner, which means setting the chassis to automatic driving mode.(使用该节点时候需要打开左上角遥控器开关,设置到自动驾驶模式)。
Grant permissions to the serial port
sudo chmod 777 /dev/ttyUSB0
start mickrobot_chassis node with Dora
dora start mickrobot_dataflow.yml --name test
dora logs test mickrobot_chassis
This mickrobot_chassis node receives the json string stream from CmdVelTwist and obtains the following data in the json string to control the chassis of the car
j_cmd_vel["linear"]["x"];
j_cmd_vel["linear"]["y"];
j_cmd_vel["linear"]["z"];
j_cmd_vel["angular"]["x"];
j_cmd_vel["angular"]["y"];
j_cmd_vel["angular"]["z"];
At the same time, the mickrobot_chassis node will publish the chassis status (x speed, y speed, rotational angular velocity) at a frequency of 100Hz. The name of the published Json string data stream is "Odometry"
# publish Odometry Json string
# chassis position
j_odom_pub["pose"]["position"]["x"] = position_x;
j_odom_pub["pose"]["position"]["y"] = position_y;
j_odom_pub["pose"]["position"]["z"] = 0;
j_odom_pub["pose"]["orientation"]["x"] = 0;
j_odom_pub["pose"]["orientation"]["y"] = 0;
j_odom_pub["pose"]["orientation"]["z"] = 0;
j_odom_pub["pose"]["orientation"]["w"] = 1;
# chassis speed
j_odom_pub["twist"]["linear"]["x"] = linear_x;
j_odom_pub["twist"]["linear"]["y"] = linear_y;
j_odom_pub["twist"]["linear"]["z"] = 0;
j_odom_pub["twist"]["angular"]["x"] = 0;
j_odom_pub["twist"]["angular"]["y"] = 0;
j_odom_pub["twist"]["angular"]["z"] = linear_w;
First try:
dora up
dora build keyboard_teleop_no_recording.yml
dora start keyboard_teleop_no_recording.yml
# 2 Ctrl-C for stopping
This is going to start a keyboard teleoperation session without camera
# Clone llama factory
cd ..
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd ../dora_mickrobot
dora build qwenvl2_recorder.yml
dora start qwenvl2_recorder.yml
Within llama factory folder,
-
Install all dependency of llama factory
-
Modify
examples/train_lora/qwen2vl_lora_sft.yaml
so that the dataset is the one you want to use,
- dataset: mllm_demo,identity # video: mllm_video_demo
+ dataset: dora_demo_107,dora_demo_108,identity`
- You can also choose the 2B model instead of the 7B model with
- model_name_or_path: Qwen/Qwen2-VL-7B-Instruct
+ model_name_or_path: Qwen/Qwen2-VL-2B-Instruct
- Then start finetuning with:
llamafactory-cli train examples/train_lora/qwen2vl_lora_sft.yaml
You can then
# Clone llama factory
cd ..
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd ../dora_mickrobot
dora build qwenvl2.yml
dora start qwenvl2.yml
- For different camera devices, you need to modify the parameters of the camera node in the xxx.yml file
- The parameter LLAMA_FACTORY_ROOT_PATH in the yml file should give the absolute path