Have you ever comfirm controlling one drone with "rpm" using learn.py ? #180

paehal · 2023-11-28T02:04:18Z

Hello, it's been a long while.

I haven't touched this repository much lately, but I'm glad to see that there has been a lot of progress.

I have one question, and this is something I had trouble with in previous version.
Have you ever seen a configuration where "rpm" is sufficient for drone to learn its own policies instead of "one_d_rpm"?

I think "rpm" is still more difficult to control. However, I believe that "rpm" control is a necessary setting for a drone flying around in 3D space.

Best regards,

JacopoPan · 2023-11-28T11:06:40Z

Hi @paehal , I trained stable-baseline3 PPO to do hover with just RPMs (in the plus/minus 5% range of the hover value) back in 2020 without yaw control (as it wasn't penalized in the reward). I agree it's a more difficult RL problem and that's why the base RL aviary class includes simplified actions spaces for the 1D and the velocity control cases.

video-10.28.2020_09.45.37.mp4

This was a 4 layer architecture [256, 256, 256, 128, 2 shared 2 separate for qf and pol], with a 12 vector input [position, ori, vel, ang_vel] to 4 motor velocities (in the +-5% RPMs around the hover RPMs) after 8 hours and ~5M time steps (48Hz ctrl).

paehal · 2023-12-04T14:30:35Z

@JacopoPan

Thanks for the reply and sharing the video. Glad to hear that rpm control has been stable in the past.

I would like to do a study under the same conditions as yours in the latest repository, is that possible?

Here is what I am wondering.

Do I just run "python learn.py" with action type as rpm?
Do I need to set up a new action that does not control yaw?
Do I also need to change the reward settings?

JacopoPan · 2023-12-07T14:05:41Z

Do I just run "python learn.py" with action type as rpm?

yes

Do I need to set up a new action that does not control yaw?

no, the action will be a vector of size 4 with the desired RPMs (in fact a plus/minus 5% centered in the hover RPMs) of each motor

Do I also need to change the reward settings?

What is mainly different in the current HoverAviary is that the reward is always positive (instead of including negative penalties), it is only based on position (the result above also included a reward component based on the velocity) and the environment does not early terminate if the quadrotor flips or flies out of bound. It might be necessary to reintroduce some of those details.

paehal · 2023-12-08T08:50:31Z

the environment does not early terminate if the quadrotor flips or flies out of bound

Let me confirm. In latest repository, does the environment terminate if the quadrotor flips or flies out of bound?
If so, how to change the simulation setting?

JacopoPan · 2023-12-08T09:21:04Z

No, you can add that to the

gym-pybullet-drones/gym_pybullet_drones/envs/HoverAviary.py

Line 100 in 9a9ca8a

def _computeTruncated(self):

method

(FYI, that the reward achieved by a "successful" one-dimensional hover is ~470 (in 3' on my machine), I just tried training the 3D hover, as is, for ~30' and it stopped at a reward of ~250).

JacopoPan · 2023-12-08T14:39:28Z

Hi @paehal

I added back the truncation condition and trained this in ~10' (this is the current code in main)

RL.mp4

paehal · 2023-12-14T06:05:09Z

@JacopoPan
Thank you for your response, it was very informative. I tried training in a similar way and obtained the following results. (Although the training time was different, I believe the results are quite close to yours.)

Related to this, I have a question: how can I load a trained model in a different job and save a video of its performance? Even setting --record_video to True, the video is not being saved. Also, when I tried to load a different trained model with the following settings, targeting a model in a specified folder, an error occurred. Since I'm not familiar with stable_baseline3, I would appreciate if you could help me identify the cause.

if resume and os.path.isfile(filename+'/best_model.zip'): path = filename+'/best_model.zip' model = PPO.load(path) print("Resume Model Complete")

[Error content]
python3.10/site-packages/stable_baselines3/common/base_class.py", line 422, in _setup_learn
assert self.env is not None
AssertionError

In a previous version, there was something like test_learning.py, which, when executed, allowed me to verify the behavior in a video.

JacopoPan · 2023-12-14T07:12:04Z

The current version of script gym_pybullet_drones/examples/learn.py does include re-loading the model and rendering it's performance, you should be able do what you desire by modifying it (I would guess your error arises from not having initialized a PPO model with the target environment before loading the trained model but I haven't encountered it myself).

paehal · 2023-12-15T02:23:22Z

@JacopoPan

Quick response, thank you. I was able to understand what you were saying by carefully reading the code. I confirmed that the evaluation is working for the first time after training. I was able to achieve this by making some changes to the code since I wanted to run a pretrained model without retraining it.

Also, this is a different question, but (please let me know if it's better to create a separate issue), I believe that increasing the control_freq generally improves control (e.g., Hovering). So, here are the following questions:

Is control_freq the same as the frequency of obtaining observations?
Are there any key points that need to be changed as learning conditions when increasing control_freq? I think I probably need to increase gamma, but I'd like to know if there are any other adjustments I should make.

JacopoPan · 2023-12-15T03:22:01Z

Ctrl freq is both the frequency at which observations are produced and actions are taken by the environment.
(Sim freq is the frequency at which the PyBullet step is called, normally greater than ctrl freq).

The main thing to note is that the observation contains the actions of the last .5 seconds, so increasing the ctrl freq will increase the obs space.

paehal · 2023-12-15T04:16:55Z

Thank you for your reply.

Ctrl freq is both the frequency at which observations are produced and actions are taken by the environment.

My understanding aligns with this, which is great. Is it also correct to say that this PyBullet step is responsible for the actual physics simulation?

The main thing to note is that the observation contains the actions of the last .5 seconds, so increasing the ctrl freq will increase the obs space.

This corresponds to the following part in the code, right?
self.ACTION_BUFFER_SIZE = int(ctrl_freq//2)

I'm asking out of curiosity, but where did the idea of using actions from the last 0.5 seconds as observations come from? Was it from a paper or some other source?

Additionally, if I want to change the MLP network model when increasing ctrl_freq because the last buffer action becomes too large, would the following setup be appropriate? Have you had any experience with changing the MLP network structure in a similar situation?

# Define policy network
class CustomPolicy(ActorCriticPolicy):
    def __init__(self, *args, **kwargs):
        super(CustomPolicy, self).__init__(*args, **kwargs,
                                           net_arch=[256, 256])

# Make PPO model using policy network
model = PPO(CustomPolicy,
            DummyVecEnv([train_env]),
            verbose=1)

JacopoPan · 2023-12-15T08:18:28Z

The sim/pybullet frequency is the actual physics integration frequency, yes.

The idea of the action buffer is that the policy might be better guided by knowing what the controller had done just before, the proportionality to the control frequency makes it dependent on the wall-clock only, and not the type of controller (but it might be appropriate to change that, depending on application).

For custom SB3 policies, I can only refer you to the relative documentation https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html

I used different critic/actor network sizes in past SB3 versions but the current focus of this repo is having very few dependencies and compatibility with the simplest/most stock versions of them.

paehal · 2023-12-19T09:33:11Z

@JacopoPan
Thank you for your comment. I have tried several experiments since last week, but it seems that entering the actions taken in the previous step leads to unstable learning as a conclusion. Although I haven't fully learned the control at 240Hz yet, I plan to try out various conditions in the future. If I have any further questions, I will ask.

zcase · 2024-03-18T23:37:08Z

@JacopoPan how did you calculate the 470 for a "successful" training or value for a successful hover?

JacopoPan added the question Further information is requested label Nov 28, 2023

paehal closed this as completed Dec 4, 2023

paehal reopened this Dec 4, 2023

This was referenced Dec 8, 2023

learn.py, expected performance, steps, and hardware? #177

Open

fail in reproducing the result of hover #169

Open

paehal closed this as completed Dec 19, 2023

piratax007 mentioned this issue May 10, 2024

High frequency in RPMs when include action buffer in observation space can couse problems in real hardware #212

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have you ever comfirm controlling one drone with "rpm" using learn.py ? #180

Have you ever comfirm controlling one drone with "rpm" using learn.py ? #180

paehal commented Nov 28, 2023

JacopoPan commented Nov 28, 2023

paehal commented Dec 4, 2023

JacopoPan commented Dec 7, 2023

paehal commented Dec 8, 2023

JacopoPan commented Dec 8, 2023 •

edited

Loading

JacopoPan commented Dec 8, 2023

paehal commented Dec 14, 2023

JacopoPan commented Dec 14, 2023

paehal commented Dec 15, 2023

JacopoPan commented Dec 15, 2023

paehal commented Dec 15, 2023

JacopoPan commented Dec 15, 2023

paehal commented Dec 19, 2023

zcase commented Mar 18, 2024

Have you ever comfirm controlling one drone with "rpm" using learn.py ? #180

Have you ever comfirm controlling one drone with "rpm" using learn.py ? #180

Comments

paehal commented Nov 28, 2023

JacopoPan commented Nov 28, 2023

paehal commented Dec 4, 2023

JacopoPan commented Dec 7, 2023

paehal commented Dec 8, 2023

JacopoPan commented Dec 8, 2023 • edited Loading

JacopoPan commented Dec 8, 2023

paehal commented Dec 14, 2023

JacopoPan commented Dec 14, 2023

paehal commented Dec 15, 2023

JacopoPan commented Dec 15, 2023

paehal commented Dec 15, 2023

JacopoPan commented Dec 15, 2023

paehal commented Dec 19, 2023

zcase commented Mar 18, 2024

JacopoPan commented Dec 8, 2023 •

edited

Loading