manual_control_setpoint split switches into new message manual_control_switches #9712

dagar · 2018-06-18T23:26:20Z

This PR splits the manual_control_setpoint switch positions into a new message manual_control_switches. Then with the switches contained to a single message simple logic was added to require 2 identical consecutive frames before publishing a change (crude debounce). Additionally the message only publishes on change.

This was originally done in response to a crash back in June (#9595) the test flight team experienced where the kill switch was activated briefly in flight. The log didn't contain the actual switch change.

It's rare, but I still receive reports that can be attributed to blips in RC. Things like mid-air disarming (with RC arm on switch), uncommanded mode change, etc.

Future work

Refactor commander to only evaluate RC switches for mode changes and arming when this uORB message updates.

santiago3dr · 2018-10-09T20:01:00Z

couple flights on the Pixracer (V4)
https://logs.px4.io/plot_app?log=f6cf001b-2bef-496b-bfd0-46cc5586e544
https://logs.px4.io/plot_app?log=0624efc7-4f1c-4f35-a7bb-f2fa543af873

manual flights going through stabilized, altitude and position modes; no issues

bkueng · 2018-10-10T08:02:23Z

Then with the switches contained to a single message simple logic was added to require 2 identical consecutive frames before publishing a change (crude debounce).

Instead of (or in addition of) adding layers/logic on top, which may or may not help, I'd like a more structured approach to resolve this. There's no question that we need absolutely reliable RC.

Questions to answer:

are the glitches only a single frame or multiple?
and related: by how much do we lower the probability of things going wrong with such a change here?

Information that should be collected from affected setups:

What Pixhawk is used?
What exact receiver RC model is used, what protocol and preferable which firmware on it?
How is the wiring done? A picture helps
Does it happen under specific situations?

What I would like to see is a hardware setup that runs 24/7, checks for glitches and records all raw RC data. I offer to help implementing that.

dagar · 2018-10-15T13:50:55Z

Unfortunately all we have to go on is incomplete logs and anecdata. My hunch is many of the blips can ultimately be explained by less than perfect wiring, but I'd certainly like to know for sure.

We could keep most of this PR other than the line requiring 2 consecutive identical switch messages. Then at least we'd have all of the switch changes logged.

dagar · 2018-10-15T16:04:17Z

Crude debounce removed and rebased on master.
@bkueng could you review?

dagar · 2018-10-15T16:30:48Z

Flash limit hit again.

LorenzMeier · 2018-10-15T20:18:40Z

@dagar I’ve come to the conclusion to deprecate v2 at this point? Do a maintenance Release and drop it along with bootloader update instructions for HW that is v3 but with an old bootloader. Time to move on!

bkueng

I'm a bit on the fence here. I see the advantages (commander update only on mode switch, log all RC switches (but we cannot infer that they were due to single RC glitches)), but also disadvantages (generally more system overhead due to splitting into small messages and flight review breakage).

src/modules/sensors/rc_update.cpp

dagar · 2018-10-16T12:45:55Z

I'm a bit on the fence here. I see the advantages (commander update only on mode switch, log all RC switches (but we cannot infer that they were due to single RC glitches)), but also disadvantages (generally more system overhead due to splitting into small messages and flight review breakage).

No we can't infer the switch was due to an RC glitch, but at least we can know it even changed. In crashes like #9595 it's difficult to be completely certain RC is to blame. It's obviously quite likely, but we can't completely rule out other subtle errors in commander because we don't have the data.

Taking a step back I see this as a fundamental split applicable to several areas throughout PX4. We have a mix of continuously changing data (50-100 Hz), and data that rarely changes (perhaps 10 times total). I'd like to deduplicate it as soon as possible to remove the burden from consumers and decrease our logging requirements while actually improving our ability to review flights. I'm all too familiar with the current uORB overhead at various levels, but there are solutions to that as well.

dagar · 2018-10-16T12:57:44Z

@dagar I’ve come to the conclusion to deprecate v2 at this point? Do a maintenance Release and drop it along with bootloader update instructions for HW that is v3 but with an old bootloader. Time to move on!

@LorenzMeier I have one idea for a relatively small push we could make that would reduce a lot of bloat while also simplifying things. Once we eventually burn through that I think considering deprecation is reasonable, but at the moment the amount of duplicated code is kind of egregious regardless of hitting a flash limit.

bkueng

In crashes like #9595 it's difficult to be completely certain RC is to blame. It's obviously quite likely, but we can't completely rule out other subtle errors in commander because we don't have the data.

What about adding a 'state switch reason' to one of the commander outputs?

Taking a step back I see this as a fundamental split applicable to several areas throughout PX4. We have a mix of continuously changing data (50-100 Hz), and data that rarely changes (perhaps 10 times total). I'd like to deduplicate it as soon as possible to remove the burden from consumers and decrease our logging requirements while actually improving our ability to review flights. I'm all too familiar with the current uORB overhead at various levels, but there are solutions to that as well.

That is all good, and I raised it now because I want to avoid running into issues first. The topic size reduction is 6 bytes, which is less than 10%, so the effect on the log is marginal.

msg/manual_control_switches.msg

src/modules/vtol_att_control/vtol_att_control_main.h

dagar · 2018-11-09T20:45:54Z

What about adding a 'state switch reason' to one of the commander outputs?

That's perhaps something to think about for the state machines in general, and definitely something we still need to improve for rejections.

That is all good, and I raised it now because I want to avoid running into issues first. The topic size reduction is 6 bytes, which is less than 10%, so the effect on the log is marginal.

The point is to be able to be able to capture 100% of the relevant changes (instead of a 5-10% sampling), the tiny reduction in log size is a distant secondary benefit.

bkueng · 2018-11-12T07:06:46Z

The point is to be able to be able to capture 100% of the relevant changes (instead of a 5-10% sampling), the tiny reduction in log size is a distant secondary benefit.

I generally agree with that, but taking occational dropouts into account, the sampling is still better.

dagar · 2018-11-12T14:47:32Z

Going back to where we started (#9595 crash) I'm reasonably certain there was a blip in RC that changed the switch interpretations, then immediately back without anything being captured in the log.

I'm proposing this change so that we can potentially capture that behaviour in real testing, then consider adding a simple debounce if appropriate.

Can we move forward with this?

bkueng · 2018-11-12T15:33:24Z

Going back to where we started (#9595 crash) I'm reasonably certain there was a blip in RC that changed the switch interpretations, then immediately back without anything being captured in the log.

The change here has too many side-effects to me, without adding much benefit. This is why I suggested adding the state switching reason to commander instead. Do you think it's not enough?
I still think the better way to debug the original issue would be on the bench, then we can capture all the data we receive. A while ago I had a similar problem (random RC glitches), and I noticed the RC was not plugged in properly, so I think we can actually reproduce similar behavior.

dagar · 2018-11-12T17:28:59Z

Additional commander logging is usually a good thing, but switches are evaluated in multiple different places within commander (main state, arm/disarm, failsafe escape, kill) as well as separately in VTOL for transitions and MC for landing gear.

I get semi-regular reports from people with issues (sometimes fatal) that sound like they might be similar RC glitches, but we don't have the data to know. Reproducing the error on the bench can be out of reach when helping someone remotely or when their vehicle has already been destroyed. Splitting data structures along these lines is easy, allows us to optimize px4 architecture later, and makes it significantly more likely we'll already have the necessary data from normal real world logs when it matters.

bkueng · 2018-11-13T09:46:12Z

The problem with this approach is that it will not allow as to do deeper changes on the RC parsing level.
If it's only about gathering data specifically for such cases, I suggest we limit the changes to do exactly that (i.e. leave the existing messages unchanged). This will allow us to do a better decision once we actually have data.

In general you simply have not convinced me for the changes in this PR. I'll need more/better arguments to be convinced.

dagar · 2018-11-13T14:13:59Z

The problem with this approach is that it will not allow as to do deeper changes on the RC parsing level.

How so? Again all this PR is currently doing is deduplicating data. If you're concerned about the potential next step of debouncing at this level it would be easy and cheap to additionally log the original version without debounce.

The other thing to note is that this isn't a compromise for the sake of logging and debugging. The manual control setpoint and switch positions are different things used by different parts of the system. We're not losing anything by splitting on these boundaries.

If it's only about gathering data specifically for such cases, I suggest we limit the changes to do exactly that (i.e. leave the existing messages unchanged). This will allow us to do a better decision once we actually have data.

I'm not really sure what you're suggesting, we can't log the original message at full rate by default, and we can't necessarily ask people that have crashed to try again. Please try to keep in mind we've still done nothing to address the original issue (#9595).

bkueng · 2018-11-15T13:11:14Z

How so? Again all this PR is currently doing is deduplicating data.

Not the PR, but the approach you are taking. All you can do by logging at that level is adding things like debouncing, whereas we should aim at resolving it at the RC parsing level (I'm not saying that it will be possible for sure).

The other thing to note is that this isn't a compromise for the sake of logging and debugging.

As I wrote above, the current PR is worse in terms of log analysis because of potential dropouts. The situation would already be better if the switches were published at least once a second.

The manual control setpoint and switch positions are different things used by different parts of the system.

Ok, let's look at this:

Both are used by:
- mavlink
- commander
- mc_pos_control
only sticks:
- vmount
- mc_acc_control
- fw_att_control
- gnd_att_control
- fw_pos_control
only switches:
- vtol_att_control

So only a single module only requires the switches, and several use both. I see the logical reason for splitting the message, but there's simply no bigger benefit, and it creates churn.

We're not losing anything by splitting on these boundaries.

It's not about losing anything. If we split up the system into too many small messages, we end up with a more complex and bloated solution. Surely we have/had cases where splitting is the obivous thing to do as messages contained too many different things, but each case needs to be evaluated with care.

I'm not really sure what you're suggesting, we can't log the original message at full rate by default, and we can't necessarily ask people that have crashed to try again.

Of course not. My suggestion was to add something to the log in case there is a single-frame switch change (or 2 or 3 for that matter), for example a printf warning.

dagar · 2018-11-15T17:15:42Z

We can keep going back and forth spending a disproportionate amount of time on this fairly mundane change, but the bottom line is that we need to set something in motion that will help us get to a real solution later. I'm proposing a small, safe change that will give us some insight to act on when these things occur. If you disagree then please open an alternative or propose a concrete path forward.

Yesterday I was talking to another user that had the kill switch falsely trigger in air briefly. No "glitch" found in input_rc, rc_channels (not logged), or manual_control_setpoint data and RC otherwise "seemed" fine. They disabled the kill switch, lost a little faith in our platform, and kept going. We can't keep ignoring these problems.

dagar · 2020-01-22T04:53:49Z

Rebased on master.

stale · 2020-07-25T04:24:24Z

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

dagar mentioned this pull request Oct 8, 2018

Bug: commander kill switch require 2 consecutive manual setpoints #9704

Closed

dagar force-pushed the pr-manual_control_setpoint-split branch from 0a6aa5a to 80b926a Compare October 8, 2018 15:59

dagar changed the title ~~[DO NOT MERGE] manual_control_setpoint split~~ manual_control_setpoint split switches into new message manual_control_switches Oct 8, 2018

dagar force-pushed the pr-manual_control_setpoint-split branch 2 times, most recently from 8312fcb to 0c5089d Compare October 8, 2018 16:54

dagar requested a review from a team October 9, 2018 02:32

dagar added the Admin: Enhancement (improvement) 💡 label Oct 9, 2018

dagar requested a review from bkueng October 9, 2018 13:57

dagar force-pushed the pr-manual_control_setpoint-split branch from 0c5089d to e571ff7 Compare October 15, 2018 16:02

bkueng reviewed Oct 16, 2018

View reviewed changes

src/modules/sensors/rc_update.cpp Outdated Show resolved Hide resolved

bkueng reviewed Oct 17, 2018

View reviewed changes

msg/manual_control_switches.msg Outdated Show resolved Hide resolved

src/modules/vtol_att_control/vtol_att_control_main.h Outdated Show resolved Hide resolved

dagar force-pushed the pr-manual_control_setpoint-split branch from 4979610 to a233853 Compare November 9, 2018 20:25

dagar force-pushed the pr-manual_control_setpoint-split branch from a233853 to 58ca913 Compare November 13, 2018 15:07

weekly-digest bot mentioned this pull request Mar 3, 2019

Weekly Digest (24 February, 2019 - 3 March, 2019) #11577

Closed

dagar mentioned this pull request May 20, 2019

Simultaneous RC and Virtual Joystick Control #12050

Open

stale bot added the Admin: Wont fix label Jul 10, 2019

PX4 deleted a comment from stale bot Jul 10, 2019

stale bot removed the Admin: Wont fix label Jul 10, 2019

weekly-digest bot mentioned this pull request Jul 14, 2019

Weekly Digest (7 July, 2019 - 14 July, 2019) #12474

Closed

dagar mentioned this pull request Jul 29, 2019

manual_contol_setpoint: fix mode slot numbering #12578

Merged

dagar force-pushed the pr-manual_control_setpoint-split branch 3 times, most recently from 0861ec7 to 225f8ee Compare August 1, 2019 16:36

stale bot added the stale label Oct 30, 2019

dagar modified the milestones: Release v1.9.0, Release v1.11.0 Dec 27, 2019

stale bot removed the stale label Dec 27, 2019

dagar force-pushed the pr-manual_control_setpoint-split branch from 225f8ee to 3d7949c Compare December 27, 2019 19:44

dagar requested a review from MaEtUgR December 27, 2019 19:45

weekly-digest bot mentioned this pull request Dec 29, 2019

Weekly Digest (22 December, 2019 - 29 December, 2019) #13804

Closed

manual_control_setpoint split switches into new manual_control_switches

01d2296

dagar force-pushed the pr-manual_control_setpoint-split branch from 3d7949c to 01d2296 Compare January 22, 2020 02:27

PX4 deleted a comment from stale bot Jan 22, 2020

stale bot added the stale label Apr 23, 2020

PX4 deleted a comment from stale bot Apr 23, 2020

stale bot removed the stale label Apr 23, 2020

weekly-digest bot mentioned this pull request Apr 26, 2020

Weekly Digest (19 April, 2020 - 26 April, 2020) #14761

Closed

stale bot added the stale label Jul 25, 2020

weekly-digest bot mentioned this pull request Jul 26, 2020

Weekly Digest (19 July, 2020 - 26 July, 2020) #15419

Closed

dagar closed this Aug 6, 2020

LorenzMeier deleted the pr-manual_control_setpoint-split branch January 18, 2021 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

manual_control_setpoint split switches into new message manual_control_switches #9712

manual_control_setpoint split switches into new message manual_control_switches #9712

dagar commented Jun 18, 2018 •

edited by AuterionWrikeBot

Loading

santiago3dr commented Oct 9, 2018

bkueng commented Oct 10, 2018

dagar commented Oct 15, 2018 •

edited

Loading

dagar commented Oct 15, 2018

dagar commented Oct 15, 2018

LorenzMeier commented Oct 15, 2018 •

edited by dagar

Loading

bkueng left a comment

dagar commented Oct 16, 2018

dagar commented Oct 16, 2018

bkueng left a comment

dagar commented Nov 9, 2018

bkueng commented Nov 12, 2018

dagar commented Nov 12, 2018

bkueng commented Nov 12, 2018

dagar commented Nov 12, 2018

bkueng commented Nov 13, 2018

dagar commented Nov 13, 2018

bkueng commented Nov 15, 2018

dagar commented Nov 15, 2018

dagar commented Jan 22, 2020

stale bot commented Jul 25, 2020

manual_control_setpoint split switches into new message manual_control_switches #9712

manual_control_setpoint split switches into new message manual_control_switches #9712

Conversation

dagar commented Jun 18, 2018 • edited by AuterionWrikeBot Loading

Future work

santiago3dr commented Oct 9, 2018

bkueng commented Oct 10, 2018

dagar commented Oct 15, 2018 • edited Loading

dagar commented Oct 15, 2018

dagar commented Oct 15, 2018

LorenzMeier commented Oct 15, 2018 • edited by dagar Loading

bkueng left a comment

Choose a reason for hiding this comment

dagar commented Oct 16, 2018

dagar commented Oct 16, 2018

bkueng left a comment

Choose a reason for hiding this comment

dagar commented Nov 9, 2018

bkueng commented Nov 12, 2018

dagar commented Nov 12, 2018

bkueng commented Nov 12, 2018

dagar commented Nov 12, 2018

bkueng commented Nov 13, 2018

dagar commented Nov 13, 2018

bkueng commented Nov 15, 2018

dagar commented Nov 15, 2018

dagar commented Jan 22, 2020

stale bot commented Jul 25, 2020

dagar commented Jun 18, 2018 •

edited by AuterionWrikeBot

Loading

dagar commented Oct 15, 2018 •

edited

Loading

LorenzMeier commented Oct 15, 2018 •

edited by dagar

Loading