Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm and disarm causes MATEKH743 to lose connection with radio in INAV 5.1.0 #8409

Closed
vlamgat007 opened this issue Sep 20, 2022 · 134 comments
Closed
Milestone

Comments

@vlamgat007
Copy link

vlamgat007 commented Sep 20, 2022

Current Behavior

I am in the process of upgrading a Matek H743-WING V2 Flight Controller + XF Nano Diversity setup from INAV 3.0 to INAV 5.1 and am running into an issue. When I arm and disarm the system, I lose all radio control to the board (in the Receiver tab the bars stop moving). I need to reboot the board to gain control. It does not seem like any of the other switches cause the same behavior. Firmware for the XF components (RX + VTX) have also been updated.

(FYI....I have successfully completed a similar upgrade for a different plane with a similar setup using Matek F405-WSE Controller + XF Nano and did not experience the same problem)

Steps to Reproduce

  1. Switch on Radio
  2. Power up the plane (plug in battery) (the issue exists with and without battery connected)
  3. Connect to board with PC via USB
  4. Connect to board using INAV 5.1.0
  5. Verify that Receiver is connected (all control surfaces moving and bars moving in Receiver tab)
  6. Wait for the system to be ready to arm
  7. When ready to arm, flip the arm switch
  8. Disarm
  9. At this point the board stops responding to the radio (bars not moving in the Receiver tab and control surfaces on the plane not moving)
  10. Reboot the board to regain control

Additional context: dump all

https://pastebin.com/uVBzcK75


version

INAV/MATEKH743 5.1.0 Aug 19 2022 / 12:34:02 (76f22b2)

GCC-10.2.1 20201103 (release)

  • FC Board name and vendor: Matek H743-WING V2 Flight Controller
  • INAV version string: 5.1.0
@vlamgat007
Copy link
Author

vlamgat007 commented Sep 21, 2022

Tested Standard and DSHOT150 protocols. Using Spedix IS45 2-6S LiPo DShot BLHeli_S 45A ESC. INAV does not lose Radio comms if I test the motor using Outputs tab. Motor runs as expected.

Also tested without battery connected and the issue still occurs.

Red failsafe indicator does not light up when error happens. It does light up if I do not trigger the error and switch off my radio.

Even though I have no control, data still coming through from plane to INAV LUA screen on radio.

@b14ckyy
Copy link
Collaborator

b14ckyy commented Sep 21, 2022

what crossfire FW version do you use? There are some older versions with odd behavior like this.

@vlamgat007
Copy link
Author

Nano Div. RX - v6.19
Micro TX - v6.19
Unify Pro32 HV - v1.15
Agent X - v4.2.0

@b14ckyy
Copy link
Collaborator

b14ckyy commented Sep 21, 2022

hmmm 6.19 should be stable. did you do a factory reset of your crossfire receiver and module after the last update? If you update from 6.14 or earlier you have to do a factory reset. Maybe try that and then see if the issue still appears.

if yes, I suggest to reset the FC to defautls and just set up the basics and see if this solves the issue. Save a DIFF ALL before that.

@vlamgat007
Copy link
Author

I did do a factory reset of the TX but not sure about the RX. I will test this later today and post my results.
Thanks!

@b14ckyy
Copy link
Collaborator

b14ckyy commented Sep 21, 2022

Rx you should reset first over the crossfire settings LUA. because you will lose binding and doing it last will be more difficult

@vlamgat007
Copy link
Author

Thanks, will do.

@vlamgat007
Copy link
Author

vlamgat007 commented Sep 22, 2022

Ok, I think I have narrowed down the issue. The RX and TX reset did not fix the issue. As suggested I then reset the defaults on the board and started from scratch. I set up only one mode for arming on Channel 5.
Every time I made a config change and saved it then tested (are + disarm) and verified if I still had radio connection. I continued till I hit the "Continuously trim servos on Fixed Wing" setting. As soon as I saved this and went through the test cycle, it froze up, and when I removed the setting the issue disappeared. I tested this multiple times to confirm.

I have not completed the setup, adding the rest of the modes etc. I will wait to hear back before making more changes.

Thanks.

@b14ckyy
Copy link
Collaborator

b14ckyy commented Sep 22, 2022

this is really strange. can you please post a DIFF of your current config right at this point where enabling or disabling this setting causes the lockup?

@breadoven
Copy link
Collaborator

breadoven commented Sep 22, 2022

Might have something to do with it, gets called when autotrim saves eeprom on disarm ?

void writeEEPROM(void)

@b14ckyy
Copy link
Collaborator

b14ckyy commented Sep 22, 2022

Possible. But why does this block his RC inputs without commanding a failsafe?

@vlamgat007 do you use a switch to arm and disarm or do you disarm with a stick command?

Anyway. I was hoping that @DzikuVx would remove the auto-save for the continous servo trim as well, when he removed it for the bootup gyro calibration. After a one time trim save with stick I see no point in saving it after every flight with minimal changes. But that's stuff for a different discussion.

@breadoven
Copy link
Collaborator

breadoven commented Sep 22, 2022

It looks like it suspends the Rx signal for a fixed 1.5s period (SKIP_RC_ON_SUSPEND_PERIOD in rx.c) before resuming the Rx signal regardless of whether or not eeprom has been saved ... which doesn't make sense. I'm guessing this isn't long enough for 5.1 which takes longer to save to eeprom. Perhaps 1.5s is right on the limit for some boards and not others, a timing issue. Although it would make more sense not to have a fixed time period but simply reset the suspend time when eeprom write has finished.

@breadoven
Copy link
Collaborator

breadoven commented Sep 22, 2022

Having said that surely if it's writing to eeprom it shouldn't be doing anything else should it, like try to process Rx signals ?

@stronnag ?

@MrD-RC
Copy link
Collaborator

MrD-RC commented Sep 22, 2022

@vlamgat007 do you use a switch to arm and disarm or do you disarm with a stick command?

There is no stick command to disarm anymore. I think it was even gone in 3.0.

@vlamgat007
Copy link
Author

Diff all at point of failure: https://pastebin.com/8YEMD5Bg

Please note the Failsafe triggered but I have had multiple times when there was no failsafe but I had no control.

@vlamgat007
Copy link
Author

@vlamgat007 do you use a switch to arm and disarm or do you disarm with a stick command?

There is no stick command to disarm anymore. I think it was even gone in 3.0.

I use a switch on channel 5 to arm and disarm.

@vlamgat007
Copy link
Author

diff all with no failsafe: https://pastebin.com/EDA0WcY8

@breadoven
Copy link
Collaborator

Failsafe is suspended during eeprom save so it doesn't trigger when the Rx is suspended. The fact that Failsafe behaves erratically during this problem might indicate there is an issue with Rx suspend during eeprom save. Be useful to know why the Rx is suspended during eeprom save, not immediately obvious.

Also wondering if this has anything to do with #7128. The only time I had a Config wipe was on a plane that was connected to Configurator and powered from the battery when it happened. The Rx only works on that plane when on battery power not USB.

@vlamgat007
Copy link
Author

I have tried with battery connected and without (usb only) and had the same result.

@breadoven
Copy link
Collaborator

Seems the Rx suspend during eeprom write is to do with 3a13edf.

@breadoven
Copy link
Collaborator

@vlamgat007 Can you try the attached firmware. It's the current master with Rx suspend time increased to 3 seconds. You'll need to use the latest 6.0 Configurator which can be found at http://seyrsnys.myzen.co.uk/inav-configurator-next/.

inav_6.0.0_MATEKH743.hex.zip

@0crap
Copy link
Contributor

0crap commented Sep 23, 2022

@vlamgat007 If you want to stick with 5.1 you might also want to try my 5.1 build without auto ContinuousServoAutotrim saving.

@vlamgat007
Copy link
Author

Thank you @breadoven and @0crap! I like having ContinuousServoAutotrim so I will try v6.0.0 first.
I am tied up with work so I will get to this later today.
Thanks!

@0crap
Copy link
Contributor

0crap commented Sep 23, 2022

@vlamgat007 Try whatever you like, but please be assured that in my 5.1 build ContinuousServoAutotrim is fully functional.
Only it will not auto save (which seems to be your issue) the new values after you land and disarm. You can still save the new values if you wish, by going into the OSD and choose save and reboot.

@vlamgat007
Copy link
Author

@0crap, thanks for the clarification!

@vlamgat007
Copy link
Author

Success! I loaded 6.0 and flashed with the 6.0.0 firmware provided by @breadoven. I updated using the last Diff All that I posted and then tested the arm and disarm.

There is a bit of a longer delay before you get control back after flipping the disarm switch but that seems like a minor issue.

I completed the rest of the setup and everything seems to be working as expected.

BIG THANKS to everyone for your support on this!!

Let me know if you have any further special instructions. I take it that I need to keep my Matek F405-WSE Controller on 5.1.0?

@breadoven
Copy link
Collaborator

@vlamgat007 Just remember the 6.0 firmware is dev standard so could have issues, not that I've had any problems. Other FC boards are fine on 5.1.0 if they work OK.

So it appears the default Rx suspend time setting during eeprom write is probably causing problems but it's not clear why it only seems to happen with Auto Trim on disarm but not with other eeprom writes such as sensor detection and Accelerometer calibration on boot up. I assume nothing odd happened previously with 5.1.0 during accelerometer calibration on boot, completed as normal with the beeper confirmation at the end ?

@b14ckyy
Copy link
Collaborator

b14ckyy commented Sep 24, 2022

@breadoven maybe because when disarming, the actual eeprom write is done first before the actual disarm happens. So all radio inputs are still expected to be valid during that time. All other eeprom saves happen only in already disarmed state.

@vlamgat007
Copy link
Author

I just came back from flying the plane after the changes.

Bad news is that it froze up when i disarmed after I landed. Good news is that this happened after it was safely on the ground.

@breadoven
Copy link
Collaborator

@breadoven maybe because when disarming, the actual eeprom write is done first before the actual disarm happens. So all radio inputs are still expected to be valid during that time. All other eeprom saves happen only in already disarmed state.

It waits until the Arming flag is disarmed before saving. Maybe the problem is Stats also get saved at the same time the Arming flag changes state. 2 eeprom writes in succession. Maybe this needs changing so you only write once on disarm.

@zvikaf
Copy link

zvikaf commented Nov 3, 2022

Just to clear things and amplify my understanding :
(1) writing to the flash memory made TBS ( /LRS ?) communication hang, did this phenomena accrue with other radio links ?
(2) seems to happened only at the 743 family of controllers, did it happened on other families ?
TIA

@breadoven
Copy link
Collaborator

breadoven commented Nov 3, 2022

Makes you wonder what happens if you turn Stats ON ?

@breadoven I set stats = ON and it is freezing up again. This is with the firmware provided by @0crap on the matek h743-wing v2 board.

Weird stuff. I bench tested with the stats = ON setting on my build and no issues. (MATEKH743 V2) That said, it's not using the diff all from vlamgat007, just my own fully configured diff all setup. After a arm and disarm the FC stays responsive. Because autolaunch for fixed wing is always on, the arm command gives the "raise throttle" indication in the OSD. At that point I disarm and do this over and over, no issues. (Stats screen shows up on OSD.)

@0crap Looking back through this stuff I noticed your post which indicated your H743 FC wasn't affected by the eeprom write problem. However, checking the code it seems there is a 10 second minimum arming time before STATS is saved on disarm. Do you think you exceeded this 10s limit when you tested it ?

One possibility here to help debugging is a custom firmware with debugging added to the Rx/Failsafe code to try and work out what the signal is doing when it locks up. However, given this happens when disarmed there won't be any log with the debug info so the debugging output would need to be taken from the Configurator sensors tab or recorded from the OSD (probably not DJI unless the WTFOS hack is used ?).

@breadoven
Copy link
Collaborator

The attached firmware includes debugging on the Rx and Failsafe side as follows:

0 - are flight channels valid (10 = invalid, 11 = valid)
1 - is Rx signal received (10 = no, 11 = yes)
2 - is Failsafe receiving Rx data (10 = no, 11 = yes)
3 - time in us entering Rx/Failsafe checking function
4 - time delta between Rx update checks (rxUpdateCheck)
5 - position indication in rxUpdateCheck related to Rx frame status
6 - checks if suspendRxSignal (value 10) or resumeRxSignal (value 11) functions run during eeprom save

set debug_mode = ALWAYS and debug values should be shown in the Configurator Sensor tab or available on the OSD if you know how to set it up.
inav_6.0.0_MATEKH743.hex.zip

@vlamgat007
Copy link
Author

Thank you @breadoven I will load and test this evening.

@0crap
Copy link
Contributor

0crap commented Nov 4, 2022

@0crap Looking back through this stuff I noticed your post which indicated your H743 FC wasn't affected by the eeprom write problem. However, checking the code it seems there is a 10 second minimum arming time before STATS is saved on disarm. Do you think you exceeded this 10s limit when you tested it ?

Probably not, 10 sec is a long time if you have to wait for it. :-)
So I did a bench retest with my own build.
STATS is set to ON and armed for more then 10 seconds on the bench.
After disarm all stays responsive, no issues. Repeated multiple times.
(If it matters, the slider to always enable autolaunch is ON, but armed is armed, motor was spinning at idle rpm.)
Using a EP1 on ELRS v3.0

All this was tested NOT connected to the INAV Configurator.

@vlamgat007
Copy link
Author

vlamgat007 commented Nov 5, 2022

@vlamgat007 it does seem like 2 issues have been mixed here. The original issue may be fixed with #8439. Did you manage to try the firmware @breadoven linked to here?

@MrD-RC and @breadoven I tested the firmware quoted above and I consistently trigger failsafe. Battery and usb irrespective. Exact same behavior as originally reported.

I will now reload the debug version from @breadoven

@vlamgat007
Copy link
Author

@breadoven when I load the debug firmware version with 6.0.0 fp2 I loose connection to the sensors. See the screenshot.
No Sensors

@vlamgat007
Copy link
Author

@breadoven and @MrD-RC Seems for now I will have to revert to the 5.1 firmware from @0crap from Sep 23 comment:

If you want to stick with 5.1 you might also want to try my 5.1 build without auto ContinuousServoAutotrim saving.

@b14ckyy
Copy link
Collaborator

b14ckyy commented Nov 5, 2022

@vlamgat007
that's expected. There where changes in INAV in the latest master that also need a Configurator update. Get the last nightly configurator from here and all will be okay again http://seyrsnys.myzen.co.uk/inav-configurator-next/

@breadoven
Copy link
Collaborator

Nobody have any luck using the "debugging" firmware listed above ?

@vlamgat007
Copy link
Author

vlamgat007 commented Nov 11, 2022

I loaded it with 6.0 and lost connection with all my sensors. I then reverted and did not try again. I have been told to use the latest "nightly version" of configurator which should fix the issue. I have not got around to it. I will see if I can do it today.

@vlamgat007
Copy link
Author

vlamgat007 commented Nov 12, 2022

set debug_mode = ALWAYS

Nobody have any luck using the "debugging" firmware listed above ?

@breadoven here is an update:

I removed the crossfire TX, RX, and VTX and switched my system to ELRS. I am now running a RMRC 1.3ghz vtx. Just mentioning these changed parameters.

I could not trigger a failsafe. Attached are the debug sensor logs.

Debug 1
Debug 2
Debug 3

@breadoven
Copy link
Collaborator

Thanks for that @vlamgat007 although not so useful since there was no failsafe so everything behaved as expected. And this would appear to be because you've switched RC gear albeit to ELRS which also uses CRSF ... strange it now seems to be behaving.

I assume there was a disarm or eeprom save where debug 6 changed value @ around 135 s ?

Also what settings did you change switching to ELRS ?

@vitaly-rudenya
Copy link

From my side I've tried on Clean setup (after chip clean reflash).
UART 6 was configures as RX
On Radio tab: Serial CRSF
No other configurations were made and no other devices connected. Only board itslef was connected to PC through USB

@OptimusTi
Copy link
Contributor

Just tested MATEKH743 INAV 5.1 and 6.0 master. No issues with TBS CRSF 6.19.

@mrbigglesw0rth
Copy link

I have a similar issue, but with a different board: OMNIBUSF4V3
Upon disarm, I hear the ESCs startup beep and see the servos not reacting for ~1.5 sec.
In the configuratior, I see the "MSP round trip" goes up to triple digits.
After this short period, everything seems to be fine. No power cycle needed. Just the ESC reboot give me a very bad vibe.

Workaround: disable continuous servo trim OR change ESC protocol to anything but DSHOT.

As there were no 6.0 firmwares posted in this thread for my board, I could not see if this will be fixed. Is there a way to get nightly firmwares for any board?

@b14ckyy
Copy link
Collaborator

b14ckyy commented Dec 13, 2022

@mrbigglesw0rth this is normal during safe procedure.
it will be optimized in the next 6.0 released by #8439

has nothing to do with the bug here.

@vlamgat007
Copy link
Author

Just an update after installing INAV 6 Horizon Hawk. Same setup as original post but now running ELRS. The system consistently looses RC connection on disarm. If I switch off "Continuously trim servos on Fixed Wing" the connection is not lost and the problem disappears.

@breadoven
Copy link
Collaborator

Just an update after installing INAV 6 Horizon Hawk. Same setup as original post but now running ELRS. The system consistently looses RC connection on disarm. If I switch off "Continuously trim servos on Fixed Wing" the connection is not lost and the problem disappears.

Would be useful if you could try #8907.

(should be possible to use the firmware shown in the Artifacts section of https://github.com/iNavFlight/inav/actions/runs/4500418426 if you can't compile yourself).

@vlamgat007
Copy link
Author

Just an update after installing INAV 6 Horizon Hawk. Same setup as original post but now running ELRS. The system consistently looses RC connection on disarm. If I switch off "Continuously trim servos on Fixed Wing" the connection is not lost and the problem disappears.

Would be useful if you could try #8907.

(should be possible to use the firmware shown in the Artifacts section of https://github.com/iNavFlight/inav/actions/runs/4500418426 if you can't compile yourself).

@breadoven thanks. I will test tonight (CST). Can you confirm, I downloaded inav-6.1.0-ci-20230323-76d2769.zip but there was also a zip file in a post today by @DzikuVx inav_6.0.0_MATEKH743.zip

Which one should I use? Thanks!

image

@breadoven
Copy link
Collaborator

Either should work but @DzikuVx hex would be best given it should definitely be the right one.

@vlamgat007
Copy link
Author

t

Either should work but @DzikuVx hex would be best given it should definitely be the right one.

Thank you @breadoven and @DzikuVx

@vlamgat007
Copy link
Author

t

Either should work but @DzikuVx hex would be best given it should definitely be the right one.

Thank you @breadoven and @DzikuVx

@breadoven and @DzikuVx I tested both the following files

  • inav_6.0.0_MATEKH743.hex
  • inav_6.1.0_MATEKH743_ci-20230323-76d2769.hex

The 6.0.0 file did not work, I was able to initiate failsafe consistently. Not every time but probably >50% of the time.
The 6.1.0 file seems to work perfectly, I was not able to initiate failsafe at all.

@0crap
Copy link
Contributor

0crap commented Apr 5, 2023

@vlamgat007 can you try the hex provided here please?

Originally posted by @DzikuVx in #8905 (comment)

@DzikuVx DzikuVx closed this as completed Apr 21, 2023
@dcan999
Copy link

dcan999 commented Dec 14, 2023

I am getting this issue on 7.0.0. It doesn't lose RX link every time you arm/disarm with continuous trim enabled, it lets me arm/disarm 4 or 5 times before it loses RX link. With continuous trim off, it never loses link.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests