Multi-EKF support #14650

dagar · 2020-04-13T01:56:14Z

WORK IN PROGRESS
Multi-EKF will be a focus of the v1.12 release cycle. I’m opening this early so we can begin to identify and address the architectural changes that will need to be in place to make this viable.

Multi-EKF Background/Motivation

WIP

Currently in PX4 most commonly supported boards have 2 or 3 IMUs that are typically different models, sometimes with wildly different output data rates, and often on the same bus, or with no synchronization mechanism. As a result our sensor hub module isn't able to effectively compare raw sensor data and "vote", so it simply selects a highest priority sensor based on a simple set of fault metrics (timeouts, stale data, etc) and passes that on to the estimator. This allows the system to continue operating if a sensor is completely lost (a hard fault), but this is actually quite rare in operation. In reality most problems encountered in typical usage are soft faults that impact navigation like high vibration (aliasing, clipping) or erratic sensor problems that still produce output.

To effectively utilize the current suite of heterogeneous IMUs to mitigate the most common real world pain points I'm proposing moving to an architecture of running (at least) one estimator per IMU.

in multi-mode each ekf2 instance publishes
- estimator_attitude (vehicle_attitude)
- estimator_local_position (vehicle_local_position)
- estimator_global_position (vehicle_global_position)
a new ekf2_selector module consumes all estimator_* messages, selects a primary (latching), and republishes vehicle_attitude, vehicle_local_position, vehicle_global_position
ekf2_selector handles resets and deltas when switching between estimator instances
At the moment a switch will only occur if there's a fault while another estimator instance with a better combined test ratio (vel + pos + hgt) is fault free. Opportunity to identify (and store) the best sensor combination
sensors hub is now configurable to either provide the single highest priority IMU or publish all
- new parameter SENS_IMU_MODE to configure IMU output
  - 0: publish all vehicle_imu instances
  - 1: publish only highest priority as primary
On F7 we can start with one estimator instance per IMU. On F4 continue as is, user configurable if desired (lots of room for optimization on F4).

TODO:

proper multi-module support (ModuleBase add common base and cleanup #12191)
sensor_combined kill? (migrate to vehicle_imu)
ekf2 replay review (logging all raw data for all instances probably not a good default)
ecl analysis script updated to be aware of multiple instances
factor in initial sensor priorities (or kill this entire concept?)
- the initial primary/preferred IMU should probably be configurable
preflight estimator_status (commander logic updated to only care about primary or should it check all?)
different origins?
- each estimator instance will have a slightly different origin (or worse), does it matter or can we get away with it if resets between instances are carefully handled everywhere?
SITL add multiple sensor instances
sensor metadata should belong to each sensor instance, not EKF2_* parameters
- offsets, rotations, possibly even noise
uORB multi-instance assumptions, we can't be sure all ekf2 publications are the same instance number. Maybe time to rethink uORB with namespaces? estimator_status0 -> estimator/0/status
lockstep implications (ekf2_timestamps)
Do we potentially need/want a separate level calibration per IMU?
per instance parameters (EKF2_MAGBIAS_{X, Y, Z}, accel and gyro noise?)

Future

entire matrix of sensor combinations (more sensors module configuration)
combine onboard AND offboard estimators (UAVCAN v1)
testing ideas, run new and old changes side by side safely

jlecoeur · 2020-04-13T07:40:37Z

Awesome! I like the proposed architecture very much.

Some thoughts on the architecture:
Although more complex, you may consider moving the selected instance to the att_pos_ctl work queue for reduced latency and stack.
Thinking out loud, we may benefit from the EKF architecture (1 time-delayed EKF + 1 output filter) to optimize further. For example run 3 time-delayed EKFs in their own work queue, select one, and run the output filter for the selected instance only, potentially in the att_pos_ctl work queue to allow higher output rate and lower latency.

Regarding ORB instances, would not it be sufficient to add the ability to "--force" an instance number at the publish site?

dagar · 2020-04-13T14:30:41Z

Although more complex, you may consider moving the selected instance to the att_pos_ctl work queue for reduced latency and stack.

That's worth thinking about. I didn't get into it above, but wq:att_pos_ctrl is higher priority than wq:INSx and it's scheduled with publications of the current primary EKF (with backup timeout schedule). I was considering mechanisms for adjusting the priority of those threads dynamically based on EKF instance priority. With only 3 instances (1 per IMU) it's likely already fine as in many cases those sensors are on the same bus and already spaced out, however this will start to break down with a larger matrix instances. Even on something like an H7 where we could afford to run a full matrix of sensor combinations we'll still want to be careful to keep the latency to a minimum on the selected path.

run the output filter for the selected instance only, potentially in the att_pos_ctl work queue to allow higher output rate and lower latency.

I'd like to explore that, and it's actually what I was trying to get at here #14379 (comment). Instead of independent downsampling (hard coded filter update period) within each ecl/EKF backend we could simply update it with every new IMU. Configuring the IMU integration would be configuring the corresponding filter update period. We could update the attitude separately from buffered IMU (pre-integration).

Regarding ORB instances, would not it be sufficient to add the ability to "--force" an instance number at the publish site?

The (potential) issue is something else coming in earlier and claiming that instance first. We can be careful to avoid that in practice, but I'd rather the potential hole didn't exist. Each ekf2 instance has up to 15 different publications. It would be a nice simplification if we can trivially map a particular message instance back to each estimator rather than every single message carrying metadata. I think this could be a good excuse to add a bit of hierarchy to uORB, but that alone might be a rabbit hole.

https://github.com/PX4/Firmware/blob/66eacd24bc8ba26f6d22a4596c16040e841d5dc3/src/modules/ekf2/ekf2_main.cpp#L269-L283

LorenzMeier · 2020-04-14T09:25:38Z

Super nice to see this!

dagar · 2020-05-07T00:39:25Z

The (potential) orb instance ordering issue has been resolved. With some uORB::Publication changes now in master at Ekf2 construction we advertise all topics at once.

xdwgood · 2020-06-11T01:04:03Z

When performing EKF selection, how do we plan to solve the state jump?

dagar · 2020-06-11T02:23:28Z

When performing EKF selection, how do we plan to solve the state jump?

The selector handles the jump and we already have the reset mechanisms in place (messages and controllers). This is also used to pass through any actual resets from the active instance.

https://github.com/PX4/Firmware/blob/a4927606ed3799361819ec7d1caaf207fe6a269f/msg/vehicle_attitude.msg#L6

https://github.com/PX4/Firmware/blob/a4927606ed3799361819ec7d1caaf207fe6a269f/msg/vehicle_local_position.msg#L29

dagar · 2020-10-19T15:49:25Z

We've hit the fmu-v2 flash limit again.

EDIT: #15994 should give us enough.

dagar · 2020-10-21T04:23:35Z

Real test logs thanks to @mcsauder.

https://review.px4.io/plot_app?log=f21e4e44-8d77-4758-b2db-091669e7bf71
https://review.px4.io/plot_app?log=1898560b-867f-4c2a-bd4d-5f5ccc9013a0

In the second log you can see an estimator switch midflight with a large uncommanded change in yaw setpoint.

I don't see a problem with the logic in the estimator selector (the delta isn't wrong), I suspect this is a problem with the existing reset logic in the muticopter attitude controller (mc_att_control) and stabilized mode.

https://github.com/PX4/Firmware/blob/0b74076265edda2ac34cb9c4addcfeea7277d3a2/src/modules/mc_att_control/mc_att_control_main.cpp#L253-L261

dagar · 2020-10-21T13:48:41Z

I've fixed the delta_q_reset in the EKF2 selector. e336dae

- add lockstep progress in HIL_SENSOR handling

priseborough

This is off by default so safe to merge. Items to look at post merge include:

MAVSDK needs to be updated to support the failure of a single sensor instance and auto tests updated.
EKF replay with multiple instances if possible. Single instance appears to work.

…display free at beginning and end

dagar · 2020-10-27T13:38:14Z

Thanks Paul.

Next steps from my perspective.

multi-EKF enabled in SITL (lockstep issues to resolve)
MAVSDK tests with specific sensors (failure, inject bias, noise, etc)
purge sensor_combined entirely
ekf2 replay
- multi-ekf support
- working with vehicle_imu
- possibly remove ekf2_timestamps?
multi module support generalize in ModuleBase if possible and remove custom EKF2 handling
extend architecture to support multi-ekf across any type of sensor (barometers, GPS, optical flow, range finder, etc)
mechanisms to adjust priorities across WorkQueues and WorkItems as the primary estimator changes (minimize latency for control path)
flight testing on common boards (2-3 IMUs, 1 internal mag, 1 or more external mags)

mrpollo · 2020-10-28T16:20:33Z

TODO:

Make sure we highlight this feature on the next release notes
Create documentation on how the feature works, and how to configure it

dagar added Admin: Enhancement (improvement) 💡 EKF2 labels Apr 13, 2020

dagar added this to the Release v1.12.0 milestone Apr 13, 2020

dagar self-assigned this Apr 13, 2020

dagar force-pushed the pr-ekf2_selector branch from 0084c28 to a305628 Compare April 13, 2020 03:30

weekly-digest bot mentioned this pull request Apr 19, 2020

Weekly Digest (12 April, 2020 - 19 April, 2020) #14700

Closed

dagar force-pushed the pr-ekf2_selector branch from a305628 to 404bdf3 Compare May 1, 2020 19:17

weekly-digest bot mentioned this pull request May 3, 2020

Weekly Digest (26 April, 2020 - 3 May, 2020) #14814

Closed

dagar force-pushed the pr-ekf2_selector branch 3 times, most recently from 2c0b564 to 4ea93d5 Compare May 7, 2020 00:36

dagar mentioned this pull request May 20, 2020

move IMU integration to sensors/vehicle_imu to fix potential accel/gyro sync issues #14906

Merged

8 tasks

dagar force-pushed the pr-ekf2_selector branch 10 times, most recently from 5a2742e to 20f7d06 Compare June 6, 2020 19:30

dagar mentioned this pull request Jun 10, 2020

sensors: move mag voter to new VehicleMagnetometer WorkItem #14397

Closed

up to 4 accels and gyros are supported now

88d0239

dagar mentioned this pull request Oct 20, 2020

Feature request: Disabling EKF in no global position #15993

Closed

Merge remote-tracking branch 'px4/master' into pr-ekf2_selector

d21e469

ekf2: selector fix delta_q_reset

e336dae

dagar mentioned this pull request Oct 21, 2020

EKF: add fault status bit for clipping PX4/PX4-ECL#917

Merged

dagar added 5 commits October 21, 2020 19:41

SITL enable multi-ekf by default

f690f54

Merge remote-tracking branch 'px4/master' into pr-ekf2_selector

8cff5d5

simulator: only send controls after components finish

19305c5

- add lockstep progress in HIL_SENSOR handling

ekf2 selector only print primary EKF change if unhealthy

40e6cc9

Merge remote-tracking branch 'px4/master' into pr-ekf2_selector

cc9319a

priseborough previously approved these changes Oct 27, 2020

View reviewed changes

ROMFS: posix rcS disable multi-ekf

d0a7dca

dagar dismissed priseborough’s stale review via d0a7dca October 27, 2020 12:58

dagar added 4 commits October 27, 2020 09:24

uORB revert debug changes

cf71520

logger: reduce default mult-ekf and sensor logging to save memory

00e9254

px4_work_queue: reduce wq:nav_and_controllers stack

eb23306

Jenkins hardware print all estimator attitude and position messages, …

44c1d81

…display free at beginning and end

dagar merged commit 0f411d6 into master Oct 27, 2020

dagar deleted the pr-ekf2_selector branch October 27, 2020 14:56

roman-dvorak mentioned this pull request Jan 15, 2021

Primary EKF changed X (unhealthy) -> X ThunderFly-aerospace/PX4-FlightGear-Bridge#27

Closed

Jaeyoung-Lim mentioned this pull request Mar 16, 2021

Choosing best attitude data ethz-asl/data-driven-dynamics#15

Closed

mrpollo mentioned this pull request Mar 22, 2021

v1.12 Updates: Multi-EKF enabled by default PX4/PX4-user_guide#1135

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-EKF support #14650

Multi-EKF support #14650

dagar commented Apr 13, 2020 •

edited

Loading

jlecoeur commented Apr 13, 2020

dagar commented Apr 13, 2020

LorenzMeier commented Apr 14, 2020

dagar commented May 7, 2020

xdwgood commented Jun 11, 2020

dagar commented Jun 11, 2020

dagar commented Oct 19, 2020 •

edited

Loading

dagar commented Oct 21, 2020

dagar commented Oct 21, 2020

priseborough left a comment

dagar commented Oct 27, 2020 •

edited

Loading

mrpollo commented Oct 28, 2020 •

edited

Loading

Multi-EKF support #14650

Multi-EKF support #14650

Conversation

dagar commented Apr 13, 2020 • edited Loading

Multi-EKF Background/Motivation

TODO:

Future

jlecoeur commented Apr 13, 2020

dagar commented Apr 13, 2020

LorenzMeier commented Apr 14, 2020

dagar commented May 7, 2020

xdwgood commented Jun 11, 2020

dagar commented Jun 11, 2020

dagar commented Oct 19, 2020 • edited Loading

dagar commented Oct 21, 2020

dagar commented Oct 21, 2020

priseborough left a comment

Choose a reason for hiding this comment

dagar commented Oct 27, 2020 • edited Loading

mrpollo commented Oct 28, 2020 • edited Loading

dagar commented Apr 13, 2020 •

edited

Loading

dagar commented Oct 19, 2020 •

edited

Loading

dagar commented Oct 27, 2020 •

edited

Loading

mrpollo commented Oct 28, 2020 •

edited

Loading