Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-EKF support #14650

Merged
merged 76 commits into from
Oct 27, 2020
Merged

Multi-EKF support #14650

merged 76 commits into from
Oct 27, 2020

Conversation

dagar
Copy link
Member

@dagar dagar commented Apr 13, 2020

WORK IN PROGRESS
Multi-EKF will be a focus of the v1.12 release cycle. I’m opening this early so we can begin to identify and address the architectural changes that will need to be in place to make this viable.

Multi-EKF Background/Motivation

WIP

Currently in PX4 most commonly supported boards have 2 or 3 IMUs that are typically different models, sometimes with wildly different output data rates, and often on the same bus, or with no synchronization mechanism. As a result our sensor hub module isn't able to effectively compare raw sensor data and "vote", so it simply selects a highest priority sensor based on a simple set of fault metrics (timeouts, stale data, etc) and passes that on to the estimator. This allows the system to continue operating if a sensor is completely lost (a hard fault), but this is actually quite rare in operation. In reality most problems encountered in typical usage are soft faults that impact navigation like high vibration (aliasing, clipping) or erratic sensor problems that still produce output.

To effectively utilize the current suite of heterogeneous IMUs to mitigate the most common real world pain points I'm proposing moving to an architecture of running (at least) one estimator per IMU.

Blank Diagram

Screenshot from 2020-04-12 13-38-07

  • in multi-mode each ekf2 instance publishes
    - estimator_attitude (vehicle_attitude)
    - estimator_local_position (vehicle_local_position)
    - estimator_global_position (vehicle_global_position)
  • a new ekf2_selector module consumes all estimator_* messages, selects a primary (latching), and republishes vehicle_attitude, vehicle_local_position, vehicle_global_position
  • ekf2_selector handles resets and deltas when switching between estimator instances
  • At the moment a switch will only occur if there's a fault while another estimator instance with a better combined test ratio (vel + pos + hgt) is fault free. Opportunity to identify (and store) the best sensor combination
  • sensors hub is now configurable to either provide the single highest priority IMU or publish all
    • new parameter SENS_IMU_MODE to configure IMU output
      • 0: publish all vehicle_imu instances
      • 1: publish only highest priority as primary
  • On F7 we can start with one estimator instance per IMU. On F4 continue as is, user configurable if desired (lots of room for optimization on F4).

TODO:

  • proper multi-module support (ModuleBase add common base and cleanup #12191)
  • sensor_combined kill? (migrate to vehicle_imu)
  • ekf2 replay review (logging all raw data for all instances probably not a good default)
  • ecl analysis script updated to be aware of multiple instances
  • factor in initial sensor priorities (or kill this entire concept?)
    • the initial primary/preferred IMU should probably be configurable
  • preflight estimator_status (commander logic updated to only care about primary or should it check all?)
  • different origins?
    • each estimator instance will have a slightly different origin (or worse), does it matter or can we get away with it if resets between instances are carefully handled everywhere?
  • SITL add multiple sensor instances
  • sensor metadata should belong to each sensor instance, not EKF2_* parameters
    • offsets, rotations, possibly even noise
  • uORB multi-instance assumptions, we can't be sure all ekf2 publications are the same instance number. Maybe time to rethink uORB with namespaces? estimator_status0 -> estimator/0/status
  • lockstep implications (ekf2_timestamps)
  • Do we potentially need/want a separate level calibration per IMU?
  • per instance parameters (EKF2_MAGBIAS_{X, Y, Z}, accel and gyro noise?)

Future

  • entire matrix of sensor combinations (more sensors module configuration)
  • combine onboard AND offboard estimators (UAVCAN v1)
  • testing ideas, run new and old changes side by side safely

@jlecoeur
Copy link
Contributor

Awesome! I like the proposed architecture very much.

Some thoughts on the architecture:
Although more complex, you may consider moving the selected instance to the att_pos_ctl work queue for reduced latency and stack.
Thinking out loud, we may benefit from the EKF architecture (1 time-delayed EKF + 1 output filter) to optimize further. For example run 3 time-delayed EKFs in their own work queue, select one, and run the output filter for the selected instance only, potentially in the att_pos_ctl work queue to allow higher output rate and lower latency.

Regarding ORB instances, would not it be sufficient to add the ability to "--force" an instance number at the publish site?

@dagar
Copy link
Member Author

dagar commented Apr 13, 2020

Although more complex, you may consider moving the selected instance to the att_pos_ctl work queue for reduced latency and stack.

That's worth thinking about. I didn't get into it above, but wq:att_pos_ctrl is higher priority than wq:INSx and it's scheduled with publications of the current primary EKF (with backup timeout schedule). I was considering mechanisms for adjusting the priority of those threads dynamically based on EKF instance priority. With only 3 instances (1 per IMU) it's likely already fine as in many cases those sensors are on the same bus and already spaced out, however this will start to break down with a larger matrix instances. Even on something like an H7 where we could afford to run a full matrix of sensor combinations we'll still want to be careful to keep the latency to a minimum on the selected path.

run the output filter for the selected instance only, potentially in the att_pos_ctl work queue to allow higher output rate and lower latency.

I'd like to explore that, and it's actually what I was trying to get at here #14379 (comment). Instead of independent downsampling (hard coded filter update period) within each ecl/EKF backend we could simply update it with every new IMU. Configuring the IMU integration would be configuring the corresponding filter update period. We could update the attitude separately from buffered IMU (pre-integration).

Regarding ORB instances, would not it be sufficient to add the ability to "--force" an instance number at the publish site?

The (potential) issue is something else coming in earlier and claiming that instance first. We can be careful to avoid that in practice, but I'd rather the potential hole didn't exist. Each ekf2 instance has up to 15 different publications. It would be a nice simplification if we can trivially map a particular message instance back to each estimator rather than every single message carrying metadata. I think this could be a good excuse to add a bit of hierarchy to uORB, but that alone might be a rabbit hole.

https://github.com/PX4/Firmware/blob/66eacd24bc8ba26f6d22a4596c16040e841d5dc3/src/modules/ekf2/ekf2_main.cpp#L269-L283

@LorenzMeier
Copy link
Member

Super nice to see this!

@dagar
Copy link
Member Author

dagar commented May 7, 2020

The (potential) orb instance ordering issue has been resolved. With some uORB::Publication changes now in master at Ekf2 construction we advertise all topics at once.

@xdwgood
Copy link
Contributor

xdwgood commented Jun 11, 2020

When performing EKF selection, how do we plan to solve the state jump?

@dagar
Copy link
Member Author

dagar commented Jun 11, 2020

When performing EKF selection, how do we plan to solve the state jump?

The selector handles the jump and we already have the reset mechanisms in place (messages and controllers). This is also used to pass through any actual resets from the active instance.

https://github.com/PX4/Firmware/blob/a4927606ed3799361819ec7d1caaf207fe6a269f/msg/vehicle_attitude.msg#L6

https://github.com/PX4/Firmware/blob/a4927606ed3799361819ec7d1caaf207fe6a269f/msg/vehicle_local_position.msg#L29

@dagar
Copy link
Member Author

dagar commented Oct 19, 2020

We've hit the fmu-v2 flash limit again.

Screenshot from 2020-10-19 11-48-50

EDIT: #15994 should give us enough.

@dagar
Copy link
Member Author

dagar commented Oct 21, 2020

Real test logs thanks to @mcsauder.

https://review.px4.io/plot_app?log=f21e4e44-8d77-4758-b2db-091669e7bf71
https://review.px4.io/plot_app?log=1898560b-867f-4c2a-bd4d-5f5ccc9013a0

In the second log you can see an estimator switch midflight with a large uncommanded change in yaw setpoint.

Screen Shot 2020-10-20 at 11 21 34 PM

I don't see a problem with the logic in the estimator selector (the delta isn't wrong), I suspect this is a problem with the existing reset logic in the muticopter attitude controller (mc_att_control) and stabilized mode.

https://github.com/PX4/Firmware/blob/0b74076265edda2ac34cb9c4addcfeea7277d3a2/src/modules/mc_att_control/mc_att_control_main.cpp#L253-L261

@dagar
Copy link
Member Author

dagar commented Oct 21, 2020

I've fixed the delta_q_reset in the EKF2 selector. e336dae

priseborough
priseborough previously approved these changes Oct 27, 2020
Copy link
Contributor

@priseborough priseborough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is off by default so safe to merge. Items to look at post merge include:

MAVSDK needs to be updated to support the failure of a single sensor instance and auto tests updated.
EKF replay with multiple instances if possible. Single instance appears to work.

@dagar
Copy link
Member Author

dagar commented Oct 27, 2020

Thanks Paul.

Next steps from my perspective.

  • multi-EKF enabled in SITL (lockstep issues to resolve)
  • MAVSDK tests with specific sensors (failure, inject bias, noise, etc)
  • purge sensor_combined entirely
  • ekf2 replay
    • multi-ekf support
    • working with vehicle_imu
    • possibly remove ekf2_timestamps?
  • multi module support generalize in ModuleBase if possible and remove custom EKF2 handling
  • extend architecture to support multi-ekf across any type of sensor (barometers, GPS, optical flow, range finder, etc)
  • mechanisms to adjust priorities across WorkQueues and WorkItems as the primary estimator changes (minimize latency for control path)
  • flight testing on common boards (2-3 IMUs, 1 internal mag, 1 or more external mags)

@dagar dagar merged commit 0f411d6 into master Oct 27, 2020
@dagar dagar deleted the pr-ekf2_selector branch October 27, 2020 14:56
@mrpollo
Copy link
Contributor

mrpollo commented Oct 28, 2020

TODO:

  • Make sure we highlight this feature on the next release notes
  • Create documentation on how the feature works, and how to configure it

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants