Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odometery calibration #2934

Merged
merged 11 commits into from
Jun 2, 2022

Conversation

jwallace42
Copy link
Contributor


Basic Info

Info Please fill out this column
Ticket(s) this addresses None
Primary OS tested on Ubuntu
Robotic platform tested on Gazebo

Description of contribution in a few bullet points

Added a new behavior tree for odometry calibration.

Description of documentation updates required from your changes

Added description of the odometry behavior tree.


Future work that may be required in bullet points

For Maintainers:

  • Check that any new parameters added are updated in navigation.ros.org
  • Check that any significant change is added to the migration guide
  • Check that any new features OR changes to existing behaviors are reflected in the tuning guide
  • Check that any new functions have Doxygen added
  • Check that any new features have test coverage
  • Check that any new plugins is added to the plugins page
  • If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists

@mergify
Copy link
Contributor

mergify bot commented May 4, 2022

@jwallace42, please properly fill in PR template in the future. @SteveMacenski, use this instead.

  • Check that any new parameters added are updated in navigation.ros.org
  • Check that any significant change is added to the migration guide
  • Check that any new features OR changes to existing behaviors are reflected in the tuning guide
  • Check that any new functions have Doxygen added
  • Check that any new features have test coverage
  • Check that any new plugins is added to the plugins page
  • If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists

@jwallace42 jwallace42 marked this pull request as ready for review May 5, 2022 18:56
@SteveMacenski
Copy link
Member

You have some linting and test failures

@SteveMacenski
Copy link
Member

SteveMacenski commented May 6, 2022

/opt/overlay_ws/src/navigation2/nav2_system_tests/src/behavior_tree/test_behavior_tree_node.cpp:213
Expected equality of these values:
  bt_handler->loadBehaviorTree(entry.path().string())
    Which is: false
  true

it looksl ike your BT is somehow invalid? That test simply tries to load all the BT XMLs in the directory and report if it fails to properly load. I don't immediately see anything wrong myself

@jwallace42
Copy link
Contributor Author

So I fixed the loading issue but I ran into another error that is quite confusing.

That test is failing due to a timeout. On further inspection I found the line that was causing the issue.

https://github.com/jwallace42/navigation2/blob/main/nav2_system_tests/src/behavior_tree/test_behavior_tree_node.cpp#L141.

The test only times out when the DriveOnHeading BT is in the new xml file.

This is quite odd for the following reasons.

  1. We have no trouble creating the tree when running the full stack.
  2. The test for DriveOnHeading does not time out.

I would expect the issue to time out every time the BT was trying to load DriveOnHeading. I am not quite sure what is going on. It seems to be deeper with the behavior tree library.

@SteveMacenski
Copy link
Member

Is it because its not in the library's list? https://github.com/jwallace42/navigation2/blob/main/nav2_system_tests/src/behavior_tree/test_behavior_tree_node.cpp#L54-L92

I think that's it, if it doesn't have access to the library, then it can't load. But I'm floored that BT.CPP doesn't throw an error and just hangs instead :eye roll:

@jwallace42
Copy link
Contributor Author

I wish that was the issue but I have it here https://github.com/ros-planning/navigation2/pull/2934/files.

@SteveMacenski
Copy link
Member

Huh, that's super strange

@jwallace42
Copy link
Contributor Author

jwallace42 commented May 10, 2022

I dug into this a bit more.

The new Cancel nodes have the same issue.

I tried digging into test with gdb but came across another unexpected result.
The test with pass with colcon --packages-select nav2_system_tests but fail if I try to run the executable directly.

>>> [rcutils|error_handling.c:108] rcutils_set_error_state()
This error state is being overwritten:

  'Service type support not from this implementation. Got:
    Handle's typesupport identifier (rosidl_typesupport_cpp) is not supported by this library, at /tmp/binarydeb/ros-rolling-rosidl-typesupport-cpp-1.4.2/src/type_support_dispatch.hpp:111
    Could not load library libnav2_msgs__rosidl_typesupport_introspection_cpp.so: dlopen error: libnav2_msgs__rosidl_typesupport_introspection_cpp.so: cannot open shared object file: No such file or directory, at /tmp/binarydeb/ros-rolling-rcutils-5.0.1/src/shared_library.c:99, at /tmp/binarydeb/ros-rolling-rosidl-typesupport-cpp-1.4.2/src/type_support_dispatch.hpp:76
while fetching it, at /tmp/binarydeb/ros-rolling-rmw-cyclonedds-cpp-1.1.2/src/rmw_node.cpp:4174

@SteveMacenski
Copy link
Member

SteveMacenski commented May 10, 2022

Neat... I wonder if something about the messages wasn't exported right for the drive on heading work. It could be that we missed some export that is causing this since its unique to that (and wasn't even introduced in this PR). With that said, I looked and wasn't able to find it

@jwallace42
Copy link
Contributor Author

jwallace42 commented May 10, 2022

Yeah, checked though the cmake and didn't find any discrepancies between DriveOnHeading and other plugins.

I also see this issue in

CancelWait
CancelSpin
CancelBackUp
CancelDriveOnHeading

@SteveMacenski
Copy link
Member

SteveMacenski commented May 10, 2022

Is it just the cancel plugins?

@padhupradheep it sounds like we may have missed something on the cancel behavior tree plugins you worked on

That's odd, but also gives us a datapoint. It's probably nothing to do with messages then. If Spin but not Spin Cancel works, then its also probably not on the behavior server side, its probably on the BT nodes themselves. And only impacting relatively new ones.

@padhupradheep
Copy link
Member

I was actually planning to jump in the conversation.

I dug deep into this during that PR. #2787 (comment)

For some reason the node didn’t register at all..

@jwallace42
Copy link
Contributor Author

jwallace42 commented May 10, 2022

I was just looking at the backtrace and it went back to

#0 futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7fffffffa5b0, clockid=<optimized out>, expected=0, futex_word=0x5555557bb74c) at ../sysdeps/nptl/futex-internal.h:320

I didn't really find any useful information out of this.

@jwallace42
Copy link
Contributor Author

Is it just the cancel plugins?

Also DriveOnHeading.

@SteveMacenski
Copy link
Member

I see no reason why it should matter, but none of those have on_tick's or ports in the provided port functions.

@padhupradheep
Copy link
Member

hey @jwallace42, if you need some help here.. I can give you an hand here.. provided I know, some more details from your last update..

@jwallace42
Copy link
Contributor Author

@padhupradheep not really sure how to go about solving the issue as a result I hadn't done anything this past week.

If you want to reproduce just put one of the troublesome BT nodes in the behavior tree xml and run the nav2_system_test. It should fail due to a timeout.

@padhupradheep
Copy link
Member

let me give it a dig and come back..

@jwallace42
Copy link
Contributor Author

Cool! Cool! Let me know if you have any questions....

I tried a lot of different things :)

@SteveMacenski
Copy link
Member

@padhupradheep find anything?

@padhupradheep
Copy link
Member

not yet.. need few more days..

@padhupradheep
Copy link
Member

I think, there is not much that I can find here. Unfortunately!

@SteveMacenski
Copy link
Member

What did you try? Just so we can use that as a basis of what doesn't work

@SteveMacenski
Copy link
Member

BehaviorTree/BehaviorTree.CPP#390 I also filed a ticket

@SteveMacenski
Copy link
Member

@jwallace42 please rebase, I doubt it'll fix it, but we updated docker images to 22.04

@padhupradheep
Copy link
Member

padhupradheep commented May 27, 2022

Oh wait, I see the problematic nodes are being registered after all..

I was doing some tests now.. I made our regular navigate_to_pose BT node into a non-sense BT (just for this test), by adding the following changes:

<RecoveryNode number_of_retries="1" name="FollowPath">
            <Sequence name="CancelingControlAndWait">
                <FollowPath path="{path}" controller_id="FollowPath"/>
                <ClearEntireCostmap name="ClearLocalCostmap-Context" service_name="local_costmap/clear_entirely_local_costmap"/>
                <CancelControl name="ControlCancel"/>
                <Wait wait_duration="20"/>
                <CancelWait name="CancelWait"/>
            </Sequence>
            <ClearEntireCostmap name="ClearLocalCostmap-Context" service_name="local_costmap/clear_entirely_local_costmap"/>
 </RecoveryNode>

So, what I inferred is that both the CancelControl and CancelWait BT nodes were ticked.. Of course the tests given in the https://github.com/ros-planning/navigation2/blob/main/nav2_system_tests/src/behavior_tree/test_behavior_tree_node.cpp failed.. But there was actually no timeout.

I verified the tick by just adding a print to the tick() function found in the bt_cancel_node.hpp

@jwallace42 am I missing something here to reproduce your issue?
@SteveMacenski sorry for the confusion.. I just went in the wrong direction last night..

@SteveMacenski
Copy link
Member

SteveMacenski commented May 27, 2022

Weirder yet, its been reported as the result of a problem in the other system tests for keepout/speed zones which just use the normal BT navigator version (not even that same test script): #2962 (comment)

Something seems wrong with BT.CPP or our use of it in CI...

@SteveMacenski
Copy link
Member

SteveMacenski commented May 27, 2022

My comment in #2962 (comment) explains why we might be seeing it in that particular PR. If that could be analogous here, it might be worth adding some logging inside of the BT action server to see where and what node is hanging, and specifically where.

Like... err https://github.com/ros-planning/navigation2/blob/e6639fca2f56ec700810a4eec55de31293d3b890/nav2_system_tests/src/behavior_tree/server_handler.cpp#L35-L44 we haven't added dummy servers for these actions for the BT trees to be able to load 🤦

It would have actually taken me a much, much longer time to have found that if it weren't for #2962 (comment) (thank you @wep21!! your comment helped in another timely place)

@SteveMacenski
Copy link
Member

Rebase or pull in main, there are some failing tests which concern me but I believe they've been handled elsewhere

@SteveMacenski
Copy link
Member

CI failed in an unusual way, retriggering

@mergify
Copy link
Contributor

mergify bot commented Jun 2, 2022

@jwallace42, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

@SteveMacenski SteveMacenski merged commit 9bb37c6 into ros-navigation:main Jun 2, 2022
@jwallace42 jwallace42 deleted the odometery_calibration branch June 3, 2022 16:56
redvinaa pushed a commit to EnjoyRobotics/navigation2 that referenced this pull request Jun 30, 2022
* added a BT for driving in a square

* removed extra printouts

* decreased timeout

* do not increment recovery

* code review

* code review

* test and linting fix

* updated plugin list

* fixed BT loading issue

* bump up speed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants