Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🛠️] Change Command Scheduler to fix iterator invalidation bugs and get rid of "hacks" in C++ and Java #6593

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

kytpbs
Copy link
Contributor

@kytpbs kytpbs commented May 7, 2024

scheduledCommands.erase() was being called in for(Command* command : scheduledCommands)

this now gets put into a temporary vector (at the size of the set, I do not think all the commands will end in the same for loop but nevertheless it's the size of the set) and erases them after the for loop is done.

Found together with @PeterJohnson.

@kytpbs kytpbs requested a review from a team as a code owner May 7, 2024 00:11
Copy link
Member

@Starlight220 Starlight220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't the iteration being done via an iterator (and then the removal also being done through the iterator)?

There's already endingCommands, and this adds another toRemove; I feel we're hacking this implementation patch on patch and it's not great.

This diverges from Java.

This changes behavior, as with this isScheduled will return true until the following iteration and not immediately when isFinished returns false. As an example of practical change, proxies will now end the iteration after the proxied command (which tbh might be better than the current situation where it depends on the map iteration order).

Add unit tests based on the code used to find this.

@PeterJohnson
Copy link
Member

PeterJohnson commented May 7, 2024

Wasn't the iteration being done via an iterator (and then the removal also being done through the iterator)?

Sort of. It used a range for, which has an internal iterator. That iterator is invalidated by SmallSet when the current object is removed, which means the for loop next iteration is incrementing from an invalid iterator (UB). Even manually managing the iterators doesn't work with SmallSet, because it can use a vector implementation, which invalidates all iterators, not just the current one. So the only other possible fix would be to use std::set AND manually manage the iterators instead of using range-for.

Java doesn't have this issue--iterator.remove() updates itself in such a way that hasNext() on it is safe for the next loop iteration.

@Starlight220
Copy link
Member

Starlight220 commented May 7, 2024

SmallSet doesn't have an Iterator.remove like Java?

@PeterJohnson
Copy link
Member

No. Neither does std::set.

@kytpbs
Copy link
Contributor Author

kytpbs commented May 7, 2024

/format

Copy link
Contributor

@KangarooKoala KangarooKoala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

However, considering that this certainly does have a change in behavior, Java should be changed to match. (We've had enough issues with differences between Java and C++, so it's best to keep them in sync as much as possible.) I'll note that robotpy's command scheduler avoids this issue by iterating over a (shallow) copy of the map/dictionary, but it still has the scheduling and canceling queues (presumably for compatibility?).

Additionally, I'm nervous about partially delaying ending the command- It feels like at least one person will run into an edge case where a "scheduled" command not being linked to any requirements causes strange and difficult-to-debug behavior.

@kytpbs
Copy link
Contributor Author

kytpbs commented May 16, 2024

The way robot.py did it seems like a better way to mitigate this issue, and SmallSet<Command*>(scheduledCommands) should create a copy of the smallSet, and I don't think the smallSet will be big enough that this will have a performance impact.

What do you guys think?

@KangarooKoala
Copy link
Contributor

I agree that copying scheduledCommands would be the cleaner solution. Some things to remember for whoever implements that:

  • Make the same change to Java (I think iterating over m_scheduledCommands.toArray(new Command[0]) would be the most performant way to copy the set)
  • Add a check to the start of the run loop to skip commands that are no longer scheduled
  • Remove the inRunLoop, toSchedule, toCancelCommands, and toCancelInterruptors member variables

@TheTripleV
Copy link
Member

Whatever behavior is being chosen, if it's going to being a part of the "spec", there should be tests that fail currently and pass afterwards added.

@kytpbs
Copy link
Contributor Author

kytpbs commented May 21, 2024

/format

@kytpbs kytpbs requested a review from PeterJohnson May 21, 2024 22:42
@kytpbs kytpbs changed the title [🛠️] fix "modify set in its loop" bug in C++ Command Scheduler [🛠️] change Command Scheduler to fix iterator invalidation bugs and get rid of "hacks" in C++ and Java May 21, 2024
@kytpbs kytpbs changed the title [🛠️] change Command Scheduler to fix iterator invalidation bugs and get rid of "hacks" in C++ and Java [🛠️] Change Command Scheduler to fix iterator invalidation bugs and get rid of "hacks" in C++ and Java May 21, 2024
@kytpbs kytpbs requested a review from KangarooKoala May 21, 2024 23:27
Copy link
Contributor

@KangarooKoala KangarooKoala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! (For what it's worth)
I'll just note down the behavioral differences for posterity, but I think these are acceptable (even positive) changes:

  • command.isScheduled() is false when command.end(boolean) and the finish or interrupt callbacks are called.
  • When scheduling a command from the run loop, command.initialize() and the schedule callbacks are called immediately, not after the run loop. It remains the case that command.execute() will only be called from the next call to scheduler.run().
  • When cancelling a command from the run loop , command.end(true) and the finish or interrupt callbacks are called immediately, not after the run loop (and the queued scheduled commands are processed). Notably, if a command (call it A) cancels another command (call it B) that was scheduled after A (and therefore will be processed after A), B is not be processed (execute() called and isFinished() checked) in the same run loop.

@kytpbs
Copy link
Contributor Author

kytpbs commented May 22, 2024

/format

@PeterJohnson
Copy link
Member

PeterJohnson commented May 23, 2024

Set.copyOf is memory allocation heavy; this is what the implementation is:

return (Set<E>)Set.of(new HashSet<>(coll).toArray());

So it first creates a HashSet, then creates an array of its elements, then creates a Set from that array.

For how we're using it here, toArray() would be more efficient. As a further optimization we could even avoid allocations most of the time by smartly reusing the array.

@KangarooKoala
Copy link
Contributor

Note that toArray(T[]) will automatically try to reuse the passed array, so a simple way to reduce allocations would be to update the array with m_composedCommandsCopy = m_composedCommands.toArray(m_composedCommandsCopy). (There's some options we could explore such as the initial array size and growing the array by more than is initially required, but I don't know if it would necessarily be worth it)

Copy link
Member

@Starlight220 Starlight220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs so many tests.
SchedulingRecursionTest passing is a good sign (or a very bad one, if the tests there aren't working).

All behavior changes, edge cases, and so on need to be tested. Rigorously.

@PeterJohnson
Copy link
Member

Note that toArray(T[]) will automatically try to reuse the passed array, so a simple way to reduce allocations would be to update the array with m_composedCommandsCopy = m_composedCommands.toArray(m_composedCommandsCopy). (There's some options we could explore such as the initial array size and growing the array by more than is initially required, but I don't know if it would necessarily be worth it)

Note also toArray() will null-fill the end elements, not shrink the array, so you will need to look for the first null and exit the loop early if doing this.

@kytpbs
Copy link
Contributor Author

kytpbs commented Oct 10, 2024

synced with main with rebase.

Will write new tests and better optimize currently working on it

@kytpbs kytpbs force-pushed the Fix-cpp-commandScheduler branch 2 times, most recently from 4db4256 to 2d54082 Compare October 11, 2024 03:24
Co-authored-by: Joseph Eng <91924258+KangarooKoala@users.noreply.github.com>
@kytpbs
Copy link
Contributor Author

kytpbs commented Oct 11, 2024

fix formatting with rebase + force-push: (I forget to run wpiformat all the time...)

@kytpbs
Copy link
Contributor Author

kytpbs commented Oct 11, 2024

Update the checking for null and isScheduled as discussed in discord with a amend + force-push:

@kytpbs
Copy link
Contributor Author

kytpbs commented Oct 11, 2024

oops did command.isScheduled() instead of isScheduled(command) what a rookie mistake, fixing with force push now

@kytpbs
Copy link
Contributor Author

kytpbs commented Oct 12, 2024

I have added 3 different tests testing weird cases of using the scheduler inside the commands. all 3 fail on the main branch and pass on this branch

What other tests should I add? @Starlight220?

Copy link
Contributor

@KangarooKoala KangarooKoala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open for discussion- Should we move some of these tests to SchedulingRecursionTest?

Co-Authored-By: Joseph Eng <91924258+KangarooKoala@users.noreply.github.com>
@kytpbs
Copy link
Contributor Author

kytpbs commented Oct 12, 2024

@KangarooKoala: Should we move some of these tests to SchedulingRecursionTest?

I say we move test cancelNextCommandTest for sure and also scheduleCommandInCommand since they are both similar to #4259 but keep commandKnowsWhenEndedTest since it feels more like a CommandScheduler test more than a scheduling recursion.
Any objections?

Copy link
Contributor

@KangarooKoala KangarooKoala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say we move test cancelNextCommandTest for sure and also scheduleCommandInCommand since they are both similar to #4259 but keep commandKnowsWhenEndedTest since it feels more like a CommandScheduler test more than a scheduling recursion.
Any objections?

Sounds good!

KangarooKoala

This comment was marked as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants