Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When a joint goes in HF, errors like positionMoveRaw: skipping command flood the logger #768

Closed
S-Dafarra opened this issue Oct 18, 2021 · 8 comments · Fixed by #770
Closed

Comments

@S-Dafarra
Copy link
Contributor

S-Dafarra commented Oct 18, 2021

Bug description

When writing simple applications to control the robot, we are used to sending commands to joints without checking the current control mode. See for example https://github.com/robotology/icub-tutorials/blob/23ac25487d5a872030b82b38a28eb44b2cb3bda3/src/motorControlBasic/tutorial_arm.cpp#L129

This is because we usually set the desired control mode only once, during the initialization phase, and during the control loop we simply get some measurement, and we set the references to the joints.

It may happen though, that some joint goes in HF after the startup. This happens quite easily with the hands. After that, the logger gets quickly filled up with messages like:

<ERROR> velocityMoveRaw: skipping command because  BOARD left_arm-eb26-j12_15 (IP 10.0.1.26)   joint  2  is not in VOCAB_CM_VELOCITY mode

Since the top level application usually does not check for the control mode at each control loop, it keeps sending references to the joint, producing errors like the above. Since the control loop is usually at 100Hz, this will saturate the logger quite easily, losing the possibility to check other errors.

cc @pattacini @marcoaccame

Steps to reproduce

It should be enough to run the tutorial https://github.com/robotology/icub-tutorials/tree/23ac25487d5a872030b82b38a28eb44b2cb3bda3/src/motorControlBasic on the robot and change the control mode of the controlled joints to something different from Position and Idle.

Expected behavior

The errors of

yError() << "positionMoveRaw: skipping command because " << getBoardInfo() << " joint " << j << " is not in VOCAB_CM_POSITION mode";
,
yError() << "velocityMoveRaw: skipping command because " << getBoardInfo() << " joint " << j << " is not in VOCAB_CM_VELOCITY mode";
, and
yError() << "setReferenceRaw: skipping command because" << getBoardInfo() << " joint " << j << " is not in VOCAB_CM_POSITION_DIRECT mode";

should be limited in time, eventually sending the error only once, and sending it again when the control mode changes.

Example repository

No response

Additional context

No response

@pattacini pattacini changed the title When a joint goes in HF, errors like positionMoveRaw: skipping command flood the logger When a joint goes in HF, errors like positionMoveRaw: skipping command flood the logger Oct 18, 2021
@pattacini
Copy link
Member

pattacini commented Oct 18, 2021

This is not a bug, but rather more a feature request, I'd say.

In my opinion, the high-level SW should be responsible somehow for checking the status of the boards. We do this for example in the Cartesian and Gaze controller.

However, due to some other limitations (see robotology/community#558) that may turn out to be severe in certain conditions, I reckon we could come up with a kind of workaround.

Regarding the workaround strategy below

should be limited in time, eventually sending the error only once, and sending it again when the control mode changes.

I wouldn't rule out yet the choice of keeping triggering the error but at a much lower frequency. We'd need to ponder all the possibilities.

Stay tuned.

@pattacini
Copy link
Member

pattacini commented Oct 19, 2021

To illustrate what I have in mind, I've fast-prototyped the handling logic in Stateflow.

model-1
The model with the input and the outputs
model-2
A closer look at the FSM chart composed of 2 parallel subcharts

Essentially, the FSM receives as input a boolean that is 1 when an error message is being triggered and 0 otherwise and yields as output a corresponding boolean with the same meaning that undergoes however a sort of smart down-sampling plus a counter that tells us how many original errors have been raised in a given temporal window.

The handler is composed of two parallel charts:

  • EvalOccurence is devoted to evaluating the number of errors triggered in a given temporal window (default = 1 sec).
  • ErrorHandler is the actual handler implementing the logic below:
    • If the occurrence of the errors is above a threshold, then it triggers the output only each second (i.e., same lapse as above).
    • Otherwise, output = input.

We have only 2 params:

  • The temporal window is used to evaluate the frequency of the input errors and to carry out down-sampling at the output stage.
  • The threshold for the errors detected in the window above which triggers down-sampling (default = 5).

The output of the handler is twofold:

  • A boolean that tells when to print the message.
  • An integer that accounts for the number of errors that occurred since the last print (this info can be used to populate the message).

Here's below a typical outcome where it's shown how the output follows the input only within the initial window that serves to evaluate the occurrence of the errors. After that, the output gets triggered only in single instances.

graph

The FSM can be obviously used to generate code.


You can play with the model: error_downsampler.zip.

@pattacini pattacini self-assigned this Oct 19, 2021
@S-Dafarra
Copy link
Contributor Author

That's pretty cool! At the moment a different error is thrown by each joint. We had cases in which a MAIS board shut down, and then all the 9 joints of a hand were going in HF. In that case, those lines were producing 10000 errors in about 10 seconds. I guess the mechanism you described considers each joint separately right?

@pattacini
Copy link
Member

pattacini commented Oct 20, 2021

At the moment the FSM is agnostic wrt other info like the joint number and the initial control mode as it receives only a boolean accounting for the occurrence of the input errors.

Along this line, we may then apply this algorithm as a function to each individual type of errors (per joint and per mode). The reduction will be still very significative although we may require too much memory just for that.

Alternatively, we may let the printouts show up initially with their info (i.e., joint number and control modes) to then only print a cumulative agnostic error message. In this case, we do need only one instance of the handler, I guess.

@pattacini
Copy link
Member

pattacini commented Oct 20, 2021

🟢 Just pushed the model to event-downsampler.

image

This way, we may use the internal state DOWNSAMPLE as in the following meta code:

IN = false;

if (error_1) {
  IN = true;
  if (!FSM.DOWNSAMPLE) {
    yError() << "positionMoveRaw: skipping command because " << getBoardInfo() << " joint " << j << " is not in VOCAB_CM_POSITION mode";
  }
}

if (error_2) {
  IN = true;
  if (!FSM.DOWNSAMPLE) {
    yError() << "velocityMoveRaw: skipping command because " << getBoardInfo() << " joint " << j << " is not in VOCAB_CM_VELOCITY mode";
  }
}

// ...

FSM_step(IN);

if (FSM.DOWNSAMPLE && FSM.OUT) {
  yError() << "Skipping the requested command as the board is not in the correct control mode. Detected" << FSM.CNT << "errors on aggregate since the last message";
}

@pattacini
Copy link
Member

pattacini commented Oct 25, 2021

Had looked a bit deeper at the code and found out that we'd need to have a timer running at a reasonable rate (usual 10 ms) while collecting possible asynchronous input events. Before, the assumption was that we could run at the fastest rate, which can no longer hold.

Therefore, I've refactored the model as per https://github.com/icub-tech-iit/matlab-tools/tree/master/event-downsampler, where now we have essentially a counter as input in place of a boolean.

@pattacini
Copy link
Member

@pattacini
Copy link
Member

Done in #770.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants