-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Fix Priority1 check in announce timeout #635
RFC: Fix Priority1 check in announce timeout #635
Conversation
// Nothing to do | ||
if( clock->getPriority1() != 255 ) | ||
if( clock->getPriority1() == 255 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is that if priority1 == 255, we are never going to become GM so we don't care if the SyncAnnounceTimeout occurred?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is called when the timer for timeouts based on reception of SYNC/ANNOUNCE frames goes off. This path is here to allow us to consider taking over being grand master if the current one goes away, but if we are not capable of doing so, there's no point in trying to. It might still be a good idea to have a status log saying "Hey, our GM apparently went away but I can't take over because priority1 is 255."
Timestamp system_time; | ||
Timestamp device_time; | ||
if( getPortState() == PTP_MASTER ) | ||
return true; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says that we are already the GM, so we don't care that the SYNC announce timed out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we are already GM, we should be sending SYNC and ANNOUNCE, but I think we'll only receive them if someone thinks they're higher priority and wants to take over. I guess there might be a leftover timer from a recent change in status that could get us here? Probably good to guard against the situation even if unlikely.
GPTP_LOG_STATUS( | ||
"*** %s Timeout Expired - Becoming Master", | ||
e == ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ? "Announce" : | ||
"Sync" ); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can get to this line here due to an announce timeout, there seems to be something not quite right. The GM still has to send out announce msgs, correct? And we code above that exits this function if this port is the GM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout to send SYNC and ANNOUNCE is a separate timer, so we should only get here if we're not currently master but we're capable of being one. I believe the timers related to this event handler are set when we receive SYNC and ANNOUNCE from the current GM.
Thanks for this update @christopher-s-hall ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the noise and delay on review, but I had to think about it a bit before it made sense because I haven't been in this part of the code (or specs) lately.
I now think that it's generally doing the right thing, although I think we probably ought to run it through the test suite soon to make sure we're not broken in a subtle way.
I was wondering what the expected behavior is in the following scenario: A slave-only (P1 = 255) device comes online on a network which currently does not have a 802.1as GM. It enters PTP_SLAVE mode since it is 255, and the GM clock ID is initialized to all zeroes. After the announce_receipt_timeout, it will enter this code path and just return true. Secondly, in a subsequent scenario, the slave only device is now synchronizing to an 802.1as GM. So, the port_state is PTP_SLAVE and the GM Clock ID is set to that of the 802.1as GM. |
I was a bit uncertain about things still, and the questions @vvacharya asked prompted me to dig deeper into 802.1AS for answers. I still think this patch is an improvement over the previous state and probably fixes a serious bug, but there may still be some issues to clean up later. Technically, the GM clock ID should never be all 0's, and the port role should be MASTER_PORT if it hasn't received an ANNOUNCE. However, if it is not GM-capable, the spec says it should not set the gmPresent flag, which gates it from sending SYNCs as a master would, and tells it to get time information from its LocalClock. This is based on 10.3.12.1.4, the updtRolesTree() function of the PortRoleSelection state machine. It runs unconditionally after initialization and again if "reselect" is set in the PortAnnounceInformation state machine (10.3.11), which happens upon entry to the "Aged" state, which happens as soon as the port is enabled, has gPTP enabled, and becomes asCapable. Once we get better GM info than our own (i.e. any info from a GM-capable system) we start the AnnounceReceiptTimeout timer. If it goes off after that point, or if we get a SyncReceiptTimeout while gmPresent is true, then we transition back to "Aged" which should run updtRolesTree() again, which should set our role back to MASTER_PORT and clear gmPresent since we are not GM-capable. I don't think that's precisely what we were doing before, nor what we're doing now. The effect was close enough to pass the certification tests before, but I'm not sure about now. We should definitely give them another run whenever someone who has access gets a chance. There could be behavioral differences around the info we put in Announce messages before we hear from a real GM, especially once multi-port behavior is fully implemented. To really follow the spec, we'll need something like the "gmPresent" flag, and we'll have to switch to MASTER_PORT when we get timeouts regardless of our GM-capable status. It doesn't make much difference when we've only got one port and we'll never be GM, though. Another thing that's odd is that we're resetting the sync and announce receipt timers when they go off. That's not in the spec at all; they only get reset when you receive a valid SYNC or ANNOUNCE, respectively. Not sure what the purpose is, but there may be some reason. |
@christopher-s-hall - any comments? I think we merge this pull request as is and create a GitHub issue of the last two comments. Any disagreements with that plan? Any comments from anyone else? |
@andrew-elder I agree. Apart from the issues that @pinealservo points out, the code is a bit confusing (e.g. the annouce/sync receipt timers continue to run while in master mode even though no action is taken) overall. |
Thanks @christopher-s-hall. |
No description provided.