Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make zeroconf availablity drive the decision to poll or not #243

Open
bdraco opened this issue Nov 6, 2022 · 10 comments
Open

Make zeroconf availablity drive the decision to poll or not #243

bdraco opened this issue Nov 6, 2022 · 10 comments

Comments

@bdraco
Copy link
Collaborator

bdraco commented Nov 6, 2022

    Maybe the zeroconf recording dropping drives unavailability?

Originally posted by @Jc2k in #237 (comment)

This would allow us to get rid of the polling and only make unavailable when the PTR expires.


I think the implementation is:

Polling only if device is not present in zeroconf

@bdraco bdraco changed the title Make the zeroconf recording dropping drive unavailability Make zeroconf availablity drive the decision to poll or not Nov 6, 2022
@Jc2k
Copy link
Owner

Jc2k commented Nov 6, 2022

If a crash + recovery happens quickly (before records expire) would we detect that connection might be broken?

@bdraco
Copy link
Collaborator Author

bdraco commented Nov 6, 2022

Maybe a is_advertising property on the pairing

BLE -> -- we have a description -- effectively always true if we have ever seen the device
IP -> -- zeroconf record is present ---
COAP -> -- zeroconf record is present ---

@bdraco
Copy link
Collaborator Author

bdraco commented Nov 6, 2022

If a crash + recovery happens quickly (before records expire) would we detect that connection might be broken?

If it hard crashed the zeroconf record wouldn't get withdrawn (requires an announce with a TTL of 0) and we can't tell the difference between the accessory sending an unsolicited broadcast at startup and a normal response so we wouldn't know its crashed from zeroconf alone

@bdraco
Copy link
Collaborator Author

bdraco commented Nov 6, 2022

If it does drop and come back we will get a TCP RST though so we know we need to poll if the connections gets reset by peer

@Jc2k
Copy link
Owner

Jc2k commented Nov 6, 2022

I'm not sure we'll always get an RST.

If connection is established and idle and there's nothing like NAT or keep alives, and then there is a hard crash, the device will just vanish without sending one.

@bdraco
Copy link
Collaborator Author

bdraco commented Nov 6, 2022

When the device reconnects to the network aiohomekit's stale connection will keep trying to talk to the device, and when the device's TCP stack sees it trying to talk over a connection that isn't established its going to send back an RST because aiohomekit's connection isn't known.... at least if it implements TCP/IP correctly 🙃

@bdraco
Copy link
Collaborator Author

bdraco commented Nov 6, 2022

We would also want to poll again for any connection drop, the RST case is just the obvious case of the accessory rebooting

@bdraco
Copy link
Collaborator Author

bdraco commented Nov 6, 2022

If connection is established and idle and there's nothing like NAT or keep alives, and then there is a hard crash, the device will just vanish without sending one.

If its a hard crash and its not coming back it would wait for the zeroconf record to expire before it would poll, fail, and mark it unavailable.

@bdraco
Copy link
Collaborator Author

bdraco commented Nov 6, 2022

Its always going to be a trade off though.

The other option is to change the poll interval to be higher when the zeroconf record is present so we reduce the traffic at the cost of waiting a bit longer to be unavailable.

I tend to thing traffic reduction is more important than quick unavailability reporting given how frequently we get wifi and network related issues but not sold either way.

@Jc2k
Copy link
Owner

Jc2k commented Nov 7, 2022

I don't disagree getting rid of polling would be better.

We could back off the poll interval and see if anyone notices.

Something else to think about... The docs we have access to AND the open ADK mandates that s# be 1. Yet most of my certified IP devices care to disagree:

Name: Hive Bridge (QYQ-313)
State Number (s#): 91
...
Name: Philips hue - 482544
State Number (s#): 183
...
Name: Elements F3C8
State Number (s#): 8
...
Name: eufy HomeBase2-0CEB
State Number (s#): 615
...
Name: Aqara-Hub-E1-91C7
State Number (s#): 915

The ones that seem to be at 1 are an Arlo baby camera and an Eve Extend.

I found some logs on the ADK issue tracker (issue 43) that suggest MFI devices might republish their zeroconf record when an event happens and no one is connected. The log strings don't exist in the open ADK though (I think someone might have raised a ticket in the wrong place - i.e. in a public tracker instead of a private tracker).

I also found this in the HTTP event code path of the esp HomeKit sdk:

    /* If no controller was connected and no disconnected event was sent,
     * reannaounce mDNS. That will increment state number as required
     * by HAP Spec R15.
     */
    if (!ctrl_connected && !hap_priv.disconnected_event_sent) {
        hap_mdns_announce(false);
        hap_priv.disconnected_event_sent = true;
    }

(Via https://github.com/espressif/esp-homekit-sdk/blob/eade505ade6c148056044307ce7b779bbc8e218b/components/homekit/esp_hap_core/src/esp_hap_ip_services.c#L1484-L1491)

Perhaps we should track the s# for changes? As if s# is only incremented (as suggested in above code) when no one is connected, it might mean we think we are connected but aren't?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants