Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chip-tool] Can't unpair then re-pair with same ID #31582

Open
bdlr2 opened this issue Jan 22, 2024 · 13 comments
Open

[chip-tool] Can't unpair then re-pair with same ID #31582

bdlr2 opened this issue Jan 22, 2024 · 13 comments

Comments

@bdlr2
Copy link

bdlr2 commented Jan 22, 2024

Reproduction steps / Feature

When I unpair a device from the Matter fabric with a given ID (with chip-tool pairing unpair 0x1234) then I try to re-pair the same device with the old ID (which is supposed to be forgotten) with the command chip-tool pairing pair code-thread 0x1234, it pairs over BLE but at the moment chip-tool tries to reach it over Thread, it tries to connect to the IP address the device had BEFORE the unpair procedure and not the new one. The problem can't come from the end-device has it gets factory-reset after the unpair.
Am-I doing something wrong ?

Using chip-tool v2023.12.07 (pre-compiled by Nordic) on a Raspberry Pi, end device is an nRF5340-DK running Zephyr Matter samples.

Platform

raspi

Platform Version(s)

No response

Type

Manually tested with SDK

(Optional) If manually tested please explain why this is only manually tested

No response

Anything else?

No response

@bzbarsky-apple
Copy link
Contributor

It sounds like your issue is that DNS-SD caches things, in general. When a device is unpaired it should send packets to clear its existing advertisements from the caches, but it sounds like your specific device is not doing that.

@Damian-Nordic do we send goodbye packets for Thread devices that would clear out the records on the SRP server?

@Damian-Nordic
Copy link
Contributor

When we remove a fabric that is not the last fabric, then the corresponding operational service is unregistered from the border router (SRP server) and the BR sends goodbye packets on the backbone network.

But if the last fabric is removed and we drop the network credentials then the BR will keep the stale records for the next 2 hours (by default). I guess the solution for now is to disable the extended discovery so that the commissionable node service is removed once the device is commissioned. In the meantime, we can add removal of services on the factory reset like Silabs recently did: #31606. This will delay the factory reset up to a few seconds but will help in most cases.

@bzbarsky-apple
Copy link
Contributor

@Damian-Nordic Extended discovery is not the issue here; it's the operational lookup that fails... Apparently the new advertisements for the new node with the same ID don't clear out the old advertisements that have a stale IP?

@bzbarsky-apple
Copy link
Contributor

Or possibly the new SRP registration happens after the client has already looked up things and found the stale data....

@Damian-Nordic
Copy link
Contributor

@bzbarsky-apple Are you sure given this:

it pairs over BLE but at the moment chip-tool tries to reach it over Thread

I understood that @bdlr2 was unhappy that the commissioner reached the device over IP for commissioning. Which, by the way, shouldn't be a big issue but disabling the extended discovery should help with that.

@bdlr2
Copy link
Author

bdlr2 commented Jan 26, 2024

@Damian-Nordic the problem is not that the commissioner tries to reach the device over IP, the problem is that it tries to reach it over a stale IP route. I think that the commissioner or the device should do what is needed to make the network forget the link between its identifier and the corresponding IP address, as it gets un-paired from the Matter fabric.

@Damian-Nordic
Copy link
Contributor

Damian-Nordic commented Jan 26, 2024

@bdlr2 Well I think I am not saying anything different than you - at this point the device shouldn't have any commissionable node service registered so if the commissioner reaches it over IP then it must be because of stale data on the border router, right? I suggested the solutions though and it will be fixed before the next nRF Connect SDK release.

@bdlr2
Copy link
Author

bdlr2 commented Jan 26, 2024

@Damian-Nordic I think so yes. Alright, if it's fixed in the nRF Connect SDK unpair/factory reset procedure that's fine !

@bzbarsky-apple
Copy link
Contributor

@Damian-Nordic Well, extended discovery bits with CM=0 should not affect discovery over BLE at all... So if they are, something is very broken somewhere.

@Damian-Nordic
Copy link
Contributor

@bzbarsky-apple Ah, sorry, you're right.
@bdlr2 It seems I misunderstood the issue, so if this is about the operational discovery, then it's not a real-life issue as commissioners other than chip-tool will typically not use the same node ID to commission the device for the second time. Still, this can be improved but for now I encourage you to change the node ID that you use pass to pair command after each unpair or factory reset.

@Yinxq
Copy link

Yinxq commented Feb 27, 2024

Hi, @Damian-Nordic, I am having the same problem. I found that deleting chip_tool_config.alpha.ini under /tmp solves the problem, which means I can pair with the same node ID. But I don't really understand why.

@Damian-Nordic
Copy link
Contributor

@Yinxq You can read how Service Registration Protocol (SRP) works but overall, if one device claims that it's the owner of a given service "ABC-001._matter._tcp", then if another device tries to register the same service, then the SRP server (typically border router) rejects the request.

When you remove the last fabric on a given device, the device must remove all information provisioned using Matter, which includes OpenThread persistent data. The OT persistent data contains the SRP keypair that is used as an identity and proof that the device owns the given service. Hence when the device tries to register "ABC-001._matter._tcp", the SRP server thinks the service is already reserved by another device.

When you remove /tmp/chip*, chip tool generates a new fabric ID so on the next commissioning attempt, the device will try to register some "DEF-001._matter._tcp" so it will be accepted.

I hoped that #32215 would help in the way that the device would unregister its service before completing the factory reset. Have you tested with that change?

@Yinxq
Copy link

Yinxq commented Feb 28, 2024

@Damian-Nordic Thanks a lot for your quick and detailed answer! I will try the change in #32215.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants