Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot pair anything to my zigbee network #18426

Closed
quarcko opened this issue Jul 25, 2023 · 14 comments
Closed

Cannot pair anything to my zigbee network #18426

quarcko opened this issue Jul 25, 2023 · 14 comments
Labels
problem Something isn't working stale Stale issues

Comments

@quarcko
Copy link

quarcko commented Jul 25, 2023

What happened?

I have a probably pretty standart network of 64 devices, about 12 routers scattered evenly around the house.
Routers are Aqara (Xiaomi) switches and power outlets. All other devices are mix of Aqara, Tuya, Lidl, IKEA.
Everything worked really well for some time (more than a year). AH yes my coordinator is zzh (CC2652R Stick)
it is flashed to latest firmware and it is on external USB wire away from WIFI antennas and other stuff.

So lets start with small issue that i have - during a year i usually have 3-4 full day power disconnections.
During that time all my routers are powered off for a few hours or sometimes full day.
My HA and other server stuff sits on powerfull UPS that is able to run them for whole power outage..

After power is restored now i see that many "end devices" skipped routers and became directly connected to coordinator.
And even as time passes by then do not "re-route" via coordinators. If i do a network scan now
most of my end devices are "siblings" to coordinator with linkquality "0" and have no other connections to anything else
but THEY ALL WORK FINE :) except...

a single Xiaomi temperature sensor fell off the network recently, and here the real problem began - i cannot pair it back!
i tried multiple ways, restarting z2m, unplugging coordinator and restarting at the same time.
tried pairing with only specific routers as well - no luck.
Also i started to think that maybe this end device is broken so i opened new fresh temperature sensor never paired
also - nothing! does not pair.

zigbee log even set to debug with "herdsman" debug also enabled show no trace of any attempt to pair.
now im stuck with setup that is unable to pair ANYTHING at all, network map is crap, but otherwise healthy fast
working network :)

any thoughts?
can zigbee network "heal" like zwave does? can we somehow force this "healing"?
maybe coordinator is confused after all those power outages and now all his "direct slots" are taken so it doesn't accept new devices?
Can we force re-routing through routers?
Can coordinator be somehow "soft" reset without the need to re-pair everything?

and lastly - if all the answers are negative - how to perform full reset of the stick, if that's my last resort?

But still i thing there should be a more convienient wat of healing network and forcing "re-routing" via routers...

What did you expect to happen?

No response

How to reproduce it (minimal and precise)

No response

Zigbee2MQTT version

1.32.1

Adapter firmware version

20230507

Adapter

zig-a-zig-ah!

Debug log

No response

@quarcko quarcko added the problem Something isn't working label Jul 25, 2023
@perolse
Copy link

perolse commented Jul 27, 2023

Seems to be the same problem that I cannot pair anything in my system. I have a SONOFF Zigbee 3.0 USB Dongle Plus-P

@francisp2
Copy link

Stop Zigbee2MQTT for 30 minutes, but leave your routers on. That should heal your map.

@polsup2
Copy link

polsup2 commented Aug 1, 2023

Exactly the same problem with Xiaomi temperature sensors WSDCGQ11LM reconnection. Also after power disconnection.

First, I thought of hardware malfunction. Opened new sensor from the box. Can't join it too.

Coordinator: ZigStar LAN
Z2m: 1.32.1

In addition to rebooting z2m, HA, rolled back coordinator firmware from 20230507 to 20221226. Erased NVRAM.
Changed batteries in sensors, tested them with multimeter. 3.1v - all ok.

No effect. Still Can't connect WSDCGQ11LM.

Didn't try to roll back z2m itself.

@quarcko
Copy link
Author

quarcko commented Aug 7, 2023

So i've tried the suggestion to stop Z2M (with unplugged coordinator) for 30-60 minutes.
After turning back - yeah, map is definitely better as devices rerouted.

But i have lost 2 more devices from the network: Aqara motion sensor and Tuya Remote.
And now i cannot pair them back.

What else happened:
I bough 2 new devices for testing:
NOUS power socket A1Z - which is router and IT PAIRED ok immediately.
NOUS Remote Knob - which is battery end device - no luck pairing it, nothing happens.

So it seems routers do pair (or at least one paired ok)
but no luck with end devices, from 3 different manufacturers.

HELP!

@quarcko
Copy link
Author

quarcko commented Aug 7, 2023

What will happen if i remove
data/coordinator_backup.json
and restart Z2M?

@quarcko
Copy link
Author

quarcko commented Aug 9, 2023

So after a research i have found my problem via this problem report:
#10339

I have used a script that was provided on the answer here:
#10339 (comment)

And i have found that 20 routers were not present inside "coordinator_backup".
So i went and removed and re-paired all of them.
After doing so i did same for about 10 missing end devices in the list.
And voila! I am able to pair again into my network, everything is working well!

So here are my thoughts:
@Koenkk - Could you add this check to zigbee2mqtt internally so in the UI you can flag those
devices that are potentially not correctly paired? it would be so easy then to just go through
and repair a network! Very useful tool, please consider adding it into your code and UI signalling a problem.

Other thoughts:

  1. I migrated from CC2531 to CC2652R as advertised by user manual without doing a re-pairing.
    and i assume that this might be a problem, the new stick probably cannot get neccesary information
    for building "coordinator_backup" without re-pairing all your items. Please consider checking this assumption
    and correcting the manual, so it does not "lie" that re-pairing is not needed and you end up with this half working network.

  2. Maybe my occasional power outages make the coordinator "loose" connections with routers?
    and then "coordinator_backup" is incomplete after some time and i end up with this situation.
    this assumption i will test myself during next power disconnection when it happens.

In any case, incorporating this "compare-check" mechanism and flagging potentially not correctly paired devices
would be enormous help for all users out there!

Also in case the problem arises because of power outages and disconnections from power:
consider different mechanisms on how to make "coordinator_backup.json" file:
Because now as i understand you just download from the stick the data and save it to file, overwriting changes.
And if stick had incorrect routing table with lost devices - you end up with incorrect backup.

Maybe it's better way to do it - only append "coordinator_backup" with new devices and do not remove
old ones? maybe except these situation when user manually requested device to leave?
And also build this backup on each PAIR / UNPAIR event, so it keeps track of added removed devices.

Issue resolved, but i am not closing it - @Koenkk please react, comment and close when you feel neccessary.

Thank you!

@Koenkk
Copy link
Owner

Koenkk commented Aug 9, 2023

@quarcko great that you solved the issue. I'm indeed aware of this issue, for some unknown reason, devices go missing from the backup, if you then re-flash this problem occurs. I also experience this issue in my prod network.

Maybe it's better way to do it - only append "coordinator_backup" with new devices and do not remove
old ones?

This is actually a very smart solution which I didn't think about yet. Basically we shouldn't remove any devices which are still in the data/database.db. (what do you think @castorw?).

I will look into this (to add both the check and backup fix), let's leave the issue open.

@cweilguny
Copy link

I have the exact same issue, but without a solution yet. I'm on "1.32.2 commit: 1ec1e57".

During the last days I rebuilt my homelab setup, so some of the components where powered down for a couple of minutes, sometimes for 1-2 hours. All worked fine, until one Aqara WSDCGQ11LM temperature sensor stopped sending values three days ago. So I deleted it, but couldn't repair. A new one was already on the way as I needed one also for another location, this one also couldn't pair. Changed battery multiple times in both sensors, no success. Had Z2M down for 45 minutes, no change. By using the script from #10339 (comment) I found one router device not correctly paired, a Philips Hue light bulb. I removed it from Z2M and repaired it. That worked, but the Aqara sensors still don't pair. I remember those devices always being a bit difficult. In the past they at least started interview, but then failed finishing it, also when I kept pushing the button during interview. Now it even doesn't start interview.

@Koenkk
Copy link
Owner

Koenkk commented Aug 14, 2023

The coordinator check is now supported directly from Zigbee2MQTT: #18599

I will continue further on the backup side (don't remove devices from backup which are still in the data/database.db file)

Changes will be available in the dev branch in a few hours from now. (https://www.zigbee2mqtt.io/advanced/more/switch-to-dev-branch.html)

@cweilguny
Copy link

cweilguny commented Aug 14, 2023

Two SONOFF SNZB-02D arrived today, no luck with them. Nothing happens when I permit join on all devices and start pairing mode on the SONOFF. As the OP could resolve it via rejoining missing routers, and that didn't change anything for me, I guess it's another issue in my case.

@quarcko
Copy link
Author

quarcko commented Aug 17, 2023

Firstly, thanks @Koenkk for implementing this into z2m, lets hope it will be nice addition so we keep network healthy.
@cweilguny - Check this thread that i linked in my replay above, there also might be and issue with EXT PANID.

Now about creating "coorindator_backup" - im not really an expert but i think this flow would be nice, if its possible:

  1. Once every 24H (maybe around 1AM) download backup from coordinator and
    A) Run new procedure of storing it without removing devices if they are still in devices.db
    B) If "current" downloaded backup HAS missing devices, BUT they are present in backup on disk
    flag those devices as "warning" - coordinator data incomplete - but can be restored from saved backup
    C) If "current" downloaded backup has missing device AND backup on disk is also missing this device
    flag it as "error" completely missing and device has to be removed and "re-paired".

Maybe procedure "B" can be done silently if hardware allows it:
after compiling new backup and realizing that file on disk is "more" complete than the one on chip
just sielntly upload it back to device? would it require hardware reset or z2m reboot i dont know
but if it's possible then no need to annoy users with "warning" level of device.
and we are left with just those "errorneous" devices that are missing from both "on device" backup and "on disk" backup.

just my 2 c.

@Koenkk
Copy link
Owner

Koenkk commented Aug 17, 2023

@quarcko we currently don't know when devices go missing, on re-flash or also without it. With this check being in place now, this can be investigated further.

@cweilguny
Copy link

cweilguny commented Aug 26, 2023

Still digging in the dark with this. According to the python script, there are no missing routers. It only shows 5 devices "knwown by the controller but not present on your Zigbee network". When looking for a way to remove them from my controller (i guess the script means "coordinator"?), I read the NVRAM with the ZigStar GW Multi tool and maybe found something about the extended_pan_id that @quarcko mentioned: In my coordinator_backup.json it shows coordinator_ieee and extended_pan_id both being "00124b002590e43c". In the NVRAM json of my coordinator there are three keys "EXTADDR", "EXTENDED_PAN_ID" and "APS_USE_EXT_PANID", that show the value "3ce49025004b1200". Shouldn't the values there and in coordinator_backup.json be the same? Could this be the reason that pairing doesn't work?

EDIT: Oh, both values are the same, but in reverse order of hex pairs o_0.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

@github-actions github-actions bot added the stale Stale issues label Sep 26, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
problem Something isn't working stale Stale issues
Projects
None yet
Development

No branches or pull requests

6 participants