Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NRZ-2019-127 broke my sensor #580

Closed
skibbipl opened this issue Dec 3, 2019 · 33 comments
Closed

NRZ-2019-127 broke my sensor #580

skibbipl opened this issue Dec 3, 2019 · 33 comments

Comments

@skibbipl
Copy link

skibbipl commented Dec 3, 2019

Since latest version my sensor stoped working. I can see on the router (Mikrotik) that it connects to my wifi network but I cannot enter the setup page either sensor sends any data outside. What is strange that since upgrade senor stopped getting ARP entries on the router.
I downgraded to NRZ-2018-123B and everything is working again. Any tips how to debug this issue?

@dirkmueller
Copy link
Collaborator

Do you see an open wifi access point airrohr-xxxxxx (that being some number) ?

@dirkmueller
Copy link
Collaborator

I would really appreciate the serial output of the sensor with the new firmware. You can capture that with screen under macos or Linux or the Arduino ide in general

@dirkmueller
Copy link
Collaborator

dirkmueller commented Dec 4, 2019

Did you have the same issue with 2019-125-B1 as well?

@dirkmueller
Copy link
Collaborator

dirkmueller commented Dec 4, 2019

you can download and flash the 125 (previous stable release, online between october 31st and december 2nd) from here: https://www.madavi.de/sensor/update/data/previous/NRZ-2019-125-B1/

@skibbipl
Copy link
Author

skibbipl commented Dec 4, 2019

This is what I get from arduino serial port monitor:

22:25:55.487 -> (garbage)Airrohr: NRZ-2019-127-1
22:25:55.659 -> mounting FS...
22:25:55.762 -> opened config file...
22:25:55.762 -> parsed json...
22:25:55.797 -> output debug text to displays...
22:25:56.003 -> Connecting to <redacted_my_wifi_network>
22:25:56.481 -> ...................
22:26:06.243 -> WiFi connected, IP is: 192.168.1.27
22:26:06.243 -> Starting Webserver... 192.168.1.27
22:26:06.276 -> 
22:26:06.276 -> ChipId: 9925909
22:26:06.276 -> Start reading SDS011 version date
22:26:07.632 -> End reading SDS011 version date
22:26:07.667 -> Read SDS...: 18-11-16(ee74)
22:26:07.767 -> Stopping SDS011...
22:26:07.767 -> Read BMP280/BME280...
22:26:07.802 -> Trying BMP280/BME280 sensor on 76 ... found
22:26:07.836 -> Send to :
22:26:07.870 -> sensor.community
22:26:07.870 -> Madavi.de
22:26:07.904 -> custom API
22:26:07.904 -> ----
22:26:07.904 -> Auto-Update active...
22:26:07.939 -> validate request auth...
22:26:07.939 -> ws: root ...

And then it dies :(
Update: downgraded to NRZ-2019-125-B1 and everything is fine again.

@dirkmueller
Copy link
Collaborator

Since you seem to have a setup to.manually flash.. can you try if manually flashing this firmware works? Please make sure to temporarily turn off auto update in config prior doing that otherwise it will OTA on boot to the previous version.

https://firmware.sensor.community/airrohr/beta/latest_en.bin

There are also other language versions, just pick one that you want.

There is a bugreport against Arduino core that wifi is not working after OTA while it works after manually flashing via serial.

@dirkmueller
Copy link
Collaborator

dirkmueller commented Dec 4, 2019

also, I see the message "validate request auth" - that means you have set a password for the webui. are you sure there isn't some "authentication" dialog hidden somewhere in some other browser tab waiting for input?

can you share some details about the user/password you ahve set? length or maybe other special things?
I tried this with the user "admin" and the password "rfWSYs82pzVZrHKrfWSYs82pzVZrHKrfWSYs82pzVZrHK" (random string, just long). and that seems to work..

@skibbipl
Copy link
Author

skibbipl commented Dec 5, 2019

Regarding web auth - yes I put some simple 8 char password with one special char from the range: !@#$%^&*()-=_+. And just for a sec after rebooting I'm able to enter webui, but then it stops working.
I'll try to manually flash latest version in the evening and I'll report back.

@dirkmueller
Copy link
Collaborator

So, just to be clear: the web authentication succeeds and the webpage is loading successfully?

@dirkmueller
Copy link
Collaborator

https://github.com/opendata-stuttgart/sensors-software/blob/master/airrohr-firmware/Readme.md#ben%C3%B6tigte-software-in-klammern-getestete-version-und-die-art-der-lizenz

when you configure arduino with the esp8266 integration (you need to add the url to the boardmanager)
then you'll have a tool called esptool.py in your %arduinoinstalldir%/Arduino15/./packages/esp8266/hardware/esp8266/2.6.2/tools/esptool/esptool.py

location. this can be used to flash firmware with:

esptool.py  --chip auto --port $port --baud 460800 write_flash -fm dio 0x00000 $firmware.bin

@dirkmueller
Copy link
Collaborator

Also, can you please disable the sds011 sensor in configuration menu from a firmware version that works and then update to the newer version (for example by enabling auto-update and "use beta channel" both at the same time). Also, please set debug level to 5 and capture the debug output from serial again, maybe this gives a clue on where it gets stuck.

when it is stable without sds011 we have the first hint of where to look.

@skibbipl
Copy link
Author

skibbipl commented Dec 5, 2019

the web authentication succeeds and the webpage is loading successfully?

Only for a second and then everything dies, also serial debugging stops.
I will try in the evening this upgrade with disabled SDS011.

@skibbipl
Copy link
Author

skibbipl commented Dec 6, 2019

Disabled SDS011 and enabled update. Still dies:

12:53:38.867 -> ⸮⸮Found firmware MD5: 40cec42ccbac05f147e486d1e4822704
12:53:39.072 -> 
12:53:44.886 -> Moving Firmware image to old.
12:53:47.090 -> Finished successfully.. Rebooting!
12:53:47.634 -> ?�)⸮Lr⸮D⸮(⸮⸮Airrohr: NRZ-2019-128-B2
12:53:53.633 -> mounting FS...
12:53:53.736 -> opened config file...
12:53:53.736 -> parsed json...
12:53:53.771 -> Rewriting old config from: NRZ-2019-125-B1
12:53:53.804 -> Saving config...
12:53:53.976 -> Config written successfully.
12:53:53.976 -> output debug text to displays...
12:53:54.216 -> Connecting to <REDACTED>
12:53:54.664 -> ...................
12:54:04.430 -> WiFi connected, IP is: 192.168.1.27
12:54:04.465 -> Starting Webserver... 192.168.1.27
12:54:04.500 -> 
12:54:04.500 -> ChipId: 9925909
12:54:04.500 -> Read BMP280/BME280...
12:54:04.534 -> Trying BMP280/BME280 sensor on 76 ... found
12:54:04.604 -> Send to :
12:54:04.604 -> sensor.community
12:54:04.604 -> Madavi.de
12:54:04.638 -> custom API
12:54:04.638 -> ----
12:54:04.638 -> Auto-Update active...

Same behavior with disabled BME280.

@ricki-z
Copy link
Member

ricki-z commented Dec 6, 2019

You are sending to a "custom API". Is this API using HTTPS? If yes, is the certificate 2048 Bit or lower?

@skibbipl
Copy link
Author

skibbipl commented Dec 6, 2019

Yes, I use https - standard Let's Encrypt RSA certificate with 2048 bits according to Firefox.

@dirkmueller
Copy link
Collaborator

Well, that should not kick in until there's an actual measurement.

Is it actually trying to do a measurement cycle after the 2 minutes or so if uptime if you don't try to access the webui?

@dirkmueller
Copy link
Collaborator

see esp8266/Arduino#6886

@mika
Copy link

mika commented Dec 7, 2019

I'm also affected (also have a Mikrotik router in my setup and send data to a local InfluxdDB), downgrading to NRZ-2018-123B from https://www.madavi.de/sensor/update/data/previous/NRZ-2018-123B/ worked for me (tried also 2019-125-B1 but AFAICS I also have troubles there).

@dirkmueller
Copy link
Collaborator

the interesting aspect here is that NRZ-2018-123B and 2019-125-B1 use the same Arduino core version (2.4.2) with the same wifi stack. Can you please try the ARP nping with those versions and tell us which one works which one doesn't (and how well)?

@dirkmueller
Copy link
Collaborator

@skibbipl
Copy link
Author

skibbipl commented Dec 8, 2019

I tried both firmwares with no luck. After booting it dies same way as 127 version.

@dirkmueller
Copy link
Collaborator

Thanks. can you try https://static.dmllr.de/airrohr/beta/builds-2019-126-B4/ ?

@dirkmueller
Copy link
Collaborator

also, I am not really sure I understand "it dies" correctly. is it merely not responding to http requests, or also not sending data?

can you load the build from https://static.dmllr.de/airrohr/beta/builds-128-B1-debug-alive/ and paste the last few dozen lines of text when it "dies" from serial console? it has wifi debug enabled as well as will print a message multiple times a second.

@skibbipl
Copy link
Author

NRZ-2019-126-B4 works fine. Regarding "dies" I mean that after initial boot COM debgging returns nothing. Working version keeps pushing following messages:

18:58:13.166 -> Start reading SDS011
18:58:13.166 -> End reading SDS011
18:58:14.156 -> Start reading SDS011
18:58:14.156 -> End reading SDS011
18:58:15.178 -> Start reading SDS011
18:58:15.178 -> End reading SDS011

Version 128 provides log below (repeated all the time). It seems that after 4 minutes (19:08 - 19:12) it started working OK!

19:08:07.737 -> pm open,type:0 0
19:08:07.944 -> ⸮⸮⸮b⸮L⸮⸮D⸮(?
19:08:08.115 -> SDK:2.2.2-dev(38a443e)/Core:2.6.2=20602000/lwIP:STABLE-2_1_2_RELEASE/glue:1.2-16-ge23a07e/BearSSL:89454af
19:08:08.217 -> Airrohr: NRZ-2019-127-1
19:08:08.251 -> mounting FS...
19:08:08.251 -> scandone
19:08:08.251 -> state: 0 -> 2 (b0)
19:08:08.286 -> state: 2 -> 3 (0)
19:08:08.319 -> state: 3 -> 5 (10)
19:08:08.319 -> add 0
19:08:08.319 -> aid 1
19:08:08.353 -> cnt 
19:08:08.353 -> opened config file...
19:08:08.353 -> 
19:08:08.387 -> connected with <redacted>, channel 9
19:08:08.421 -> dhcp client start...
19:08:08.421 -> wifi evt: 0
19:08:08.455 -> ip:192.168.1.27,mask:255.255.255.0,gw:192.168.1.1
19:08:08.490 -> wifi evt: 3
19:08:08.525 -> parsed json...
19:08:08.525 -> output debug text to displays...
19:08:08.560 -> state: 5 -> 0 (0)
19:08:08.595 -> rm 0
19:08:08.595 -> wifi evt: 1
19:08:08.595 -> STA disconnect: 8
19:08:08.630 -> del if0
19:08:08.630 -> usl
19:08:08.630 -> mode : null
19:08:08.664 -> wifi evt: 8
19:08:08.664 -> sleep disable
19:08:08.732 -> mode : sta(12:34:56:78:90:ab)
19:08:08.732 -> add if0
19:08:08.767 -> Connecting to <redacted>
19:08:08.800 -> wifi evt: 8
19:08:09.207 -> .....scandone
19:08:12.475 -> state: 0 -> 2 (b0)
19:08:12.475 -> .state: 2 -> 3 (0)
19:08:12.510 -> state: 3 -> 5 (10)
19:08:12.510 -> add 0
19:08:12.510 -> aid 1
19:08:12.545 -> cnt 
19:08:12.545 -> 
19:08:12.545 -> connected with <redacted>, channel 9
19:08:12.578 -> dhcp client start...
19:08:12.611 -> wifi evt: 0
19:08:12.611 -> ip:192.168.1.27,mask:255.255.255.0,gw:192.168.1.1
19:08:12.680 -> wifi evt: 3
19:08:12.680 -> .
19:08:12.680 -> WiFi connected, IP is: 192.168.1.27
19:08:12.680 -> Starting Webserver... 192.168.1.27
19:08:12.714 -> 
19:08:12.749 -> ChipId: 9925909
19:08:12.749 -> Start reading SDS011 version date
19:08:14.085 -> End reading SDS011 version date
19:08:14.119 -> Read SDS...: 18-11-16(ee74)
19:08:14.221 -> Stopping SDS011...
19:08:14.221 -> Read BMP280/BME280...
19:08:14.254 -> Trying BMP280/BME280 sensor on 76 ... found
19:08:14.288 -> Send to :
19:08:14.321 -> sensor.community
19:08:14.321 -> Madavi.de
19:08:14.321 -> custom API
19:08:14.321 -> ----
19:08:22.465 -> pm open,type:0 0
19:08:22.670 -> ?�⸮F⸮()⸮DHf⸮
19:08:22.876 -> SDK:2.2.2-dev(38a443e)/Core:2.6.2=20602000/lwIP:STABLE-2_1_2_RELEASE/glue:1.2-16-ge23a07e/BearSSL:89454af
19:08:22.946 -> Airrohr: NRZ-2019-127-1
19:08:22.981 -> mounting FS...
19:08:22.981 -> scandone
19:08:22.981 -> state: 0 -> 2 (b0)
19:08:23.016 -> state: 2 -> 3 (0)
19:08:23.051 -> state: 3 -> 5 (10)
19:08:23.051 -> add 0
19:08:23.051 -> aid 1
19:08:23.086 -> cnt 
19:08:23.086 -> opened config file...
19:08:23.086 -> 
19:08:23.120 -> connected with <redacted>, channel 9
19:08:23.155 -> dhcp client start...
19:08:23.155 -> wifi evt: 0
19:08:23.190 -> ip:192.168.1.27,mask:255.255.255.0,gw:192.168.1.1
19:08:23.225 -> wifi evt: 3
19:08:23.260 -> parsed json...
19:08:23.260 -> output debug text to displays...
19:08:23.294 -> state: 5 -> 0 (0)
19:08:23.329 -> rm 0
19:08:23.329 -> wifi evt: 1
19:08:23.329 -> STA disconnect: 8
19:08:23.362 -> del if0
19:08:23.362 -> usl
19:08:23.362 -> mode : null
19:08:23.397 -> wifi evt: 8
19:08:23.397 -> sleep disable
19:08:23.466 -> mode : sta(12:34:56:78:90:ab)
19:08:23.466 -> add if0
19:08:23.500 -> Connecting to <redacted>
19:08:23.533 -> wifi evt: 8
19:08:23.941 -> .....scandone
19:08:27.206 -> state: 0 -> 2 (b0)
19:08:27.206 -> .state: 2 -> 3 (0)
19:08:27.240 -> state: 3 -> 5 (10)
19:08:27.240 -> add 0
19:08:27.240 -> aid 1
19:08:27.274 -> cnt 
19:08:27.274 -> 
19:08:27.274 -> connected with <redacted>, channel 9
19:08:27.309 -> dhcp client start...
19:08:27.343 -> wifi evt: 0
19:08:27.343 -> ip:192.168.1.27,mask:255.255.255.0,gw:192.168.1.1
19:08:27.377 -> wifi evt: 3
19:08:27.377 -> .
19:08:27.377 -> WiFi connected, IP is: 192.168.1.27
19:08:27.410 -> Starting Webserver... 192.168.1.27
19:08:27.444 -> 
19:08:27.479 -> ChipId: 9925909
19:08:27.479 -> Start reading SDS011 version date
19:08:28.812 -> End reading SDS011 version date
19:08:28.847 -> Read SDS...: 18-11-16(ee74)
19:08:28.947 -> Stopping SDS011...
19:08:28.947 -> Read BMP280/BME280...
19:08:28.981 -> Trying BMP280/BME280 sensor on 76 ... found
19:08:29.015 -> Send to :
19:08:29.049 -> sensor.community
19:08:29.049 -> Madavi.de
19:08:29.083 -> custom API
19:08:29.083 -> ----
<cut>
19:12:24.551 -> SNTP sync finished: Tue Dec 10 18:12:12 2019
19:12:24.586 -> 
19:12:24.620 -> Alive! at 6413
19:12:24.620 -> 33240
19:12:24.655 -> Alive! at 6481
19:12:24.655 -> 33240
19:12:24.725 -> Alive! at 6551
19:12:24.725 -> 33240

@dirkmueller
Copy link
Collaborator

Ok, this is great news. this means NTP is simply not possible in your wifi setup. We can fix that.

@dirkmueller
Copy link
Collaborator

@skibbipl please test if https://static.dmllr.de/airrohr/beta/builds-NRZ-2019-128-B3/ resolves that issue.

@skibbipl
Copy link
Author

Looks good:

23:28:49.284 -> output debug text to displays...
23:28:49.515 -> Connecting to <redacted>
23:28:49.957 -> .......
23:28:53.711 -> WiFi connected, IP is: 192.168.1.27
23:28:53.746 -> Starting Webserver... 192.168.1.27
23:28:53.781 -> 
23:28:53.781 -> ChipId: 9925909
23:28:53.781 -> Start reading SDS011 version date
23:28:55.105 -> End reading SDS011 version date
23:28:55.139 -> Read SDS...: 18-11-16(ee74)
23:28:55.273 -> Stopping SDS011...
23:28:55.273 -> Read BMP280/BME280...
23:28:55.306 -> Trying BMP280/BME280 sensor on 76 ... found
23:28:55.340 -> Send to :
23:28:55.374 -> sensor.community
23:28:55.374 -> Madavi.de
23:28:55.374 -> custom API
23:28:55.374 -> ----
23:28:56.366 -> Start reading SDS011
23:28:56.366 -> End reading SDS011

Also as you mentioned NTP I found following info in Mikrotik SNTP Client:

Last Bad Packet From | 192.168.1.27
Last Bad Packet | 04:32:55 ago
Last Bad Packet Reason | server-ip-mismatch

@dirkmueller
Copy link
Collaborator

dirkmueller commented Dec 10, 2019

thanks. googling that error message leads to: https://de.scribd.com/document/78877210/NTP-Server-Local-Mikrotik

which could be the reason why you're having issues with NTP?

The problem is without NTP we can not really validate SSL certificates (needed for secure data sending as well as secure PTA) as we have no valid system time. so it is sort of important to get a solution for this :/

dirkmueller added a commit to dirkmueller/sensors-software that referenced this issue Dec 10, 2019
…gart#580)

When NTP is not (yet) completing, we need to still
process the rest of the loop() otherwise watchdog might
kill us or the webserver is not responding.
@skibbipl
Copy link
Author

I have two NTP servers (both on Raspberry Pi) in my local LAN, however for IoT devices I use dedicated wifi hotspot with blocked access to LAN. Perhaps my Mikrotik broadcasts info about NTP servers in LAN but the sensor cannot access them and therefore gets confused?

@dirkmueller
Copy link
Collaborator

No, we use hardcoded ntp servers in the internet. It seems by default these routers come with firewall / NAT rules that are intended to do NTP on the router but accidentally also apply to lan packets reaching for NTP, which causes them to be dropped . That's how I read the description above.

Anyway, thanks a ton for your help in chasing this down!

@mika
Copy link

mika commented Dec 13, 2019

@dirkmueller sorry for the delay on my side but I wasn't in front of the device any longer back then and busy with other stuff. Great debugging and impressive turnaround time for the fix, thanks! 👍

BTW, the issue is closed but I don't see the related fix neither in https://github.com/opendata-stuttgart/sensors-software nor as pending PR here, what's the suggested procedure to get this fix for us (except for using https://static.dmllr.de/airrohr/beta/builds-NRZ-2019-128-B3/)?

@dirkmueller
Copy link
Collaborator

The best solution is to work on getting NTP packet routing working in your setup. Without time anything the sensor does including updating itself is insecure because it can not validate certificates.

We need to wait for vacation season to end to get a new beta published. This will not go out as a stable release this year because we have another issue that needs to be fixed before we can do a new rollout.

@mika
Copy link

mika commented Dec 13, 2019

Hm that's interesting, I'm not aware of any problems related to NTP with any other clients in my network. hmmm

Ah ok, thanks for clarification, was just wondering whether anything was forgotten or so. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants