Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WD reboot] Reconnect to WiFi AP cause WD reboot #6266

Closed
5 of 6 tasks
TD-er opened this issue Jul 6, 2019 · 3 comments
Closed
5 of 6 tasks

[WD reboot] Reconnect to WiFi AP cause WD reboot #6266

TD-er opened this issue Jul 6, 2019 · 3 comments
Labels
waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.

Comments

@TD-er
Copy link
Contributor

TD-er commented Jul 6, 2019

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: [ESP-12|other]
  • Core Version: [core 2.5.2 and core 2.6.0 latest]
  • Development Env: [Platformio|]
  • Operating System: [Windows|Ubuntu|]

Settings in IDE

  • Module: [Generic ESP8266 Module|]
  • Flash Mode: [dio]
  • Flash Size: [4MB]
  • lwip Variant: [v2 Lower Memory|Higher Bandwidth]
  • Reset Method: [nodemcu]
  • Flash Frequency: [40Mhz]
  • CPU Frequency: [80Mhz]
  • Upload Using: [SERIAL]
  • Upload Speed: [115200] (serial upload only)

Problem Description

This has been discussed also in other issues like here, but since it does appear to be something different and also to keep the information in a single issue I thought it would be better to make a separate issue for it.

The problem is, for over a year now, I get lots of reports of WD reboots which are really hard to reproduce.
I now have found a way to reproduce it on my nodes using the disconnect feature of my (MikroTik) AP and narrowed it down to somewhere in the code used to (re)connect to WiFi.

There may still be several issues at stake here, and for sure it will not be the only reason for WD reboots, but I believe it is responsible for lots of them.

What I found is that the WD-reboots do happen right at the moment the WiFi connection transitions from "connected" to "Got IP".
The same transition does happen when static IP is being used.
This can happen at the first connect attempt or at any other reconnect, but it doesn't happen always.

The standard WiFi connection cycle can be seen as several stages:

  • Connect to AP
  • Authenticate connection => When successful, event "connected" is fired.
  • Setup network configuration => When successful, event "got IP" is fired.

Every now and then the last step does halt the system for some reason. Not even the loop() function is called then and thus a WD-reboot.
I do not see the last event happening in all these situations where the WD reboot occur.

To trigger it, I force a WiFi disconnect from my AP (a MikroTik) for the node I'm testing.
Sometimes the disconnect is not even seen at the ESP. There is no disconnected event fired and the connection just continues like nothing has happened.
It also may just do a disconnect after which the ESP does perform a new reconnect and continues work.
But every now and then the reconnect process does lead to a WD reboot.

A WiFi disconnect is just something that's very normal.
For example if the WiFi AP does change channels, or if there have been too many errors reported (can also be in the transmission of another client).

A lot of circumstances reported by users also can be explained with a WiFi reconnect happening:

  • Bad/poor WiFi reception will lead to more frequent WD reboots
  • Lots of network traffic on the node are more likely to see more WD reboots. (e.g. nodes running MQTT receive also messages from others)
  • Automatic WiFi channel on the AP does increase WD reboots.
  • Web page response is terribly slow right before WD reboot. (missing packets => AP may request reconnect)

What I've already tried:

  • Running network setup through WiFi events
  • DHCP/static IP (the last one seems to be more unstable, more on that later)
  • Using a while loop after calling the begin(..) and tracking WiFi.status() and not doing anything other than logging output and calling delay(100)
  • WiFi STA off as soon as disconnect event received and wait 100 msec after turning STA mode on again before doing anything.
  • disable use extra 4k RAM
  • Core 2.5.2 & stage with SDK222x and SDK222y (reports go back to core 2.4.x)
  • Testing if loop() is called but also running my own while loop waiting.
  • Setting the automatic reconnect enabled and disabled.
  • Test other LWIP2 options (high bandwidth, low memory etc.) Not yet tested LWIP 1.4

Another thing, which may or may not be related.
I noticed that explicily setting the IP config to all zeroes (IP, gateway, DNS....) will give these WD reboots almost always.
Even at the first connect attempt which should be successful. Only once in N times it was successful with N sometimes hitting 100 times.
When not setting the IP config but just calling the usual functions the first attempt was almost always successful (9/10 at least).

MCVE Sketch

I will try to make a simple sketch for this

Debug Messages

Debug messages go here
@TD-er
Copy link
Contributor Author

TD-er commented Jul 28, 2019

Just as a reference to myself (and anyone interested ;) )
I am still testing, but I followed these steps noted here: #6172 (comment)
And until now the unit is still working even after more than 10x kicking it off the AP.

In short, the code handling this in ESPeasy:

void setWifiMode(WiFiMode_t wifimode) {
  const WiFiMode_t cur_mode = WiFi.getMode();

  if (cur_mode == wifimode) {
    return;
  }
  if (wifimode != WIFI_OFF) {
    #ifdef ESP8266
    // See: https://github.com/esp8266/Arduino/issues/6172#issuecomment-500457407
    WiFi.forceSleepWake(); // Make sure WiFi is really active.
    #endif // ifdef ESP8266
    delay(100);
  }

  switch (wifimode) {
    case WIFI_OFF:
      addLog(LOG_LEVEL_INFO, F("WIFI : Switch off WiFi"));
      break;
    case WIFI_STA:
      addLog(LOG_LEVEL_INFO, F("WIFI : Set WiFi to STA"));
      break;
    case WIFI_AP:
      addLog(LOG_LEVEL_INFO, F("WIFI : Set WiFi to AP"));
      break;
    case WIFI_AP_STA:
      addLog(LOG_LEVEL_INFO, F("WIFI : Set WiFi to AP+STA"));
      break;
    default:
      addLog(LOG_LEVEL_INFO, F("WIFI : Unknown mode"));
      break;
  }

  if (!WiFi.mode(wifimode)) {
    addLog(LOG_LEVEL_INFO, F("WIFI : Cannot set mode!!!!!"));
  }
  if (wifimode == WIFI_OFF) {
    delay (1000);
    WiFi.forceSleepBegin();
    delay (1);
  } else {
    setupStaticIPconfig();  
    delay(30); // Must allow for some time to init.
  }
}

@devyte
Copy link
Collaborator

devyte commented Nov 9, 2019

@TD-er a lot has changed since this was reported, including fixes that are relevant. Please retest with 2.6.0.

@devyte devyte added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Nov 9, 2019
@TD-er
Copy link
Contributor Author

TD-er commented Nov 9, 2019

Yep, WiFi stability has improved a lot recently.
The most descriptive example is this chart showing the uptime (in minutes) of my weather station

image

So I guess this particular issue can be closed by now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.
Projects
None yet
Development

No branches or pull requests

2 participants