Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiFi cannot connect until a power cycle. #5527

Closed
6 tasks done
Swedish-Coder opened this issue Dec 19, 2018 · 52 comments
Closed
6 tasks done

WiFi cannot connect until a power cycle. #5527

Swedish-Coder opened this issue Dec 19, 2018 · 52 comments
Labels
component: network type: bug waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.
Milestone

Comments

@Swedish-Coder
Copy link

Swedish-Coder commented Dec 19, 2018

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

-Hardware: ESP8266
-Core Version: 2.4.2
-Development Env: Visual Micro
-Operating System: Windows

Settings in IDE

-Module: Generic ESP8266 Module
-Flash Mode: qio
-Flash Size: 2MB (256k Spiffs)
-lwip Variant: v1.4
-Reset Method: ck
-Flash Frequency: 40Mhz
-CPU Frequency: 80Mhz
-Upload Using: Serial
-Upload Speed: 921600

Problem Description

Using the reconnect from below works fine 99% of the time. Running this test on ~50 ESP8266. Reconnect is called every 30seconds.
After some undefined time (ranging from 1h to 10days..) the Wifi will not be able to reconnect.

The Status stays like this until the device is reset (hardware reboot), have tried letting the devices stay in this mode for days, but they never recover. Rebooting the AP does not help.
The signal strength is also not the issue (ran a test with AP in front of ~42 ESP8266 with ~2m away from devices, they all had ~ -65 dba).

Have tried different disconnect / reconnects, all resulting in the same stale state. Currently trying
WiFi.disconnect(true); forceSleepBegin(35000); //…do other tasks for 25ms WiFi.forceSleepWake(); WiFi.persistent(false); WiFi.begin(ssid.c_str(), psk.c_str());

This sometimes causes crashes on the WiFi.begin(..) line.

Is there a way to “properly” do a reconnect on WiFi, or a way to reset the WiFi to “hardware boot” status (delete all DHCP information, known AP, etc.)?

MCVE Sketch

void Reconnect(){
WiFi.disconnect(true);
WiFi.mode(WIFI_STA);
			
While(WiFi.status() != WL_CONNECTED){
//do other things
}
Serial.println(String(F("[WiFi] Connect now... heap: ")) + String(ESP.getFreeHeap()) + String(F(", ts: ")) + String(millis()));
		
//… load ssid & pass from spiffs
WiFi.begin(ssid.c_str(), psk.c_str());
}
@kherNI
Copy link

kherNI commented Dec 20, 2018

I'm glad I'm not the only one that has seen this with 2.4.2. I have 2 devices that are installed a long way away and after anywhere from 1-10 days, like you mention, they will not reconnect to the WiFi AP. In this application they are going through a cellular modem and they seem to lose communication fairly regularly (multiple times a day) even though the signal strengths have been pretty good. I was thinking that maybe it had something to do with the changes with lwIP. I haven't been able to try and diagnose them due to their location. In the meantime I rolled back to 2.4.0 (I've had serious problems with 2.4.1) and the reliability seems to have improved. I've also implemented some logic that if after a certain period of time (i.e. 5 minutes) of not being able to reconnect the MCU is restarted. I was hoping to see if things improve with 2.5.0 when it comes out of Beta.

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 21, 2018

I was hoping to see if things improve with 2.5.0 when it comes out of Beta.

We need to reproduce to have a chance to fix. Fixes will be made to 2.5.0-beta2, not to 2.4.x.
Please try it, and enable debug on Serial to possibly get firmware messages that would help understanding.

I've been running this sample for several hours with core-2.5.0-beta2 with no issue so far.

#include <ESP8266WiFi.h>

#ifndef STASSID
#define STASSID "your-ssid"
#define STAPSK  "your-password"
#endif

const char* ssid     = STASSID;
const char* password = STAPSK;

const char* host = "10.0.1.253";
const uint16_t port = 80;

void connect ()
{
  Serial.printf("wake = %d\n", WiFi.forceSleepWake());
  WiFi.mode(WIFI_STA);
  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }
  Serial.println("");
  Serial.println("WiFi connected");
  Serial.println("IP address: ");
  Serial.println(WiFi.localIP());
}

void disconnect ()
{
  WiFi.disconnect(true);
  Serial.printf("sleep 1us = %d\n", WiFi.forceSleepBegin(1));
}

void setup() {
  Serial.begin(115200);
  Serial.println();
  Serial.print("Connecting to ");
  Serial.println(ssid);
  WiFi.persistent(false);
}

void loop() {
  connect();
  Serial.print("connecting to ");
  Serial.print(host);
  Serial.print(':');
  Serial.println(port);

  WiFiClient client;
  if (!client.connect(host, port)) {
    Serial.println("connection failed");
    delay(5000);
    return;
  }

  Serial.println("sending data to server");
  if (client.connected()) {
    client.println("hello from ESP8266");
  }

  unsigned long timeout = millis();
  while (client.available() == 0) {
    if (millis() - timeout > 5000) {
      Serial.println(">>> Client Timeout !");
      client.stop();
      return;
    }
  }

  Serial.println("receiving from remote server");
  while (client.available()) {
    char ch = static_cast<char>(client.read());
    Serial.print(ch);
  }

  Serial.println();
  Serial.println("closing connection");
  client.stop();

  disconnect();
}

@d-a-v d-a-v added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Dec 21, 2018
@Swedish-Coder
Copy link
Author

Swedish-Coder commented Jan 8, 2019

An update on the progress of tracking down this bug in the new Version:

-Prepared a new Version using 2.5.0-Beta2 and lwlp v2.
-Ran a test on 10 Devices yesterday.

We seem to have some other issues from the switch from 2.4..2 to 2.5.0-Beta 2, the heap is way lower and it seems that we got a memory leak from switching as the heap is in a steady decline, causing a reboot ~ every 2hours.
Keep that in mind for the results..

4 Devices did not make it through the night. All of them giving up ~50-60min after the last boot (after which the end up in a state where they cannot establish a connection).
These keep on trying to connect every 10 seconds, but have never been able to establish the connection again.

Restarted 2 of the 4 Devices made them able to reconnect on the first try. Leaving the other 2 to keep on trying to connect for now to see if they recover.

The other 6 are still running and reconnect successfully every 10 seconds. (but are also rebooting a lot)

The next steps will be to find and at least reduce the memory leak so that we can conduct a more meaningful test on this. Will increase reconnect timer to 15 seconds to give the chip more time to reconnect. Will also model the disconnect / reconnect after the example d-a-v posted (using force sleep / wake), but will also wait 1.2 seconds between disconnect and reconnect. Since all but one of these devices are not connected via Serial, will also implement that the last "failed wifi" status will be saved to the spiffs for more information. Ill post my findings here soon...

OK will have 4 running with the new version which is a bit more stable and using the changed mentioned above. Will update if the issue happens on any of the 4 Devices.

I will also run 6 devices on Version 2.4.0 with the changes above and reconnect every 15seconds to see if the issue occurs on this version as well.

Update

The issue also happened once on the devices running 2.4.0 and lwIp 1.4.

@TD-er
Copy link
Contributor

TD-er commented Jan 17, 2019

At ESPeasy we're getting similar reports on this for many months now.
My plan for a next test was to make a call to set the wifi mode to off and on again to see if that will fix this state.

@d-a-v
Copy link
Collaborator

d-a-v commented Jan 17, 2019

The sketch I posted above had been working for two full weeks (I was away and had let it run).

In order to help debugging, we need to be able to understand where the cause is.
Ideally, a packet capture facility running on the access point would help to check wether the esp is connecting to wifi (radio, layer1) then sends dhcp requests (or any other IP packet with fixed address, layer2) on the network. A dd/openwrt-like AP is nice because one can run tcpdump -ni <intf> host 192.168.xxx on it.

@TD-er
Copy link
Contributor

TD-er commented Jan 17, 2019

@d-a-v It seems there are several issues at hand which overlap in observed symptoms.
A quick glance at the ESPeasy WiFi related issues shows it has been reported for quite some time now.

This one ESPeasy - DeepSleep Problem to send Data has some nice Wireshark dumps/screenshots.

But like I said, it seems to be more than one issue, or at least there is not one single remedy (apart from reboot) to solve it. Some reported fixes (e.g. reboot AP or disconnect ESP from within AP management tools) do not seem to work for others.

One of our users is now testing if this suggested fix will work (in his setup): #2186 (comment)

Also this closed issue (WiFi Reconnect only after power cycle #2235) does seem to be related to the one we're discussing right now.

@d-a-v
Copy link
Collaborator

d-a-v commented Jan 17, 2019

Please do your tests and reports using the latest "git" version of this core.
Remember that espressif radio/layer1 firmware has been updated since core-2.4.2 (last summer).

@TD-er
Copy link
Contributor

TD-er commented Jan 17, 2019

You mean the 'stage' branch?

@d-a-v
Copy link
Collaborator

d-a-v commented Jan 17, 2019

We have only one branch. So yes.
Or release 2.5.0-beta2

@Swedish-Coder
Copy link
Author

The error has now occured since we have implemeted a bit more exception logging.
Ammongs other things we now save the last WiFi dc / reconnect Event. The Last WiFi Connection error that we received was "WIFI_DISCONNECT_REASON_AUTH_EXPIRE"

This on 2.4.2 and LwIP 1.4.

@TD-er
Copy link
Contributor

TD-er commented Jan 17, 2019

WIFI_DISCONNECT_REASON_AUTH_EXPIRE is a normal disconnect reason and it may occur quite often (depending on the settings in de AP)

@Swedish-Coder
Copy link
Author

Another short update
We are now logging the last 10 Disconnect / Reconnect Events. The Devices are Auto rebooted after 20 hours with no Wifi.

We have this running on ~50 Devices, with another ~400 comming online soon.
The Issue has occured 3 times since then...So it does not occur often. However it happens faster the more we disconnect / reconnect.

Here are the Logs

LastWiFiDisconnectReason=8,2,8,201,2,201,2,8,2,8,;TotalFoundWiFis=7;RSSI=66%(-67)
LastWiFiDisconnectReason=2,8,2,8,204,2,201,2,8,2,;TotalFoundWiFis=4;RSSI=64%(-68)
LastWiFiDisconnectReason=2,201,2,8,2,8,2,8,2,201,;TotalFoundWiFis=7;RSSI=66%(-67)

The 8 (ASSOC_LEAVE) is mostly triggered when we try to reconnect (disconnect is called)
We are seeing a lot of reason 2 (WIFI_DISCONNECT_REASON_AUTH_EXPIRE) which I agree is a normal reason, but not for so many consecutive tries....

@Swedish-Coder
Copy link
Author

Short update on what we are currently examining....
We have modified the Code to recreate the WiFi class when we reconnect.

WiFi.~ESP8266WiFiClass();
WiFi = ESP8266WiFiClass();

We have been running this test for the last 10 days on 40+ Devices and the issue has never happened since. This on 1.4.2 and LwIp 1.4.

@TD-er
Copy link
Contributor

TD-er commented Feb 5, 2019

@Swedish-Coder
That's a nice one to try also.

I'm looking at those classes.
This class inherits from a lot of other classes, but it has no destructor defined (thus also not a virtual one)

class ESP8266WiFiClass : public ESP8266WiFiGenericClass, public ESP8266WiFiSTAClass, public ESP8266WiFiScanClass, public ESP8266WiFiAPClass {

That should be no issue, if the classes it inherits from do have a virtual destructor defined.
But ESP8266WiFiGeneric also has none (also no members, not really a problem I guess)
Same for ESP8266WiFiSTA , ESP8266WiFiScan and ESP8266WiFiAPClass.

All these classes have static variables defined, but maybe these are only reset in the constructor?

@openfsguruh
Copy link

Hi all, I like to reopen or reconfirm the issue and can hopefully give relevant information!
I am using Arduino 1.8.5 on Windows 10 using now ESP8266 2.5.0 (released one no beta)
I can still see this reconnect issue on some boards but not on all. This is really wired because all ESP8266 are the same Wemos D1 mini and I buyed them at the same date (around Jan 2018)
ESP8266 2.4.0 didn't show me this reconnect issue, I got it since 2.5.0.
Some ESP reconnects after the reset switch fine, others don't. The crazy thing is WiFi.status() reports a connected state but it is not working.
After playing a while around I found this persistent option is playing for me a key role.
Usually (and think most of you) have WiFi.persisten(false) in their sketches. It was for me the same. When I set for test reasons WiFi.persistent(true), I can reproduce the reconnect issue on all(!) ESP8266.
So this allowed me to analyse it in more detail on a other working ESP reproducing the problem I have only on some of them with peristent off.

I have seen the reconnect was reportet internally really fast but later on was not working. So when I do two cycles in the startup I got a workaround for this reconnection issue also for my bad ESPs.

So in setup I do following:
WiFi.persistent( false );
WiFi.setAutoConnect(true);
WiFi.mode(WIFI_STA);
WiFi.config(ip,gateway,subnet,channel, bssid);
WiFi.setSleepMode(WIFI_LIGHT_SLEEP);
delay(100);
WiFi.begin(ssid, password);
WiFi.disconnect(); // first cycle which may be broken
delay(200);
WiFi.begin(ssid, password);
// now WiFi.status() != WL_CONNECTED cycle as usual

This slows down the startup cycle but solved the reset issue for me. It seems there is something stored in flash from older ESP versions, which I can't reset and seems is still influencing WiFi startup even when not explicitly stated.
lwIP variant: v2 lower Memory
VTables: flash
Exceptions: disabled

@AndreKR
Copy link

AndreKR commented May 4, 2019

There is one thing I noticed when I enabled WIFI debugging, maybe it is of interest.

In my case I was waking up from deep sleep and I had auto-connect on (the default). When I was looking at the log, I saw the order of events was:

  • Connect to the WiFi with the credentials stored in flash
  • dhcp client start...
  • Before DHCP could finish, my code called WiFi.begin() again, which disconnected the WiFi and started connecting again
  • Connect to the WiFi

Now I noticed there was no second "dhcp client start..." in the log.

I guessed that maybe this was the reason that the WiFi connection didn't work sometimes.

I now disable the auto-connect in setup() and handle the WiFi connection myself:

  if (WiFi.SSID() != "") {
    println(F("WiFi credentials set in flash, wiping them"));
    WiFi.disconnect();
  }
  if (WiFi.getAutoConnect()) {
    println(F("Disabling auto-connect"));
    WiFi.setAutoConnect(false);
  }
  WiFi.persistent(false);
  WiFi.mode(WIFI_STA);
  WiFi.begin(...);

And suddenly the connection works every time.

@d-a-v d-a-v added this to the 2.6.0 milestone May 6, 2019
@d-a-v d-a-v added component: network type: bug and removed waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. labels May 6, 2019
d-a-v added a commit to d-a-v/Arduino that referenced this issue May 6, 2019
this is a lighter workaround for esp8266#5527
and may require better understanding of the issue
@TD-er
Copy link
Contributor

TD-er commented Jul 4, 2019

I have not tested that function directly, but I do get the WL_DISCONNECTED a lot when it fails (and thus WD reboots) to perform a (re)connect.
So indirect it is very likely I get that return value since there is no other enum value mapping to that result.

@TD-er
Copy link
Contributor

TD-er commented Jul 4, 2019

Just to give some debugging context here:

7516 : WIFI : Disconnected: WiFi.status() = 0
7620 : WIFI : Connecting MikroTik attempt #2
7723 : WIFI : WiFi.status() = WL_DISCONNECTED  SSID: MikroTik
7841 : WIFI : Disconnected: WiFi.status() = 6
8049 : WIFI : Disconnected: WiFi.status() = 6
8254 : WIFI : Disconnected: WiFi.status() = 6
8461 : WIFI : Disconnected: WiFi.status() = 6
8678 : WIFI : Disconnected: WiFi.status() = 6
8883 : WIFI : Disconnected: WiFi.status() = 6
9087 : WIFI : Disconnected: WiFi.status() = 6
9338 : WIFI : Disconnected: WiFi.status() = 6
9542 : WIFI : Disconnected: WiFi.status() = 6
9748 : WIFI : Disconnected: WiFi.status() = 6
9970 : WIFI : Disconnected: WiFi.status() = 6
10174 : WIFI : Disconnected: WiFi.status() = 6
10379 : 
 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v8b899c12
~ld
�U55 : 

INIT : Booting version:  (ESP82xx Core 2_5_2, NONOS SDK 2.2.2-dev(c0eb301), LWIP: 2.1.2 PUYA support)
55 : INIT : Free RAM:33072

And at the time where the connect if successful:

10111 : WIFI : Disconnected: WiFi.status() = 6
10360 : WIFI : Disconnected: WiFi.status() = 6
11405 : WIFI : Entering processConnect()
11507 : WIFI : Connected! AP: MikroTik (B8:69:F4:9F:21:FA) Ch: 1 Duration: 3770 ms
11509 : EVENT: WiFi#ChangedAccesspoint
11536 : backgroundtasks: WiFi connected

The last line is also triggered when the WiFi.status() reports WL_CONNECTED
The "WIFI : Disconnected...." lines are printed from within the loop().
"WIFI : Entering ..." is when an event has been received and needs to be processed (boolean flag checked in loop() )

And some debug output when there was a connection and I kicked it from the AP:

21116 : WIFI : Entering processDisconnect()
21217 : EVENT: WiFi#Disconnected
21235 : WIFI : Disconnected! Reason: '(202) Auth fail' Connected for 9666 ms
21236 : WIFI : Disconnected: WiFi.status() = 0
21341 : WIFI : Disconnected: WiFi.status() = 0
21447 : WIFI : Disconnected: WiFi.status() = 0
21551 : WIFI : Disconnected: WiFi.status() = 0
21665 : MQTT : Connection lost, state: Connection lost
21666 : EVENT: MQTT#Disconnected
21686 : WIFI : Disconnected: WiFi.status() = 0
21790 : WIFI : Connecting MikroTik attempt #0
21895 : WIFI : WiFi.status() = WL_DISCONNECTED  SSID: MikroTik

 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v8b899c12
~ld
�U54 :

INIT : Booting version:  (ESP82xx Core 2_5_2, NONOS SDK 2.2.2-dev(c0eb301), LWIP: 2.1.2 PUYA support)
55 : INIT : Free RAM:33072

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 4, 2019

There are phy updates in #6257, can you try with it ?

@TD-er
Copy link
Contributor

TD-er commented Jul 4, 2019

In PlatformIO I must then use this define?
PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22y
And of course somehow making sure your PR branch is checked out on the computer where I'm building....

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 4, 2019

PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22y

Yes, but I didn't try it myself

@TD-er
Copy link
Contributor

TD-er commented Jul 4, 2019

Since I'm doing this every now and then, I have a feature request....
Can it be made a lot easier for platformIO users to just add some label somewhere to fetch a PR branch?
Every time I have to fetch some PR version to test some Arduino code it takes me >1 hour to make a build and sometimes a next build is just not referring to the same branch anymore which takes a lot of time to debug.

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 4, 2019

What kind of label ?
Whenever I need to test a PR I have this script called PR:

#!/bin/sh
test -z "$2" && { echo "give # name"; exit 1; }
set -x
git checkout master
git branch -D pr-$1-$2
git fetch origin pull/$1/head:pr-$1-$2 && git checkout pr-$1-$2
git branch | grep pr-$1-$2

which gives:

$ PR 6257 my-sdk222y-testing-branch
[...]
Switched to branch 'pr-6257-my-sdk222y-testing-branch'
[...]

(this script works with any repository on github)

@TD-er
Copy link
Contributor

TD-er commented Jul 4, 2019

I did those steps in the terminal by hand and the result is that PlatformIO is detecting it is another core lib and thus renames the folder and start fetching the one it thinks it needs.

So it renames C:\Users\gijs\.platformio\packages\framework-arduinoespressif8266 into C:\Users\gijs\.platformio\packages\framework-arduinoespressif8266@2.5.0-dev

No idea where it gets this 2.5.0-dev from, since it states 2.6.0-dev in the file platform.txt.

Edit: It is in package.json.
Changed the version in there to match the one it thinks it is building against and renamed the folders.
Now it does have the newer core lib, since it is not able to build due to compiler errors.

Edit2:
It is building now, I forgot to run the submodule init call.

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 4, 2019

Maybe @ivankravets could help with the platformio process ?

@TD-er
Copy link
Contributor

TD-er commented Jul 4, 2019

OK, now it has been built with the ..SDK22y flag
And it is running
ESP82xx Core 27327617, NONOS SDK 2.2.2-dev(38a443e), LWIP: 2.1.2 PUYA support

So far I have not been able to get it to crash after a disconnect.
But it also seems to ignore/miss the WiFi disconnects I send from the MikroTik AP.

Edit:
Sadly it still will crash occasionally on a reconnect

187456 : EVENT: WiFi#Disconnected
187495 : WIFI : Disconnected! Reason: '(202) Auth fail' Connected for 4513 ms
187496 : WIFI : Disconnected: WiFi.status() = 0
187602 : WIFI : Disconnected: WiFi.status() = 0
187706 : WIFI : Disconnected: WiFi.status() = 0
187811 : WIFI : Disconnected: WiFi.status() = 0
187915 : WIFI : Disconnected: WiFi.status() = 0
188019 : WIFI : Connecting MikroTik attempt #0
188123 : WIFI : WiFi.status() = WL_DISCONNECTED  SSID: MikroTik

 ets Jan  8 2013,rst cause:4, boot mode:(3,7)

wdt reset
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v27327617
~ld
�U69 : 

INIT : Booting version:  (ESP82xx Core 27327617, NONOS SDK 2.2.2-dev(38a443e), LWIP: 2.1.2 PUYA support)
70 : INIT : Free RAM:32800
71 : INIT : Warm boot #26 Last Task: Const Interval timer, id: 3 - Restart Reason: Hardware Watchdog
74 : FS   : Mounting...

But good news is it is a lot less often

@TD-er
Copy link
Contributor

TD-er commented Jul 4, 2019

Just an observation here.
I tried to disconnect it a few times and noticed nothing on the ESP side.
So no disconnect events were received.
The unit is still reachable from the network etc. but the reported WiFi.status() is 4,
That's WL_CONNECT_FAILED right?

N.B. automatic reconnect is enabled.

Edit:
Never mind, it is echo'ing an internal value I use, not the output of the WiFi.status().

@TD-er
Copy link
Contributor

TD-er commented Jul 4, 2019

Something fishy is going on here and not just regarding this new SDK version.
I've also seen it happen a lot yesterday, now that I realize it.

Problem is that one build seems to be able to connect to the AP just fine and a next one is just not able to connect to the AP. Only once in 10's or maybe 100 attempts (and thus crash/reboots)

Yesterday I saw it when changing delay values to get a feeling for their impact, but now I saw it again.
I just replaced some simple value in a log string.

if (loglevelActiveFor(LOG_LEVEL_INFO)) {
    String log;
    log.reserve(60);
    log = F("WD   : Uptime ");
    log += wdcounter / 2;
    log += F(" ConnectFailures ");
    log += connectionFailures;
    log += F(" FreeMem ");
    log += FreeMem();
    log += F(" WiFiStatus ");
    log += wifiStatus;
    addLog(LOG_LEVEL_INFO, log);
  }

And then replacing wifiStatus into WiFi.status().

With just this single change the resulting executable was not able to connect at all.
Not even with a clean and rebuild.
Changed it back, did a clean and a rebuild and it connects just fine.

This is also behavior we're seeing for a long time in ESPeasy builds.
Some people report issues about not being able to connect while for others it is working just fine.

This does "smell" like some buffer allocation or initialization which is not right.
So either at runtime some buffers are allocated in a different order or there is some compile time effect in play.
It just looks like we're addressing something that's either not 0-terminated or just a byte short in allocated space and then try to find the end of a string or something.
Also I have no idea if it is in my code or the core library, but these things do happen every now and then.

Is there some compiler flag possible to force any allocated array to be initialized with 0 (or some predefined value) ?

Edit:
I also noticed the 'failing' builds (crashing at the initial wifi connect the same way it crashes "normally" when processing a disconnect) were also not running I2C communications well.
In the logs I noticed a lot of errors of a plugin not able to read some registers from a connected I2C device.
With the proper build I2C is also working fine.

@ivankravets
Copy link
Collaborator

Do we have any problems with PlatformIO?

@TD-er
Copy link
Contributor

TD-er commented Jul 4, 2019

@ivankravets Well I am not 100% sure. :)
At least I have some issues with fetching a specific core lib version using PIO.
The use case here is to mark somehow a specific PR or branch from this repository and build a version against it. (which is more of a feature request I guess)

About whether or not we have a problem with PIO...
A while ago (about a year ago) you mentioned something regarding linker issues which may be related to something in PIO's build chain.
I wonder if it is possible I am seeing the same again with my builds since they act different between builds if something completely unrelated has changed. (e.g. WiFi connection issues and/or I2C issues when changing a log text somewhere else)

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 4, 2019

@TD-er

Is there some compiler flag possible to force any allocated array to be initialized with 0 (or some predefined value) ?

It is the default. A special attribute is needed to not initialize globals or allocated variables (to 0).

This does "smell" like some buffer allocation or initialization which is not right.

In doubt, can you run your tests with disable_extra4k_at_link_time() to be called somewhere in your app (in setup() it is fine).

@ivankravets
Copy link
Collaborator

@TD-er

A while ago (about a year ago) you mentioned something regarding linker issues which may be related to something in PIO's build chain.

If I remember correctly, we now use the same build workflow as Arduino IDE, even the same linker scripts. You can take a look at https://github.com/esp8266/Arduino/blob/master/tools/platformio-build.py

Or you mean when a library has extern declarations?

@TD-er
Copy link
Contributor

TD-er commented Jul 4, 2019

Or you mean when a library has extern declarations?

As far as I know, we don't use those in this project, so that should not be an issue here.

@d-a-v That one was already present in my code, wrapped in #ifdef ESP8266_DISABLE_EXTRA4K
So I will now test some builds with it.
For now it doesn't seem to make any difference.
A while ago I've already cleaned up the stack allocated stuff in ESPeasy, which is put on the sys stack.
So it should not really be an issue anymore, but it's worth a try I guess.

N.B. for my tests I went back to the core 2.5.2 SDK 2.2.1 It does seem a bit faster to me in rendering web pages, but maybe that's too early to tell right now. (SDK 2.2.1 vs. your PR version)

@d-a-v
Copy link
Collaborator

d-a-v commented Aug 29, 2019

What's the status of this issue ?
If you still have problem, would you mind to try #6356 which is an attempt to "uniformize" the WiFi sleep-off-on sequence so to have a common base to debug ?

(edit: #6356 is merged)

@d-a-v
Copy link
Collaborator

d-a-v commented Sep 11, 2019

@TD-er @Swedish-Coder Is this issue still relevant in git master ?

@d-a-v d-a-v added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Sep 11, 2019
@TD-er
Copy link
Contributor

TD-er commented Sep 11, 2019

I'm not entirely sure if it is still relevant.
These "cannot reconnect" issues often just happened after a long time, which makes it really hard to reproduce.

I have now 2 nodes running with the patch you mentioned (previous version of it).
For example:

Unit Number: 75
Local Time: 2019-09-12 00:08:06
Uptime: 2 days 12 hours 19 minutes
Connected: 8h06m
Last Disconnect Reason: (201) No AP found
Number Reconnects: 7

So it seems it was very well capable of reconnecting (some of these reconnects were me kicking the node from the AP, only during the first hour after boot)
But that's just too few nodes to tell if it is now fixed.

@d-a-v d-a-v modified the milestones: 2.6.0, 3.0.0 Sep 11, 2019
@devyte
Copy link
Collaborator

devyte commented Nov 9, 2019

The pseudo modes are merged.
The original issue reported y @Swedish-Coder is old. A whole lot has changed since it was reported, including several relevant fixes.
Closing.
If you think the problem is still relevant, please open a new issue and follow the template instructions.

@devyte devyte closed this as completed Nov 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: network type: bug waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.
Projects
None yet
Development

No branches or pull requests

9 participants