Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPClient: Split "HTTPC_ERROR_READ_TIMEOUT" up into more detailed error codes #8302

Open
M-Reimer opened this issue Sep 5, 2021 · 20 comments

Comments

@M-Reimer
Copy link
Contributor

M-Reimer commented Sep 5, 2021

I'm currently facing an issue with some "IoT device web server" which at some point seems to be a bit off with how they implemented the HTTP standard. After some runtime in some situation I end up getting "HTTPC_ERROR_READ_TIMEOUT" all the time. Sometimes this "revives" somehow just to reappear later.

The problem is that this error code could actually mean different things. Several checks return this code. We have it at the following locations in the GIT revision I've checked (with GIT revision in URLs to make them "stable"):

  1. case Stream::Report::TimedOut: return HTTPC_ERROR_READ_TIMEOUT;
  2. return returnError(HTTPC_ERROR_READ_TIMEOUT);
  3. return returnError(HTTPC_ERROR_READ_TIMEOUT);
  4. return HTTPC_ERROR_READ_TIMEOUT;

Where only the last one actually is based on actual measurement of time. The first one may also be timeout related (didn't dig that deep). Point 2 and 3 can also be "malformed data" which would be interesting to know so I can check what my server does wrong here. Probably it would be even possible to make the HTTPClient more robust once I know what is wrong on the server side. The access works every time with a "regular browser" so I guess the HTTP implementation is more forgiving there.

@joysfera
Copy link

joysfera commented Dec 3, 2021

FWIW, after upgrading from v2.7.4 Arduino core to v3.0.2 on my ESP8266 WebServer my ESP8266 HTTPClients started getting this very HTTPC_ERROR_READ_TIMEOUT" after some runtime (hours). Sometimes this revives for a moment but then falls back to error -11 (timeout). The access to the webserver works every time with a "regular browser" (just like for you).

I emphasize that this used to work fine for years when the server was built with pre-v3.x Arduino cores so it seems to be related to the server side but I can't figure out what exactly is wrong and where.
Restoring v2.7.x webserver fixes the issue. Clients might be v2.7.x based or v3.0.2, both seem to work OK.
Very hard to debug as I haven't been able to reproduce it in my lab, it only happens far away in the field.

@d-a-v d-a-v self-assigned this Dec 4, 2021
@d-a-v d-a-v added this to the 3.1 milestone Dec 4, 2021
@d-a-v
Copy link
Collaborator

d-a-v commented Dec 4, 2021

If you have any idea on how to reproduce your issues, that will help.

@joysfera
Copy link

joysfera commented Dec 4, 2021

Seems like the key is to have at least three or four "simultaneous" clients (I've got four clients - each fetching data from the server every five seconds). Sometimes it takes several hours to reproduce, sometimes it happens almost instantly - probably when two or more of the clients' requests clash? I don't know, just guessing.

FYI, I described my problem in detail here: https://www.esp8266.com/viewtopic.php?f=32&t=23296

I haven't created my own github issue because I don't have a short code to reproduce it. I just noticed that this github issue describes what I get (HTTP TIMEOUT) so I thought I'd mention my story - perhaps with more detailed error reporting from HTTPClient I'd guess more quickly what's up with the WebServer? Or maybe not.

@ozzcet
Copy link

ozzcet commented Dec 4, 2021

Actually , I have similar issue. I have build kind of push button that connect to server through out http protocol. Sometime I have gotten -11 some time get 200 who knows why ? Weird... This caused push button more than one time to get 200...

@kirurui
Copy link

kirurui commented Dec 4, 2021 via email

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 4, 2021

There was a recent related update, currently in master branch: see #8216
Not saying this will solve this bug, but worth giving a try.
Can you give a try with master branch ? (Arduino users can try the snapshot release v0.0.1).

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 5, 2021

I am currently running PostServer and PostClient.

On client, delay between requests is removed, and an error counter is added and increased when retcode is not 200.

Setup:

  • Server on esp8266
  • one client on another esp8266
  • another client on host:
cd test/hosts
make list
make D=1 ../../libraries/ESP8266HTTPClient/examples/PostHttpClient/PostHttpClient

I'll tell you just before I die burning in the 2.4G flow, or if a counter happens to be greater than 0.
If any of you has a publishable app to test, the issue will be narrowed faster.
You also may enable core and library debug messages so there is more to show.

@joysfera
Copy link

joysfera commented Dec 5, 2021

Isn't the PostClient re-using existing connection? That could mask what I experienced with HTTP 1.0.
Also, I'd insert a delay(500) to the PostServer's loop() to simulate that the server node has also some other work to do.

I'll arrange a similar test with 3+ clients when I get some spare time. I just invested half a day to return to core v2.7.4 and surprise surprise - SPIFFS.check() was causing a fatal crash and a boot loop there. Took me several hours to debug it.

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 5, 2021

I have updated my test to use HTTP1.0.
I have now 3 host clients, 1 esp http1.0 client and 1 esp server with 500ms loop.
I'll let it run for some hours. Then if I still have no error, I'll remove the 500ms.

@joysfera
Copy link

joysfera commented Dec 5, 2021

Do you use the v3.0.2 release or master branch? Just asking if it got fixed in the master by any chance.
BTW, we kinda hijacked @M-Reimer issue, that's not fair. Though I am very very glad you @d-a-v pay so much attention to my issue, thank you!

@joysfera
Copy link

joysfera commented Dec 5, 2021

@d-a-v one more note: my server application also keeps a MQTT connection alive (probably consuming at least one network socket all the time) and does a HTTP POST to another server every minute (which requires another socket). That limited amount of sockets might help to cause the issue. On the plus side my clients are not hammering the ESP8266 server that often - only once in 5 seconds.

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 5, 2021

There is no limit for number of client sockets apart from free memory.
Do you follow your free memory on both client and server ?
Do you follow the service time, what's its greatest value ?
Are you using some kind of Stream to feed the webclient (= not only sending/receiving strings) ?

These questions are general, and may also help @M-Reimer @kirurui @ozzcet debugging.

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 5, 2021

Do you use the v3.0.2 release or master branch? Just asking if it got fixed in the master by any chance.

I am currently using the master branch.

@joysfera
Copy link

joysfera commented Dec 5, 2021

In my app I watch the free memory on both client and server (about 25+ kB).
As for the service time - when all ESP8266 clients were getting -1 or -11 errors the ESP8266WebServer's handler was not even getting their requests, so service time was 0. Normally it was 2 milliseconds.
My server builds a JSON String manually ( reply += "...") (the total length is say 500 bytes) and then sends it at once.

    server.sendHeader("Access-Control-Allow-Origin", " *");
    server.sendHeader("Cache-Control", " no-cache, must-revalidate");
    server.send(200, "application/json", reply);

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 5, 2021

was not even getting their requests,

Ah this is a valuable information.
Can you modify this constant to something bigger (either by modifying the file or by externally overriding it if you don't use the Arduino IDE with which you can't) ?

Hint: this parameter is not the maximum number of clients. It's the automatic door of the shop. It stays closed when there are 5 customers waiting behind the door, but opens when there are less. Impatient customers outside may give up after their timeout. This is a DOS (free memory) protection.

@joysfera
Copy link

joysfera commented Dec 5, 2021

I can modify that constant easily but please note that in this particular use case I don't want any backlog to grow there. Given that clients are timeouting quickly by themselves (within 1500ms - I could even short that period under 1000 ms) it would be ideal if the webserver was throwing away any older request it couldn't handle in time as soon as a newer request arrives. At least that's my theory for as stable client-server setup as possible.

I admit that I don't know what's happening on the server when the client times out. Does the server know about it or does it still keep the old abandoned request in a queue that keeps filling up because the client is resending its request every few seconds?

Perhaps the key question is how quickly server times out when sending a reply to a client that no longer listens because he has timed out a moment ago?

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 5, 2021

This parameter is not managed by the server. It's TCP.
It's very like when an incoming customer is blocked outside of the shop. The door is closed. Can't see what's inside. Exactly as if there is no server on the port. For the client which reaches timeout, it's like if there is no server.
When backlog was not enabled, memory could be filled by too many incoming connections.
So increasing it now is still better than when it was not enabled.

The other option is to increase client timeout.

A further option would be for http client to distinguish between real timeout (server sending incomplete answer) or can't connect to the port (what's discussed above).

The server don't see the clients which have timeouted before the connection is established (they didn't come through the dooor).

Perhaps the key question is how quickly server times out when sending a reply to a client that no longer listens because he has timed out a moment ago?

and

it would be ideal if the webserver was throwing away any older request it couldn't handle in time as soon as a newer request arrives

In master branch, the webserver will close quicker when a client is misbehaving but another is waiting (this is a recent improvement).

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 6, 2021

I forgot to tell that the "no features" tastes of lwIP do not have tcp-backlog enabled.

@Buddha6669
Copy link

Buddha6669 commented Jan 5, 2022

I had the same problem hapends around midnight (request every 12 min at time +30s), i add an exception in my code, maybe the server idk.

@d-a-v d-a-v removed this from the 3.1 milestone Dec 16, 2022
@Alnahrawy
Copy link

For me increasing the HTTPClinet timeout solved this problem:


    HTTPClient http;
    http.setTimeout(15000);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants