Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect set timeout in WiFiClientSecureCtx #8889

Open
heshamameen opened this issue Mar 13, 2023 · 13 comments · May be fixed by #8899
Open

Respect set timeout in WiFiClientSecureCtx #8889

heshamameen opened this issue Mar 13, 2023 · 13 comments · May be fixed by #8899

Comments

@heshamameen
Copy link

Could you please make a method to actually set timeout in WiFiClientSecureCtx?

You are hard coding the timeout, which overrides the setTimeout method, so in this case setTimeout has no effect.

It sometimes overrides with 5000 ms, and for the first TLS and SSL handshakes, it's 15000 ms, which is too much in so many cases, especially when the esp is connected to a poor network.The ESP goes blind for 15 seconds with every secure request.

You could instead implement a method that checks if the user set a timeout so it respects it and uses it instead of hard-coded ones, If the user didn't set the timeout, you can use your hard-coded timeouts.

@d-a-v d-a-v linked a pull request Mar 28, 2023 that will close this issue
@d-a-v
Copy link
Collaborator

d-a-v commented Mar 28, 2023

@heshamameen Can you try with #8899 ?

@d-a-v
Copy link
Collaborator

d-a-v commented Mar 30, 2023

(updated)
handshake or regular timeout can be set like in the proposed updated example

https.setTimeout(4000); // or: client->setTimeout(4000);
client->setHandshakeTimeout(10000);

@heshamameen
Copy link
Author

Hi @d-a-v and @mcspr,
Thank you for your response and effort on this issue.

I reviewed your edits and tried them, and overall, they work great and solve the problem of setting the handshake timeout, but I have two comments.

The first one is the new debugging message you added to print time out when starting run_until
DEBUG_BSSL("_run_until starts, timeout=%lu\n", _updateStreamTimeout());
It prints too much, especially when you have an open for reuse httpclient so I had to comment it to test with debugging on.

The second is the reset you have implemented 
loopTimeout.reset();
I don't think it's necessary to reset the timeout inside the run_until loop for two reasons:

  1. When the esp is connected to a poor network, there is a large possibility that the esp will get stuck in the run_until loop. Let's say the user chose a 5-second timeout and the receiving data rate is slow on a poor connection. What will happen is the run_until loop will execute, and it can count 4 seconds, receive a chunk, reset the timeout to start counting 5 seconds from the beginning (this is 9 seconds), and so on with every data chunk received, so it can go forever in this loop.

While I was trying, I ended up spending 17 seconds on a 5-second timeout request because of the reset.

So I decided to comment reset lines and try again, and it worked almost as I expected. It took almost the timeout in a lot of cases, but other cases took the timeout + a random amount of time spent inside the run_until For example, I set the timeout to 5 seconds, and the loop took 6800 ms with all the resets commented, so with a time-sensitive application, this is disturbing.

  1. The user actually chose this timeout to timeout on, especially for time-sensitive applications; even if the stream was running and receiving data, let the user calibrate this timeout to what he sees fit, and if he chose a small timeout that is not enough for reading the packet, he can try with a bigger one, or he can leave your default timeout.

@d-a-v
Copy link
Collaborator

d-a-v commented Apr 3, 2023

These are fair considerations.
About the debug line, it will be removed.

The Arduino definition of stream timeout is given there. As you can read, it is the maximum time allowed between reading two chars (or the maximum inter-arrival time). This is how ssl-client has been changed in the handshake part, by adding the .reset() as you noticed.

The primary goal of this timeout is to help detect and stop broken connections. Wifi-client-secure has been changed to this end so we don't need to wait for 15s a connection which breaks after 6 seconds.

What you describe is a connection which is too slow for your needs but which will eventually come to a successful end.

I'm interested to know which application needs, on esp8266, a fast SSL connection which is useless when not fast enough, out of curiosity. There are by the way several ways to accelerate ssl connections if you've not tried them yet.

We internally (in fact I internally justified with the above reasons why this PR) came to this proposal:

  • This amended timeout management in wifi-client-secure
  • An overall timeout - let's call it duration barrier to avoid confusion - in http-client which would stop the connection if result doesn't come quickly enough (which has yet to be amended in code). But httpclient::connect() (= ssl handshake) would be out of this overall duration barrier.
  • Removal of current wifi-client-ssl handshake timeout (only existing in this PR anyway)

So this would not solve the handshake issue that you describe @heshamameen.

Maybe @mcspr we can keep the handshake timeout. It would have to be renamed to something else (setHandshakeDurationBarrier()?) that would kill the connection when user's requirements are not met. This timeout would be infinity by default.

@mcspr
Copy link
Collaborator

mcspr commented Apr 4, 2023

I think you also got me confused of your intention there, don't hold my initial agreement against me pls :>

When lower timeouts are used, it does indeed appear to work, but... It is excessive, since socket timeout is already there for individual read() / write() operations. The idea is to catch hanged connections as noted up above - most of the time it would be better to fail early and give control to the app. We deadlock everything otherwise.

This timeout would be infinity by default.

Would be really bad.

setHandshakeDurationBarrier

'Timeout' would mean the same thing as in other contexts - after start of some operation, wait up until this amount of time or abort everything. Duration barrier (thinking of time fence?) sounds pretty weird.

Handshake can also be deferred to first read() / write(), if you worry about the connect() blocking all of a sudden. Then the same logic would apply as the other timeout

@heshamameen
Copy link
Author

Hi @d-a-v ,

I understand what you say about the internal timeout of the Arduino stream, and of course it makes sense to have a timeout for not receiving anything for a time period to dismiss a broken connection, but I think what you suggested at the end (setHandshakeDurationBarrier()) is very helpful if implemented because you would give the user the ability to set a time limit for the SSL handshake.

This, along with the overall timeout of the httpclient would be a great tool to timeout SSL requests throughout its life time.

As for what kind of application requires this kind of time sensitivity,

For example, I work on a smart wifi switch, and due to cost limitations, I use esp8266, not esp32, so I have to do a lot of operations on a single core, such as listing to an http server, communicating with other microcontrollers that report states that are in sync with esp8266, sending SSL requests to a database to sync states, and a lot more, all at the same time on a single core.

So I have to time everything perfectly to avoid being out of sync, or if the user sends a request to the server and finds the server is not responding, SSL requests are blocking requests, so I have to set a time limit for the request, even if it was sending data, to continue the rest of the communications I have to make through out the loop, and if the request fails, I can schedule it to run another time, and so on.

In such cases, I think @mcspr The duration barrier is not weird at all; it's just meant to be a timing tool to make sure that everything runs under user control and not outside of it.

@mcspr
Copy link
Collaborator

mcspr commented Apr 4, 2023

'wierd' is about the words used, not the timeout / duration / deadline thing itself :)

We can't distinguish WiFiClient from Secure in HTTPClient, hence it makes most sense to make this configurable from WiFiClient side.
Timeout for operation varies between states. When not yet connected, we want total time of operation to be limited b/c it does not make sense to keep not working connection around. After that, polled portion of the timeout is questionable; just keeping the network timeout there can be a bit easier to reason about (without two overlapping timeouts)

@d-a-v
Copy link
Collaborator

d-a-v commented Apr 4, 2023

@heshamameen

FWIW, about your background jobs, ticker and recurrent scheduled functions are still called while esp is busy within slow functions like ssl handshake.

@d-a-v
Copy link
Collaborator

d-a-v commented Apr 5, 2023

@heshamameen
There is an update within #8899.
Can you try it and particularly the http-client's new setWallTime(ms) and see if your constraints can be honored with it ?
Usage changes can be seen in the https client example

https.setWallTime(10000); // do not exceed 10s while getting data
client->setWallTime(20000); // do not exceed 20s during handshake
.

@heshamameen
Copy link
Author

First, thanks for your continued support.

I have read and tested all your added changes for the _wallTime addition.

It's a great idea to introduce a timer that terminates the request no matter if we receive data or not while maintaining the timeout parameters to help detect bad and broken connections and interrupt them.

When I tried the _wallTime change, it didn't work at all. The only parameter that was effective was the timeout, because you check the wall time expiration in the _wait_for_handshake function after calling the _run_until, which is blocking and iterating till the handshake succeeded or the timeout occurred, so it wasn't working because the timer expires when we are stuck in the _run_until, and the checker happens after we leave _run_until with success or with a timeout.

So I don't think of this _wallTime for the handshake only; it can control the time of the whole request by simply checking on its expiration in _run_until after the timeout checker, so it will be checked during all the communications in read, write, flush, and handshake and dismiss the request as soon as this time hits then return a high priority error to dismiss the request, and of course this wall time should in this case be forever unless the user changes it.

Also, I don't think we need this _wallTime inside the HTTPClient, as the library is just a wrapper for Client and Client Secure. The old timeout parameter was sufficient, especially after you propagated it to the client and the ancestors, and the _wallTime time in Client will be sufficient to return an error to HTTPClient to terminate the request.

@heshamameen
Copy link
Author

I have tried to put the walltime checker inside run until it worked great, but I had some issues with the http stream. I had to set the wall time to zero to make the stream work.

I also tried your nightly build version, and I found that the httpClient resue is not working properly. Usually I do the first patch request to Firebase, for example, in 1250 ms and reuse the connection so that from the second patch it will be 250 ms, but that's not happening in your 0.0.2 version. Every time the request takes 1250 ms as if it's not reused, and the weird thing is that the handshake is not happening for the second time, but it takes 1000 ms in this line [HTTP-Client] connect: already connected, reusing connection
Debugging output:
the first request before reusing:


17:57:08.803 -> [HTTP-Client][sendRequest] type: 'PATCH' redirCount: 0
17:57:08.803 -> [hostByName] request IP for: xxxxxxxxxxxxxx
17:57:08.850 -> [hostByName] Host: xxxxxxxxxxxxxxxxxx
17:57:08.850 -> :ur 1
17:57:08.850 -> :dsrcv 0
17:57:08.850 -> :del
17:57:08.850 -> :ref 1
17:57:08.897 -> BSSL:_connectSSL: start connection
17:57:08.897 -> 
17:57:08.897 -> _iobuf_in:       0x40108474
17:57:08.897 -> _iobuf_out:      0x401087ec
17:57:08.897 -> _iobuf_in_size:  837
17:57:08.897 -> _iobuf_out_size: 2133
17:57:08.897 -> :wr 135 0
17:57:08.897 -> :wrc 135 135 0
17:57:08.943 -> :ack 135
17:57:08.943 -> :rn 1412
17:57:08.943 -> :rd 5, 1412, 0
17:57:08.943 -> :rdi 1412, 5
17:57:08.943 -> :rd 85, 1412, 5
17:57:08.943 -> :rdi 1407, 85
17:57:08.943 -> :rch 1412, 1412
17:57:08.943 -> :rch 2824, 1412
17:57:08.943 -> :rch 4236, 117
17:57:08.943 -> :rd 5, 4353, 90
17:57:08.943 -> :rdi 1322, 5
17:57:08.943 -> :rd 832, 4353, 95
17:57:08.943 -> :rdi 1317, 832
17:57:08.991 -> :rd 832, 4353, 927
17:57:08.991 -> :rdi 485, 485
17:57:08.991 -> :c 485, 1412, 4353
17:57:08.991 -> :rdi 1412, 347
17:57:08.991 -> :rd 832, 2941, 347
17:57:08.991 -> :rdi 1065, 832
17:57:09.131 -> :rd 832, 2941, 1179
17:57:09.131 -> :rdi 233, 233
17:57:09.131 -> :c 233, 1412, 2941
17:57:09.131 -> :rdi 1412, 599
17:57:09.551 -> :rd 832, 1529, 599
17:57:09.551 -> :rdi 813, 813
17:57:09.551 -> :c 813, 1412, 1529
17:57:09.551 -> :rdi 117, 19
17:57:09.551 -> :rd 89, 117, 19
17:57:09.551 -> :rdi 98, 89
17:57:09.551 -> :rd 5, 117, 108
17:57:09.551 -> :rdi 9, 5
17:57:09.551 -> :rd 4, 117, 113
17:57:09.551 -> :rdi 4, 4
17:57:09.551 -> :c0 4, 117
17:57:09.645 -> :wr 267 0
17:57:09.645 -> :wrc 267 267 0
17:57:09.691 -> :wr 6 0
17:57:09.691 -> :wrc 6 6 0
17:57:09.691 -> :wr 69 0
17:57:09.691 -> :wrc 69 69 0
17:57:09.738 -> :ack 267
17:57:09.738 -> :ack 75
17:57:09.785 -> :rn 75
17:57:09.785 -> :rd 5, 75, 0
17:57:09.785 -> :rdi 75, 5
17:57:09.785 -> :rd 1, 75, 5
17:57:09.785 -> :rdi 70, 1
17:57:09.785 -> :rd 5, 75, 6
17:57:09.785 -> :rdi 69, 5
17:57:09.785 -> :rd 64, 75, 11
17:57:09.785 -> :rdi 64, 64
17:57:09.785 -> :c0 64, 75
17:57:09.785 -> BSSL:Connected!
17:57:09.785 -> [HTTP-Client] connected to xxxxxxxxxxxxxxx
17:57:09.785 -> [HTTP-Client] sending request header
17:57:09.785 -> -----
17:57:09.785 -> PATCH xxxxxxxxxxxxxxxxxxxx HTTP/1.1
17:57:09.785 -> Host: xxxxxxxxxxxxx
17:57:09.785 -> User-Agent: ESP8266HTTPClient
17:57:09.785 -> Accept-Encoding: identity;q=1,chunked;q=0.1,*;q=0
17:57:09.785 -> Connection: keep-alive
17:57:09.785 -> Content-Length: 94
17:57:09.785 -> 
17:57:09.785 -> -----
17:57:09.785 -> :wr 565 0
17:57:09.785 -> :wrc 565 565 0
17:57:09.785 -> BSSL:_run_until: Timeout
17:57:09.831 -> :ack 565
17:57:09.831 -> :wr 565 0
17:57:09.831 -> :wrc 565 565 0
17:57:09.878 -> :ack 565
17:57:09.878 -> :wr 277 0
17:57:09.878 -> :wrc 277 277 0
17:57:09.924 -> :ack 277
17:57:09.924 -> :wr 149 0
17:57:09.924 -> :wrc 149 149 0
17:57:09.971 -> :ack 149
17:57:10.018 -> :rn 437
17:57:10.018 -> :rd 5, 437, 0
17:57:10.018 -> :rdi 437, 5
17:57:10.018 -> :rd 432, 437, 5
17:57:10.018 -> :rdi 432, 432
17:57:10.018 -> :c0 432, 437
17:57:10.018 -> [HTTP-Client][handleHeaderResponse] RX: 'HTTP/1.1 200 OK
'
17:57:10.018 -> [HTTP-Client][handleHeaderResponse] RX: 'Server: nginx
'
17:57:10.018 -> [HTTP-Client][handleHeaderResponse] RX: 'Date: Sun, 09 Apr 2023 15:57:10 GMT
'
17:57:10.018 -> [HTTP-Client][handleHeaderResponse] RX: 'Content-Type: application/json; charset=utf-8
'
17:57:10.018 -> [HTTP-Client][handleHeaderResponse] RX: 'Content-Length: 94
'
17:57:10.018 -> [HTTP-Client][handleHeaderResponse] RX: 'Connection: keep-alive
'
17:57:10.018 -> [HTTP-Client][handleHeaderResponse] RX: 'Access-Control-Allow-Origin: *
'
17:57:10.018 -> [HTTP-Client][handleHeaderResponse] RX: 'Cache-Control: no-cache
'
17:57:10.018 -> [HTTP-Client][handleHeaderResponse] RX: 'Strict-Transport-Security: max-age=31556926; includeSubDomains; preload
'
17:57:10.018 -> [HTTP-Client][handleHeaderResponse] RX: '
'
17:57:10.018 -> [HTTP-Client][handleHeaderResponse] code: 200
17:57:10.018 -> [HTTP-Client][handleHeaderResponse] size: 94
17:57:10.018 -> [Req] -> PUT request sent
17:57:10.018 -> [Req] -> Time : 1246
17:57:10.018 -> [Req] -> HttpCode : 200
17:57:10.018 -> [HTTP-Client][end] tcp kept open for reuse

the second request after reusing output is:

17:57:10.205 -> [HTTP-Client][sendRequest] type: 'PATCH' redirCount: 0
17:57:10.205 -> [HTTP-Client] connect: already connected, reusing connection
17:57:11.184 -> [HTTP-Client] sending request header
17:57:11.184 -> -----
17:57:11.184 -> PATCH xxxxxxxxxxxxxxxxxxxxxxxxxxx HTTP/1.1
17:57:11.231 -> Host: xxxxxxxxxxxxxxxxxxxxxxxxxxx
17:57:11.231 -> User-Agent: ESP8266HTTPClient
17:57:11.231 -> Accept-Encoding: identity;q=1,chunked;q=0.1,*;q=0
17:57:11.231 -> Connection: keep-alive
17:57:11.231 -> Content-Length: 94
17:57:11.231 -> 
17:57:11.231 -> -----
17:57:11.231 -> :wr 565 0
17:57:11.231 -> :wrc 565 565 0
17:57:11.278 -> :ack 565
17:57:11.278 -> :wr 565 0
17:57:11.278 -> :wrc 565 565 0
17:57:11.325 -> :ack 565
17:57:11.325 -> :wr 277 0
17:57:11.325 -> :wrc 277 277 0
17:57:11.371 -> :ack 277
17:57:11.371 -> :wr 149 0
17:57:11.371 -> :wrc 149 149 0
17:57:11.371 -> BSSL:_run_until: Timeout
17:57:11.418 -> :ack 149
17:57:11.418 -> :rn 437
17:57:11.418 -> :rd 5, 437, 0
17:57:11.418 -> :rdi 437, 5
17:57:11.418 -> :rd 432, 437, 5
17:57:11.418 -> :rdi 432, 432
17:57:11.418 -> :c0 432, 437
17:57:11.418 -> [HTTP-Client][handleHeaderResponse] RX: 'HTTP/1.1 200 OK
'
17:57:11.418 -> [HTTP-Client][handleHeaderResponse] RX: 'Server: nginx
'
17:57:11.418 -> [HTTP-Client][handleHeaderResponse] RX: 'Date: Sun, 09 Apr 2023 15:57:11 GMT
'
17:57:11.464 -> [HTTP-Client][handleHeaderResponse] RX: 'Content-Type: application/json; charset=utf-8
'
17:57:11.464 -> [HTTP-Client][handleHeaderResponse] RX: 'Content-Length: 94
'
17:57:11.464 -> [HTTP-Client][handleHeaderResponse] RX: 'Connection: keep-alive
'
17:57:11.464 -> [HTTP-Client][handleHeaderResponse] RX: 'Access-Control-Allow-Origin: *
'
17:57:11.464 -> [HTTP-Client][handleHeaderResponse] RX: 'Cache-Control: no-cache
'
17:57:11.464 -> [HTTP-Client][handleHeaderResponse] RX: 'Strict-Transport-Security: max-age=31556926; includeSubDomains; preload
'
17:57:11.464 -> [HTTP-Client][handleHeaderResponse] RX: '
'
17:57:11.464 -> [HTTP-Client][handleHeaderResponse] code: 200
17:57:11.464 -> [HTTP-Client][handleHeaderResponse] size: 94
17:57:11.464 -> [Req] -> PUT request sent
17:57:11.464 -> [Req] -> Time : 1267
17:57:11.464 -> [Req] -> HttpCode : 200
17:57:11.464 -> [HTTP-Client][end] tcp kept open for reuse

So I went and tried with the latest release 3.1.2 and it worked perfectly

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 18, 2023

@mcspr

This timeout would be infinity by default.

Would be really bad.

Only the handshake walltime would be infinity by default. Leaving it to infinity would be equivalent to the current implementation which relies on regular timeout only.

@heshamameen

[...] because you check the wall time expiration in the _wait_for_handshake function after calling the _run_until, which is blocking and iterating till the handshake succeeded or the timeout occurred [...]

If this PR is still relevant, I can go on with it for you to test.

@heshamameen
Copy link
Author

@d-a-v Of course it is still relevant and it would be great if you investigated more and corrected the wall time-out not working issue.
I have a workaround in the _run_until function to make the wall time-out work for me correctly with this peace of code.

if (_wallTime && !_handshake_done && target == BR_SSL_SENDAPP) { DEBUG_BSSL("_run_until handshake too long\n"); return -1; }

So if you can please investigate more and make it work professionally and integrate it in the master branch to be rolled out in the next update to enable us to use it without editing the repo by hand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants