Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aiohttp TLS websocket fails for continously sending packages on ESP32 #851

Open
wohltat opened this issue May 2, 2024 · 11 comments
Open

Comments

@wohltat
Copy link

wohltat commented May 2, 2024

@Carglglz
I have an application i want to send TLS / SSL packages over websockets using aiohttp on an ESP32. The problem is that the websockets fail after a short while when using packages that have a little bigger size around 2kB to 4kB.

Here is a simple test script:

import aiohttp
import asyncio
import gc

# allocate and prepare fixed block of data to be sent
b = bytearray(10_000)
for k in range(100):
    p = k*100
    b[p:p] = b'X'*96 + f'{k:3}' + '\n'
mv = memoryview(b)

URL = "wss://somewebsocketserver/echo"
sslctx = False
if URL.startswith("wss:"):
    try:
        import ssl
        
        sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
        sslctx.verify_mode = ssl.CERT_NONE
    except Exception:
        pass

async def ws_receive(ws : ClientWebSocketResponse):
    try:
        async for msg in ws:
            if msg.type != aiohttp.WSMsgType.TEXT:
                print('type:', msg.type, repr(msg.data))
    except TypeError as e:
        print('ws_receive', repr(e))
            

async def ws_test_echo(session):
    n = 1
    
    while True:
        ws_receive_task = None
        async with session.ws_connect(URL, ssl=sslctx) as ws:
            ws_receive_task = asyncio.create_task(ws_receive(ws))
            try:
                while True:
                    gc.collect()
                    print('-------------------', n, '------------------------')
                    await ws.send_str(b[0:100*n].decode())
                    n += 1

                    await asyncio.sleep_ms(100)

            except KeyboardInterrupt:
                pass

            finally:
                await ws.close()


async def main():
    async with aiohttp.ClientSession() as session:
        print('session')
        await ws_test_echo(session)


if __name__ == "__main__":
    asyncio.run(main())

remote

I'll get the following exception after a while. Just dealing with the exception is not satisfying since it takes a while and the applicaiton is blocked in that time.

Task exception wasn't retrieved
future: <Task> coro= <generator object 'ws_receive' at 3ffec850>
Traceback (most recent call last):
  File "asyncio/core.py", line 1, in run_until_complete
  File "<stdin>", line 29, in ws_receive
  File "/lib/aiohttp/aiohttp_ws.py", line 226, in __anext__
  File "/lib/aiohttp/aiohttp_ws.py", line 171, in receive
  File "/lib/aiohttp/aiohttp_ws.py", line 198, in _read_frame
  File "asyncio/stream.py", line 1, in read
OSError: -113
Traceback (most recent call last):
  File "<stdin>", line 66, in <module>
  File "asyncio/core.py", line 1, in run
  File "asyncio/core.py", line 1, in run_until_complete
  File "asyncio/core.py", line 1, in run_until_complete
  File "<stdin>", line 62, in main
  File "<stdin>", line 56, in ws_test_echo
  File "/lib/aiohttp/aiohttp_ws.py", line 233, in close
  File "/lib/aiohttp/aiohttp_ws.py", line 194, in close
  File "/lib/aiohttp/aiohttp_ws.py", line 189, in send
  File "asyncio/stream.py", line 1, in drain
OSError: -113

(OSError: 113 = ECONNABORTED)

Noteworthy is that there are always some packages that micropyhon thinks are sent already but don't reach the other side.
I also do not receive a websocket close package.
This was tested on a remote Websocket server that i don't control.

local

When i try it on a local server, i see different problems.

Traceback (most recent call last):
  File "<stdin>", line 64, in <module>
  File "asyncio/core.py", line 1, in run
  File "asyncio/core.py", line 1, in run_until_complete
  File "asyncio/core.py", line 1, in run_until_complete
  File "<stdin>", line 60, in main
  File "<stdin>", line 45, in ws_test_echo
  File "/lib/aiohttp/aiohttp_ws.py", line 239, in send_str
  File "/lib/aiohttp/aiohttp_ws.py", line 187, in send
  File "asyncio/stream.py", line 1, in write
OSError: -104

(OSError: 104 = ECONNRESET)

The servers seems to not receive the complete message / only a corrupted message and closes the connection:

...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 37
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 38
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 39
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
b'\xe9\x82\xc3+\xe9\x82\xbbG\x81\xd0\xc3+\xe9\x82\xc3+\xe9\x82\xc3+\xe9\x82\xc3+\xe9\x82\xc3+\xe9\x82\xc3+\xe9\x82\xc3+\xe9\x82\xc3+\xe9\x82\xc3'

Unhandled exception in client_connected_cb
transport: <asyncio.sslproto._SSLProtocolTransport object at 0x7f18b780ecf0>
Traceback (most recent call last):
  File "/home/username/Documents/projects/rfid/websockets_python/microdot/examples/tls/microdot/microdot.py", line 1224, in serve
    await self.handle_request(reader, writer)
  File "/home/username/Documents/projects/rfid/websockets_python/microdot/examples/tls/microdot/microdot.py", line 1338, in handle_request
    await writer.aclose()
  File "/home/username/Documents/projects/rfid/websockets_python/microdot/examples/tls/microdot/microdot.py", line 1218, in aclose
    await self.wait_closed()
  File "/usr/lib/python3.11/asyncio/streams.py", line 364, in wait_closed
    await self._protocol._get_close_waiter(self)
  File "/usr/lib/python3.11/asyncio/sslproto.py", line 648, in _do_shutdown
    self._sslobj.unwrap()
  File "/usr/lib/python3.11/ssl.py", line 983, in unwrap
    return self._sslobj.shutdown()
           ^^^^^^^^^^^^^^^^^^^^^^^
ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2706)

The closing of the websocket is also not handled properly by the micropython application / aiohttp. The close package is received but the connection is not closed automatically.

Another problem is that even if i deal with the exceptions and reconnect automatically, my application will eventually crash because of a run out of memory.
This is really problematic to me since i only send messages of size 2k or so.
I guess this is because of the not so memory economical design of aiohttp. For sending and receiving there is always a lot of new memory allocation involved.
The use of preallocation and memoryview seems reasonable here.

This is the local python websocketserver code:

import ssl
import sys
from microdot import Microdot
from microdot.websocket import with_websocket

app = Microdot()

html = '''<!DOCTYPE html>
<html>
    <head>
        <title>Microdot Example Page</title>
        <meta charset="UTF-8">
    </head>
    <body>
        <div>
            <h1>Microdot Example Page</h1>
            <p>Hello from Microdot!</p>
            <p><a href="/shutdown">Click to shutdown the server</a></p>
        </div>
    </body>
</html>
'''


@app.route('/')
async def hello(request):
    return html, 200, {'Content-Type': 'text/html'}

@app.route('/echo')
@with_websocket
async def echo(request, ws):
    while True:
        data = await ws.receive()
        print(data)
        await ws.send(data)

@app.route('/shutdown')
async def shutdown(request):
    request.app.shutdown()
    return 'The server is shutting down...'


ext = 'der' if sys.implementation.name == 'micropython' else 'pem'
folder_name = "/home/username/Documents/projects/rfid/websockets_python/"
ssl_cert = "cert.pem"
ssl_key = "key.pem"

sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
sslctx.load_cert_chain(ssl_cert, ssl_key)
app.run(port=4443, debug=True, ssl=sslctx)

MicroPython v1.23.0-preview.344.gb1ac266bb.dirty on 2024-04-29; Generic ESP32 module with ESP32

@Carglglz
Copy link
Contributor

Carglglz commented May 4, 2024

@wohltat your test does not do what you think is doing e.g.

# allocate and prepare fixed block of data to be sent
b = bytearray(10_000)
for k in range(100):
    p = k*100
    b[p:p] = b'X'*96 + f'{k:3}' + '\n'
mv = memoryview(b)

if you add

print(len(b), b[-100:])

You will see that, 1) it has a length of 20_000 and 2) there are invalid UTF bytes.

Also here

...
                    await ws.send_str(b[0:100*n].decode())
                    n += 1
...

this is sending an infinite growing string, so I'm not sure what you want to test with this...

anyway here is a proper test (working on unix and esp32 ports)

import aiohttp
import asyncio
import sys
import time

# allocate and prepare fixed block of data to be sent
b = bytearray(10_000)
for k in range(100):
    p = k * 100
    b[p : p + 100] = b"X" * 96 + f"{k:3}" + "\n"
mv = memoryview(b)

# print(len(b), b[-100:])
try:
    size = int(sys.argv.pop(-1))
except Exception:
    size = 2000
# sys.exit()

URL = "wss://<URL>:8448/echo" 
sslctx = False
if URL.startswith("wss:"):
    try:
        import ssl

        sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
        sslctx.verify_mode = ssl.CERT_NONE
    except Exception:
        pass


async def ws_receive(ws: ClientWebSocketResponse):
    try:
        async for msg in ws:
            if msg.type != aiohttp.WSMsgType.TEXT:
                print("type:", msg.type, repr(msg.data))

    except (TypeError, OSError) as e:
        print("ws_receive", repr(e))


async def ws_test_echo(session):
    n = 0
    l = 0
    m = 0
    ws_receive_task = None
    async with session.ws_connect(URL, ssl=sslctx) as ws:
        ws_receive_task = asyncio.create_task(ws_receive(ws))
        try:
            while True:
                t0 = time.ticks_ms()
                while (n * 100) + size < len(b):
                    await ws.send_str(b[n * 100 : (n * 100) + size].decode())
                    dt = time.ticks_diff(time.ticks_ms(), t0) / 1e3
                    print(
                        "-------------------",
                        f"packet # {m} |",
                        f"total: {l/1000:.1f} KB |",
                        f"{(size/1000)/dt:.1f} KB/s |",
                        f"{((size*8)/dt)/1e3:.1f} kbps",
                        "------------------------",
                        end="\r",
                    )
                    n += 1
                    m += 1

                    l += size
                    t0 = time.ticks_ms()

                    await asyncio.sleep_ms(100)
                n = 0

        except KeyboardInterrupt:
            pass

        finally:
            await ws.close()


async def main():
    async with aiohttp.ClientSession() as session:
        print(f"session test @ packet size: {size/1000:.1f} KB")
        await ws_test_echo(session)


def run():
    asyncio.run(main())


if __name__ == "__main__":
    run()

and server

import ssl
from microdot import Microdot
from microdot.websocket import with_websocket

app = Microdot()

html = """<!DOCTYPE html>
<html>
    <head>
        <title>Microdot Example Page</title>
        <meta charset="UTF-8">
    </head>
    <body>
        <div>
            <h1>Microdot Example Page</h1>
            <p>Hello from Microdot!</p>
            <p><a href="/shutdown">Click to shutdown the server</a></p>
        </div>
    </body>
</html>
"""


@app.route("/")
async def hello(request):
    return html, 200, {"Content-Type": "text/html"}


@app.route("/echo")
@with_websocket
async def echo(request, ws):
    t0 = time.ticks_ms()
    l = 0
    m = 0
    while True:
        data = await ws.receive()
        size = len(data)
        l += size
        dt = time.ticks_diff(time.ticks_ms(), t0) / 1e3
        print(
            "-------------------",
            f"packet # {m} |",
            f"total: {l/1000:.1f} KB |",
            f"{(size/1000)/dt:.1f} KB/s |",
            f"{((size*8)/dt)/1e3:.1f} kbps",
            "------------------------",
            end="\r",
        )
        m += 1
        t0 = time.ticks_ms()

        # print(data[-4:].replace("\n", ""), end="\r")
        await ws.send(data)
       


@app.route("/shutdown")
async def shutdown(request):
    request.app.shutdown()
    return "The server is shutting down..."


ssl_cert = "cert.pem"
ssl_key = "key.pem"

sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
sslctx.load_cert_chain(ssl_cert, ssl_key)
app.run(port=8448, debug=True, ssl=sslctx)

client

$  micropython ws_echo_test.py 2000
session test @ packet size: 2.0 KB
------------------- packet # 49 | total: 98.0 KB | 19.4 KB/s | 155.3 kbps -------------------------

server

 $ micropython ws_server_test.py
Starting async server on 0.0.0.0:8448...
------------------- packet # 570 | total: 1142.0 KB | 19.2 KB/s | 153.8 kbps ------------------------

@wohltat
Copy link
Author

wohltat commented May 5, 2024

Hi, thanks for quick your answer.

Your are right with the size of the bytearray, but the first 10_000 bytes stay the same so i can't see how this influences the test.
I tried it again a i still have the same problems.

this is sending an infinite growing string, so I'm not sure what you want to test with this...

This was my intention. I wanted to test with varying and increasing packet sizes. Your test has the same size of 2kb for every packet which was never a problem on the local server and is ocasionally failing on the remote server (that i don't know exactly what it is doing).
If i set the size = 5000, i can reliably make it fail on the local server right away:

Traceback (most recent call last):
  File "<stdin>", line 93, in <module>
  File "<stdin>", line 89, in run
  File "asyncio/core.py", line 1, in run
  File "asyncio/core.py", line 1, in run_until_complete
  File "asyncio/core.py", line 1, in run_until_complete
  File "<stdin>", line 85, in main
  File "<stdin>", line 55, in ws_test_echo
  File "/lib/aiohttp/aiohttp_ws.py", line 239, in send_str
  File "/lib/aiohttp/aiohttp_ws.py", line 189, in send
  File "asyncio/stream.py", line 1, in drain
OSError: -104

With increasing packet sizes the point of failure is almost always at (or very close to) 4k for the local server. Strangely the connection to the remote server fails rather randomly between 2k and 8k. But i don't know what kind of server that is.

The other problem that may be connected is that there is a memory problem, that may be caused by memery fragmentation i guess. I have no other explanation since there should be enough memory available. When i use a little memory in between the packets than the following memory error might occur. Not always though, often the connections breaks because of the before mentioned errors:

gc.mem_free()=82864 gc.mem_alloc()=34192
------------------- 36 ------------------------
gc.mem_free()=82832 gc.mem_alloc()=34224
------------------- 37 ------------------------
gc.mem_free()=82800 gc.mem_alloc()=34256
------------------- 38 ------------------------
Traceback (most recent call last):
  File "<stdin>", line 66, in <module>
  File "asyncio/core.py", line 1, in run
  File "asyncio/core.py", line 1, in run_until_complete
  File "asyncio/core.py", line 1, in run_until_complete
  File "<stdin>", line 62, in main
  File "<stdin>", line 48, in ws_test_echo
  File "/lib/aiohttp/aiohttp_ws.py", line 239, in send_str
  File "/lib/aiohttp/aiohttp_ws.py", line 184, in send
  File "/lib/aiohttp/aiohttp_ws.py", line 137, in _encode_websocket_frame
MemoryError: memory allocation failed, allocating 3809 bytes

I used the following client code on the ESP32 (The server ran on my pc under linux). The list l is to cause increase memory fragmentation:

import aiohttp
import asyncio
import gc

# allocate and prepare fixed block of data to be sent
b = bytearray(10_000)
for k in range(100):
    p = k*100
    b[p:p+100] = b'X'*96 + f'{k:3}' + '\n'
mv = memoryview(b)
print(len(b), b[-100:])

l = []

URL = "wss://somewebsocketserver"
sslctx = False
if URL.startswith("wss:"):
    try:
        import ssl
        
        sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
        sslctx.verify_mode = ssl.CERT_NONE
    except Exception:
        pass

async def ws_receive(ws : ClientWebSocketResponse):
    try:
        async for msg in ws:
            if msg.type != aiohttp.WSMsgType.TEXT:
                print('type:', msg.type, repr(msg.data))
    except TypeError as e:
        print('ws_receive', repr(e))

async def ws_test_echo(session):
    n = 1
    
    while True:
        ws_receive_task = None
        async with session.ws_connect(URL, ssl=sslctx) as ws:
            ws_receive_task = asyncio.create_task(ws_receive(ws))
            try:
                while True:
                    gc.collect()
                    print(f'{gc.mem_free()=} {gc.mem_alloc()=}')
                    print('-------------------', n, '------------------------')
                    await ws.send_str(b[0:100*n].decode())
                    n += 1
                    l.append('abcdefghijkl' + str(n))
                    await asyncio.sleep_ms(300)

            except KeyboardInterrupt:
                pass

            finally:
                await ws.close()


async def main():
    async with aiohttp.ClientSession() as session:
        await ws_test_echo(session)


if __name__ == "__main__":
    asyncio.run(main())

@Carglglz
Copy link
Contributor

Carglglz commented May 6, 2024

Your are right with the size of the bytearray, but the first 10_000 bytes stay the same so i can't see how this influences the test.

Because at some point it was sending non-UTF characters so it looked like data corruption.

This was my intention. I wanted to test with varying and increasing packet sizes. Your test has the same size of 2kb for every packet which was never a problem on the local server and is ocasionally failing on the remote server (that i don't know exactly what it is doing).
If i set the size = 5000, i can reliably make it fail on the local server right away:

Ok I see, I'm testing this in unix port at both sides, with increased sizes and at some point it "breaks" .i.e at 8000-10000 for wss and > 16000 for plain ws (Continuation frames not supported). I'm not sure if this has anything to do with the aiohttp_ws implementation or if the "bug" is at lower level, asyncio streams maybe?
This may have something to do with partial writes 🤔 ... I may need to do further testing with CPython, and also with plain asyncio streams and see if I can reproduce this...

@wohltat
Copy link
Author

wohltat commented May 7, 2024

I tried it again with a python-websockets server described in #853 (comment).

The program also freezes and i get the following on the server side:

...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 24
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 25
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 26
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 27

% sending keepalive ping
> PING 04 4b f2 e3 [binary, 4 bytes]
= connection is CLOSED
! failing connection with code 1006
connection handler failed
Traceback (most recent call last):
  File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 963, in transfer_data
    message = await self.read_message()
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 1033, in read_message
    frame = await self.read_data_frame(max_size=self.max_size)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 1108, in read_data_frame
    frame = await self.read_frame(max_size)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 1165, in read_frame
    frame = await Frame.read(
            ^^^^^^^^^^^^^^^^^
  File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/framing.py", line 68, in read
    data = await reader(2)
           ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/streams.py", line 750, in readexactly
    await self._wait_for_data('readexactly')
  File "/usr/lib/python3.11/asyncio/streams.py", line 543, in _wait_for_data
    await self._waiter
  File "/usr/lib/python3.11/asyncio/selector_events.py", line 974, in _read_ready__get_buffer
    nbytes = self._sock.recv_into(buf)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/server.py", line 236, in handler
    await self.ws_handler(self)
  File "/home/username/tls-test/server_secure.py", line 16, in hello
    name = await websocket.recv()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 568, in recv
    await self.ensure_open()
  File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 939, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: no close frame received or sent
connection closed

or also this

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 35
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 36

< CLOSE 1005 (no status received [internal]) [0 bytes]
= connection is CLOSING
> CLOSE 1005 (no status received [internal]) [0 bytes]
x closing TCP connection
= connection is CLOSED
connection handler failed
Traceback (most recent call last):
  File "/home/boogieman/Documents/projects/rfid/websockets_python/microdot/examples/tls/venv/lib/python3.11/site-packages/websockets/legacy/server.py", line 236, in handler
    await self.ws_handler(self)
  File "/home/boogieman/Documents/projects/rfid/websockets_python/microdot/examples/tls/server_secure.py", line 16, in hello
    name = await websocket.recv()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boogieman/Documents/projects/rfid/websockets_python/microdot/examples/tls/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 568, in recv
    await self.ensure_open()
  File "/home/boogieman/Documents/projects/rfid/websockets_python/microdot/examples/tls/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 948, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedOK: received 1005 (no status received [internal]); then sent 1005 (no status received [internal])
connection closed

@Carglglz
Copy link
Contributor

Carglglz commented May 7, 2024

The websockets implementation is based on https://github.com/danni/uwebsockets ,so it may be interesting to test that and see if it breaks too. Using Wireshark may help 🤔, and something like iperf3 for websockets would be nice.... also there is micropython/micropython#12819, and https://github.com/orgs/micropython/discussions/14427#discussioncomment-9337471

[EDIT] this may help too esp.osdebug(True, esp.LOG_INFO) and I'm seeing
this

I (35117) wifi:bcn_timeout,ap_probe_send_start
I (37617) wifi:ap_probe_send over, resett wifi status to disassoc
I (37617) wifi:state: run -> init (c800)
I (37617) wifi:pm stop, total sleep time: 4746559 us / 30202076 us

I (37627) wifi:<ba-del>idx
I (37627) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (37637) wifi: STA_DISCONNECTED, reason:200:beacon timeout
Traceback (most recent call last):
  File "lv_utils.py", line 148, in task_handler
  FI (37647) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
ile "lv_mpdiI (37657) wifi:state: init -> auth (b0)
I (37657) wifi:state: auth -> assoc (0)
I (37667) wifi:state: assoc -> run (10)

  File "board_config.py", line 146, in blit
  File "board_config.py", line 106, in send_data
  File "protocol.py", line 185, in recv
  File "protocol.py", line 92, in read_frame
OSError: [Errno 113] ECONNABORTED

So it looks like a wifi beacon timeout...

@wohltat
Copy link
Author

wohltat commented May 7, 2024

I took a closer look in wireshark and found something i didn't expect.
Although i'm only sending from the esp32 to the pc server via websockets, there are there are a lot of TCP packets that i don't know what they are doing.

The TLS packets with increasing size are probably the ones with the payload. As one expect they increase by 100 byte every time.
The other packets are:
54 byte -> Client (to 192.168.178.64)
1494 byte -> Server (to 192.168.178.58)
54 byte -> Client
1494 byte -> Server
54 byte -> Client
then the next TLS Packet

image

When the connection breaks there are those TCP packets missing. This causes OSError 113 (= ECONNABORTED).
image

As i understand it OSError is not from micropython but the Espressif IDF. Is this maybe some kind of connection-alive-checking mechanism that could be turned of?
When watching the websockets communication with a client running locally on my pc there are not those additional TCP-packets.
I also find them rather large with around 3kb "useless" data transfer for potentially just a single byte of payload.

@Carglglz
Copy link
Contributor

Carglglz commented May 8, 2024

As i understand it OSError is not from micropython but the Espressif IDF. Is this maybe some kind of connection-alive-checking mechanism that could be turned of?

I've tried to increase the beacon timeout but it only takes longer to throw the error after the socket becomes unresponsive, it seems like the whole network stack stops...I've just tested the solution at micropython/micropython#12819 but it still stops at some point...same with enabling debug/verbose mode, it stops too...well I think this may be important to read it carefully... https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-guides/wifi.html#wi-fi-buffer-usage

@Carglglz
Copy link
Contributor

Carglglz commented May 9, 2024

@wohltat my conclusion is that after micropython/micropython#13219 and micropython/micropython#12141 (see e.g. micropython/micropython#12141 (comment))
this has become more relevant

Why Buffer Configuration Is Important
In order to get a high-performance system, consider the memory usage/configuration carefully for the following reasons:

  • the available memory in ESP32 is limited.
  • currently, the default type of buffer in LwIP and Wi-Fi drivers is "dynamic", which means that both the LwIP and Wi-Fi share memory with the application. Programmers should always keep this in mind; otherwise, they will face a memory issue, such as "running out of heap memory".
  • it is very dangerous to run out of heap memory, as this will cause ESP32 an "undefined behavior". Thus, enough heap memory should be reserved for the application, so that it never runs out of it.
    the Wi-Fi throughput heavily depends on memory-related configurations, such as the TCP window size and Wi-Fi RX/TX dynamic buffer number.
  • the peak heap memory that the ESP32 LwIP/Wi-Fi may consume depends on a number of factors, such as the maximum TCP/UDP connections that the application may have.
  • the total memory that the application requires is also an important factor when considering memory configuration.

Due to these reasons, there is not a good-for-all application configuration. Rather, it is recommended to consider memory configurations separately for every different application.

see also
https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-guides/lwip.html#lwip-performance

possible solutions:

@wohltat
Copy link
Author

wohltat commented May 20, 2024

I've spend some time researching. So far i don't have a complete solution but some insights:

  • It helps to change some options to get some more RAM. This helped to send significant longer messages before the connection broke

    • reduce wifi static rx buffer, CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM=4
    • disabled ipv6 support
  • i also tried debug output (CONFIG_LWIP_DEBUG=y) and i found that in mbedtls there is an unexpected error:
    'DBG:/IDF/components/mbedtls/mbedtls/library/ssl_msg.c:2145: ssl->f_send() returned -26752 (-0x6880)'
    0x6880 SSL - Connection requires a write call
    (Post Request with SSL doesn’t work (mbedtls error) godotengine/godot#59216)
    Could the have something to do with missing backpressure as mentioned in python-websockets
    This would make sense since everytime the error occurs, the client send message N and the server just received message N-2 or less. So that would mean that the messages are not send and the send-buffer is filling up. But i'm not quite sure about that theory, what do you think?

  • i found compilation times very slow. Every time only one option is changed the whole idf has to be recompiled which takes like 3-4 minutes on my computer. Is there a way to accelerate this process?

  • I tried to remove bluetooth from the firmware to free up some more memory but i always get error messages? I could not find anything about how to correctly remove bt. I'm not quite sure but i think i remember that you said something about removing bt before.

@wohltat
Copy link
Author

wohltat commented May 20, 2024

I also tried compiling with IDF v5.2, which is supported according to the micropython documentation.
That turn out to not work at all so far.

  1. IDF v5.2 had a IRAM memory shortage. I tried to set some flags that were recommended to free IRAM space.
  2. after flashing i got MBEDTLS_ERR_MPI_ALLOC_FAILED when running my application. Then i followed the steps to optimize memory for Mbed TLS
  3. This didn't make it work either and i got the following error message:
    Guru Meditation Error: Core 1 panic'ed (StoreProhibited). Exception was unhandled.

@Carglglz
Copy link
Contributor

It helps to change some options to get some more RAM. This helped to send significant longer messages before the connection broke

Yes the problem is basically this:

  • MicroPython allocates memory --> at some point it needs more memory which takes from idf heap (this memory now cannot be given back to idf) --> idf tries to allocate wifi buffers but now there is no memory --> undefined behaviour i.e. errors or stale until network beacon timeout.

Solution is preallocate what you really need in MicroPython and make sure it does not allocate over a certain size leaving enough memory for idf heap/wifi buffers. Also be aware that slicing e.g. b[0:100*n] allocates memory so in your example this is going to lead to memory fragmentation and then the above mentioned "overallocation"

i found compilation times very slow. Every time only one option is changed the whole idf has to be recompiled which takes like 3-4 minutes on my computer. Is there a way to accelerate this process?

yes but it will depend on the number of core/threads of your computer use -j <N> option with the make command,

$ man make

 -j [jobs], --jobs[=jobs]
            Specifies the number of jobs (commands) to run simultaneously.   If  there  is  more
            than one -j option, the last one is effective.  If the -j option is given without an
            argument, make will not limit the number of jobs that can run simultaneously.

I also tried compiling with IDF v5.2, which is supported according to the micropython documentation.
That turn out to not work at all so far.

IDF v5.1.2 it's the latest I've tested which seems to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants