-
-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MessageOutput: thread-safety and dynamic memory #1212
Conversation
Did you make a longer check regarding the memory fragmentation? Keep in mind, we only have ~110kb of memory available from which a huge portion will be required when rendering the prometheus output and the livedata json output. Memory fragmentation was the main reason that no dynamic memory is used. |
I really don't expect this to be a problem even on our memory-constrained ESP32 devices. The buffers aren't that big (lines of log output), so they can occupy gaps in the memory layout, and more importantly, they are freed as soon as possible and they are indeed all freed routinely as long as the main loop is running. Moreover, I took care that each buffer is not copied uneccesarily (it is moved to I hate it but have to admit that your point is of course totally valid. This change must not break existing functionality. Let me point out though, that being unable to have "nice things" like this working solution to the problem of mangled logs because some other component insists on allocating an insanely huge block of consecutive memory (relative to the whole memory) is frustrating. In the long run, data should be processed in reasonable chunks instead. I hope that's achievable. Since I do want prometheus integration myself in the near future, I might look at this myself. Having said that, I ran a test like this:
I let this run for one and a half hour and observed no Also, this change is running for ~20 hours on my productive board, with no signs of issues. However, that board runs OpenDTU-OnBattery, so it's not perfectly comparable. Still, I did not cause any issues by querying the prometheus API on that node. What do you think? Was 1 hour long enough (with the intervals so short)? Have I missed something that I should have run in parallel? Is this setup worth documenting for future stress tests? |
The newly added commit fixes a problem that existed before, but has been made worse by the thread-safety proposed with this pull request. Also before, very long lines that were printed through MessageOutput would be chopped off at around 256 Bytes (a guess, I did not check) because of a limited buffer in HardwareSerial (or the hardware, I don't know for sure). However, nobody probably observed this problem since usually no single call to print something would result in a line longer than that limit. Multiple print statements give the HardwareSerial and the actual hardware enough time to make room for the next characters. |
b1aacad
to
a044e83
Compare
I noticed again that the web console is missing messages in logs provided by a user of the OpenDTU-OnBattery fork, see the last paragraph in hoylabs#393 (comment). That's not your problem until you notice missing messages in web console output of your project's users as well, I understand. However, before I create a pull request against said fork, hoping that @helgeerbe would be willing to merge it, I wanted to ask you @tbnobody if I can do anything else to convince you that this proposed change is a valuable addition? |
I tested the PR today. The memory usage was enormous. At some times it consumed 20kb more RAM then without this PR. The response of the system info api was also a little bit laggy. The flash consumption was at about 50kb if I remember correctly. |
Thanks for considering and testing the change! Would you mind elaborating on how you benchmarked the memory consumption, so I can try to optimize this change in this regard? What's the URL of the system info API you said was lagging ( What device are you testing against? Mine is an ESP32-S3 with two cores, maybe I need to use something else. Are we considering single-core ESP32? +50k flash consumption really surprises me. I did not check it, to be honest. Maybe it's simple to fix by using different containers. I will have a look at this as well. |
I am using the normal esp32 without any extension. So it's dual core. And I've just doublechecked the flash consumption.... It's 1530937 vs 1535157 so ~4,2kb (instead of the 50kb mentioned above) |
SetupI have an ESP32-WROOM-32U that I used for testing in more detail to address your concers. I build for the This is how I inspected the heap usage:
This is how I tested the latency in the system info page:
This neglects any delays and any lagging that is caused by the client when executing the web interface. This especially includes checking for a new firmware version online, which can take more or less time depending on various factors. Flash consumptionThanks for checking the flash usage again. It's the same for me. Are you okay with spending ~4.2k of flash on this improvement? We can carve out ~1.7k if we switched to a Interesting problemI noticed that the web console output and the serial output contains less lines per time while doing any of the two tests and the inverter is not polled successfully while the tests are running. This probably means that processing the HTTP response of ResultsHeap usageHeap usage when the change is not included is between 124120 and 132708 Bytes. By chance, I managed to capture a moment where the firmware without the change used more heap memory as with the change included. It seemed very odd to me that this feature would cause 20k of heap memory being hogged at all and especially for a long enough period that one would actually see it in the system info page. I understand that memory is a scarce resource and that you have been burned before, but realistically, this change will only use memory in the region of a few kilobytes of memory at the very worst. That could happen if a lot of software components with their own LatencyWithout the change:
With the change:
I conclude that your observation regarding the lagging in the system info page might have something to do with the client-side processing of the page in the browser. It seems that this change has no influence on the time it takes to prepare the information required to render the system info page. That being said, preparing the JSON response for that page might be in need of improvement. 444ms of exclusive processing time is quite a lot. It's probably that much because coming the memory to calculate the heap usage is quite intense. Next StepsWhat else should be look at to make sure this change does not cause issues elsewhere? |
these adjustments guarantee unmangled log messages and more importantly, guarantee that all messages from a particular component are printed to the web console, which most people use to copy messages from when reporting issues. * use dynamic memory to allow handling of arbitrary message lenghts. * keep a message buffer for every task so no task ever mangles the message of another task. * every complete line is written to the serial console and moved to a line buffer for sending them through the websocket. * the websocket is always fed complete lines. * make sure to feed only as many lines as possible to the websocket handler, so that no lines are dropped. * lock all MessageOutput state against concurrent access. * respect HardwareSerial buffer size: the MessageOutput class buffers whole lines of output printed by any task in order to avoid mangling of text. that means we hand over full lines to the HardwareSerial instance, which might be too much in one call to write(buffer, size). we now check the return value of write(buffer, size) and call the function again with the part of the message that could not yet be written by HardwareSerial.
a044e83
to
601b34e
Compare
I would like to maintain that this would be a useful update to the MessageOutput class. The change is shipping for months now in the downstream project OpenDTU-OnBattery without issues. Today I re-introduced this change to the downstream project in hoylabs#567 as its owner reverted the change when merging from this upstream project recently (TaskScheduler introduction). For that reason I am coming back here and updated the change to be compatible with TaskScheduler. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns. |
Very early on I found issues with the serial output of OpenDTU, i.e., garbled messages, where one message would be interrupted with another message. Also, some output seemed to be missing. There was at least discussion about this, see #1130. Very recently, a change was implemented in 43cba67, which is a big step forward, but does not solve all issues:
Print.h
and implemented inPrint.cpp
, especiallyprintln(String&)
, still are "vurnerable" to scheduling. Example: Scheduling can happen after printing the contents of a line, but before printing the\r\n
.println()
, but are stitched together with multipleprint()
s. Those can still be garbled beyond recognition due to scheduling.MessageOutput::write(uint8_t)
because the lock is not protecting the buffer position check.My proposal with this MR is as follows: