-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in 1.6 with Lora32 RTL_433 Acurite 5 in 1 #1693
Comments
Did you had 1.5.1 before or is it a new installation? |
1.5.1 did not have this problem. On Jun 20, 2023, at 6:57 PM, Florian ***@***.***> wrote:
Did you had 1.5.1 before or is it a new installation?
If it is an update was it a different behavior with 1.5.1?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I stand corrected. v1.5.1 also goes offline and stops updating the rtl_433 topics, as well as the SYS: Uptime and SYS: FreeMemory topics |
Could you detail the RF devices that are being decoded, could help spot an issue with a particular decoder |
Normally receiving from 2 AmbientWeather F007TH sensors, an Acurite-5n1, and 2 Oregon-THGR810 sensors. After letting it run for a few more hours, I'm also getting signals from a number of nearby sensors, including Acurite-986, Acurite-609TXC, Acurite-Atlas, Interlogix-Security, Generic-Motion, Oregon-CM180i, Skylink_HA-434TL_motion, Acurite-606TX, and Springfield-Soil |
Hi, i've installed last firmware 1.6.0 on my lilygo_rtl-433 and it freeze too... afer five minutes (reboot and lost wifi).. I don't kwow if it's memory leak or chacon/dio protocols... in fact, my chacon products are still not recongnized... |
@rknobbe Can you try disabling the Home Assistant discovery feature in case it is triggering the issue ? https://docs.openmqttgateway.com/use/gateway.html#auto-discovery |
@rknobbe I wish that the discovery setting was the issue, as determining which signal is triggering the leak is going to be hard. It is possible throw a process of elimination to determine which signal is triggering the leak ? |
The linux version of rtl_433 has commandline options to select and deselect which parsers to include. Does the OMG version have a similar feature? I haven't seen it in the docs. Otherwise I can selectively turn off some of my sensors. However there are way more stray signals (from neighbors' sensors) than intentional ones, and I can't disable those. |
By the way, I've set up this automation in HomeAssistant to detect low memory condition and restart the gateway automatically. It's holding up well now.
|
Unfortunately we don't have a feature to enable or disable particular parsers/decoders |
I really only care about the Acurite 5n1 sensor (and it's the only one of mine that I'd prefer not to disconnect since it's on my roof). Let me see if I can get the gateway and library to compile with just that sensor enabled. |
Take a look in the code base for the directive |
I see MY_DEVICES in environments.ini, but it's not clear how I specify which DEVICES I want to be compiled in rtl_433_esp (or how to compile rtl_433_esp in VSCode, for that matter). Regardless, I've enabled MY_DEVICES and reflashed with default_envs=lilygo-rtl_433. Will let you know how the memory looks in the morning. |
Sorry, I should have mentioned that MY_DEVICES is a directive for the rtl_433_ESP library. It is used a couple places in the library to allow testing of a subset of decoders. In the code you would need to specify which decoders. To use a custom version of rtl_433_ESP do this 1 - git clone the rtl_433_ESP library from GitHub
The symlink needs to point to where you cloned rtl_433_ESP 3 - Then in your environment switch
|
@NorthernMan54 - thanks for the hint on how to get rtl_433_ESP listening to only specific device types. This morning I recompiled v1.6.0 with only support for the Acurite decoders. I've been running most of the day, and the console plus MQTT-Explorer both give me confidence I'm only hearing my Acurite 5-n-1. However the memory trend doesn't look encouraging. |
I turned on MEMORY_DEBUG and watched for a while. I do eventually see what looks like a loss. Here is a capture of about 1800 lines. Look at the section that starts at line 683. |
Another trace with only Acurite 5n1 messages |
A temporary dip in heap is expected as the OMG WebUI and Display caches messages for display. But the cache has a max depth, so that it doesn't leak heap memory. I'm starting a long running test with the latest build in an attempt to recreate this. I also have an acurite device that uses the same decoder N: Send on /RTL_433toMQTT/Acurite-Tower/B/2043 msg {"model":"Acurite-Tower","id":2043,"channel":"B","battery_ok":1,"temperature_C":17.4,"humidity":77,"mic":"CHECKSUM","protocol":"Acurite 592TXR Temp/Humidity, 592TX Temp, 5n1 Weather Station, 6045 Lightning, 899 Rain, 3N1, Atlas","rssi":-69,"duration":120001} |
So weird! I rebuilt again last night, eliminating the stray Acurite 686's that are in the neighborhood. This build only has the 5n1 decoder and a WH51 that doesn't seem to get picked up. Same slope of memory loss. |
Isn't the WH51 using FSK encoding, so you won't be able to receive it at the same time as you are receive OOK signals. I was looking at your RTLCnt, and am wondering if you have a significantly more messages coming in compared to my setup. And if this is triggering the leak. Doing some math
Does 3 a second make sense to you ? |
RTLCnt is incremented for each signal received, and passed to the signal decoder. Am thinking we are getting closer to the case of the leak, this high rate of signal may be causing a race condition. If you enable the directive |
I'll recompile and upload tonight. I'm encouraged |
I'm not seeing a huge volume of unparsed messages. Here a few examples
|
Seeing a lot slower pace of RTLCnt with this new build too, don't understand why. Maybe whatever was blasting me this afternoon has turned off. here is the first 5 minutes or so from the webUI console. You'll notice there is about a 2:1 ratio of unrecognized frames to Acurite 5n1 frames. After about 1000 seconds RTLCnt is up to about 245. |
I'm not sure what to make of the data I'm seeing. Not many "undecoded signal" messages in the terminal trace, but other than the count in MQTT_Explorer I don't know how to count them. After about 4000 seconds I have about 200 messages from my 5n1 weather station and RTLCnt is over 6000. I also captured the "pulses" value from the undecoded_signal topic, but I don't know if that will help you. |
In an attempt to minimize the number of variables and components involved with this, does it make sense to try a build of just the rtl_433_ESP example receiver ? To ensure that the leak is within rtl_433_ESP and not within OMG. With the OOK_Receiver example, you would need to set the appropriate compiler directives etc. I was also looking at the logic around memory usage, and each received signal allocates some heap for storing the signal for processing, passes it to the decoder logic, then the decoder logic frees it after processing. I did not see any obvious leaks. |
Has there been any resolution to this? I have the same issue. |
Sorry for the bump. My neighbor has an Acurite 5-in-1 that is within range of half of my house. If I move my device (ESP32/CC1101, RTL_433 receiver running) within range of the 5-in-1, while still running, the memory leak starts. When I move the device out of range again, while still running, the memory leak stops. If a single RTL_433 decoder is responsible for the leak the Acurite 5-in-1 is a likely culprit. |
Thanks for this it helps, now the next step is to analyze the corresponding decoder in RTL_433 project to identify any memory leak |
I do not have the ability to move mine outside of the range, but I also have a neighbor with a 5-in-1 and am also experiencing this memory leak. |
I'll rebuild one of my gateways this weekend with the 5-in-1 removed and compare with one that has it enabled. |
@rknobbe An easy way to disable a device decoder is to add the disabled flag - FYI - https://github.com/NorthernMan54/rtl_433_ESP/blob/3fea1cf678212ea5ef70e38f625bc8505f73bb28/src/rtl_433/devices/mebus.c#L90 I spent some time yesterday doing a code review of the acurite device decoder and nothing immediately jumped at out my to say this is it. But still reviewing |
I have encountered this as well. We thought that the radio device was bad (flashed with lilygo-rtl_433), but when I swapped it out with a known stable device the same crash was encountered. |
I'm trying to use a JTAG debugger to examine this problem further, but I'm having trouble following these Espressif instructions for host-based heap tracing. I can't find how to set the configuration options in the first few steps:
Is there a guide available for how to use the esp-idf menuconfig within platformio? |
@ianmtaylor1 Sorry I have no experience with the JTAG Debugger, I just use print statements in code ( I know that's primitive, but it works most times. ) With the Acurite 5 in 1, I do have 5 n 1 devices in my setup, and I do not have a memory leak in my setup. |
I think the question here, is how to use menuconfig with Arduino framework, this may be helpful |
Hello, define DEVICES \
I run also in combination with BT, but issue still there if BT disabled. |
I have a neighbor with a lot of interlogix devices that show up too. I think I've seen one or two Oregon scientific devices. |
@kjetilsn Any thoughts on what within the Oregon Scientific device decoder is triggering this ? |
@NorthernMan54 |
Identifying memory leaks sometimes can be tricky, my approach is to first review the code for anything that looks suspicious, then go from there. The Oregon decoder is a bit complex, so identifying where to look first is advantageous, but your two devices appear to use this function Looking at the function, this line looks like a possibility It is allocating 44 Bytes every invocation. And I'm not sure if it is cleaned up or not You could try just moving it outside the function, so it only gets invoked once. ie before the function starts |
I'm guessing valgrind won't work in this environment?
…On Wed, Dec 13, 2023, 5:10 PM Northern Man ***@***.***> wrote:
Identifying memory leaks sometimes can be tricky, my approach is to first
review the code for anything that looks suspicious, then go from there.
The Oregon decoder is a bit complex, so identifying where to look first is
advantageous, but your two devices appear to use this function
oregon_scientific_v3_decode so starting there makes sense
https://github.com/NorthernMan54/rtl_433_ESP/blob/3fea1cf678212ea5ef70e38f625bc8505f73bb28/src/rtl_433/devices/oregon_scientific.c#L598
Looking at the function, this line looks like a possibility unsigned char
msg[EXPECTED_NUM_BYTES] = {0};
https://github.com/NorthernMan54/rtl_433_ESP/blob/3fea1cf678212ea5ef70e38f625bc8505f73bb28/src/rtl_433/devices/oregon_scientific.c#L612
It is allocating 44 Bytes every invocation. And I'm not sure if it is
cleaned up or not
You could try just moving it outside the function, so it only gets invoked
once. ie before the function starts
—
Reply to this email directly, view it on GitHub
<#1693 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKNSWR3KWQ44PPCM7VVJBTYJI7VPAVCNFSM6AAAAAAZN6VHKWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJUHA4DIMJVGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I've found a memory leak, but I don't know for sure if it's the only memory leak. At least, my problems appear to have gone away. The leak is here, in the function responsible for converting units of received signals. While converting inches to millimeters, we get the following section: else if ((d->type == DATA_DOUBLE) &&
(str_endswith(d->key, "_in") || str_endswith(d->key, "_inch"))) {
d->value.v_dbl = inch2mm(d->value.v_dbl);
char* new_label =
str_replace(str_replace(d->key, "_inch", "_in"), "_in", "_mm");
free(d->key);
d->key = new_label;
char* new_format_label = str_replace(d->format, "in", "mm");
free(d->format);
d->format = new_format_label;
} The function Changing this line appears to have fixed the leak, at least for me. When I enabled MEMORY_DEBUG in rtl_433_ESP, I noticed the leak was only happening on type 49 messages (wind speed, wind direction, rainfall) and not type 56 (wind speed, temperature, humidity) messages. This explains why it was so hard to pin down a specific device, because it was happening during decoding/output but not in any specific decoder. @kjetilsn does your Oregon Scientific device measure rainfall? @NorthernMan54 are you converting units to SI or no? If not, it could explain why you aren't experiencing this with your 5-in-1. Like I said, I don't know if this is the only leak, but this is definitely a leak. I'm happy to submit a pull request. |
With the Library I thought I had hard coded it into Metric mode, hence why everyone is seeing the issue. And for my devices, I have another sensor that leverages the same acurite device decoders, but it does not do rain fall. So that is why I never noticed this. Will get this released into rtl_433_ESP over the next few days, it will also need an update on the OMG side as well so stay tuned. |
Thank you @NorthernMan54 for advise on debugging, I did not get around to testing it though. |
FYI - The same fix was implemented in rtl_433, well done @ianmtaylor1 |
Before submitting a problem please check the troubleshooting section
https://docs.openmqttgateway.com/upload/troubleshoot.html
Describe the bug
Memory slowly decreasing over the course of the day. Eventually becomes unresponsive.
To Reproduce
Installed lilygo_rtl-433 from the web installer. Configured to connect to my MQTT server. Plotting memory and uptime in HomeAssistant.
Expected behavior
Stable memory usage.
Screenshots
See attached
Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: