Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No proper check of memory allocation when using DynamicJsonDocument results in sometimes temporary unexpected behavior during runtime #1615

Closed
helgeerbe opened this issue Jan 4, 2024 · 24 comments
Labels
bug Something isn't working

Comments

@helgeerbe
Copy link
Contributor

helgeerbe commented Jan 4, 2024

What happened?

I observed this error on openDTU-onBattery, but it should be fixed in the upstream project. So I open the issue here.

DynamicJsonDocumentis used to create datasets for the Web-API, MQTT messaging and to backup config data.

Problem is, that DynamicJsonDocument doesn't throw an exception, if memory allocation fails. This results in strange and unexpected (sometimes temporary) behavior of openDTU during runtime.

  • Liveview api returns null
  • Mqtt doesn't send messages
  • Saved config files are corrupted

Since this error is not handled, there is no logging of this resource problem.

To Reproduce Bug

Extend INV_MAX_COUNTsignificantly and you will never see live data, nor you will get any logging.

The try block std::bad_alloc catches errors from serializeJason(), when there is not enough memory for DynamicJsonDocument but not enough memory for serialization.

Expected Behavior

At least I would expect some logging for live view and mqtt that there is not enough memory.
Config files should never be saved when they are corrupted.

Install Method

Self-Compiled

What git-hash/version of OpenDTU?

7946dfc

Relevant log/trace output

No response

Anything else?

After each call of DynamicJsonDocument()there should be a test with DynamicJsonDocument::capacity != 0.

During my tests I used:

  • ESP.getFreeHeap()
  • ESP.getMaxAllocHeap()

While both functions tells me that there is enough memory DynamicJsonDocument::capacity() returns 0. This may be due to my particular Esp32, as I suspect that the flash memory is no longer working properly. I have to desolder it and replace it with a new Esp32.

@helgeerbe helgeerbe added the bug Something isn't working label Jan 4, 2024
@helgeerbe helgeerbe changed the title No proper check of memory allocation when using DynamicJasonDocument results in sometimes temporyry behavior during runtime No proper check of memory allocation when using DynamicJaonDocument results in sometimes temporyry behavior during runtime Jan 4, 2024
@helgeerbe helgeerbe changed the title No proper check of memory allocation when using DynamicJaonDocument results in sometimes temporyry behavior during runtime No proper check of memory allocation when using DynamicJsonDocument results in sometimes temporary unexpected behavior during runtime Jan 4, 2024
@tbnobody
Copy link
Owner

tbnobody commented Jan 4, 2024

Thanks for the hint!

This may be due to my particular Esp32, as I suspect that the flash memory is no longer working properly.

I don't think so. The objects etc. are allocated in the RAM not in the flash memory. But I could imagine that there is not enough free memory in one piece. (Keyword memory fragmentation)

Extend INV_MAX_COUNTsignificantly and you will never see live data, nor you will get any logging.

Will implement some logging, but in the end, it seems that in a long term, the live data response has to be split into different requests or websocket responses (e.g. just send one inverter).

Regarding Configuration handling, I have already implemented the check. Will be released soon.

@helgeerbe
Copy link
Contributor Author

Another observation:

void WebApiWsLiveClass::loop()
{
    .
    .
    .


    // Update on every inverter change or at least after 10 seconds
    if (millis() - _lastWsPublish > (10 * 1000) || (maxTimeStamp != _newestInverterTimestamp)) {

        try {
            std::lock_guard<std::mutex> lock(_mutex);
            MessageOutput.printf("Calling /api/livedata/status FreeHeap = %d, MaxAllocHeap = %d, MinFreeHeap = %d.\r\n", ESP.getFreeHeap(), ESP.getMaxAllocHeap(), ESP.getMinFreeHeap());
            DynamicJsonDocument root(4200 * INV_MAX_COUNT); // TODO(helge) check if this calculation is correct
            JsonVariant var = root;

            MessageOutput.printf("Calling /api/livedata/status jsonDocument capacity = %d.\r\n", root.capacity());
            generateJsonResponse(var);
            MessageOutput.printf("Calling /api/livedata/status JsonDocumentSize = %d (allocated %d).\r\n", root.memoryUsage(), 4200 * INV_MAX_COUNT);
                
            String buffer;
            if (buffer) {
                serializeJson(root, buffer);

                if (Configuration.get().Security.AllowReadonly) {
                    _ws.setAuthentication("", "");
                } else {
                    _ws.setAuthentication(AUTH_USERNAME, Configuration.get().Security.Password);
                }

                _ws.textAll(buffer);
            }

        } catch (const std::bad_alloc& bad_alloc) {
            MessageOutput.printf("Calling /api/livedata/status has temporarily run out of resources. Reason: \"%s\".\r\n", bad_alloc.what());
        } catch (const std::exception& exc) {
            MessageOutput.printf("Unknown exception in /api/livedata/status. Reason: \"%s\".\r\n", exc.what());
        }

        _lastWsPublish = millis();
    }
}

Shows in the log:

13:27:07.110 > Websocket: [/livedata][1] connect
13:27:07.167 > Websocket: [/vedirectlivedata][1] connect
13:27:07.179 > Calling /api/livedata/status FreeHeap = 98200, MaxAllocHeap = 57332, MinFreeHeap = 60268.
13:27:07.188 > Calling /api/livedata/status jsonDocument capacity = 21000.
13:27:07.201 > Calling /api/livedata/status JsonDocumentSize = 4275 (allocated 21000).

Now I enable TLS in MQTT

13:34:59.572 > Websocket: [/livedata][22] connect
13:34:59.591 > Calling /api/livedata/status FreeHeap = 55276, MaxAllocHeap = 32756, MinFreeHeap = 34796.
13:34:59.616 > Calling /api/livedata/status jsonDocument capacity = 0.
13:34:59.643 > Calling /api/livedata/status JsonDocumentSize = 0 (allocated 21000).

@tbnobody
Copy link
Owner

tbnobody commented Jan 4, 2024

Yes, it's clear that TLS requries a lot of memory... As mentioned before, on the long term the repsonse has to be changed.

@helgeerbe
Copy link
Contributor Author

I know, that there is heap fragmentation. I dump MaxAllocHeap which shows the largest memory block, which could be allocated. The size of the JsonDocument is significant smaller than that block. But I agree, logging some warnings would be good and then split the live data in pieces, or stream it in smaller chunks over the websocket.

@helgeerbe
Copy link
Contributor Author

This may be due to my particular Esp32, as I suspect that the flash memory is no longer working properly.

I don't think so. The objects etc. are allocated in the RAM not in the flash memory. But I could imagine that there is not enough free memory in one piece. (Keyword memory fragmentation)

Sorry my fault, I talked about the RAM, too. My Esp32 shows some strange hardware behavior, so I believe it is going to die soon.

@tbnobody
Copy link
Owner

tbnobody commented Jan 21, 2024

Made several experiments this evening to reduce the heap usage on the /api/livedata/status endpoint as well as the /livedata websocket.
Reduced the memory usage from 4096 * INV_MAX_COUNT to 4096.

The api/livedata/status endpoint just contains the following:

{
    "inverters": [
        {
            "serial": "116480148266",
            "name": "HMS-2000-01",
            "order": 0,
            "data_age": 405,
            "poll_enabled": true,
            "reachable": false,
            "producing": false,
            "limit_relative": 0,
            "limit_absolute": -1
        },
        {
            "serial": "116480145267",
            "name": "teste 1",
            "order": 1,
            "data_age": 405,
            "poll_enabled": false,
            "reachable": false,
            "producing": false,
            "limit_relative": 0,
            "limit_absolute": -1
        },
        {
            "serial": "116480145268",
            "name": "test 2",
            "order": 2,
            "data_age": 405,
            "poll_enabled": false,
            "reachable": false,
            "producing": false,
            "limit_relative": 0,
            "limit_absolute": -1
        }
    ],
    "total": {
        "Power": {
            "v": 0,
            "u": "W",
            "d": 1
        },
        "YieldDay": {
            "v": 0,
            "u": "Wh",
            "d": 0
        },
        "YieldTotal": {
            "v": 0,
            "u": "kWh",
            "d": 3
        }
    },
    "hints": {
        "time_sync": false,
        "radio_problem": false,
        "default_password": true
    }
}

This is enough to generate the basic structure of the live data page. The real data are coming from the websocket. Not at once but each inverter after another
e.g.

{
  "inverters": [
    {
      "serial": "116480148266",
      "name": "HMS-2000-01",
      "order": 0,
      "data_age": 550,
      "poll_enabled": true,
      "reachable": false,
      "producing": false,
      "limit_relative": 0,
      "limit_absolute": -1,
      "AC": {
        "0": {
          "Power": {
            "v": 0,
            "u": "W",
            "d": 1
          },
          "Voltage": {
            "v": 0,
            "u": "V",
            "d": 1
          },
          "Current": {
            "v": 0,
            "u": "A",
            "d": 2
          },
          "Power DC": {
            "v": 0,
            "u": "W",
            "d": 1
          },
          "YieldDay": {
            "v": 0,
            "u": "Wh",
            "d": 0
          },
          "YieldTotal": {
            "v": 0,
            "u": "kWh",
            "d": 3
          },
          "Frequency": {
            "v": 0,
            "u": "Hz",
            "d": 2
          },
          "PowerFactor": {
            "v": 0,
            "u": "",
            "d": 3
          },
          "ReactivePower": {
            "v": 0,
            "u": "var",
            "d": 1
          },
          "Efficiency": {
            "v": 0,
            "u": "%",
            "d": 3
          }
        }
      },
      "DC": {
        "0": {
          "name": {
            "u": "Panel 1"
          },
          "Power": {
            "v": 0,
            "u": "W",
            "d": 1
          },
          "Voltage": {
            "v": 0,
            "u": "V",
            "d": 1
          },
          "Current": {
            "v": 0,
            "u": "A",
            "d": 2
          },
          "YieldDay": {
            "v": 0,
            "u": "Wh",
            "d": 0
          },
          "YieldTotal": {
            "v": 0,
            "u": "kWh",
            "d": 3
          },
          "Irradiation": {
            "v": 0,
            "u": "%",
            "d": 3,
            "max": 400
          }
        },
        "1": {
          "name": {
            "u": ""
          },
          "Power": {
            "v": 0,
            "u": "W",
            "d": 1
          },
          "Voltage": {
            "v": 0,
            "u": "V",
            "d": 1
          },
          "Current": {
            "v": 0,
            "u": "A",
            "d": 2
          },
          "YieldDay": {
            "v": 0,
            "u": "Wh",
            "d": 0
          },
          "YieldTotal": {
            "v": 0,
            "u": "kWh",
            "d": 3
          }
        },
        "2": {
          "name": {
            "u": ""
          },
          "Power": {
            "v": 0,
            "u": "W",
            "d": 1
          },
          "Voltage": {
            "v": 0,
            "u": "V",
            "d": 1
          },
          "Current": {
            "v": 0,
            "u": "A",
            "d": 2
          },
          "YieldDay": {
            "v": 0,
            "u": "Wh",
            "d": 0
          },
          "YieldTotal": {
            "v": 0,
            "u": "kWh",
            "d": 3
          }
        },
        "3": {
          "name": {
            "u": ""
          },
          "Power": {
            "v": 0,
            "u": "W",
            "d": 1
          },
          "Voltage": {
            "v": 0,
            "u": "V",
            "d": 1
          },
          "Current": {
            "v": 0,
            "u": "A",
            "d": 2
          },
          "YieldDay": {
            "v": 0,
            "u": "Wh",
            "d": 0
          },
          "YieldTotal": {
            "v": 0,
            "u": "kWh",
            "d": 3
          }
        }
      },
      "INV": {
        "0": {
          "Temperature": {
            "v": 0,
            "u": "°C",
            "d": 1
          }
        }
      },
      "events": 0
    }
  ],
  "total": {
    "Power": {
      "v": 0,
      "u": "W",
      "d": 1
    },
    "YieldDay": {
      "v": 0,
      "u": "Wh",
      "d": 0
    },
    "YieldTotal": {
      "v": 0,
      "u": "kWh",
      "d": 3
    }
  },
  "hints": {
    "time_sync": false,
    "radio_problem": false,
    "default_password": true
  }
}

The different websocket events are merged in the WebUI. I will also extend the /api/livedata/status endpoint with a GET parameter to fetch data for a specific inverter.

e.g. /api/livedata/status?inv=116480148266

{
    "inverters": [
        {
            "serial": "116480148266",
            "name": "HMS-2000-01",
            "order": 0,
            "data_age": 15,
            "poll_enabled": true,
            "reachable": true,
            "producing": false,
            "limit_relative": 0,
            "limit_absolute": -1,
            "AC": {
                "0": {
                    "Power": {
                        "v": 0,
                        "u": "W",
                        "d": 1
                    },
                    "Voltage": {
                        "v": 0,
                        "u": "V",
                        "d": 1
                    },
                    "Current": {
                        "v": 0,
                        "u": "A",
                        "d": 2
                    },
                    "Power DC": {
                        "v": 0,
                        "u": "W",
                        "d": 1
                    },
                    "YieldDay": {
                        "v": 0,
                        "u": "Wh",
                        "d": 0
                    },
                    "YieldTotal": {
                        "v": 0,
                        "u": "kWh",
                        "d": 3
                    },
                    "Frequency": {
                        "v": 0,
                        "u": "Hz",
                        "d": 2
                    },
                    "PowerFactor": {
                        "v": 0,
                        "u": "",
                        "d": 3
                    },
                    "ReactivePower": {
                        "v": 0,
                        "u": "var",
                        "d": 1
                    },
                    "Efficiency": {
                        "v": 0,
                        "u": "%",
                        "d": 3
                    }
                }
            },
            "DC": {
                "0": {
                    "name": {
                        "u": "Panel 1"
                    },
                    "Power": {
                        "v": 0,
                        "u": "W",
                        "d": 1
                    },
                    "Voltage": {
                        "v": 0,
                        "u": "V",
                        "d": 1
                    },
                    "Current": {
                        "v": 0,
                        "u": "A",
                        "d": 2
                    },
                    "YieldDay": {
                        "v": 0,
                        "u": "Wh",
                        "d": 0
                    },
                    "YieldTotal": {
                        "v": 0,
                        "u": "kWh",
                        "d": 3
                    },
                    "Irradiation": {
                        "v": 0,
                        "u": "%",
                        "d": 3,
                        "max": 400
                    }
                },
                "1": {
                    "name": {
                        "u": ""
                    },
                    "Power": {
                        "v": 0,
                        "u": "W",
                        "d": 1
                    },
                    "Voltage": {
                        "v": 0,
                        "u": "V",
                        "d": 1
                    },
                    "Current": {
                        "v": 0,
                        "u": "A",
                        "d": 2
                    },
                    "YieldDay": {
                        "v": 0,
                        "u": "Wh",
                        "d": 0
                    },
                    "YieldTotal": {
                        "v": 0,
                        "u": "kWh",
                        "d": 3
                    }
                },
                "2": {
                    "name": {
                        "u": ""
                    },
                    "Power": {
                        "v": 0,
                        "u": "W",
                        "d": 1
                    },
                    "Voltage": {
                        "v": 0,
                        "u": "V",
                        "d": 1
                    },
                    "Current": {
                        "v": 0,
                        "u": "A",
                        "d": 2
                    },
                    "YieldDay": {
                        "v": 0,
                        "u": "Wh",
                        "d": 0
                    },
                    "YieldTotal": {
                        "v": 0,
                        "u": "kWh",
                        "d": 3
                    }
                },
                "3": {
                    "name": {
                        "u": ""
                    },
                    "Power": {
                        "v": 0,
                        "u": "W",
                        "d": 1
                    },
                    "Voltage": {
                        "v": 0,
                        "u": "V",
                        "d": 1
                    },
                    "Current": {
                        "v": 0,
                        "u": "A",
                        "d": 2
                    },
                    "YieldDay": {
                        "v": 0,
                        "u": "Wh",
                        "d": 0
                    },
                    "YieldTotal": {
                        "v": 0,
                        "u": "kWh",
                        "d": 3
                    }
                }
            },
            "INV": {
                "0": {
                    "Temperature": {
                        "v": 0,
                        "u": "°C",
                        "d": 1
                    }
                }
            },
            "events": 0
        }
    ],
    "total": {
        "Power": {
            "v": 0,
            "u": "W",
            "d": 1
        },
        "YieldDay": {
            "v": 0,
            "u": "Wh",
            "d": 0
        },
        "YieldTotal": {
            "v": 0,
            "u": "kWh",
            "d": 3
        }
    },
    "hints": {
        "time_sync": false,
        "radio_problem": false,
        "default_password": true
    }
}

To get a overview list of all inverters just call /api/livedata/status and to get details for the inverter call /api/livedata/status?inv=<serial>

@MetaChuh
Copy link

note to other testers: version v24.1.21 doesn't have /api/livedata/status?inv= implemented yet, and will still output the old /api/livedata/status instead of a single inverter.

@MetaChuh
Copy link

@tbnobody
is there already any public binary where i can test the new /api/livedata/status?inv=<inverter_id> ?
version v24.1.26 still shows full inverter data, same as /api/livedata/status, regardless of if ?inv= is requested

thx and greetings

@tbnobody
Copy link
Owner

Not yet.... Most likley I will include it in the next version. I'm anxious as it will break the setup for users who are using the web api (regardless the fact that I wrote it on multiple locations that the API can change). But I think the advantage of a more stable system is worth it.

@tbnobody
Copy link
Owner

If you like to test it....

opendtu-generic_esp32.zip

@MetaChuh
Copy link

thx 🙏
first test of v24.1.26-3 looks good on the api side 👍
i really like that the /api/livedata/status overview still contains e.g. reachable, producing, data_age and limit_relative.
so i don't even have to poll the corresponding inverter, if it's either not producing/reachable, or if i just need the current limit, or data age. 👍

public note: web live view does not show inverters yet, but it's imho irrelevant for first api testing.

I'm anxious as it will break the setup for users who are using the web api

hopefully a little anxiety relief:
this almost same api change was not much of a problem at the ahoy project, when /api/record/live was replaced with /api/inverter/list and /api/inverter/id/<inverter_id>
most probable reason: the legacy endpoints were still available for over a year in parallel, until they finally were removed.

additional thought: in this current case, it's a bit more tricky, as the endpoint name here is still called status.
the only thing i can immediately come up with, if you'd like to run legacy status in parallel for a while, is to make it configurable als legacy [on/off] for full legacy, or new short output ...

but thats imho more work than it would be, to use another parallel endpoint name for the list, e.g. /api/livedata/inverter and /api/livedata/inverter?inv=<serial>

thx and whatever it will be, it's fine with me 👍

@tbnobody
Copy link
Owner

Sorry I included the wrong version of the WebApp. (You maybe have to manually clear the browser cache or reload with CTRL+F5 now as the git hash is not updated in this test version.

opendtu-generic_esp32.zip

@Sn0w3y
Copy link

Sn0w3y commented Jan 28, 2024

Am already testing it with openEMS now !
Had the Problem right now, that the old Version of the API forced the Device to Reboot or Crash sometimes..

Greetings

@MetaChuh
Copy link

usage info regarding the next api and webapp changes:
the next gen opendtu webapp uses websocket, instead of api calls, to display inverter values.

in case you have an opendtu behind a http reverse proxy, e.g. for data protection,
you need to set following custom passthrough headers at your proxy (nginx example):

proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;

you can test out, whether your proxy or forwarding supports websocket with following curl command:

curl --include --no-buffer --http1.1 --header "Connection: Upgrade" --header "Upgrade: websocket" --header "Sec-WebSocket-Key: randomkey" --header "Sec-WebSocket-Version: 13" https://user:password@yoururl:yourport/livedata

same with line breaks for easier reading:

curl --include --no-buffer --http1.1 --header "Connection: Upgrade" --header "Upgrade: websocket" \
--header "Sec-WebSocket-Key: randomkey" --header "Sec-WebSocket-Version: 13" \
https://user:password@yoururl:yourport/livedata

(replace user, password, yoururl and yourport, with your corresponding data)

this will make a call to ws:// or encrypted to wss://user:password@yoururl:yourport/livedata and dump the response stream to your terminal, until you press CTRL+C

sideinfo: if using https you have to request explicit --http1.1 as the http2 default will bypass the $http_upgrade variable

@Sn0w3y
Copy link

Sn0w3y commented Jan 30, 2024

removing the HTTP API is not a straight forward process and will result in a non functionable Module in openEMS :(
we only need the AC Data (Voltage, Power, Current) i please you to remain this values

@Sn0w3y
Copy link

Sn0w3y commented Jan 30, 2024

Sorry I included the wrong version of the WebApp. (You maybe have to manually clear the browser cache or reload with CTRL+F5 now as the git hash is not updated in this test version.

opendtu-generic_esp32.zip

this works well for me. now. please tell me the workflow as in the Discussion of the API development there is nothing mentioned about a websocket ?

@tbnobody
Copy link
Owner

tbnobody commented Jan 30, 2024

the next gen opendtu webapp uses websocket, instead of api calls, to display inverter values.

That's not true. The Websocket was also used in the previous versions for updating the values. With the new version you can also get all the data using the GET request. You just have to perform it per inverter. You will not get all inverters at once. But on the other hand you are able to get a list of all available inverters. Then you have to loop over all inverter serial numbers and perform a request for each serial.

And yes, I am aware that this change will break some other integrations which rely on this json output. But we can either stop further development and leave it as it is or we can change it, and therefor also the different connectors e.g. for openEMS.

I also mentioned on every Documentation of the WebAPI that it is mainly for the WebUI and may change in a way that it suits best for the WebUI.

@Sn0w3y
Copy link

Sn0w3y commented Jan 30, 2024

So basically the Endpoint /api/inverter/id/<inverter_id> will remain the same as i already implemented it now in a dev version using your dev version:

https://github.com/tbnobody/OpenDTU/files/14075558/opendtu-generic_esp32.zip

Greetz

@tbnobody
Copy link
Owner

Not sure if we understand us correctly...

I never had the intention to remove the WebAPI. Just the result and the parameters are different. (As mentioned here)

@MetaChuh
Copy link

The Websocket was also used in the previous versions for updating the values

never mind, just posted a prophylactic how to, if others might have their dtu behind reverse proxies, as this was needed to get the inverter details visible outside of home, as it was working with the current and past versions.
(maybe better not to have it posted, until someone might need it 🙏)

@Sn0w3y i think you've confused the api projects /api/inverter/id/<inverter_id> is an ahoy dtu call.
opendtu is /api/livedata/status?inv=116480148266 for this test version

@tbnobody
Copy link
Owner

@MetaChuh if you have a full vhost as an example I would add it in the howto section in the documentation (apache vs. nginx does not matter)

@tbnobody
Copy link
Owner

I think this issue can be closed. Memory checks are (moreless) implemented and also the memory consumption is much less since the updated livedata API.

@MetaChuh
Copy link

I think this issue can be closed.

yes thx.
was a mixed topic deviating mess anyways.
(reflectively seen: at least 1/2 of it was messed up by me)

Copy link

github-actions bot commented Apr 1, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants