Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket #136

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open

Websocket #136

wants to merge 35 commits into from

Conversation

CurlyMoo
Copy link

No description provided.

@McMagellan
Copy link

Feedback Test Egyras#564 Signature Alpha 5f4a763. (not the debug version)

Loaded Rule: CoPilot3303.txt with 23 rules and several SetCurves commands and modified in length.

The web server is running stable. There was no longer any freezing after opening the UI again.
The ruleset starts safely even after reboot and power down.

The rules are running correctly as far as I can tell without the console information.
The HP followed the rules correctly.
MQTT runs stable.

Known deficits: Console window only runs for about 30 seconds.
The variable information is missing.

Further development of the rules is not possible, which is why I used my current ruleset here.

For further testing I need specific information about which version should be tested on what.

I can also provide the debug log if required.

@CurlyMoo
Copy link
Author

Does it still run stable?

@McMagellan
Copy link

Yes, it's stable.

Attached is the debug log file from the last 30 hours.
I test 3x Power off and 2 Times Reboot from Browser --> OK.
Connection establishment to the web server approx. 50x.
No MQTT- connection errors.
No more problems when starting the ruleset.
There were 666x Sendlist "queue is full" messages .
My timer rule triggert 1 minute was executed 1744 times. --> OK

The message: New webserver client: was there 1053 times. ??
In reality, I only re-established a connection to the web server a maximum of 50 times during this time.
I noticed that if one or two computers are simply connected to the Web-IF, the connection is broken down and re-established in a short cycle. There's nothing wrong with it on the front end. There was never a frozen page.

The monitoring of the So and 1Wire port is visible correctly.
From Rules there are only the set commands visible.
24aputty.log

@CurlyMoo
Copy link
Author

So the only bug in this version (besides the missing rules console) is the websocket messages that freeze after 30 sec?

@McMagellan
Copy link

There were no frozen windows. In my opinion the WS is now running stable.

The only thing I'm concerned about is the large number of new and closing web server messages that can never come from my test.
Here is an example. In an hour between 15:49 and 16:49 there was a new connection to the WS of 192.168.178.86 108 times without it being served, I was working in the garden.

I can't say whether these calls are triggered by the PC.

The 1 hour is in the attached log file.
DebugLog2.txt

@CurlyMoo
Copy link
Author

CurlyMoo commented Aug 25, 2024

Please notice that all the value tables are refreshed by an javascript Ajax call:

 function refreshTable(tableName){
   switch(tableName) {
     case 'Heatpump':
       loadContent('heishavalues', '/tablerefresh', function(){});
       break;
     case 'S0':
       loadContent('s0values', '/tablerefresh?s0', function(){});
       break;
     case 'Opentherm':
       loadContent('openthermvalues', '/tablerefresh?opentherm', function(){});
       break;
     case 'Dallas':
       loadContent('dallasvalues', '/tablerefresh?1wire', function()
         {
           var dallas_elements = document.getElementsByClassName("dallas_alias");
           for (var i = 0; i < dallas_elements.length; i++) {
               dallas_elements[i].addEventListener('blur', dallasAliasEdit, false);
           }
         });
       break;
     default:
       break;
   }
  clearTimeout(timeout);
  timeout=setTimeout(refreshTable, 30000, tableName);
}

These calls are just regular webserver requests. And as you can see in the logs, almost all of the calls happen in a 30 second interval.

@IgorYbema Do you have time to check if you can reproduce the websocket connection failing after 30 seconds issue?

@McMagellan
Copy link

I would like to give another hint.
The situation with the many WS calls could have arisen as follows.

  1. Establish a new connection with Heishamon.
  2. Switch to the console window.
  3. Notice that the window is frozen after 30 seconds.
  4. Maintain this state for hours.

Since the console window currently only works to a limited extent, it may not be an error at all and would no longer occur if the console was running correctly again.

If the rules information is re-integrated into the console window, I would have 2 suggestions for improvement. I'm a pensioner and grateful for everything that makes things clearer.

  1. For better clarity, it would be good to insert a blank line after the last global variable entry.

  2. In old Heishamon versions, the variables that were changed in the timer were listed at the end of the list. The new version uses a fixed order in the list with numbering. It would be great if the last changed variables were marked with an asterisk. e.g.

old:
0 #Clock = 0
new:
0* #Clock = 0

If it's not too much effort, it would be even better if trend information was displayed for numerical variables. e.g.:

New value greater than old value:
0+ #Clock = 0

New value smaller than old value:
0- #Clock = 0

For changed string contents:
0* #Clock = Upshift

@CurlyMoo
Copy link
Author

Since the console window currently only works to a limited extent, it may not be an error at all and would no longer occur if the console was running correctly again.

The websockets shouldn't disconnect in 30 seconds so lets first fix that.

@IgorYbema
Copy link
Owner

Let me take a look at that indeed

@CurlyMoo
Copy link
Author

There was indeed a bug in the WS heartbeat. The last build should fix that.

@IgorYbema
Copy link
Owner

If you mean that that would fix the console 30 sec bug? It made it worse> it not stops after about 10 secs

@CurlyMoo
Copy link
Author

What does whireshark say about the PING/PONG of the websocket connection. The webserver should now send a PING each 3 seconds that should immediatly responded to by a PONG from the client.

@IgorYbema
Copy link
Owner

PING/PONG works fine. According to chrome the websocket is closed due to wrong data received:
WebSocket connection to 'ws://heishamon.local/ws' failed: Could not decode a text frame as UTF-8.
Trying to figure on in wireshark if I can see that invalid data

@CurlyMoo
Copy link
Author

I'm wondering why the change does change that failure speed if it's not the PING/PONG that's the issue.

@IgorYbema
Copy link
Owner

No and I believe the 30 secs is just a average. If you have more console output the possibility of an early disconnect is bigger

@IgorYbema
Copy link
Owner

Just took more than 2 minutes to get the error

@IgorYbema
Copy link
Owner

There is indeed some garbage in the data. Probably some memory overflow.
image

@IgorYbema
Copy link
Owner

Regarding the websocket closing unexpected it seems it also generates invalid opcodes right at that time. These upcodes are not defined, also not in the webserver code. So they must be generated due to some memory overflow condition. I'll try to dive into the websocket send routine soon.

image

@IgorYbema
Copy link
Owner

Also testing the code on the ESP32 and it runs fine as it seems. Been testing for a few minutes but no console stop (websocket disconnect). Could be because of more memory maybe?

@CurlyMoo
Copy link
Author

With more memory overflows indeed occur less

@IgorYbema
Copy link
Owner

Ah yes indeed. Or the effect is lower as the memory which is overflowed isn't in use by another critical proces and probably even NULL-ed.

@CurlyMoo
Copy link
Author

Exactly

@CurlyMoo
Copy link
Author

The webserver in my repo is constantly tested for memory issues with valgrind. That doesn't report anything, so i'm curious where the error could be.

@stumbaumr
Copy link

Maybe it is a good idea to separate my commits for improving the actions into a separate PR?

@CurlyMoo
Copy link
Author

CurlyMoo commented Sep 9, 2024

Will fix those when this PR is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants