Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: command response processing issues #40

Merged
merged 6 commits into from
Apr 25, 2023
Merged

Conversation

stevenh
Copy link
Contributor

@stevenh stevenh commented Feb 24, 2023

Fix command response processing issues introduced by notification handling.

This includes:

  • Random hangs due to writes to channels with no readers.
  • Potential race when reusing client response buffer.
  • Close of notify channel while handler was still running.
  • Eliminated race on IsConnected check in ExecCmd.

Also:

  • Use just \n for keep-alive to reduce data sent across the wire.
  • Wrap more errors to improve error reporting.
  • Remove io.(Reader|Writer) wrapping as that breaks scanner io.EOF handling.

Fix command response processing issues introduced by notification
handling.

This includes:
* Random hangs due to writes to channels with no readers.
* Potential race when reusing client response buffer.
* Close of notify channel while handler was still running.
* Eliminated race on IsConnected check in ExecCmd.

Also:
* Use just \n for keep-alive to reduce data sent across the wire.
* Wrap more errors to improve error reporting.
* Remove io.(Reader|Writer) wrapping as that breaks scanner io.EOF
  handling.
Increase the golangci-lint timeout on Windows as it constantly takes
longer than Linux and Mac leading to timeout failures.
Copy link
Contributor

@HalloTschuess HalloTschuess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all thank you for your commitment!

I actually had something similar in mind, but I couldn't really wrap my head around it.

All goroutines seem to end now.
Running the tests there are still random failures every few hundred tries, which I still think might have something to do with the mock server. So I wouldn't really bother with it in this pull request.

There is another thing I have noticed, which is probably not scope of this pull request:
There is no way to find out if the connection got closed without executing a command or using c.IsConnected(). c.notify is only closed inside of c.Close() which has to be called manually.
This leads to indefinitely blocking when just listening to events, without sending anything.
From what I can tell this has been like this before as well. Exposing c.done might be a good way for an easy fix.

client.go Outdated Show resolved Hide resolved
Move notify close so its done as soon as the sender exits.

Revert keep alive change as '\n' isn't enough to prevent connection
timeout.
Document that notification channel will be closed.
@stevenh
Copy link
Contributor Author

stevenh commented Feb 26, 2023

First of all thank you for your commitment!

I actually had something similar in mind, but I couldn't really wrap my head around it.

All goroutines seem to end now. Running the tests there are still random failures every few hundred tries, which I still think might have something to do with the mock server. So I wouldn't really bother with it in this pull request.

Could you post the details of this, as I've not been able reproduce even with 10k runs here.

There is another thing I have noticed, which is probably not scope of this pull request: There is no way to find out if the connection got closed without executing a command or using c.IsConnected(). c.notify is only closed inside of c.Close() which has to be called manually. This leads to indefinitely blocking when just listening to events, without sending anything. From what I can tell this has been like this before as well. Exposing c.done might be a good way for an easy fix.

I've moved the close of c.notify into the messageHandler defer so it will now be closed as soon as a disconnect is detected. This will allow any notification consumers to know when no more notifications will be sent.

client.go Outdated Show resolved Hide resolved
@HalloTschuess
Copy link
Contributor

Could you post the details of this, as I've not been able reproduce even with 10k runs here.

I tried it again on a docker container running on my home docker server and even after one hour (~3.5k runs) there was not a single failure.

Maybe what I observe is some weird Windwos/WSL issue on my end, where the mockserver is not reachable sometimes.

In this instance for example it seems like the connection could not be established, as after connecting there immediately is an EOF

=== RUN   TestCmdsBasicSSH
    basic_cmds_test.go:34: 
                Error Trace:    go-ts3/basic_cmds_test.go:34
                Error:          Received unexpected error:
                                client: header: unexpected EOF
                Test:           TestCmdsBasicSSH

stevenh and others added 2 commits February 27, 2023 14:56
Fix a comment typo for responseErrTimeout

Co-authored-by: HalloTschuess <hallo.ich.f@gmail.com>
Remove wrap on Write, Read and Close methods in mock servers to ensure
we don't trigger unexpected behaviour such as io.EOF check failures.
Copy link
Contributor

@HalloTschuess HalloTschuess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@P4sca1
Copy link

P4sca1 commented Apr 13, 2023

I tried your branch and it seems to have fixed all the issues I had with connections being freezed.
Any chance that the changes can get merged into main?

@Mattzi
Copy link

Mattzi commented Apr 24, 2023

Can confirm this fixing all the errors. Also LGTM

Any chance this will be merged? @stevenh or @HalloTschuess

@stevenh
Copy link
Contributor Author

stevenh commented Apr 24, 2023

@danmrichards / @lwaddicor can we get this merged please

@lwaddicor lwaddicor merged commit 01bb4ee into master Apr 25, 2023
@lwaddicor lwaddicor deleted the fix/cmd-processing branch April 25, 2023 00:19
@lwaddicor
Copy link
Collaborator

Thanks for this, approved and merged. @stevenh do you feel a v1.1.1 or v1.2.0 is most appropriate here?

@stevenh
Copy link
Contributor Author

stevenh commented Apr 25, 2023

Thanks Lewis, as its a backwards compatible bug fix https://semver.org/ says its a patch so v1.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants