Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client renego refactoring #2598

Open
wants to merge 11 commits into
base: 3.2
Choose a base branch
from

Conversation

Tazmaniac
Copy link

Describe your changes

To correctly fix #2590 and potentially other reals corner cases, a complete refactoring of the test is needed.
It remove all the calls to openssl s_client other than the one in the big events driven testing loop.
The result is more robust, faster and cleaner.
It fixes a bad interaction bug with the --openssl-timeout option too.

As a side note, Akamai started to implement a mitigation (before renegotiation could only be enabled or disabled) . It close the connection after a completely unpredictable number of renegotiation (up to more than 30 seen) from the client point of view. So it could lead to potentially "false positive" (well it depend of your point of view...).
I will try later to reduce this noise, but this is not a regression.

What is your pull request about?

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Typo fix
  • Documentation update
  • Update of other files

If it's a code change please check the boxes which are applicable

  • For the main program: My edits contain no tabs and the indentation is five spaces
  • I've read CONTRIBUTING.md and Coding_Convention.md
  • I have tested this fix against >=2 hosts and I couldn't spot a problem
  • I have tested this new feature against >=2 hosts which show this feature and >=2 host which does not (in order to avoid side effects) . I couldn't spot a problem
  • For the new feature I have made corresponding changes to the documentation and / or to help()
  • If it's a bigger change: I added myself to CREDITS.md (alphabetical order) and the change to CHANGELOG.md

All cases could be handled by the single openssl s_client invocation
loop:
- dispatch and adjust comments to not loose them
- remove the first s_client invocation: stuck connections are allready
  handled by the main loop
- remove the second s_client invocation: normal case and server closed
  connections are allready handled by the main loop. The loop take care
  of the race between server connection close and s_client terminating
  too by doing another loop run, not closing STDIN.
- special non HTTP case equivalent to ssl_reneg_attempts=2
- specialcase only the HTTP result printing to not change the output

- openssl-timeout option clashe badly with the main loop logic:
  Introduce $OPENSSL_NOTIMEOUT
In the wait loop, I was relying on a 1s sleep to eliminate a possible
late zero return value server close on the last attempt.
- do globaly one more harmless "for" iteration
  and remove the sleep 1 for faster and more robust result
- correct the non HTTP case iteration value
- adjust the timeout to the conservative 6s in the non HTTP case,
  for HTTP case it become 33s
- improve comments
- Recover the "not vulnerable" case (no mitigation) printing, cosmetic
  fix.
- With the removing of all s_client invocation other than the main loop
  one, fix the init of the ERRFILE and TMPFILE: no need to append, no
  need to remove, inconditionally zap the content before the loop.
@Tazmaniac
Copy link
Author

I don't understand the CI failed tests. The "broken pipe" error from subshells should not be propagated and affect the main shell script return code and does not in "normal" run. Does the CI test modifying the shell 'pipefail' 'errtrace' or 'errexit' options ?

@drwetter
Copy link
Owner

Haven´t really looked at it . Just re-ran the test once. I also don´t know what's going on.

I would recommend to checkout and run the script from the "home dir"

@drwetter
Copy link
Owner

Is #2597 obsolete?

@Tazmaniac
Copy link
Author

Yes #2598 is the complete fix. #2597 was a quick minimal workaroud for the reported defect.

@Tazmaniac
Copy link
Author

Locally, all CI falling tests are OK:

prove -V
TAP::Harness v3.43 and Perl v5.34.0

prove -v t/10_baseline_ipv4_http.t
t/10_baseline_ipv4_http.t ..

Baseline unit test IPv4 against "google.com"
ok 1 - via sockets, terminal output
ok 2 - via sockets JSON output
ok 3 - via OpenSSL
ok 4 - via OpenSSL JSON output
1..4
ok
All tests successful.
Files=1, Tests=4, 121 wallclock secs ( 0.02 usr  0.00 sys + 48.35 cusr 15.54 csys = 63.91 CPU)
Result: PASS

prove -v t/21_baseline_starttls.t
t/21_baseline_starttls.t ..

STARTTLS SMTP unit test via sockets --> smtp-relay.gmail.com:587 ...
ok 1 -

STARTTLS SMTP unit tests via OpenSSL --> smtp-relay.gmail.com:587 ...
ok 2 -

STARTTLS POP3 unit tests via sockets --> pop.gmx.net:110 ...
ok 3 -

STARTTLS POP3 unit tests via OpenSSL --> pop.gmx.net:110 ...
ok 4 -

STARTTLS IMAP unit tests via sockets --> imap.gmx.net:143 ...
ok 5 -

STARTTLS IMAP unit tests via OpenSSL --> imap.gmx.net:143 ...
ok 6 -

STARTTLS MANAGE(SIEVE) unit tests via sockets --> mail.tigertech.net:4190 ...
ok 7 -

STARTTLS XMPP unit tests via sockets --> jabber.org:5222 ...
ok 8 -

STARTTLS FTP unit tests via sockets --> ldap.uni-rostock.de:21 ...
ok 9 -

STARTTLS LDAP unit tests via sockets --> db.debian.org:389 ...
ok 10 -

STARTTLS LDAP unit tests via OpenSSL --> db.debian.org:389 ...
ok 11 -
1..11
ok
All tests successful.
Files=1, Tests=11, 1146 wallclock secs ( 0.01 usr  0.01 sys + 234.82 cusr 78.76 csys = 313.60 CPU)
Result: PASS

@drwetter
Copy link
Owner

drwetter commented Dec 6, 2024

Sorry for the delayed response. Stuck in a project and other things shed their light ahead.

It also works for me on my computers. However if you look more closely I am afraid the problem is not only the CI check. Please have a look @ https://github.com/drwetter/testssl.sh/actions/runs/11722187436/job/32988029778?pr=2598 and search for 17204. Maybe that line is exhausting the resources of the CI containers.

Also when this is addressed we would need to test this against vulnerable hosts. Probably an old openssl version could help started with s_server -www.

@Tazmaniac
Copy link
Author

No, the broken pipes are perfectly "normal" but should not be reported as I said previously : #2598 (comment)
They where not previously visible as the code was probably not reached during CI tests.
Something is messing up with signals / bash config like 'set -e'
Localy with standard Ubuntu 22.04: prove -V TAP::Harness v3.43 and Perl v5.34.0 => no problems
The CI is using perl 2.26 why such old version ? Why not the standard Ubuntu 22.04 one ?

Run prove -v
prove -v
shell: /usr/bin/bash -e {0}
env:
....
This bash -e is suspicious for me.
The difficulty is to reproduce locally to find what is messing up with default bash pipe and subshell signaling in the CI environment.
Will try to debug and fix this next week, first by inhibiting the "-e" behavior around my code.

@drwetter
Copy link
Owner

drwetter commented Dec 6, 2024

My point being: the broken pipes are not in the one test which failed, they are in a lot of tests. We don't check for those errors yet. As this makes the CI runs useless I cannot merge this as it is now.

You can submit a PR upgrading the CI test to Ubuntu 22.04 but it also needs to run under normal conditions for the use on 20.04 (haven´t check yet).

I haven´t understood yet why that message occurs. My remark wrt "exhausting the resources" was just a guess. I would feel more comfortable if someone can confirm this.

@Tazmaniac
Copy link
Author

The '-e'/'set -e' bash modifier is for me responsible for the breakage. I extensively use sub-shell and execution pipes in my code, with expected pipe break.
I will work on it on monday.

@drwetter
Copy link
Owner

drwetter commented Dec 6, 2024

I will work on it on monday.

👍

testssl.sh worked as expected.
Under the hood, broken pipes are expected as part of the fast loop exit
strategy that relies as little as possible on timeout detection.
But under the CI, testssl.sh output is garbled by the subshells stderr
outputs, catched for some reason by 'prove -v'.
Simply redirecting the stderr output of the offending command to
/dev/null fixes the problem.
@Tazmaniac
Copy link
Author

Ok CI tests breakage fixed.

@drwetter
Copy link
Owner

Thanks @Tazmaniac !

I am a little bit swamped with things. Just had a short look: What is OPENSSL_NOTIMEOUT? It is not explained why you set another global. If that would make sense it would need to be set + explained in the first ~100 lines)

@Tazmaniac
Copy link
Author

It was introduced in the commit dab177f.
The testssl.sh ssl-timeout option interfere badly with the renego test which has it own timeout management.
So before potentially overloading the OPENSLL variable with
OPENSSL="$TIMEOUT_CMD $OPENSSL_TIMEOUT $OPENSSL"
save the "before the overload" variant in OPENSSL_NOTIMEOUT.
The assignment is at the right place. I could add a comment just before.

Current version is affected by the bad interaction/bug too.

@Tazmaniac
Copy link
Author

Tazmaniac commented Dec 12, 2024

Would this be ok for you ?

diff --git a/testssl.sh b/testssl.sh
index 7983aae..62315e0 100755
--- a/testssl.sh
+++ b/testssl.sh
@@ -20456,6 +20456,9 @@ find_openssl_binary() {
           fi
      fi

+     #Some tests have their own timeout management which could badly interact with TIMEOUT_CMD command.
+     #Save the value of OPENSSL in a _NOTIMEOUT variant before possibly prepending $TIMEOUTCMD to it.
+     #Used by do_renego test for example.
      OPENSSL_NOTIMEOUT=$OPENSSL

      if ! "$do_mass_testing"; then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Secure Client-Initiated Renegotiation Testing Yields False Vulnerability
3 participants