Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite sslproto.py #176

Merged
merged 1 commit into from
Sep 18, 2018
Merged

Rewrite sslproto.py #176

merged 1 commit into from
Sep 18, 2018

Conversation

fantix
Copy link
Member

@fantix fantix commented Jul 2, 2018

State transition:

State transition

Recv/send calling chain:

Recv/send calling chain

Reworked flow control:

Reworked flow control

SSLProtocol internals (shutdown detail):

SSLProtocol

SSLObject methods internals:

SSLObject

Refs #158, this is a WIP yet, please kindly wait until ready for review.

TODOs:

  • Pass cpython tests
  • Pass uvloop tests
  • Test SSL over SSL
  • Correct shutdown (graceful close) data flush and cover with test
  • Test renegotiation (with pyOpenSSL)
  • Port Trio tests
  • Maybe extract a cdef class for both performance and readibility
  • Add test for https://bugs.python.org/issue30698
  • https://bugs.python.org/issue29406
  • https://bugs.python.org/issue25292
  • Fix eof_received() implementation
  • Backport to cpython and pass tests
  • Cythonize transport
  • Extract high/low water mark
  • Make sure memory management is good
  • Check sendfile support

@fantix
Copy link
Member Author

fantix commented Jul 2, 2018

The failing test is unstable, checking now.

Unexpected calls to loop.call_exception_handler():
...
 {'exception': RuntimeError('no python buffer is allocated in on_read; nread=-4095',),
  'message': 'Fatal error on transport UnixTransport',
  'protocol': <uvloop.loop.SSLProtocol object at 0x7f1749650470>,
  'transport': <UnixTransport closed=True reading=False 0x7f1749643a08>}]
FAIL

======================================================================
FAIL: test_create_unix_connection_ssl_1 (tests.test_unix.Test_UV_UnixSSL)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/decentfox/uvloop/uvloop/_testbase.py", line 102, in tearDown
    self.fail('unexpected calls to loop.call_exception_handler()')
AssertionError: unexpected calls to loop.call_exception_handler()

According to libuv docs, the following read_cb might be called with nread=UV_EOF with no buffer allocated.

cdef void __uv_stream_buffered_on_read(uv.uv_stream_t* stream,
ssize_t nread,
const uv.uv_buf_t* buf) with gil:
if __ensure_handle_data(<uv.uv_handle_t*>stream,
"UVStream buffered read callback") == 0:
return
cdef:
UVStream sc = <UVStream>stream.data
Loop loop = sc._loop
Py_buffer* pybuf = &sc._read_pybuf
if nread == uv.UV_ENOBUFS:
sc._fatal_error(
RuntimeError(
'unhandled error (or an empty buffer) in get_buffer()'),
False)
return
try:
if not sc._read_pybuf_acquired:
raise RuntimeError(
f'no python buffer is allocated in on_read; nread={nread}')
if nread == 0:
# From libuv docs:
# nread might be 0, which does not indicate an error or EOF.
# This is equivalent to EAGAIN or EWOULDBLOCK under read(2).
return
if __uv_stream_on_read_common(sc, loop, nread):
return
if UVLOOP_DEBUG:
loop._debug_stream_read_cb_total += 1
sc._protocol_buffer_updated(nread)
except BaseException as exc:
if UVLOOP_DEBUG:
loop._debug_stream_read_cb_errors_total += 1
sc._fatal_error(exc, False)
finally:
sc._read_pybuf_acquired = 0
PyBuffer_Release(pybuf)

However, this code didn't handle UV_EOF before above error got raised.

@1st1
Copy link
Member

1st1 commented Jul 3, 2018

According to libuv docs, the following read_cb might be called with nread=UV_EOF with no buffer allocated.

Ah, that makes sense. Thanks for fixing that; can you make a fix in a separate PR?

What's the projected timeline for this PR? I assume you are planning to add more SSL tests, right? Or is it ready for a review now?

@fantix
Copy link
Member Author

fantix commented Jul 3, 2018

Ah, that makes sense. Thanks for fixing that; can you make a fix in a separate PR?

Oh sure thing, will do.

What's the projected timeline for this PR? I assume you are planning to add more SSL tests, right? Or is it ready for a review now?

Yes, I was planning to add more tests (renegotiation for example), and fix some obvious issues and make improvements during the testing. Let's do the review tomorrow anyway, shall we? 😃

@1st1
Copy link
Member

1st1 commented Jul 3, 2018

Anytime this week works.

@fantix fantix force-pushed the t158_rewrite_sslproto branch 2 times, most recently from 663268f to c178c6b Compare July 5, 2018 08:15
@fantix fantix changed the title [WIP] Rewrite sslproto.py Rewrite sslproto.py Jul 5, 2018
@fantix fantix force-pushed the t158_rewrite_sslproto branch from 01be886 to 321ab23 Compare July 5, 2018 13:17
@1st1 1st1 requested review from 1st1 and elprans and removed request for 1st1 July 5, 2018 15:36
@@ -9,6 +9,7 @@
import threading
import weakref

from OpenSSL import SSL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe from OpenSSL import SSL as open_ssl? :) To make it obvious where OpenSSL is really used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

# hack reader and writer to call start_tls()
transport = writer._transport
writer._transport = None
reader._transport = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, you're brave! :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL was rushing to avoid manual protocol code - perhaps it's better be replaced by protocol so that the test can be robust enough even if asyncio changes its implementation in the future.

break
elif self._state == _SHUTDOWN:
self._do_read()
self._do_shutdown()
Copy link
Member Author

@fantix fantix Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_do_read() should be removed in _SHUTDOWN, because a peer should start to discard incoming app data once it sends the close_notify alert.

@1st1
Copy link
Member

1st1 commented Jul 6, 2018

In short: I like the new code, it's easy to follow. Great job! Don't have any major comments. The state transitions seem fine, the overall logic is OK too.

I'd continue iterating on the codebase and adding tests (your current ToDo is great too!)

At some point when we think it's ready we'll commit the Python implementation (to have it in the repo) and start to Cythonize it to make it slightly faster. The pure-Python implementation will go straight to asyncio 3.8.

@fantix fantix force-pushed the t158_rewrite_sslproto branch 2 times, most recently from eb7387d to ff4e5d8 Compare July 10, 2018 11:13
@fantix
Copy link
Member Author

fantix commented Jul 11, 2018

I'm adding a new state FLUSHING before SHUTDOWN (please see updated diagram at top), for cases that:

  • transport.close() is called, but there is still data in _write_backlog
  • close_notify is received, but there is still data in _write_backlog

Usually APP data will transiently stay in _write_backlog and immediately be encrypted into SSL data in outgoing buffer. But when the other peer starts an renegotiation, SSLObject will stop the encryption work and wait for renegotiation to finish. So both cases do exist.

While in FLUSHING state:

  • No further data will be fed to APP protocol - it'll be simply discarded
  • Writing to the transport has no effect - data is discarded and warning will be raised after a few attempts, similar to the behavior of TCPTransport
  • The SSLProtocol keeps trying to flush the data in _write_backlog, and moves to SHUTDOWN state once done
  • The newly-added ssl_shutdown_timeout also covers this state - that means the countdown starts when it enters FLUSHING state

@fantix
Copy link
Member Author

fantix commented Jul 12, 2018

(Took 1.5 hours to correctly write a FLUSHING test first, but only half an hour to actually implement the FLUSHING state 😃)

@fantix
Copy link
Member Author

fantix commented Jul 17, 2018

Working on eof_received() case when peer closes the writing channel only, we still flush all pending data if any if possible, send close_notify without waiting for peer's close_notify and close the transport.

A question here - do we want to treat peer's close_notify as a half-close, calling APP's eof_received()? Theoretically it is still possible to send data as far as we don't send close_notify. (Reading this paper, it seems important to let APP protocol know that whether close_notify is received or not. So I think no matter half-close should be supported or not, eof_received() should get called first. Please see new diagram above.)


Update: it is now implemented as follows: treating TLS as transport details, close_notify is always considered as EOF for upper APP protocols, while EOF from lower transport level (TCP for example) is considered as a brutal disconnection, thus not causing APP-level EOF.

fantix added a commit to fantix/cpython that referenced this pull request Jul 18, 2018
@fantix fantix force-pushed the t158_rewrite_sslproto branch 2 times, most recently from b757099 to 5b06936 Compare July 18, 2018 09:42
@fantix
Copy link
Member Author

fantix commented Jul 27, 2018

Spent some time to make it run faster in uvloop - about 20% ~ 40% faster than original uvloop in 1~100 KiB echo benchmark. Here're the cProfiled diagrams for time spent on each module:

time

And here's the SSL benchmark running modified vmbench on a 4-core 2.5GHz cloud server, all benchmarked servers are running in python:3.7 (Debian 9) docker containers (full benchmark data here):

benchmark

Where:

  • ssl raw is pure OpenSSL 1.1.0f echo encryption and decryption without network I/O written in C++ based on openssl-bench initially referenced by Elvis, it is the theoretical computing limit.
  • golang (1.7.4) implemented its own TLS transport, well optimized it is assumed, thus it is considered as the computing limit + network I/O.
  • ssl uvloop protocol is using the new sslproto.pyx with regular EchoProtocol inherited from asyncio.Protocol.
  • ssl uvloop buffered is similar, but uses EchoBufferedProtocol which has less memory copies, but additional calls added more overhead.
  • ssl orig uvloop is original uvloop with asyncio.Protocol.
  • nodejs is on version 4.8.2.
  • ssl asyncio protocol is vanilla asyncio with asyncio.Protocol.
  • gevent is on version 1.3.5 (with greenlet 0.4.13).
  • ssl threaded is copied from Curio.
  • twisted is on version 18.7.0.

The new sslproto.pyx is performing slightly better than Golang at 100KiB, but MemoryBIO added much memory copy overhead which is significant for smaller payloads, still it is 10% ~ 20% faster than original uvloop on throughtput, and 60%~80% faster than cpython.

@fantix fantix force-pushed the t158_rewrite_sslproto branch 4 times, most recently from 611741a to 72ddbe0 Compare July 30, 2018 03:31
Copy link
Member

@1st1 1st1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work, I honestly don't have anything aside nitpicking ;)

@fantix
Copy link
Member Author

fantix commented Aug 2, 2018

Thanks a lot for reviewing in late night! ❤️

@1st1
Copy link
Member

1st1 commented Aug 2, 2018

I'm testing this in debug mode just to make sure we haven't made some silly mistakes. I see that the process memory grows aggressively to some point and then flats out; this might be Python's famous memory fragmentation issue. But just in case, I'd like you to see if you can reproduce this locally.

Instructions:

  • build uvloop with make debug;
  • run python examples/bench/echoserver.py --proto --uvloop --ssl; you should see some debug info printed on the screen and dynamically updated;
  • run python examples/bench/echoclient.py --workers 6 --msize 10241 --num 10000 --ssl and watch how the Process memory metric increases.
  • repeat the last step a few times; then increase the payload size; repeat again.

So the process' memory usage grows with time. I'm 99% sure that this is just memory fragmentation. But maybe we also need to ensure that it's contained; for instance, it might be a good idea to cap max buffer size when we realloc. In general, I'd appreciate if you take another quick glance over sslproto.pyx to make sure we're not leaking memory.

Other than that, the PR is ready :)

@fantix
Copy link
Member Author

fantix commented Aug 2, 2018

Totally reasonable. Will do, and I'll see if I can find some more tools or ways to check possible memory issues.

class EchoBufferedProtocol(asyncio.BufferedProtocol):
def connection_made(self, transport):
self.transport = transport
self.buffer = bytearray(256 * 1024)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._buffer = bytearray(256 * 1024)
self.buffer = memoryview(self._buffer)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! Been there, should have commented - to make sure the echo is true, the buffer needs to be copied, so that next read doesn’t change previous outgoing data which is still pending. With memoryview and bytes again, the benchmark gets slightly slower, therefore it ended up like this now. Thanks anyway! I’ll add comment.

Copy link
Member

@1st1 1st1 Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right :)

We'll need to add a new API to asyncio 3.8 to tell us when a write() has completed so that we can re-use the buffer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s interesting!

@1st1
Copy link
Member

1st1 commented Aug 2, 2018

Check sendfile support

We don't support the new sendfile in uvloop yet.

@1st1
Copy link
Member

1st1 commented Aug 2, 2018

And here's the SSL benchmark running modified vmbench on a 4-core 2.5GHz cloud server, all benchmarked servers are running in python:3.7 (Debian 9) docker containers (full benchmark data here):

BTW, we should merge your additions back to the vmbench project.

@fantix fantix force-pushed the t158_rewrite_sslproto branch 3 times, most recently from 8a0c44c to 26f36ac Compare August 16, 2018 08:19
@fantix
Copy link
Member Author

fantix commented Aug 21, 2018

I could reproduce the memory pattern. The memory increase seems to be more relevant to new SSL connection, rather than repeated echo send and recv. So I tested with python examples/bench/echoclient.py --workers 6 --msize 10241 --num 10 --ssl --times 10000, and memory usage of the server stabilized at around 30MB. Increasing the number of concurrent workers to 12, the server memory usage ended up at around 40MB.

I've also reviewed the code, and made a few small changes. (I'll create a separate PR for the vmbench changes)

If okay, I'll rebase again and squash all commits.

@1st1
Copy link
Member

1st1 commented Aug 21, 2018

If okay, I'll rebase again and squash all commits.

go for it!

@1st1
Copy link
Member

1st1 commented Sep 4, 2018

@elprans can you take a quick look at this before I merge?

uvloop/includes/flowcontrol.pxd Outdated Show resolved Hide resolved
uvloop/sslproto.pyx Outdated Show resolved Hide resolved
@fantix fantix force-pushed the t158_rewrite_sslproto branch from 5c44111 to c5168a4 Compare September 5, 2018 02:21
@1st1
Copy link
Member

1st1 commented Sep 6, 2018

@fantix hey, could you please take a look at the failed CI?

@fantix
Copy link
Member Author

fantix commented Sep 7, 2018

Sure thing! Sorry been out of office, will check today.

@fantix fantix force-pushed the t158_rewrite_sslproto branch from c5168a4 to d7f7f17 Compare September 7, 2018 09:07
@fantix
Copy link
Member Author

fantix commented Sep 7, 2018

Seems to be an unstable test, a retest passed. Tried but didn't reproduce locally, I'll try to find clues in the failing log.

@1st1
Copy link
Member

1st1 commented Sep 10, 2018

@asvetlov could you also take a look? This is something that we'll use pretty much as is in asyncio in 3.8.

@1st1 1st1 merged commit 9cba749 into MagicStack:master Sep 18, 2018
@1st1
Copy link
Member

1st1 commented Sep 18, 2018

Thank you so much, @fantix! 💯🎉🚀

@fantix
Copy link
Member Author

fantix commented Sep 18, 2018

Thanks a lot @1st1 😃🍻

@asvetlov
Copy link
Contributor

Huge improvement! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants