Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful termination #49

Closed
wants to merge 55 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
6fe8f2a
Patch h2 and support graceful termination
stephenc-pace Nov 24, 2022
5623ff8
lint
stephenc-pace Nov 24, 2022
33fcaa9
Fix bugs
stephenc-pace Nov 24, 2022
d50dae9
Questionable updates to patch
stephenc-pace Nov 24, 2022
134a242
Ensure we flush out streams on termination
stephenc-pace Nov 25, 2022
e8f7c1e
add logger for h2
stephenc-pace Nov 25, 2022
7f245d8
lint and delete misleading tests
stephenc-pace Nov 25, 2022
a74febd
git ignore mac and ide files
stephenc-pace Nov 25, 2022
ee7dfca
delete
stephenc-pace Nov 25, 2022
e3c134b
no noop
stephenc-pace Nov 25, 2022
22f77e9
debug
stephenc-pace Nov 25, 2022
f488f80
remove test no longer valid
stephenc-pace Nov 25, 2022
9c96656
Merge remote-tracking branch 'origin/master' into graceful-termination
stephenc-pace Nov 25, 2022
a57fcd4
check timeout
stephenc-pace Nov 28, 2022
dab2f2f
add logging
stephenc-pace Nov 28, 2022
000ebae
add debug line
stephenc-pace Nov 28, 2022
dd5db42
just run failing test
stephenc-pace Nov 28, 2022
7d15214
more debug
stephenc-pace Nov 28, 2022
292e810
more debug
stephenc-pace Nov 28, 2022
e1e0607
more debugging and probably broken it
stephenc-pace Nov 28, 2022
8d41825
messed up debug line
stephenc-pace Nov 28, 2022
f402cae
lint debug line
stephenc-pace Nov 28, 2022
82747eb
possibly found bug
stephenc-pace Nov 28, 2022
ccc63c8
revert test changes
stephenc-pace Nov 28, 2022
63aa5ed
test test
stephenc-pace Nov 28, 2022
7675ce4
run with more logging again
stephenc-pace Nov 29, 2022
3cfcf53
add missing header
stephenc-pace Nov 29, 2022
ae1db25
nope, try send headers?
stephenc-pace Nov 29, 2022
02c2bd1
fluke?
stephenc-pace Nov 29, 2022
9d14c81
remove comment
stephenc-pace Nov 30, 2022
1f91758
restore tests. whoops
stephenc-pace Nov 30, 2022
51fdb69
handle error codes better
stephenc-pace Dec 1, 2022
fc2941d
add note to docstring regarding h2 shortcomings
stephenc-pace Jan 10, 2023
c2cebd6
comment out logic to see what happens to tests
stephenc-pace Jan 10, 2023
2f4acc4
pin GHA ubuntu to 20.04 as latest does not have py3.6 available
stephenc-pace Jan 10, 2023
b5fffd4
tox update deprecated whitelist config for allowlist
stephenc-pace Jan 10, 2023
abbef1d
static
stephenc-pace Jan 10, 2023
cc726dd
restore flush
stephenc-pace Jan 19, 2023
8cb69b5
stop processing earlier if connection terminated
stephenc-pace Jan 23, 2023
3fa6487
clean up spawned threads on container stop
stephenc-pace Jan 23, 2023
bc6a6f8
static
stephenc-pace Jan 23, 2023
3e79db1
Add a short timeout to connection termination
stephenc-pace Jan 23, 2023
3591087
move thread killing to stop
stephenc-pace Jan 23, 2023
e3a56c3
kill threads before connections
stephenc-pace Jan 24, 2023
68eccf4
remove uneccessary thread cleanup
stephenc-pace Jan 25, 2023
35eec8d
better attempt at ensuring stream is exhausted
stephenc-pace Jan 25, 2023
b3986d7
move stream end logic to stream class
stephenc-pace Jan 26, 2023
9ae76cc
simplify stream exhausted check and add comment
stephenc-pace Jan 30, 2023
c59efe7
lin
stephenc-pace Jan 30, 2023
b6e6922
update action steps
stephenc-pace Feb 17, 2023
9411b3c
Add thread cleanup back
stephenc-pace Feb 17, 2023
71187ad
lint
stephenc-pace Feb 17, 2023
50f8be4
revert stream exhaustion fix
stephenc-pace Mar 10, 2023
0e536d7
expose max concurrent streams
stephenc-pace Mar 10, 2023
aa8bdce
Avoid sending headers or trying to recreate streams that have been cl…
stephenc-pace Mar 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: checkout
uses: actions/checkout@v2
uses: actions/checkout@v3

- name: setup python
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: 3.9

Expand All @@ -33,7 +33,7 @@ jobs:
TOXENV: static

test:
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
strategy:
matrix:
python-version:
Expand All @@ -44,10 +44,10 @@ jobs:

steps:
- name: checkout
uses: actions/checkout@v2
uses: actions/checkout@v3

- name: setup python
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

Expand All @@ -74,10 +74,10 @@ jobs:

steps:
- name: checkout
uses: actions/checkout@v2
uses: actions/checkout@v3

- name: setup python
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: 3.9

Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@ __pycache__
.tox
*_pb2.py
*_pb2_grpc.py
.coverage
.coverage
.DS_Store
.idea
5 changes: 5 additions & 0 deletions nameko_grpc/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# -*- coding: utf-8 -*-
from nameko_grpc.h2_patch import patch_h2_transitions


patch_h2_transitions()
17 changes: 13 additions & 4 deletions nameko_grpc/channel.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,14 +111,19 @@ class ServerConnectionPool:
Just accepts new connections and allows them to run until close.
"""

def __init__(self, host, port, ssl, spawn_thread, handle_request):
def __init__(
self, host, port, ssl, spawn_thread, handle_request, max_concurrent_streams=100
):
self.host = host
self.port = port
self.ssl = ssl
self.spawn_thread = spawn_thread
self.handle_request = handle_request
self.max_concurrent_streams = max_concurrent_streams

self.connections = queue.Queue()
self.is_accepting = False
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were missing from the init. There's a few more cases of this but not fixing everything.

self.listening_socket = None

def listen(self):
sock = eventlet.listen((self.host, self.port))
Expand All @@ -139,7 +144,9 @@ def run(self):
sock, _ = self.listening_socket.accept()
sock.settimeout(60) # XXX needed and/or correct value?

connection = ServerConnectionManager(sock, self.handle_request)
connection = ServerConnectionManager(
sock, self.handle_request, self.max_concurrent_streams
)
self.connections.put(weakref.ref(connection))
self.spawn_thread(
target=connection.run_forever, name=f"grpc server connection [{sock}]"
Expand All @@ -165,9 +172,11 @@ def stop(self):
class ServerChannel:
"""Simple server channel encapsulating incoming connection management."""

def __init__(self, host, port, ssl, spawn_thread, handle_request):
def __init__(
self, host, port, ssl, spawn_thread, handle_request, max_concurrent_streams=100
):
self.conn_pool = ServerConnectionPool(
host, port, ssl, spawn_thread, handle_request
host, port, ssl, spawn_thread, handle_request, max_concurrent_streams
)

def start(self):
Expand Down
112 changes: 98 additions & 14 deletions nameko_grpc/connection.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# -*- coding: utf-8 -*-
import itertools
import logging
import select
import sys
from collections import deque
Expand All @@ -8,7 +9,7 @@
from threading import Event

from grpc import StatusCode
from h2.config import H2Configuration
from h2.config import DummyLogger, H2Configuration
from h2.connection import H2Connection
from h2.errors import ErrorCodes
from h2.events import (
Expand Down Expand Up @@ -37,9 +38,30 @@
log = getLogger(__name__)


class H2Logger(DummyLogger):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No interface available so subclassing the dummy logger as a stand-in

"""
Provide logger to H2 matching required interface
"""

def __init__(self, logger: logging.Logger):
super().__init__()
self.logger = logger

def debug(self, *vargs, **kwargs):
self.logger.debug(*vargs, **kwargs)

def trace(self, *vargs, **kwargs):
# log level below debug
self.logger.log(5, *vargs, **kwargs)


SELECT_TIMEOUT = 0.01


class ConnectionTerminatingError(Exception):
pass


class ConnectionManager:
"""
Base class for managing a single GRPC HTTP/2 connection.
Expand All @@ -49,21 +71,24 @@ class ConnectionManager:
by subclasses.
"""

def __init__(self, sock, client_side):
def __init__(self, sock, client_side, max_concurrent_streams=100):
self.sock = sock

config = H2Configuration(client_side=client_side)
h2_logger = H2Logger(log.getChild("h2"))
config = H2Configuration(client_side=client_side, logger=h2_logger)
self.conn = H2Connection(config=config)
self.conn.local_settings.max_concurrent_streams = max_concurrent_streams

self.receive_streams = {}
self.send_streams = {}

self.run = True
self.stopped = Event()
self.terminating = False

@property
def alive(self):
return not self.stopped.is_set()
return not self.stopped.is_set() and not self.terminating

@contextmanager
def cleanup_on_exit(self):
Expand All @@ -83,23 +108,25 @@ def cleanup_on_exit(self):
if send_stream.closed:
continue # stream.close() is idemponent but this prevents the log
log.info(
f"Terminating send stream {send_stream}"
f"Terminating send stream {send_stream.stream_id}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream ids are much more useful than the object address

f"{f' with error {error}' if error else ''}."
)
send_stream.close(error)
for receive_stream in self.receive_streams.values():
if receive_stream.closed:
continue # stream.close() is idemponent but this prevents the log
log.info(
f"Terminating receive stream {receive_stream}"
f"Terminating receive stream {receive_stream.stream_id}"
f"{f' with error {error}' if error else ''}."
)
receive_stream.close(error)
self.sock.close()
self.stopped.set()
log.debug(f"connection terminated {self}")

def run_forever(self):
"""Event loop."""
log.debug(f"connection initiated {self}")
self.conn.initiate_connection()

with self.cleanup_on_exit():
Expand All @@ -108,6 +135,9 @@ def run_forever(self):

self.on_iteration()

if not self.run:
break

self.sock.sendall(self.conn.data_to_send())
ready = select.select([self.sock], [], [], SELECT_TIMEOUT)
if not ready[0]:
Expand Down Expand Up @@ -142,8 +172,10 @@ def run_forever(self):
self.connection_terminated(event)

def stop(self):
self.run = False
self.stopped.wait()
self.conn.close_connection()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sends a go away to the server/client

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

self.terminating = True
log.debug("waiting for connection to terminate (Timeout 5s)")
self.stopped.wait(5)

def on_iteration(self):
"""Called on every iteration of the event loop.
Expand All @@ -155,6 +187,16 @@ def on_iteration(self):
self.send_headers(stream_id)
self.send_data(stream_id)

if self.terminating:
send_streams_closed = all(
stream.exhausted for stream in self.send_streams.values()
)
receive_streams_closed = all(
stream.exhausted for stream in self.receive_streams.values()
)
if send_streams_closed and receive_streams_closed:
self.run = False

def request_received(self, event):
"""Called when a request is received on a stream.

Expand Down Expand Up @@ -199,6 +241,7 @@ def window_updated(self, event):
Any data waiting to be sent on the stream may fit in the window now.
"""
log.debug("window updated, stream %s", event.stream_id)
self.send_headers(event.stream_id)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to send headers before data if they exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a pre-existing bug that you've happened to identify as part of this work? If so, how did you identity it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguably an existing bug that came up through existing tests after changing the stop method to shutdown with a goway frame first.

self.send_data(event.stream_id)

def stream_ended(self, event):
Expand All @@ -210,16 +253,22 @@ def stream_ended(self, event):
receive_stream = self.receive_streams.pop(event.stream_id, None)
if receive_stream:
receive_stream.close()
# send_stream = self.send_streams.pop(event.stream_id, None)
# if send_stream:
# send_stream.close()

def stream_reset(self, event):
"""Called when an incoming stream is reset.

Close any `ReceiveStream` that was opened for this stream.
Close any `ReceiveStream` or `SendStream` that was opened for this stream.
"""
log.debug("stream reset, stream %s", event.stream_id)
receive_stream = self.receive_streams.pop(event.stream_id, None)
if receive_stream:
receive_stream.close()
send_stream = self.send_streams.pop(event.stream_id, None)
if send_stream:
send_stream.close()

def settings_changed(self, event):
log.debug("settings changed")
Expand All @@ -235,9 +284,22 @@ def trailers_received(self, event):

receive_stream.trailers.set(*event.headers, from_wire=True)

def connection_terminated(self, event):
log.debug("connection terminated")
self.run = False
def connection_terminated(self, event: ConnectionTerminated):
"""H2 signals a connection terminated event after receiving a GOAWAY frame

If no error has occurred, flag termination and initiate a graceful termination
allowing existing streams to finish sending/receiving.

If an error has occurred then close down immediately.
"""
log.debug(f"received GOAWAY with error code {event.error_code}")
if event.error_code not in (ErrorCodes.NO_ERROR, ErrorCodes.ENHANCE_YOUR_CALM):
log.debug("connection terminating immediately")
self.terminating = True
self.run = False
else:
log.debug("connection terminating")
self.terminating = True

def send_headers(self, stream_id, immediate=False):
"""Attempt to send any headers on a stream.
Expand Down Expand Up @@ -269,6 +331,14 @@ def send_data(self, stream_id):
# has been completely sent
return

# When a stream is closed, a STREAM_END item or ERROR is placed in the queue.
# If we never read from the stream again, these are not surfaced, and the
# stream is never exhausted.
# Because we shortcut sending data if headers haven't been set yet, we need
# to manually flush the queue, surfacing the end/error, and ensuring the
# queue exhausts (and we can terminate).
send_stream.flush_queue_to_buffer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. Can you add a regression test case for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have a look. It came up through existing tests so I'll work out which ones cover it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not easily. I added it as the tests were timing out (hanging) without it but removing it now and they appear to be passing still. I think it still makes sense to have it but it's tricky to test specifically.


if not send_stream.headers_sent:
# don't attempt to send any data until the headers have been sent
return
Expand Down Expand Up @@ -331,7 +401,19 @@ def send_request(self, request_headers):
over the response.

Invocations are queued and sent on the next iteration of the event loop.

raises ConnectionTerminatingError if connection is terminating. Check
stephenc-pace marked this conversation as resolved.
Show resolved Hide resolved
connection .is_alive() before initiating send_request

Note:
We are handling termination and raising TerminatingError here as the
underlying library H2 doesn't do this. If H2 ever begins handling graceful
shutdowns, this logic will need altering.
"""
if self.terminating:
raise ConnectionTerminatingError(
"Connection is terminating. No new streams can be initiated"
)
stream_id = next(self.counter)

request_stream = SendStream(stream_id)
Expand Down Expand Up @@ -427,8 +509,10 @@ class ServerConnectionManager(ConnectionManager):
Extends the base `ConnectionManager` to handle incoming GRPC requests.
"""

def __init__(self, sock, handle_request):
super().__init__(sock, client_side=False)
def __init__(self, sock, handle_request, max_concurrent_streams=100):
super().__init__(
sock, client_side=False, max_concurrent_streams=max_concurrent_streams
)
self.handle_request = handle_request

def request_received(self, event):
Expand Down
Loading