Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ostree pull hangs in low bandwidth environments (libcurl/http2?) #1362

Open
jlebon opened this issue Dec 5, 2017 · 4 comments
Open

ostree pull hangs in low bandwidth environments (libcurl/http2?) #1362

jlebon opened this issue Dec 5, 2017 · 4 comments
Labels

Comments

@jlebon
Copy link
Member

jlebon commented Dec 5, 2017

Using the libcurl backend on a machine with low bandwidth (< 500 kB/s here), it seems pretty easy to get ostree to hang during a pull operation. I've reproduced this in two separate environments. The issue seems present in git master/latest release/previous release. Disabling pipelining works around the issue.

Once it apparently hangs, top shows CPU usage goes to 100%. Using strace and G_MESSAGES_DEBUG=all shows that it's not actually waiting on a specific resource. In fact, data does come in, but it seems like only 1 of the 8 open sockets are attended to, while the POLLIN for the other sockets are completely ignored yet continuously polled.

...
poll([{fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}], 9, 0) = 7 ([{fd=8, revents=POLLIN}, {fd=16, revents=POLLIN}, {fd=17, revents=POLLIN}, {fd=18, revents=POLLIN}, {fd=20, revents=POLLIN}, {fd=21, revents=POLLIN}, {fd=22, revents=POLLIN}])
poll([{fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}], 9, 0) = 7 ([{fd=8, revents=POLLIN}, {fd=16, revents=POLLIN}, {fd=17, revents=POLLIN}, {fd=18, revents=POLLIN}, {fd=20, revents=POLLIN}, {fd=21, revents=POLLIN}, {fd=22, revents=POLLIN}])
poll([{fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}], 9, 0) = 7 ([{fd=8, revents=POLLIN}, {fd=16, revents=POLLIN}, {fd=17, revents=POLLIN}, {fd=18, revents=POLLIN}, {fd=20, revents=POLLIN}, {fd=21, revents=POLLIN}, {fd=22, revents=POLLIN}])
poll([{fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}], 9, 0) = 7 ([{fd=8, revents=POLLIN}, {fd=16, revents=POLLIN}, {fd=17, revents=POLLIN}, {fd=18, revents=POLLIN}, {fd=20, revents=POLLIN}, {fd=21, revents=POLLIN}, {fd=22, revents=POLLIN}])
poll([{fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}], 9, 0) = 7 ([{fd=8, revents=POLLIN}, {fd=16, revents=POLLIN}, {fd=17, revents=POLLIN}, {fd=18, revents=POLLIN}, {fd=20, revents=POLLIN}, {fd=21, revents=POLLIN}, {fd=22, revents=POLLIN}])
poll([{fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}], 9, 0) = 7 ([{fd=8, revents=POLLIN}, {fd=16, revents=POLLIN}, {fd=17, revents=POLLIN}, {fd=18, revents=POLLIN}, {fd=20, revents=POLLIN}, {fd=21, revents=POLLIN}, {fd=22, revents=POLLIN}])
poll([{fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}], 9, 0) = 7 ([{fd=8, revents=POLLIN}, {fd=16, revents=POLLIN}, {fd=17, revents=POLLIN}, {fd=18, revents=POLLIN}, {fd=20, revents=POLLIN}, {fd=21, revents=POLLIN}, {fd=22, revents=POLLIN}])
poll([{fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}], 9, 0) = 7 ([{fd=8, revents=POLLIN}, {fd=16, revents=POLLIN}, {fd=17, revents=POLLIN}, {fd=18, revents=POLLIN}, {fd=20, revents=POLLIN}, {fd=21, revents=POLLIN}, {fd=22, revents=POLLIN}])
poll([{fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}], 9, 0) = 7 ([{fd=8, revents=POLLIN}, {fd=16, revents=POLLIN}, {fd=17, revents=POLLIN}, {fd=18, revents=POLLIN}, {fd=20, revents=POLLIN}, {fd=21, revents=POLLIN}, {fd=22, revents=POLLIN}])
poll([{fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}], 9, 0) = 7 ([{fd=8, revents=POLLIN}, {fd=16, revents=POLLIN}, {fd=17, revents=POLLIN}, {fd=18, revents=POLLIN}, {fd=20, revents=POLLIN}, {fd=21, revents=POLLIN}, {fd=22, revents=POLLIN}])
...

Notice how poll indicates that 7 of the sockets have data ready to be read. In other traces I've seen, only the 1 socket that is not ready is the one serviced, which of course fails with EAGAIN. Running with:

diff --git a/src/libostree/ostree-fetcher-curl.c b/src/libostree/ostree-fetcher-curl.c
index 58835529..93314d10 100644
--- a/src/libostree/ostree-fetcher-curl.c
+++ b/src/libostree/ostree-fetcher-curl.c
@@ -455,6 +455,7 @@ event_cb (int fd, GIOCondition condition, gpointer data)
     (condition & G_IO_IN ? CURL_CSELECT_IN : 0) |
     (condition & G_IO_OUT ? CURL_CSELECT_OUT : 0);

+  g_debug ("event %d on socket %d!", action, fd);
   (void)curl_multi_socket_action (fetcher->multi, fd, action, &fetcher->curl_running);
   check_multi_info (fetcher);
   if (fetcher->curl_running > 0)

does show that we're passing on those notifications to libcurl.

@cgwalters cgwalters added the bug label Dec 5, 2017
@cgwalters
Copy link
Member

Ah, hmm. Yes I've seen this too even with a fast connection, but I didn't diagnose it this far.

cgwalters added a commit to cgwalters/ostree that referenced this issue Dec 6, 2017
This seems to work around
ostreedev#1362

Though I'm not entirely sure why yet. But at least with this it'll be easier for
people to work around things locally.
@cgwalters
Copy link
Member

I don't quite understand why this suddenly exploded 💣 😢...possibly it's related to the server or libcurl version?

Anyways one workaround appears to be #1368

rh-atomic-bot pushed a commit that referenced this issue Dec 7, 2017
This seems to work around
#1362

Though I'm not entirely sure why yet. But at least with this it'll be easier for
people to work around things locally.

Closes: #1368
Approved by: jlebon
@cgwalters cgwalters changed the title ostree pull hangs in low bandwidth environments ostree pull hangs in low bandwidth environments (libcurl/http2?) Dec 14, 2017
@miabbott
Copy link
Collaborator

miabbott commented Jan 2, 2018

Starting to see reports of this in the wild - https://twitter.com/Erstejahre/status/947795020597227521

LorbusChris pushed a commit to LorbusChris/ostree-spec that referenced this issue Oct 23, 2018
I'm beating Colin to the punch here and making use of the new
`--disable-http2`. What pushed me over the edge was the fact that I
literally could not upgrade my Atomic Workstation with a simple
`rpm-ostree upgrade` without hitting:

ostreedev/ostree#1362

I originally thought this only occurred in low-bandwidth environments,
when in fact it still happens even after upgrading my Internet bundle
and can actually be reproduced very easily in environments where
bandwidth is not an issue. E.g. this currently hangs in the latest
Fedora 27 Atomic Host provisioned in the cloud:

    ostree remote add --no-gpg-verify faw27 https://dl.fedoraproject.org/ostree/27/
    ostree pull faw27 fedora/27/x86_64/workstation

Until we investigate deeper, let's just play it safe and disable HTTP2.
This should also fix the HTTP2 framing layer people have been hitting.

Let's get this in to confirm it doesn't break the rdgo streams, and then
we can backport the patch for f27 to make sure it gets into the next
TWR?
@vtolstov
Copy link

vtolstov commented Dec 4, 2019

i'm hit this issue twice today with http2=false , i need to run three times rpm-ostree upgrade

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants