-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix webui log hanging bug #1914
Conversation
In debugging this issue i ended up cleaning up a bunch more logging things, everything is just a little bit nicer now (flatfs was importing its own separate instance of go-log, that was making weirdness) |
When doing
go get doesn't solve it, thoughts? |
54dce63
to
f21fe5c
Compare
@diasdavid my bad, tired brain forgot to vendor correctly. |
thank you @whyrusleeping :) so, 1st thing I noticed were these logs when I opened the webui: 08:08:02.161 ERROR flatfs: too many open files, retrying in %dms0 flatfs.go:119
08:08:02.161 ERROR flatfs: too many open files, retrying in %dms100 flatfs.go:119
08:08:02.263 ERROR flatfs: too many open files, retrying in %dms200 flatfs.go:119
08:08:03.562 ERROR core/serve: Path Resolve error: no link named "fontawesome-webfont.woff2" under QmWjympWW8hpP5Jgfu66ZrqTMykUBjGbFQF2XgfChau5NZ gateway_handler.go:479
08:08:03.782 ERROR flatfs: too many open files, retrying in %dms0 flatfs.go:119
08:08:03.782 ERROR flatfs: too many open files, retrying in %dms100 flatfs.go:119 |
Been testing with js-ipfs-api latest and very happy with the results, because it works again :) (0.3.8 made the response parser go nuts) |
@diasdavid yeah, i notice that when on osx as well. its that darn 256 fd limit. Nothing bad should actually be happening though, you'll see an uglier error if it does. |
f21fe5c
to
3727f48
Compare
@whyrusleeping the new webui with logging disabled is:
|
3727f48
to
ce81fb4
Compare
case <-cn.CloseNotify(): | ||
case <-ctx.Done(): | ||
} | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it okay to cancel a context twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfectly fine :)
823df45
to
c5b133a
Compare
looks like circleCI was trying to run fuse tests? thats weird |
c5b133a
to
f87f3d6
Compare
License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>
f87f3d6
to
2f5563b
Compare
@whyrusleeping this LGTM. that one test failed many times. hangs, or this:
|
@whyrusleeping worried these tests failures (hangs!) are related to this PR. cant get things to pass, and even on the off chance it does now, i dont have confidence on it. i'll give it a few hours for travis to shake up, then rerun everything a few times |
i'm seeing similar tests failing (the stdin pipe one) on other PRs. While thats not a good thing, i'm slightly more confident that its not this PRs fault |
@jbenet clean tests (minus the attached stdin one that i'm seeing elsewhere as well) |
the stdin faiure appears to just be caused by the daemon taking a little longer to startup now (not sure why that is), locally, it takes four or five seconds to startup the daemon. The test is set to fail it after ten seconds, a particularly slow and stressed travis CI machine will likely hit this with some frequency |
Actually, thinking about it a bit more. That test bootstraps to our bootstrap nodes, they have been really slow lately (@lgierth mentioned they havent been rebooted in a while). These failures are very likely a result of bootstrapping taking way longer than usual. |
until we are satisfied with ipfs's reliability at long periods we should continue periodically rebooting the nodes. it's a service people depend on. |
we should get to the bottom of these tests failing before merging anything else. either it's a problem that snuck in, or its a service problem that needs to be fixed. |
ok looks good now |
@jbenet in the future, could you just say "RFM" or something and let me rebase and merge it in so we can get a straight line of history? |
Sure. Feel free to fix it and force push to master — On Sun, Nov 8, 2015 at 11:55 PM, Jeromy Johnson notifications@github.com
|
Just drop the heads here for reference — On Mon, Nov 9, 2015 at 12:00 AM, Juan Batiz-Benet juan@benet.ai wrote:
|
Can just do it for future PR's. force pushing to master is very scary for many reasons other than just "i might screw something up" |
the webui can currently cause the entire daemon to deadlock because the writer set up when it queries the log endpoints is never closed (and will hang on writes). This causes all the calls to any log function to hang and subsequently deadlocks the daemon.
I beleive this is the root cause of 1896, and after this PR, I am unable to repro by messing around via the webui.
Interestingly, i was never able to get a repro via curl... only in-browser.