Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Caddy freezes when syslog server not available #4083

Closed
sopleb opened this issue Mar 23, 2021 · 4 comments · Fixed by #4172
Closed

Bug: Caddy freezes when syslog server not available #4083

sopleb opened this issue Mar 23, 2021 · 4 comments · Fixed by #4172
Labels
bug 🐞 Something isn't working
Milestone

Comments

@sopleb
Copy link

sopleb commented Mar 23, 2021

Hello,

I had a power outage over the weekend and noticed an issue with one of my sites, when looking into the issue, caddy was locking up whenever browsing the one domain that uses a syslog output

somewebsite.com {
        reverse_proxy host:port
        log {
                output net syslog-server:port
        }
}

Is the config that is being used. Is this behavior expected? or am I breaking something?

@simaotwx
Copy link
Contributor

I'm not super familiar with the Caddy codebase but here's what I'm seeing:

In https://github.com/caddyserver/caddy/blob/master/modules/logging/netwriter.go#L78 it attempts to connect to the server which causes Caddy to lock up.
Called from https://github.com/caddyserver/caddy/blob/master/logging.go#L656
Which is called from https://github.com/caddyserver/caddy/blob/master/logging.go#L147
Which is called from https://github.com/caddyserver/caddy/blob/master/logging.go#L102
Which in turn is called from https://github.com/caddyserver/caddy/blob/master/caddy.go#L360

I'm not sure how to solve this. We could use a goroutine to set up logging and use the default logger until then but that could cause some messages to get lost. Or we use a context deadline but that wouldn't solve the downtime issue.
Maybe a combination of both could be the solution: waiting a bit and if nothing happens, use the the default logger until the connection succeeds.

I think the others can give more insight into this.

@mholt
Copy link
Member

mholt commented Apr 5, 2021

I think the best solution would be to not configure Caddy to use a log server that isn't available?

@v-rosa
Copy link
Contributor

v-rosa commented Apr 10, 2021

@mholt considering that initially the syslog server is available, but suddenly its not reachable, this problem will persist and impact the caddy server, right?

@mholt
Copy link
Member

mholt commented Apr 12, 2021

I haven't had much time to spend on this, but I pushed a potential patch in #4172 -- could you all please try it out? Configure a dial timeout and it should unblock the server so it can keep working, it just might still be slow (once in a while). Let me know the details of your experience with it and we can smooth out any rough edges.

@mholt mholt added the bug 🐞 Something isn't working label Apr 12, 2021
@mholt mholt added this to the v2.4.0 milestone Apr 12, 2021
@mholt mholt linked a pull request Apr 29, 2021 that will close this issue
@mholt mholt modified the milestones: v2.4.0, v2.4.1 Apr 30, 2021
mholt added a commit that referenced this issue May 19, 2021
* logging: Implement dial timeout for net writer (fix #4083)

* Limit how often redials are attempted

This should cause dial blocking to occur only once every 10 seconds at most, but it also means the logger connection might be down for up to 10 seconds after it comes back online; oh well. We shouldn't block for DialTimeout at every single log emission.

* Clarify offline behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
4 participants