-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Still happens: Dynamic Backends can only be added to warm VCLs #117
Comments
@delthas apologies for having lost track of this issue. If you have time, could you please (re)confirm that it still happens with latest Varnish-Cache? In particular, I would like to exclude the possibility of this being related to varnishcache/varnish-cache#4108, for which the fix went in on Monday. |
@nigoroll I can confirm this still happens with Varnish 7.5.0. Varnish 7.5.0 crashed here this morning with:
Panic.show in varnishadm reported:
The log of the varnish docker container contained more or less the same information:
|
Could anyone who can reproduce this issue please run varnishd with this little patch and report the Panic? diff --git a/bin/varnishd/cache/cache_vrt_vcl.c b/bin/varnishd/cache/cache_vrt_vcl.c
index 82e6e10a5..a17db5726 100644
--- a/bin/varnishd/cache/cache_vrt_vcl.c
+++ b/bin/varnishd/cache/cache_vrt_vcl.c
@@ -249,7 +249,7 @@ VRT_AddDirector(VRT_CTX, const struct vdi_methods *m, void *priv,
return (vcldir_surplus(vdir));
if (!temp->is_warm && temp != VCL_TEMP_INIT)
- WRONG("Dynamic Backends can only be added to warm VCLs");
+ WRONG(temp->name);
return (vdir->dir);
} |
I am trying hard to reproduce this with current code (varnishcache/varnish-cache@7292f92 and vmod_dynamic e1b9307), no success. Test vcl which I am using vcl 4.1;
import dynamic;
import directors;
backend proforma none;
sub vcl_init {
new r1 = dynamic.resolver();
r1.set_resolution_type(RECURSING); # tried with and without
new d = dynamic.director(
resolver = r1.use(),
ttl_from = dns
);
new rr = directors.round_robin();
rr.add_backend(d.backend("www.uplex.de"));
}
sub vcl_recv {
return (synth(200));
}
sub vcl_synth {
set resp.body = "" + rr.backend().resolve() + " " +
d.backend("uplex.de").resolve() + """
""";
return (deliver);
} start:
vcl load loop:
curl loop (in a second terminal):
I just don't get the panic, also not with this addition to deliberately delay resolution: diff --git a/src/dyn_resolver_getdns.c b/src/dyn_resolver_getdns.c
index c674f1b..f97b761 100644
--- a/src/dyn_resolver_getdns.c
+++ b/src/dyn_resolver_getdns.c
@@ -32,6 +32,7 @@
//#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
+#include <unistd.h>
#ifdef __FreeBSD__
#include <arpa/inet.h>
@@ -203,6 +204,7 @@ getdns_lookup(struct VPFX(dynamic_resolver) *r,
ret = getdns_address_sync(c->context, node, NULL, &state->response);
errchk(ret);
+ usleep(1000 * 1000);
return (getdns_common_lookup_check(state));
}
So, can anyone please help me with a reproducer? |
FTR: The container build is https://github.com/varnish/docker-varnish/blob/master/fresh/debian/Dockerfile#L9C26-L9C66 which uses 5dc09f5, which is a minor change on top of 47bfcf5 So if this was root-caused in dynamic, it should still happen. |
So it turns out, that - still - we can deadlock raching between VCL events and director deletions (see varnishcache/varnish-cache#4140). This commit was originally written to fix #108, but it now enables us to take a different route again: Do not pthread_join() lookup threads, but rather detach them and make sure _they_ return the reference before terminating. The remainder of the commit message is an edit in light of the new insights: Our lookup threads may continue for a bit after a VCL_COLD event has been received: dom_event() sets dom->status to DYNAMIC_ST_DONE while the lookup thread may be blocking in the resolver or updating a domain. So, lookup threads might try to create backends even after the VCL has gone cold, in which case VRT_AddDirector() would panic with "Dynamic Backends can only be added to warm VCLs" - _unless_ a reference is present on the VCL to make the VCL state change go through VCL_TEMP_COOLING, in which case VRT_AddDirector() just returns NULL. vmod_dynamic took a VCL reference since 99912bc back in May 2019, but before the merge of varnishcache/varnish-cache#4037 it did not work correctly, because the VCL reference was returned before all threads had terminated. The naive idea to solve this would be to take/release a VCL reference in each lookup thread (or when starting/stopping it), but references can only be taken from the CLI context (ASSERT_CLI() in VRT_VCL_Prevent_Discard). Also, to support use of the .resolve() type method from vcl_init{}, we needed to depart from the original concept of running resolver threads only for warm vcls. We get out of this situation by using a reference on top of the VCL reference, which we can take even before we have the VCL reference. The VCL reference is returned once all internal "referef"s are returned. Ref #108 Probably fixes #117 also
Oops, I did not mean to close this issue before confirmation. (too much github magic, closing issues across projects is, hm, interesting) |
I'm getting the same assert again, but in a slightly different way (even with the fix).
The stack is still:
I'm stressing Varnish + libvmod-dynamic with a lot of VCL reloads. Everything usually works fine for a while (sometimes hours), until Varnish starts to loop on this error, every ~15s to ~1 minute. So, no issues for hours and then once I get this panic, it repeats every ~15s to ~1min: child crashes, child restarts, child crashes, ...
Here's an excerpt of the full logs:
It continued crashing in a loop indefinitely for multiple days. I managed to attach to the process and the VCL was indeed COOL at the time of the panic.
Originally posted by @delthas in #108 (comment)
The text was updated successfully, but these errors were encountered: