Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(providers): Sanitize nuts properties from configuration #221

Merged

Conversation

darkweak
Copy link
Owner

@darkweak darkweak added the bug Something isn't working label Jun 17, 2022
@darkweak darkweak self-assigned this Jun 17, 2022
@darkweak darkweak force-pushed the fix/cache/providers/nuts/sanitize-properties-from-configuration branch from 2644c8d to f60c05d Compare June 18, 2022 00:42
@darkweak darkweak force-pushed the fix/cache/providers/nuts/sanitize-properties-from-configuration branch from 426d87e to abeadcb Compare June 18, 2022 11:50
@darkweak darkweak force-pushed the fix/cache/providers/nuts/sanitize-properties-from-configuration branch from 536c4cd to c37739f Compare June 18, 2022 12:17
@darkweak
Copy link
Owner Author

@mattvb91 can you try with the --with github.com/darkweak/souin/plugins/caddy@ c37739fb65697ca15d3c872491bf448ced8c2469 --with github.com/darkweak/souin@ c37739fb65697ca15d3c872491bf448ced8c2469 ?

@mattvb91
Copy link

mattvb91 commented Jun 18, 2022

@darkweak I can now see the keys succesfully purged from /__cache/souin/surrogate_keys however the pages are still cached and show up on the /__cache/souin list now

EDIT: also the surrogate-key header doesnt seem to get received at the clients end anymore, looks like it is removed during transit now once the key has been seen?

@darkweak
Copy link
Owner Author

darkweak commented Jun 18, 2022

What's your configuration file ? What are the sent requests ? That's really weird, the E2E tests are updated to ensure everything works as expected.

@mattvb91
Copy link

So the order is the following. Restart caddy and have an empty list on both /__cache/souin/surrogate_keys & /__cache/souin.

curl http://random-c9vwf2ke.localhost/first-product

This request goes to a nextjs container which will in turn call an upstream laravel API. So I end up with a frontend html response to cache and an API json response to cache:

/__cache/souin/surrogate_keys

{
"STALE_product-5f65fd28-bd1c-418e-b0df-2b63f45260df": ",GET-random-c9vwf2ke.localhost-/api/shop/get/first-product,STALE_GET-random-c9vwf2ke.localhost-/first-product{-VARY-}Accept-Encoding:",
"product-5f65fd28-bd1c-418e-b0df-2b63f45260df": ",GET-random-c9vwf2ke.localhost-/api/shop/get/first-product,GET-random-c9vwf2ke.localhost-/first-product{-VARY-}Accept-Encoding:"
}
/__cache/souin
[
"GET-random-c9vwf2ke.localhost-/first-product{-VARY-}Accept-Encoding:",
"STALE_GET-random-c9vwf2ke.localhost-/api/shop/get/first-product",
"GET-random-c9vwf2ke.localhost-/api/shop/get/first-product",
"STALE_GET-random-c9vwf2ke.localhost-/first-product{-VARY-}Accept-Encoding:"
]

Both the API and frontend responses have been tagged with the product-5f65fd28-bd1c-418e-b0df-2b63f45260df surrogate key as can be seen in the surrogate_keys response.

curl --location --request PURGE 'http://localhost/__cache/souin' \
--header 'Surrogate-Key: product-5f65fd28-bd1c-418e-b0df-2b63f45260df' \
--header 'Host: localhost'

Now the API still has a STALE_GET response

/__cache/souin
[
"STALE_GET-random-c9vwf2ke.localhost-/api/shop/get/first-product",
]

While the keys list is empty

/__cache/souin/surrogate_keys
{
}

And looking at it it may be caused by the {-VARY-}Accept-Encoding: getting appended to the requests which then ends up being a different key?

Config:

"cache":{
   "api":{
      "basepath":"/__cache",
      "souin":{
         "basepath":"/souin",
         "enable":true
      }
   },
   "cache_keys":{
      "/_next\\/static/":{
         "disable_host":true
      }
   },
   "cdn":{
      "dynamic":true,
      "strategy":"hard"
   },
   "distributed":true,
   "log_level":"debug",
   "nuts":{
      "path":"/tmp/nuts-souin"
   },
   "stale":"0s",
   "ttl":"3600s"
}

@darkweak
Copy link
Owner Author

@mattvb91 I'm not able to reproduce so I'll try to fix that blindly. Can you try with the --with github.com/darkweak/souin/plugins/caddy@17c29c8107cf566a89931a114fdf1137f354336b --with github.com/darkweak/souin@17c29c8107cf566a89931a114fdf1137f354336b please?

@mattvb91
Copy link

No luck with that,

still gets stuck in the list:

/__cache/souin

"GET-random-c9vwf2ke.localhost-/first-product{-VARY-}Accept-Encoding:gzip, deflate, br",

and it doesnt get added back into the surrogate key list either at that point because im assuming the cache entry doesnt hold the surrogate-key header anymore (That used to get sent but has gone missing in one of the last releases) so at that point there is nothing to flush by surrogate key in in the list however its still serving that cache entry.

Is there any way to disable the {-VARY-}Accept-Encoding:gzip, deflate, br from going into the cache key?

@darkweak
Copy link
Owner Author

In my tests I didn't set a cache key with a comma + space, that's why it works on my side, my bad

@darkweak
Copy link
Owner Author

darkweak commented Jun 18, 2022

This commit should fix this bug.

Is there any way to disable the {-VARY-}Accept-Encoding:gzip, deflate, br from going into the cache key?

No, because if the vary is not part of the cache key, you'll probably serve a wrong response that depend on the headers (e.g. language)

@mattvb91
Copy link

Sorry deleted my last comment from 1 minute ago, I had accidently disabled 'distributed' => true, to test something else.

This looks like it is working correctly on initial glance!!! Clearing both my frontend & API responses now! This is great.

@darkweak
Copy link
Owner Author

Is it really working ? 😅

@mattvb91
Copy link

Yup at least with requests that have 1 surrogate key per page.

I will test tomorow with list pages that will have multiple surrogate keys to check that it flushes list pages too but yea this definitely works nicely for individual pages now great stuff 👍

@mattvb91
Copy link

So it looks like its 90% there. If I run my whole test suite including list pages that need to get flushed it fails when there is a large number of surrogate keys in the list.

However if I restart caddy to clear the cache and run the individual test that fails which results in only a small number of surrogate keys then the test passes.

So at the moment I cant even give a concrete example cause I need to dig a bit more to find out exactly whats happening but my initial suspicion is that once there is a large number of entries then theres some kind of corruption going on in the surrogate key list / cache entries and it isn't able to cleanly flush the specified resource.

@mattvb91
Copy link

@darkweak not sure if this is related just looking into if I can get a certain order at which it fails but why do I get different header responses when using curl vs what the browser receives?

curl --head http://adw-sdrubieu.localhost

HTTP/1.1 200 OK
Cache-Status: Souin; fwd=uri-miss
Content-Length: 17293
Content-Type: text/html; charset=utf-8
Date: Sun, 19 Jun 2022 12:06:06 GMT
Etag: "438d-J+ToNOLWBws9Qd0hwDPyuodqNLk"
Surrogate-Key: adw-sdrubieu.localhost, product-e077c673-4c4a-4856-b673-1c1b91e77dbf, category-74e2d799-cce4-4423-b52a-8141ae1333fc
Vary: Accept-Encoding

Same request on chrome dev tools:

http://adw-sdrubieu.localhost

HTTP/1.1 200 OK
Age: 208
Cache-Status: Souin; hit; ttl=3392
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Date: Sun, 19 Jun 2022 12:03:09 GMT
Etag: "438d-J+ToNOLWBws9Qd0hwDPyuodqNLk"
Vary: Accept-Encoding
Transfer-Encoding: chunked

Is that because it adds keys depending on request headers coming in from the browser?

@darkweak
Copy link
Owner Author

The browser send a curl like curl --location --request GET '{YOUR_URL}' --head

@mattvb91
Copy link

mattvb91 commented Jun 19, 2022

Thanks! Im still a bit lost as to why im getting different results I feel like it should give me back the same cache entry I must be missing something obvious?

curl --head --location --request GET 'http://random-5a9k4ose.localhost/'

HTTP/1.1 200 OK
Cache-Status: Souin; fwd=uri-miss; stored
Content-Type: text/html; charset=utf-8
Date: Sun, 19 Jun 2022 14:04:33 GMT
Etag: "43bf-6bgCw8ERYf7UEoKH526rNffgVhY"
Surrogate-Key: random-5a9k4ose.localhost, product-cf9fa866-e38e-4a3a-8bf8-34d86d68c56a, category-e087fcfc-171e-4ba1-af2b-4c1db5986804
Vary: Accept-Encoding
Transfer-Encoding: chunked

Browser:

Request URL: http://random-5a9k4ose.localhost/

Age: 1137
Cache-Status: Souin; hit; ttl=2463
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Date: Sun, 19 Jun 2022 13:46:02 GMT
Etag: "43bf-6bgCw8ERYf7UEoKH526rNffgVhY"
Transfer-Encoding: chunked
Vary: Accept-Encoding

same etag but the content being served in both instances is actually old and hasnt been cleared.

EDIT: under /__cache/souin I have 2 of the same entries:

"GET-random-5a9k4ose.localhost-/{-VARY-}Accept-Encoding:gzip",
"GET-random-5a9k4ose.localhost-/{-VARY-}Accept-Encoding:",

However on /__cache/souin/surrogate_keys I only have one:

"random-5a9k4ose.localhost": ",GET-random-5a9k4ose.localhost-%2F%7B-VARY-%7DAccept-Encoding%3A"

So it looks like 1 is getting lost in the cache and no longer tracked over surrogates

@darkweak darkweak force-pushed the fix/cache/providers/nuts/sanitize-properties-from-configuration branch 2 times, most recently from 8c05948 to e445378 Compare June 20, 2022 00:38
@darkweak darkweak force-pushed the fix/cache/providers/nuts/sanitize-properties-from-configuration branch from e445378 to 89ddf86 Compare June 20, 2022 00:55
@darkweak darkweak merged commit d1b8dd7 into master Jun 22, 2022
@darkweak darkweak deleted the fix/cache/providers/nuts/sanitize-properties-from-configuration branch June 22, 2022 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[question] supported caching providers in main branch
3 participants