Skip to content
This repository has been archived by the owner on Jan 6, 2023. It is now read-only.

Cache Issue #1476

Open
philleepflorence opened this issue Nov 25, 2019 · 33 comments · May be fixed by #1991
Open

Cache Issue #1476

philleepflorence opened this issue Nov 25, 2019 · 33 comments · May be fixed by #1991
Labels
api bug Something isn't working

Comments

@philleepflorence
Copy link
Contributor

Bug Report

There seems to be an issue with the cache. Currently, the API cannot differentiate between external API calls which should be cached and API calls from /admin which may not need to be cached especially while actively updating collections. Sometimes when a user is not authenticated and a cache of a core collection is created, e.g. directus_settings, it prevents the user from signing back in throwing an error - Cannot read property 'api' of undefined.
Turning off cache in the config or purging the cache seems to fix the error.

Steps to Reproduce

It is hard to produce steps to recreate!
Seems this error happened when signed in the app, while an external API request was made that created a new cache, and when refreshing the page, got the error.
It also happened when navigating to the login page on a new tab after a cache has been made.
It only happens when directus_* collections have been updated.

Expected Behavior

Only cache external API requests.
The admin app should always load from the DB to reflect actual DB data.
It would be nice if there were a query parameter to force fetching from the DB even if a cache exists (useful for refresh functionality).

Actual Behavior

Requests from the Admin app are being cached and sometimes can break if an update has been made but not reflected in the cache.

Technical Details

  • Device: Mac Desktop
  • OS: MacOS 10.14.6
  • Web Server: Apache - MAMP 5.5
  • PHP Version: 7.2.21 - MAMP 5.5
  • Database: MySQL 5.7.26 - MAMP 5.5
  • Directus Release Version: 8.0.0-rc.2
  • Install Method: cloned master branch
@rijkvanzanten rijkvanzanten transferred this issue from directus/directus Nov 25, 2019
@benhaynes benhaynes added the bug Something isn't working label Nov 26, 2019
@binal-7span
Copy link
Contributor

@philleepflorence - APP is using the API itself. As per the current code structure, we are not able to differentiate whether the API is calling from APP or calling standalone.

We can achieve this with an extra query param cache. But instead of bug, this should be considered as an enhancement. Thoughts @benhaynes?

@philleepflorence
Copy link
Contributor Author

@BJGajjar - it seems currently in V8, I am able to see the requests coming from {{same domain}}/admin and also the ones coming via Postman so I can differentiate. I tested a hook to delete the cache if the request comes from the Admin App and that helped with the Project Name error.

But the main issue is that caches are made from Postman and it somehow is affecting the Admin App sometimes if I remove the hook.

I do agree that a cache query param would be cool - force-reload is a nice feature.
Another suggestion could be the Admin App passing a token so if the URLs and Domains don't match there could still be a way to differentiate.

@hufon
Copy link

hufon commented Nov 26, 2019

I've this problem...

@hufon
Copy link

hufon commented Nov 26, 2019

The major problem, is security, because when you call /api when authenticated (with cookie) the answer is not the same without authentication...

With cache enabled, it seems that the cache doesn't differentiate authenticated and non authenticated calls.

@hufon
Copy link

hufon commented Nov 29, 2019

  • Enable cache

  • Do an authenticated call to http://xxx/_/

  • You'll get something like :
    {"data":{"api":{"version":"8.0.0-rc.2","requires2FA":false,"database":"mysql","project_logo":null,"project_color":"light-blue-300","project_name":"api","default_locale":"fr-FR","project_foreground":null,"project_background":null,"telemetry":false},"server":{"max_upload_size":209715200,"general":{"php_version":"7.1.30","php_api":"apache2handler"}}}}

  • Now test with unauthenticated (without cookies) call

You should get something like that
{"data":{"api":{"requires2FA":false,"project_logo":null,"project_color":"light-blue-300","project_name":"api","default_locale":"fr-FR","project_foreground":null,"project_background":null,"telemetry":false}},"public":true}

And in fact you get (with cache enabled)
{"data":{"api":{"version":"8.0.0-rc.2","requires2FA":false,"database":"mysql","project_logo":null,"project_color":"light-blue-300","project_name":"api","default_locale":"fr-FR","project_foreground":null,"project_background":null,"telemetry":false},"server":{"max_upload_size":209715200,"general":{"php_version":"7.1.30","php_api":"apache2handler"}}}}

Which is the authenticated response...

@hufon
Copy link

hufon commented Nov 29, 2019

in fact we should only cache unauthenticated calls... Authenticated calls are very hard to cache, because of access rights...

@hufon
Copy link

hufon commented Nov 29, 2019

Maybe is the same problem directus/app#2251

no, your problem is more like #1473

@everyx
Copy link
Contributor

everyx commented Nov 29, 2019

no, your problem is more like #1473

Yes, just a mistake. :)

@bminer
Copy link

bminer commented Dec 6, 2019

We still need to cache authenticated calls, but perhaps they should be tied to the logged in user somehow?

@philleepflorence
Copy link
Contributor Author

We still need to cache authenticated calls, but perhaps they should be tied to the logged in user somehow?

@bminer Do you mean authenticated calls via the API or from the App?

@rijkvanzanten
Copy link
Member

It sounds like the safest short-term improvement / fix here would be to say that caching only works for public information, while we figure out the ideal strategy of caching authenticated data as well.

@bminer
Copy link

bminer commented Dec 11, 2019

@rijkvanzanten - I agree that caching really only works for public info at the moment. The main problem is that our Directus deployment simply won't function without caching. The database is slammed with thousands of queries to confirm database schema for each simple request. Without caching, our app would not function at all. Any noteworthy workarounds?

@bminer
Copy link

bminer commented Dec 11, 2019

@philleepflorence - I meant authenticated calls either via the API or from the App. They both hit the same API, right?

@philleepflorence
Copy link
Contributor Author

@bminer you are correct but now we can differentiate calls from the admin app vs straight API.

Maybe adding a query parameter for caching would help.

I agree caching is really important so as not to overburden the DB server.

@nachogarcia
Copy link

I agree for caching only public info for now. We have the issue mentioned above and is causing us a lot of pain (manually flushing the cache each minutes).

Also, on this note, are API errors cached? Or if a malicious user does heavy requests that result in an error (e.g. many fields with relations) can overload the api/db ?

@rijkvanzanten
Copy link
Member

@bminer @philleepflorence @nachogarcia How should all of this ideally work?

  • Cache all public GET requests (most heavily used)

  • Invalidate cache of specific item on POST / PATCH / DELETE

  • (?) Don't cache anything when authenticated

  • (?) Cache same as normal when authenticated, but:

    • Don't cache at all when coming from the admin app*

* I don't think we should have a proprietary flow for just the Directus app; instead we should consider a skip-cache flag or something of that nature.

@nachogarcia
Copy link

At the moment we have many network errors when updating certain items in our CMS - we're investigating the reason, but we think that has relation with the cache - request size (our DB has many relations to store component hierarchy).

On that note, ideally it would be great if the cache would work per role or user since the admin would work quite faster, and shouldn't cause issues since it would be invalidated on update/delete.

I don't know how the graphql works now, but some time ago you needed authentication - so there would be also that.

@bminer
Copy link

bminer commented Dec 12, 2019

@rijkvanzanten - I think you're on the right track, but I also think that caching authenticated requests needs to be implemented also. The cached item should only be a "hit" if the logged in user matches the original request. Thoughts? Too complicated?

@rijkvanzanten
Copy link
Member

rijkvanzanten commented Dec 12, 2019

On that note, ideally it would be great if the cache would work per role or user since the admin would work quite faster, and shouldn't cause issues since it would be invalidated on update/delete.

+1 for configuring this on the role level @nachogarcia

@bminer I think we should be fine as long as cache invalidation on update / delete works properly. However........ what happens when the user that requested something was deleted / suspended in the meantime? That should then also invalidate the cache I suppose. Otherwise you might block someones access, but this person is still able to retrieve the data from cache

@bminer
Copy link

bminer commented Dec 12, 2019

Agreed. Deleted data should invalidate all relevant cache entries, including authenticated ones, which could mean invalidating user-specific (or role-specific) cache entries.

One exception to the role-specific case might be the /me endpoint, which returns information about the currently logged in user. This should (probably) never be cached.

@rijkvanzanten
Copy link
Member

I suppose we'd have to cache authenticated endpoints as the combination of path - token - user, which should still allow /me to be cached, as it would be cached when using a specific token / user to fetch it 🤔

@philleepflorence
Copy link
Contributor Author

Sorry I missed out on the responses. I was in transit.

I suppose a solution is to cache only when a cache parameter is passed for public and authenticated requests.

Also a cache delete parameter to flush cache for the associated collection. .

I tried this on a local API (via hooks overriding the Directus cache).

The only issue I had was I removed all related collection cache to the collection in question. So the first subsequent request was a bit slower but then the next ones went faster as normal.

@dwene
Copy link
Contributor

dwene commented Dec 18, 2019

Hey everyone. I'll confess I don't understand all of the user-role permissions issues with caching (I'm running directus for all public content). But here's my two cents on how I hope caching would work:

It sounds like the safest short-term improvement / fix here would be to say that caching only works for public information, while we figure out the ideal strategy of caching authenticated data as well.

This works great for me.

I suppose we'd have to cache authenticated endpoints as the combination of path - token - user, which should still allow /me to be cached, as it would be cached when using a specific token / user to fetch it 🤔

This seems pretty safe. I assume it would work for public users as path-null-null. But if you have token do you need user?

Also you might warn people that their cache could grow waaay larger than before. Data x # of sessions.

I suppose a solution is to cache only when a cache parameter is passed for public and authenticated requests.
Also a cache delete parameter to flush cache for the associated collection. .

For those who need to run Directus with a cache to keep their database from going down, this seems risky. It exposes a way for someone to continually flush the cache and bring down the site. But that would essentially be a DDOS attack so, maybe that's outside the purview of what Directus should prevent.

Just my 2 cents :)

@rijkvanzanten
Copy link
Member

rijkvanzanten commented Dec 18, 2019

But if you have token do you need user?

Good point! We don't need the user if we already have the token. Just gotta make sure we extract the token correctly as it can come from multiple sources (header / query param / cookie)

@nachogarcia
Copy link

Another thing we might want to do regarding cache implementation is having a directus prefix in cache keys.

It could be useful for listing, deleting, etc. all directus keys in a shared redis cluster.

@lluishi93
Copy link

lluishi93 commented Jan 31, 2020

@rijkvanzanten Is this issue going to be fixed/addressed for the next release?

@rijkvanzanten
Copy link
Member

Not the next release unfortunately, but hopefully soon. Our main focus right now is stability and usability, and making cache actually function is very high on that list.

@jcbihanictous
Copy link

Hi @rijkvanzanten ,
Any update on this?
I'm working with @nachogarcia and it's a real issue for our contributors, especially those working weekends or out of office hours.

@WoLfulus
Copy link
Contributor

We still have no answer to this, but I'm looking at the options we have right now.

@Bedotech
Copy link
Contributor

With Directus 8 the cache is always disabled right? because i can't get it working, in our use case we use GraphQL and we cant upgrade because this.

@rijkvanzanten
Copy link
Member

The cache is disabled by default, but can be enabled in the config file of the API

@everyx
Copy link
Contributor

everyx commented May 23, 2020

@jcbihanictous
Copy link

@WoLfulus do we have a solution to this yet?
It's a really big problem for us right now...

hairyjim added a commit to hairyjim/api that referenced this issue Oct 10, 2020
@hairyjim hairyjim linked a pull request Oct 10, 2020 that will close this issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.