-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎨 Reduces response time of catalog/services listing entrypoint #5273
🎨 Reduces response time of catalog/services listing entrypoint #5273
Conversation
19a522f
to
b4b39c9
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #5273 +/- ##
========================================
+ Coverage 87.2% 88.7% +1.4%
========================================
Files 1308 1213 -95
Lines 53550 49548 -4002
Branches 1170 1024 -146
========================================
- Hits 46749 43960 -2789
+ Misses 6552 5367 -1185
+ Partials 249 221 -28
Flags with carried forward coverage won't be shown. Click here to find out more.
|
8d46419
to
8c11867
Compare
bc876b9
to
508a986
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
What do these changes do?
This PR analyses and provides a temporary solution for the incident #5267 that reports very long delays in
GET /catalog/services
entrypoint. In addition the front-end can at time spawn up to 17 parallel calls to this entrypoint point (probably due to some retry mechanism)We are well aware that the original design does not scale with the amount of services. A proper resolution of this issue therefore requires a redesign that should incorporate at least pagination and lighter item objects with bounded fields.
For the moment we decided to go with a strategic TTL cache that will reduce the overhead of postprocessing every service item in the webserver. Note that the webserver does not just forward the service object provided by the
catalog
service but also computes and adds some extra information (e.g. units) to it. Using a profiler reveals thatreplace_service_inputs
incurs a considerable computation overhead. Therefore, with the last increase in the number of services this has caused the large delays reported in #5267.This is the benchark results and profiler before the cache was added


and this is after
Finally here we can see several calls to the entrypoint. After the first, subsequent calls take much less time
Regarding the empty lists, I also noticed that this comes directly in the response of the catalog service (and not from the webserver). Probably is due to an issue of the caches on the multiple replicas there. Nonetheless this point has not been addressed here
Details
cachetools
to cache sync functions (we haveaiocache
to cache async functions`pytest-benchmark
to benchmark thereplace_service_inputs
msgpack
mainly to remove log warning message of some libraries (e.g. aiocache) that uses this as a default if it is available.Related issue/s
How to test
Driving test
services/web/server/tests/unit/isolated/test_catalog_models.py
Dev Checklist
DevOps