Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content Delivery API deadlocks when simultaneously requesting items referencing each other #15827

Closed
diger74 opened this issue Mar 4, 2024 · 3 comments · Fixed by #16023
Closed
Labels

Comments

@diger74
Copy link

diger74 commented Mar 4, 2024

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

12.3.7

Bug summary

When there are multiple items in the CMS tree which reference each other, e.g. assume Author content type with SimilarAuthors multinode treepicker property, and the following content:

  1. Author_1, with SimilarAuthors: Author_2, Author_3
  2. Author_2, with SimilarAuthors: Author_1, Author_3
  3. Author_3, with SimilarAuthors: Author_1, Author_2

When 2 or more of these items are requested through Content Delivery API simultaneously, it causes deadlocks in Umbraco. These requests (and all further requests to these items) hang infinitely until Memory Cache Reload or application restart. Requesting other items works just fine, only these colliding items hang.

This is only reproducible when these requested items are not yet in the Content Delivery API cache. They can be in nuCache though.

Specifics

We are implementing a website using Umbraco 12 Content Delivery API with NextJS 14.

This website contains a few thousand of products, many of which have Related Products (multinode treepicker) populated. Very often these related products form clusters of 3-5 products which are specified as related to each other.

By the description of this bug, you may think this issue occurs extremely rarely because it is hard to imagine that 2 Content Delivery API requests (with an average execution time of about 50ms) will collide at almost the same time. In our case this however happens very often during the build of a static site in NextJS, when the framework requests many pages to be built at once. With the number of pages we got in Umbraco CMS, and the fact the related products are present there very often, our NextJS builds are crushing 80% of the times with requests to Umbraco hanging in deadlocks.

Steps to reproduce

Please find a vanilla test solution prepared on Umbraco 12.3.7:
TestApiLanguages.zip

Within the CMS content tree it contains:

  1. Author 1 (en), with SimilarAuthors: Author 2 (en), Author 3 (en)
  2. Author 2 (en), with SimilarAuthors: Author 1 (en), Author 3 (en)
  3. Author 3 (en), with SimilarAuthors: Author 1 (en), Author 2 (en)

API requests to query this content:

  1. https://localhost:44354/umbraco/delivery/api/v1/content/item/en/author-1-en/?expand=all ['Accept-Language': 'en-US']
  2. https://localhost:44354/umbraco/delivery/api/v1/content/item/en/author-2-en/?expand=all ['Accept-Language': 'en-US']
  3. https://localhost:44354/umbraco/delivery/api/v1/content/item/en/author-3-en/?expand=all ['Accept-Language': 'en-US']

To reproduce this issue, you need to be able to send the requests above at almost exactly the same time. Manually through Postman or some other tool, it is impossible to reproduce. Please find below the script implemented in k6 framework that tries to send these requests randomly in 100 concurrent threads. This approach has a pretty high probability of reproducing this issue. You need to install k6 to run this script, or implement something similar with any other tool.

import http from 'k6/http';
import { check } from 'k6';

export let options = {
  vus: 100, // 100 virtual users
  duration: '10s', // Duration to run the test, adjust as needed
};

export default function () {
  const requests = [
    {
      url: 'https://localhost:44354/umbraco/delivery/api/v1/content/item/en/author-1-en/?expand=all',
      headers: { 'Accept-Language': 'en-US' },
    },
    {
      url: 'https://localhost:44354/umbraco/delivery/api/v1/content/item/en/author-2-en/?expand=all',
      headers: { 'Accept-Language': 'en-US' },
    },
    {
      url: 'https://localhost:44354/umbraco/delivery/api/v1/content/item/en/author-3-en/?expand=all',
      headers: { 'Accept-Language': 'en-US' },
    },
  ];

  // Select a random request from the array
  const randomRequest = requests[Math.floor(Math.random() * requests.length)];
  
  // Perform the request
  let response = http.get(randomRequest.url, { headers: randomRequest.headers });
  
  // Check the response
  check(response, {
    [`status was 200 (${randomRequest.headers.Accept})`]: (r) => r.status === 200,
  });
}

Test case 1 (no deadlocks on warmed-up cache):

  1. Start Umbraco
  2. Request API URLs 1-3, manually, one at a time to warmup the Content Delivery API cache
  3. Run k6 script above to generate concurrent workload
  4. Request API URLs 1-3 again, manually, one at a time to make sure these still return results

Restart Umbraco (works 100%) OR reload Memory Cache (works 70% of time)

Test case 2 (deadlocks):

  1. Start Umbraco
  2. DO NOT request API URLs 1-3 manually, one at a time to avoid warmup of the Content Delivery API cache
  3. Run k6 script above to generate concurrent workload
  4. Request API URLs 1-3 again, manually, one at a time to make sure these requests are deadlocked now

Expected result / actual result

Test case 1:
You should NOT see any deadlocks with warmed up Content Delivery API cache.
k6 script run should show you some solid execution stats, on my machine there were ~40k successful calls to Content Delivery API in 10 seconds, returning the content back:
image

Test case 2:
You should experience deadlocks now, k6 script run should show something like this, with no executions, because all the requests were hanging waiting for the response from Umbraco:
image
Then, after running the script, if you try to request any of these 3 URLs via Postman or some other tool you should see infinite loading:
image

Copy link

github-actions bot commented Mar 4, 2024

Hi there @diger74!

Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.

We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.

  • We'll assess whether this issue relates to something that has already been fixed in a later version of the release that it has been raised for.
  • If it's a bug, is it related to a release that we are actively supporting or is it related to a release that's in the end-of-life or security-only phase?
  • We'll replicate the issue to ensure that the problem is as described.
  • We'll decide whether the behavior is an issue or if the behavior is intended.

We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.

Thanks, from your friendly Umbraco GitHub bot 🤖 🙂

@diger74
Copy link
Author

diger74 commented Mar 19, 2024

Hi team, is there any progress on prioritizing and fixing this bug?
Our project goes live tomorrow and this issue is just a slowly ticking bomb for us which you don't know when will explode, but it will at some random point. We've noticed API responses hanging forever (therefore a website page that requires this API data is hanging forever as well), and it happens randomly but regularly, as I mentioned in the description of the original ticket. Clearing memory cache helps but only until next time it hangs again. This is extremely annoying to explain to the client.

I've done some investigation and I think I've found what causes the deadlock.

Please have a look at these 2 lines of code: https://github.com/umbraco/Umbraco-CMS/blob/contrib/src/Umbraco.Core/DeliveryApi/PropertyRenderer.cs#L14-L17

Both methods set a lock during their execution.

Now, imagine 2 Content Delivery API requests (for pages that are set as related to each other through content picker property) are hitting Umbraco almost at the same time, this is what happens:

  1. Request_1, querying Page_1, Umbraco starts execution, and tries to assembly the response from published cache
  2. It iterates property by property, and if it is a reference property (content picker) at some point it executes L14 from the above - property.HasValue() which sets the lock for resolving a property of Page_1
  3. And then continues the execution of property.GetDeliveryApiValue(expanding) which tries to get Page_2 from published cache and starts setting locks on Page_2 properties one by one
  4. If during that time of executing property.GetDeliveryApiValue(expanding) we have Request_2 for Page_2 - the same locks are set from the other side, and now Page_2 is waiting for Page_1 properties to be resolved.
  5. As a result in the end we have a deadlock. All new requests to Page_1 and Page_2 will be hanging forever! (okay, in Azure they will timeout with HTTP 500 after 6 minutes, locally they will hang forever until you clear a memory cache, and this will also clear the locks)

Locally, I managed to avoid deadlocks by simply removing the lock here in a Property class:
https://github.com/umbraco/Umbraco-CMS/blob/contrib/src/Umbraco.PublishedCache.NuCache/Property.cs#L324

I've done a few tests and haven't noticed any regression side effects. It wasn't clear from the code, what exactly these locks are defending against.

Also, this Property implementation is an internal class so I can't even go ahead and replace my implementation easily without going down a full custom Umbraco build path.

I am desperate. The project goes live, the client's not particularly happy about this random page hanging situation, and I can't do anything about it.

Help Me Obi-Wan Kenobi, You're My Only Hope.

Regards,
Dmitry

@Migaroez
Copy link
Contributor

Closing this because of #15928 being merged in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants