Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using continuation token with new QueryIterator results in high RU & latency. #1282

Closed
stephenwilson11 opened this issue Mar 17, 2020 · 17 comments
Labels
bug Something isn't working QUERY

Comments

@stephenwilson11
Copy link

Describe the bug
When you receive a continuation token from a client a build a new iterator to get the next page of results, the RU & latency spike. Attached are code snippets that repo this issue in the V3 SDK and also in V2 where the issue is not present. Am I misusing the V3 SDK in some way or is there an issue here?

The output is as follows:

V3 SDK:
Fetched 5 documents costing 10.79 RU.
Fetched 5 documents costing 596.46 RU.

V2 SDK:
Fetched 5 documents costing 5.48 RU.
Fetched 5 documents costing 5.34 RU.

To Reproduce
I have attached the source used to produce the V2 and V3 results above. The collection they queried against contained a partition with 600k of small documents. In case its a factor, I up-scaled my database RU's to 20,000 RU's to insert the 600k and then back down to 1200 RU (the min for the number of collections I have) resulting in two logical partitions with 600 RU's on each.

Expected behavior
Performance and cost comparable to V2.

Actual behavior
Excessive cost and performance.

Environment summary
SDK Version: Microsoft.Azure.Cosmos 3.6.0
OS Version: Windows

Additional context
I have includes the full output from the sample applications that include diagnostics info also.

V2-SampleSource.cs.txt
V2-SDK-SampleOutput.txt
V3-SampleSource.cs.txt
V3-SDK-SampleOutput.txt

@j82w
Copy link
Contributor

j82w commented Mar 17, 2020

@bchong95 can you please take a look?

@stephenwilson11 can you try add MaxBufferedItemCount and MaxConcurrency settings on both versions? Based on the diagnostics in v3 it's doing 2 page reads for a single query ReadNextAsync call. The first page cost 5.39RUs, and the second page was 5.53RUs. The second page is buffered and is never read with the current code.

new QueryRequestOptions
            {
                MaxItemCount = 5,
                PartitionKey = new PartitionKey("d72edfb58d064e09b30445592730f412"),
                ResponseContinuationTokenLimitInKb = 1,
                MaxBufferedItemCount = 5,
                MaxConcurrency  = 1,
            });

@stephenwilson11
Copy link
Author

stephenwilson11 commented Mar 17, 2020

Hi, thanks for getting back to me so quickly.

I have added MaxBufferedItemCount and MaxConcurrency and rerun the V3 example which has not had any visible effect (output attached).

I believe this is due to the query having an order by as you commented here: #990

I did comment on this thread regarding this as it is odd that you say this is required yet the V2 and Data Explorer in the Azure console manage to work without it (even using order by).

Anyway I feel the issue reported is something else entirely. If I was simply dealing with double pages being returned then my RU's would be consistently around 10RU's not just under 600RUs.

Further info that may or may not be helpful is that I have a composite index on the two fields used in the order by.

V3-SDK-SampleOutput-WithMaxBufferedAndMaxConcurrency.txt
.

@j82w
Copy link
Contributor

j82w commented Mar 17, 2020

Both pages are from the same partition, so there shouldn't be any reason to pull the second page. I don't have any ideas on what would cause the RUs to increase like that. @bchong95 is the query expert and will do the investigation.

@stephenwilson11
Copy link
Author

Yeh that makes sense... if you could look into that also I would appreciate it :) Its adding unnecessary load.

@j82w
Copy link
Contributor

j82w commented Mar 17, 2020

What version of the v2 SDK are you using?

@stephenwilson11
Copy link
Author

stephenwilson11 commented Mar 17, 2020

Microsoft.Azure.DocumentDB.Core 2.10.1

@j82w
Copy link
Contributor

j82w commented Mar 18, 2020

Just to give you an update this is actively being investigated. Query team has confirmed that there is a bug with it pulling the extra page. They are still root causing the RU difference.

@stephenwilson11
Copy link
Author

Great, thanks for the update.

@j82w
Copy link
Contributor

j82w commented Mar 20, 2020

@stephenwilson11 there is a PR #1289 to fix the large RU increase. Please look at the PR description for more details.

@stephenwilson11
Copy link
Author

That's great thank you. Are you able to estimate how long it will be for this to make it into a release package?

@j82w
Copy link
Contributor

j82w commented Mar 20, 2020

Hopefully early next week.

I attached an unofficial release that is not signed. I generated the nuget locally based on the branch with the fix.
Microsoft.Azure.Cosmos.3.7.0.zip

@stephenwilson11
Copy link
Author

@j82w Thank you. Did a fix go in for the other bug with it pulling the extra page?

@j82w
Copy link
Contributor

j82w commented Mar 31, 2020

3.7.0 was officially released, and hopefully the extra page will get fixed with the #1319

@j82w j82w added bug Something isn't working and removed needs-investigation labels Mar 31, 2020
@stephenwilson11
Copy link
Author

@j82w I have re-run the test with the new release and can confirm this issue is resolved with each query now costing ~10RU (due to double page issue #1319).

Do I close the issue or leave that to you?

@j82w j82w closed this as completed Mar 31, 2020
@bartelink
Copy link
Contributor

bartelink commented Apr 1, 2020

@stephenwilson11 What were the final RU costs - were they equivalent to V2 (i.e. do you now see 10 documents in one response with <= the sum of the original individual costs)?
(If not, can you explain why what you observed is correct / acceptable from your perspective?)

@stephenwilson11
Copy link
Author

@bartelink

Cost per page fetch:
V2 (2.10.1): ~5 RU
V3 (3.6.0): ~600 RU
V3 (3.7.0): ~10 RU

This bug addresses the massive spike in RU's seen in the 3.6.0 release but does not address the double page fetching bug #1319 resulting in the RU cost of ~10 currently seen.

Once that bug is also addressed and released I expect the RU cost to match V2 at ~5.

@bartelink
Copy link
Contributor

Thanks for the quick and detailed response; that makes sense

Background: I'm seeing a prototypical query (17k docs, all tiny, total meaningful payload approx .6MB, total partition size ~2MB) showing the following costs atm:

V2: paged in 1000s, 18 responses RC 973.41
(I need to do messy cherry-picking to run a V2 test without paging but its safe to assume, it's going to be less)
V3.7.1-preview: paged in 100000s, 1 response RC 1001.02
V3.7.1-preview: paged in 1000s, 18 responses RC 1055.03
V4-preview3: paged in 100000s, 1 response RC 1001.02
V4-preview3: paged in 1000s, 18 responses RC 1055.03

I'll pitch in on validating the fix for #1319 when the time comes, in the hope that resolves the issue (which is an adoption blocker from our perspective)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working QUERY
Projects
None yet
Development

No branches or pull requests

3 participants