Using continuation token with new QueryIterator results in high RU & latency. #1282

stephenwilson11 · 2020-03-17T09:23:28Z

Describe the bug
When you receive a continuation token from a client a build a new iterator to get the next page of results, the RU & latency spike. Attached are code snippets that repo this issue in the V3 SDK and also in V2 where the issue is not present. Am I misusing the V3 SDK in some way or is there an issue here?

The output is as follows:

V3 SDK:
Fetched 5 documents costing 10.79 RU.
Fetched 5 documents costing 596.46 RU.

V2 SDK:
Fetched 5 documents costing 5.48 RU.
Fetched 5 documents costing 5.34 RU.

To Reproduce
I have attached the source used to produce the V2 and V3 results above. The collection they queried against contained a partition with 600k of small documents. In case its a factor, I up-scaled my database RU's to 20,000 RU's to insert the 600k and then back down to 1200 RU (the min for the number of collections I have) resulting in two logical partitions with 600 RU's on each.

Expected behavior
Performance and cost comparable to V2.

Actual behavior
Excessive cost and performance.

Environment summary
SDK Version: Microsoft.Azure.Cosmos 3.6.0
OS Version: Windows

Additional context
I have includes the full output from the sample applications that include diagnostics info also.

V2-SampleSource.cs.txt
V2-SDK-SampleOutput.txt
V3-SampleSource.cs.txt
V3-SDK-SampleOutput.txt

j82w · 2020-03-17T11:45:16Z

@bchong95 can you please take a look?

@stephenwilson11 can you try add MaxBufferedItemCount and MaxConcurrency settings on both versions? Based on the diagnostics in v3 it's doing 2 page reads for a single query ReadNextAsync call. The first page cost 5.39RUs, and the second page was 5.53RUs. The second page is buffered and is never read with the current code.

new QueryRequestOptions
            {
                MaxItemCount = 5,
                PartitionKey = new PartitionKey("d72edfb58d064e09b30445592730f412"),
                ResponseContinuationTokenLimitInKb = 1,
                MaxBufferedItemCount = 5,
                MaxConcurrency  = 1,
            });

stephenwilson11 · 2020-03-17T12:05:36Z

Hi, thanks for getting back to me so quickly.

I have added MaxBufferedItemCount and MaxConcurrency and rerun the V3 example which has not had any visible effect (output attached).

I believe this is due to the query having an order by as you commented here: #990

I did comment on this thread regarding this as it is odd that you say this is required yet the V2 and Data Explorer in the Azure console manage to work without it (even using order by).

Anyway I feel the issue reported is something else entirely. If I was simply dealing with double pages being returned then my RU's would be consistently around 10RU's not just under 600RUs.

Further info that may or may not be helpful is that I have a composite index on the two fields used in the order by.

V3-SDK-SampleOutput-WithMaxBufferedAndMaxConcurrency.txt
.

j82w · 2020-03-17T12:59:04Z

Both pages are from the same partition, so there shouldn't be any reason to pull the second page. I don't have any ideas on what would cause the RUs to increase like that. @bchong95 is the query expert and will do the investigation.

stephenwilson11 · 2020-03-17T13:57:12Z

Yeh that makes sense... if you could look into that also I would appreciate it :) Its adding unnecessary load.

j82w · 2020-03-17T14:06:32Z

What version of the v2 SDK are you using?

stephenwilson11 · 2020-03-17T14:13:19Z

Microsoft.Azure.DocumentDB.Core 2.10.1

j82w · 2020-03-18T11:19:47Z

Just to give you an update this is actively being investigated. Query team has confirmed that there is a bug with it pulling the extra page. They are still root causing the RU difference.

stephenwilson11 · 2020-03-18T11:21:26Z

Great, thanks for the update.

j82w · 2020-03-20T11:49:48Z

@stephenwilson11 there is a PR #1289 to fix the large RU increase. Please look at the PR description for more details.

stephenwilson11 · 2020-03-20T11:58:31Z

That's great thank you. Are you able to estimate how long it will be for this to make it into a release package?

j82w · 2020-03-20T15:27:37Z

Hopefully early next week.

I attached an unofficial release that is not signed. I generated the nuget locally based on the branch with the fix.
Microsoft.Azure.Cosmos.3.7.0.zip

stephenwilson11 · 2020-03-20T15:37:54Z

@j82w Thank you. Did a fix go in for the other bug with it pulling the extra page?

j82w · 2020-03-31T13:38:06Z

3.7.0 was officially released, and hopefully the extra page will get fixed with the #1319

stephenwilson11 · 2020-03-31T13:51:48Z

@j82w I have re-run the test with the new release and can confirm this issue is resolved with each query now costing ~10RU (due to double page issue #1319).

Do I close the issue or leave that to you?

bartelink · 2020-04-01T09:29:07Z

@stephenwilson11 What were the final RU costs - were they equivalent to V2 (i.e. do you now see 10 documents in one response with <= the sum of the original individual costs)?
(If not, can you explain why what you observed is correct / acceptable from your perspective?)

stephenwilson11 · 2020-04-01T09:38:48Z

@bartelink

Cost per page fetch:
V2 (2.10.1): ~5 RU
V3 (3.6.0): ~600 RU
V3 (3.7.0): ~10 RU

This bug addresses the massive spike in RU's seen in the 3.6.0 release but does not address the double page fetching bug #1319 resulting in the RU cost of ~10 currently seen.

Once that bug is also addressed and released I expect the RU cost to match V2 at ~5.

bartelink · 2020-04-01T09:54:11Z

Thanks for the quick and detailed response; that makes sense

Background: I'm seeing a prototypical query (17k docs, all tiny, total meaningful payload approx .6MB, total partition size ~2MB) showing the following costs atm:

V2: paged in 1000s, 18 responses RC 973.41
(I need to do messy cherry-picking to run a V2 test without paging but its safe to assume, it's going to be less)
V3.7.1-preview: paged in 100000s, 1 response RC 1001.02
V3.7.1-preview: paged in 1000s, 18 responses RC 1055.03
V4-preview3: paged in 100000s, 1 response RC 1001.02
V4-preview3: paged in 1000s, 18 responses RC 1055.03

I'll pitch in on validating the fix for #1319 when the time comes, in the hope that resolves the issue (which is an adoption blocker from our perspective)...

j82w added needs-investigation QUERY labels Mar 17, 2020

j82w mentioned this issue Mar 30, 2020

Query: Add logic to execute some queries as Passthrough when possible #1319

Merged

j82w added bug Something isn't working and removed needs-investigation labels Mar 31, 2020

j82w closed this as completed Mar 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using continuation token with new QueryIterator results in high RU & latency. #1282

Using continuation token with new QueryIterator results in high RU & latency. #1282

stephenwilson11 commented Mar 17, 2020

j82w commented Mar 17, 2020

stephenwilson11 commented Mar 17, 2020 •

edited

Loading

j82w commented Mar 17, 2020

stephenwilson11 commented Mar 17, 2020

j82w commented Mar 17, 2020

stephenwilson11 commented Mar 17, 2020 •

edited

Loading

j82w commented Mar 18, 2020

stephenwilson11 commented Mar 18, 2020

j82w commented Mar 20, 2020

stephenwilson11 commented Mar 20, 2020

j82w commented Mar 20, 2020

stephenwilson11 commented Mar 20, 2020

j82w commented Mar 31, 2020

stephenwilson11 commented Mar 31, 2020

bartelink commented Apr 1, 2020 •

edited

Loading

stephenwilson11 commented Apr 1, 2020

bartelink commented Apr 1, 2020

Using continuation token with new QueryIterator results in high RU & latency. #1282

Using continuation token with new QueryIterator results in high RU & latency. #1282

Comments

stephenwilson11 commented Mar 17, 2020

j82w commented Mar 17, 2020

stephenwilson11 commented Mar 17, 2020 • edited Loading

j82w commented Mar 17, 2020

stephenwilson11 commented Mar 17, 2020

j82w commented Mar 17, 2020

stephenwilson11 commented Mar 17, 2020 • edited Loading

j82w commented Mar 18, 2020

stephenwilson11 commented Mar 18, 2020

j82w commented Mar 20, 2020

stephenwilson11 commented Mar 20, 2020

j82w commented Mar 20, 2020

stephenwilson11 commented Mar 20, 2020

j82w commented Mar 31, 2020

stephenwilson11 commented Mar 31, 2020

bartelink commented Apr 1, 2020 • edited Loading

stephenwilson11 commented Apr 1, 2020

bartelink commented Apr 1, 2020

stephenwilson11 commented Mar 17, 2020 •

edited

Loading

stephenwilson11 commented Mar 17, 2020 •

edited

Loading

bartelink commented Apr 1, 2020 •

edited

Loading