Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FeedIterator unexpectedly pre-fetches next page #990

Closed
justinbhopper opened this issue Nov 13, 2019 · 22 comments
Closed

FeedIterator unexpectedly pre-fetches next page #990

justinbhopper opened this issue Nov 13, 2019 · 22 comments
Assignees
Labels

Comments

@justinbhopper
Copy link

Describe the bug
The FeedIterator<T> fetches a page of results before it is requested. A single call of ReadNextAsync() causes two POSTs to the cosmos API, rather one.

To Reproduce
Here is a sample application that illustrates the problem. Note ReadNextAsync() is only called once, but the console logs two API calls.

Expected behavior
FeedIterator should only fetch pages upon request.

Actual behavior
FeedIterator seems to be calling for a second page of results on its own. If this is by design, then I think it is flawed, as it makes a huge assumption that we care to get more than the number of items requested.

Environment summary
SDK Version: 3.4.1
OS Version: Windows 10

Additional context
Here is a sample of the console's output. Note that two POSTs occur, despite only having called ReadNextAsync() once. Also note that the contents of the second POST show the next set of results (items 5-9).

POST dbs/FeedTest/colls/Test
{"_rid":"dIVbALcqJUk=","Documents":[
  {"r":{"id":"3cc75122-20d1-4d14-ba41-c1f328d2e9f3","Index":0,"Partition":"test","_rid":"dIVbALcqJUkBAAAAAAAAAA==","_self":"dbs\/dIVbAA==\/colls\/dIVbALcqJUk=\/docs\/dIVbALcqJUkBAAAAAAAAAA==\/","_etag":"\"00000000-0000-0000-99da-8ff2d1b101d5\"","_attachments":"attachments\/","_ts":1573619207}},
  {"r":{"id":"015aa151-203a-4b85-9c7b-bd99ade327e6","Index":1,"Partition":"test","_rid":"dIVbALcqJUkCAAAAAAAAAA==","_self":"dbs\/dIVbAA==\/colls\/dIVbALcqJUk=\/docs\/dIVbALcqJUkCAAAAAAAAAA==\/","_etag":"\"00000000-0000-0000-99da-8ff4ecf101d5\"","_attachments":"attachments\/","_ts":1573619207}},
  {"r":{"id":"b74259ec-ce80-4881-931c-39355e95e23a","Index":2,"Partition":"test","_rid":"dIVbALcqJUkDAAAAAAAAAA==","_self":"dbs\/dIVbAA==\/colls\/dIVbALcqJUk=\/docs\/dIVbALcqJUkDAAAAAAAAAA==\/","_etag":"\"00000000-0000-0000-99da-8ff6345a01d5\"","_attachments":"attachments\/","_ts":1573619207}},
  {"r":{"id":"b2d53634-923c-4a54-a9f2-09ff3a4e1c97","Index":3,"Partition":"test","_rid":"dIVbALcqJUkEAAAAAAAAAA==","_self":"dbs\/dIVbAA==\/colls\/dIVbALcqJUk=\/docs\/dIVbALcqJUkEAAAAAAAAAA==\/","_etag":"\"00000000-0000-0000-99da-8ff699df01d5\"","_attachments":"attachments\/","_ts":1573619207}},
  {"r":{"id":"9555e4d7-e75f-47e2-83d3-84c1b85757bf","Index":4,"Partition":"test","_rid":"dIVbALcqJUkFAAAAAAAAAA==","_self":"dbs\/dIVbAA==\/colls\/dIVbALcqJUk=\/docs\/dIVbALcqJUkFAAAAAAAAAA==\/","_etag":"\"00000000-0000-0000-99da-8ff6dd4901d5\"","_attachments":"attachments\/","_ts":1573619207}}
],"_count":5}

POST dbs/FeedTest/colls/Test
{"_rid":"dIVbALcqJUk=","Documents":[
  {"r":{"id":"1b51b134-ef3c-4f61-b42b-486391bc2e7e","Index":5,"Partition":"test","_rid":"dIVbALcqJUkGAAAAAAAAAA==","_self":"dbs\/dIVbAA==\/colls\/dIVbALcqJUk=\/docs\/dIVbALcqJUkGAAAAAAAAAA==\/","_etag":"\"00000000-0000-0000-99da-8ff7320701d5\"","_attachments":"attachments\/","_ts":1573619207}},
  {"r":{"id":"a4d7a76d-683d-46ec-94c1-c820a3572fdf","Index":6,"Partition":"test","_rid":"dIVbALcqJUkHAAAAAAAAAA==","_self":"dbs\/dIVbAA==\/colls\/dIVbALcqJUk=\/docs\/dIVbALcqJUkHAAAAAAAAAA==\/","_etag":"\"00000000-0000-0000-99da-8ff77f7201d5\"","_attachments":"attachments\/","_ts":1573619207}},
  {"r":{"id":"189306b0-aae2-4b4f-9ff8-e0b566943fe4","Index":7,"Partition":"test","_rid":"dIVbALcqJUkIAAAAAAAAAA==","_self":"dbs\/dIVbAA==\/colls\/dIVbALcqJUk=\/docs\/dIVbALcqJUkIAAAAAAAAAA==\/","_etag":"\"00000000-0000-0000-99da-8ff7cf4e01d5\"","_attachments":"attachments\/","_ts":1573619207}},
  {"r":{"id":"348cf465-de43-420e-9c30-11116dc4a755","Index":8,"Partition":"test","_rid":"dIVbALcqJUkJAAAAAAAAAA==","_self":"dbs\/dIVbAA==\/colls\/dIVbALcqJUk=\/docs\/dIVbALcqJUkJAAAAAAAAAA==\/","_etag":"\"00000000-0000-0000-99da-8ff81b2101d5\"","_attachments":"attachments\/","_ts":1573619207}},
  {"r":{"id":"297d734e-ba44-409b-8c0c-95c44aa13983","Index":9,"Partition":"test","_rid":"dIVbALcqJUkKAAAAAAAAAA==","_self":"dbs\/dIVbAA==\/colls\/dIVbALcqJUk=\/docs\/dIVbALcqJUkKAAAAAAAAAA==\/","_etag":"\"00000000-0000-0000-99da-8ff8684101d5\"","_attachments":"attachments\/","_ts":1573619207}}
],"_count":5}
@j82w
Copy link
Contributor

j82w commented Nov 13, 2019

This is an optimization in the query code. The query code will buffer pages in parallel in the background to reduce the latency to the caller. You can set the MaxBufferedItemCount to 0 and the MaxConcurrency to 0 in the QueryRequestOptions to prevent this.

@j82w j82w closed this as completed Nov 13, 2019
@bartelink
Copy link
Contributor

bartelink commented Nov 13, 2019

I am observing a not dissimilar change in behavior in the 2.2 sdk in recent times...

I have 4 items and ask for max 2 items but get a response with 4 items (versus expecting to only get them when I ask for next page of results)

Is there a recent change in server behavior, or (as I assume is the case), is this absolutely 100% a client sdk thing?

UPDATE: Confirming that with 2.0 and 2.2 I am seeing a failing test that's not failed before in the test suite in https://github.com/jet/equinox
Assertion condition failing:

 [ResponseBackward; ResponseBackward; QueryBackward; Append] = [ResponseBackward; QueryBackward; Append]

^ this shows I get one response instead of two

I cannot 100% confirm that this (presumably server behavior) server change started affecting V2 clients at the same time, but I think I went and sanity checked at the time.

Bottom line is that I don't have a way with either the v2 or v3 SDKs to get behavior I'm seeking (have not analyzed the request flow but it seems pretty clear to me that it's not likely to be that sort of a problem in my case, whereas the OP evidently is)

@justinbhopper
Copy link
Author

@j82w I have set MaxBufferedItemCount to 0 and MaxConcurrency to 0, but the output remains the same. Please see my updated sample application to confirm I did this correctly.

Can we please reopen this issue?

@bartelink
Copy link
Contributor

bartelink commented Nov 13, 2019

Note I have also been observing this (slightly different) behavior on the V3 SDK for some time (and, see above am also seeing evidence of the same behavior change against the V2 SDK). (NB not saying the conditions are the same, but definitely interested in resolving that issue too; who knows it might clarify matters wrt this too)

@j82w j82w reopened this Nov 13, 2019
@j82w
Copy link
Contributor

j82w commented Nov 13, 2019

@bchong95 is this by design?

@j82w
Copy link
Contributor

j82w commented Nov 13, 2019

This seems to be a bug in the way query is buffering the pages. Assigned to @bchong95 for the fix.

@j82w
Copy link
Contributor

j82w commented Nov 13, 2019

The bug is in the CosmosParallelItemQueryExecutionContext. The last iteration on the for loop causes it to move next which causes it to load another page. There is some other higher priority bugs being worked on currently so there is no eta when this will get fixed.

@justinbhopper
Copy link
Author

Not to question your team's priorities, but I want to note that this is causing all of our queries to double in RU cost because every query we make is requesting twice as many documents as was asked for.

This may very well explain many of the reports in other issues here where SDK v3 has worse performance than v2.

@j82w
Copy link
Contributor

j82w commented Nov 13, 2019

This bug is a priority to get fixed. There is currently another bug where query is completely broken and is blocking a customer so it is taking priority. This bug where an extra page load is bad, but it doesn't impact any user draining the entire query. This should get prioritized next. If you want the fix faster please feel free to submit a PR.

@justinbhopper
Copy link
Author

I apologize for my wording - I was trying to add impact info without seeming to suggest that this bug's priority be increased. The wording I chose didn't achieve that. Of course performance issues would not be prioritized higher than functionality issues.

@j82w
Copy link
Contributor

j82w commented Nov 13, 2019

No need to apologize. I should have been more clear that we had functional bug that is taking priority, and that this will get prioritized next. I'm just trying to be as transparent as possible with the reasoning behind the prioritization of fixes. We really appreciate the detailed bug report with the sample. We'll try to get the issue fixed as soon as we can.

@bchong95
Copy link
Contributor

Resolved with #999

@j82w j82w closed this as completed Nov 20, 2019
@justinbhopper
Copy link
Author

@j82w Can we please reopen this ticket? #999 fixes the issue unless queries contain an ORDER BY.

If the query contains an ORDER BY, the prefetching occurs even if MaxBufferedItemsCount and MaxConcurrency are set to 0.

Updated my example to illustrate this:
https://github.com/justinbhopper/cosmos-feed-issue/blob/master/Program.cs#L44

@j82w
Copy link
Contributor

j82w commented Dec 9, 2019

This is by design and required for order by queries to give the correct results. Order by queries do the the following in the SDK. The SDK has to go to every partition to get an ordered list, then do a k-way merge on the results, then it takes the top N from that merged list and returns it as the result. Each partition only knows about content in it's own partition so there is no way to know which partition has the Nth item in the container for the order by.

@stephenwilson11
Copy link

stephenwilson11 commented Feb 26, 2020

@j82w, Sorry I do not follow your reason for why this is required.

What is the second request resolving? From the above and inspecting the responses & continuation token returned from the sdk it appears to be done to resolve the 'rid' and order by values of the next item for use in the continuation token?

If so could this not be done more efficiently by the sdk/cosmos requesting 'MaxItemCount' + 1 so the required info can be provided in the single request. Nothing else from the second request appears to be used??

Sorry if I simply do not get the issue here, I’m pretty new to using Cosmos but it does appear to be pretty wasteful & costly as is.

@vit-svoboda
Copy link

Why does the v3 SKD insist on treating all queries as cross-partition ones even though the partition key is provided? From what I understand, cross-partition queries are to be avoided for performance reasons.

@j82w
Copy link
Contributor

j82w commented Mar 20, 2020

@bchong95 can you explain?

@j82w j82w reopened this Mar 20, 2020
@bartelink
Copy link
Contributor

bartelink commented Mar 20, 2020

(For the record I'm also sitting on a fully ported branch which is doing pretty innocuous stuff but for unexplained reasons we're seeing a ~10% additional RU cost and lack of anticipated perf benefits - i.e. the stuff I previously outlined on this thread appears to be panning out as being indicative of over-reading beyond just what one might expect solely from a low MaxItems count not being honored. I'll supply a more detailed analysis in due course)

eta: 3.7.0-preview2 does not fix the issue, and neither does the custom build in the other issue

@bchong95
Copy link
Contributor

for ORDER BY queries you can now set a partition key to avoid this behavior:

#1319

@bartelink
Copy link
Contributor

eta: 3.7.0-preview2 does not fix the issue, and neither does the custom build in the other issue

🎉 3.9.0-preview fixes my problem - reducing RU consumption back to levels as per V2 (using preview-3 now which is also healthy).

Any rough eta on when we'll see a 4.x preview with this fix included?

@j82w
Copy link
Contributor

j82w commented May 18, 2020

@ealsur is there any estimate on when this fix will get ported to v4?

@ealsur
Copy link
Member

ealsur commented May 18, 2020

@j82w No, V4 focus at this point is public surface for review board.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants