-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FeedIterator unexpectedly pre-fetches next page #990
Comments
This is an optimization in the query code. The query code will buffer pages in parallel in the background to reduce the latency to the caller. You can set the MaxBufferedItemCount to 0 and the MaxConcurrency to 0 in the QueryRequestOptions to prevent this. |
I am observing a not dissimilar change in behavior in the 2.2 sdk in recent times... I have 4 items and ask for max 2 items but get a response with 4 items (versus expecting to only get them when I ask for next page of results) Is there a recent change in server behavior, or (as I assume is the case), is this absolutely 100% a client sdk thing? UPDATE: Confirming that with 2.0 and 2.2 I am seeing a failing test that's not failed before in the test suite in https://github.com/jet/equinox
^ this shows I get one response instead of two I cannot 100% confirm that this (presumably server behavior) server change started affecting V2 clients at the same time, but I think I went and sanity checked at the time. Bottom line is that I don't have a way with either the v2 or v3 SDKs to get behavior I'm seeking (have not analyzed the request flow but it seems pretty clear to me that it's not likely to be that sort of a problem in my case, whereas the OP evidently is) |
@j82w I have set MaxBufferedItemCount to 0 and MaxConcurrency to 0, but the output remains the same. Please see my updated sample application to confirm I did this correctly. Can we please reopen this issue? |
Note I have also been observing this (slightly different) behavior on the V3 SDK for some time (and, see above am also seeing evidence of the same behavior change against the V2 SDK). (NB not saying the conditions are the same, but definitely interested in resolving that issue too; who knows it might clarify matters wrt this too) |
@bchong95 is this by design? |
This seems to be a bug in the way query is buffering the pages. Assigned to @bchong95 for the fix. |
The bug is in the CosmosParallelItemQueryExecutionContext. The last iteration on the for loop causes it to move next which causes it to load another page. There is some other higher priority bugs being worked on currently so there is no eta when this will get fixed. |
Not to question your team's priorities, but I want to note that this is causing all of our queries to double in RU cost because every query we make is requesting twice as many documents as was asked for. This may very well explain many of the reports in other issues here where SDK v3 has worse performance than v2. |
This bug is a priority to get fixed. There is currently another bug where query is completely broken and is blocking a customer so it is taking priority. This bug where an extra page load is bad, but it doesn't impact any user draining the entire query. This should get prioritized next. If you want the fix faster please feel free to submit a PR. |
I apologize for my wording - I was trying to add impact info without seeming to suggest that this bug's priority be increased. The wording I chose didn't achieve that. Of course performance issues would not be prioritized higher than functionality issues. |
No need to apologize. I should have been more clear that we had functional bug that is taking priority, and that this will get prioritized next. I'm just trying to be as transparent as possible with the reasoning behind the prioritization of fixes. We really appreciate the detailed bug report with the sample. We'll try to get the issue fixed as soon as we can. |
Resolved with #999 |
@j82w Can we please reopen this ticket? #999 fixes the issue unless queries contain an ORDER BY. If the query contains an ORDER BY, the prefetching occurs even if MaxBufferedItemsCount and MaxConcurrency are set to 0. Updated my example to illustrate this: |
This is by design and required for order by queries to give the correct results. Order by queries do the the following in the SDK. The SDK has to go to every partition to get an ordered list, then do a k-way merge on the results, then it takes the top N from that merged list and returns it as the result. Each partition only knows about content in it's own partition so there is no way to know which partition has the Nth item in the container for the order by. |
@j82w, Sorry I do not follow your reason for why this is required. What is the second request resolving? From the above and inspecting the responses & continuation token returned from the sdk it appears to be done to resolve the 'rid' and order by values of the next item for use in the continuation token? If so could this not be done more efficiently by the sdk/cosmos requesting 'MaxItemCount' + 1 so the required info can be provided in the single request. Nothing else from the second request appears to be used?? Sorry if I simply do not get the issue here, I’m pretty new to using Cosmos but it does appear to be pretty wasteful & costly as is. |
Why does the v3 SKD insist on treating all queries as cross-partition ones even though the partition key is provided? From what I understand, cross-partition queries are to be avoided for performance reasons. |
@bchong95 can you explain? |
(For the record I'm also sitting on a fully ported branch which is doing pretty innocuous stuff but for unexplained reasons we're seeing a ~10% additional RU cost and lack of anticipated perf benefits - i.e. the stuff I previously outlined on this thread appears to be panning out as being indicative of over-reading beyond just what one might expect solely from a low MaxItems count not being honored. I'll supply a more detailed analysis in due course) eta: |
for ORDER BY queries you can now set a partition key to avoid this behavior: |
🎉 Any rough eta on when we'll see a 4.x preview with this fix included? |
@ealsur is there any estimate on when this fix will get ported to v4? |
@j82w No, V4 focus at this point is public surface for review board. |
Describe the bug
The
FeedIterator<T>
fetches a page of results before it is requested. A single call ofReadNextAsync()
causes two POSTs to the cosmos API, rather one.To Reproduce
Here is a sample application that illustrates the problem. Note
ReadNextAsync()
is only called once, but the console logs two API calls.Expected behavior
FeedIterator should only fetch pages upon request.
Actual behavior
FeedIterator seems to be calling for a second page of results on its own. If this is by design, then I think it is flawed, as it makes a huge assumption that we care to get more than the number of items requested.
Environment summary
SDK Version: 3.4.1
OS Version: Windows 10
Additional context
Here is a sample of the console's output. Note that two POSTs occur, despite only having called
ReadNextAsync()
once. Also note that the contents of the second POST show the next set of results (items 5-9).The text was updated successfully, but these errors were encountered: