Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new streaming APIs for the System.Net.Http.Json extensions #89258

Merged
merged 14 commits into from
Jul 24, 2023

Conversation

IEvangelist
Copy link
Member

Implements the proposed/approved API described in #87577.

  • Adds two new partial class impls:
    • HttpClientJsonExtensions.Get.AsyncEnumerable.cs
    • HttpContentJsonExtensions.AsyncEnumerable.cs
  • Adds unit tests
  • Adds APIs to ref

Fixes #87577, @eiriktsarpalis

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost ghost assigned IEvangelist Jul 20, 2023
@ghost
Copy link

ghost commented Jul 20, 2023

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

Implements the proposed/approved API described in #87577.

  • Adds two new partial class impls:
    • HttpClientJsonExtensions.Get.AsyncEnumerable.cs
    • HttpContentJsonExtensions.AsyncEnumerable.cs
  • Adds unit tests
  • Adds APIs to ref

Fixes #87577, @eiriktsarpalis

Author: IEvangelist
Assignees: -
Labels:

area-System.Text.Json, new-api-needs-documentation

Milestone: -

@ghost
Copy link

ghost commented Jul 20, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Implements the proposed/approved API described in #87577.

  • Adds two new partial class impls:
    • HttpClientJsonExtensions.Get.AsyncEnumerable.cs
    • HttpContentJsonExtensions.AsyncEnumerable.cs
  • Adds unit tests
  • Adds APIs to ref

Fixes #87577, @eiriktsarpalis

Author: IEvangelist
Assignees: IEvangelist
Labels:

area-System.Net.Http, area-System.Text.Json, new-api-needs-documentation

Milestone: -

…lientJsonExtensionsTests.cs

Co-authored-by: Eirik Tsarpalis <eirik.tsarpalis@gmail.com>
@MihaZupan
Copy link
Member

HttpClient has two main limits that apply to such requests: Timeout and MaxResponseHeadersLength.

With the existing GetFromJsonAsync helpers, Timeout applies to both the initial request and the content transfer.
This PR applies the Timeout only to the initial request, while the content streaming through the IAsyncEnumerable does not have a timeout.
I think this is in line with what the caller would expect from this API, so 👍.

But if the Timeout only applies to the initial request, you don't have to deal with the Timeout or CancellationTokenSource manually -- HttpClient will take care of that for you. This part of the logic can be simplified away.


The second limit is MaxResponseContentBufferSize.

With GetFromJsonAsync this limits how much data we will read and try to deserialize.
But the current state of this PR will enforce the limit only if the Content-Length header is present. For such an API that implies data will be generated over a longer period of time, this will not be the case for most requests.

How I would personally expect such an API to behave is either:
a) The limit is enforced on a per-element basis. That is it limits how much data was buffered in memory at any given point.
b) The limit is enforced on the entire transfer (all elements).
c) The limit does not apply at all.

Where I strongly prefer a) as it gives the user the option to set meaningful limits while still having the ability to stream lots of data over a longer period of time.
Implementation-wise, this can be as simple as wrapping the content stream in the existing LengthLimitReadStream helper, and resetting the count after every yield.

@IEvangelist IEvangelist requested a review from layomia July 21, 2023 18:23
Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
IEvangelist and others added 3 commits July 24, 2023 07:56
@eiriktsarpalis
Copy link
Member

a) The limit is enforced on a per-element basis. That is it limits how much data was buffered in memory at any given point.

I don't think such an enforcement is necessary. Because of how async deserialization works in STJ, deserializing a single element does not necessitate buffering all the relevant data in memory. I would strongly recommend going for option (b) since that would be consistent with what would happen if you attempted to deserialize the same data using a regular GetFromJson<T[]> method call.

@IEvangelist
Copy link
Member Author

a) The limit is enforced on a per-element basis. That is it limits how much data was buffered in memory at any given point.

I don't think such an enforcement is necessary. Because of how async deserialization works in STJ, deserializing a single element does not necessitate buffering all the relevant data in memory. I would strongly recommend going for option (b) since that would be consistent with what would happen if you attempted to deserialize the same data using a regular GetFromJson<T[]> method call.

Addressed in c99e708

@MihaZupan
Copy link
Member

deserializing a single element does not necessitate buffering all the relevant data in memory

But presumably, it does consume memory on the same order of magnitude to store the newly allocated object?

option (b) since that would be consistent with what would happen if you attempted to deserialize the same data using a regular GetFromJson<T[]> method call.

I might be misunderstanding the intended use of this API, but is it not meant exactly for cases where the server doesn't have all the data available immediately, and you want to consume elements as they are being produced? For example, I could connect to a server and stream state updates for the next day, potentially going over the limit? Just setting the limit really high is not the same, as I don't want any single update to OOM.

@eiriktsarpalis
Copy link
Member

cc @jeffhandley @ericstj for your consideration since this is adding new functionality for RC.

@eiriktsarpalis
Copy link
Member

deserializing a single element does not necessitate buffering all the relevant data in memory

But presumably, it does consume memory on the same order of magnitude to store the newly allocated object?

It does store a state machine containing partially hydrated objects, but that wouldn't be contributing to overheads that are substantially larger than the final deserialized result.

I might be misunderstanding the intended use of this API, but is it not meant exactly for cases where the server doesn't have all the data available immediately, and you want to consume elements as they are being produced?

That's correct, I meant to say that it should be consistent with the GetFromJson method insofar as what the effect of MaxResponseContentBufferSize should be.

@MihaZupan
Copy link
Member

@IEvangelist mind reverting c99e708 in that case and only removing the ResetCount part, not the LengthLimitReadStream entirely?

@IEvangelist
Copy link
Member Author

@IEvangelist mind reverting c99e708 in that case and only removing the ResetCount part, not the LengthLimitReadStream entirely?

So, the content stream will flow into the new LengthLimitReadStream and the resulting stream is then passed to the JsonSerializer, right? Are you good with that @eiriktsarpalis?

@eiriktsarpalis
Copy link
Member

Yes, the method should build the same stream in the same way as the existing core method:

static async Task<TValue?> Core(
HttpClient client,
Task<HttpResponseMessage> responseTask,
bool usingResponseHeadersRead,
CancellationTokenSource? linkedCTS,
Func<Stream, TJsonOptions, CancellationToken, ValueTask<TValue?>> deserializeMethod,
TJsonOptions jsonOptions,
CancellationToken cancellationToken)
{
try
{
using HttpResponseMessage response = await responseTask.ConfigureAwait(false);
response.EnsureSuccessStatusCode();
Debug.Assert(client.MaxResponseContentBufferSize is > 0 and <= int.MaxValue);
int contentLengthLimit = (int)client.MaxResponseContentBufferSize;
if (response.Content.Headers.ContentLength is long contentLength && contentLength > contentLengthLimit)
{
LengthLimitReadStream.ThrowExceededBufferLimit(contentLengthLimit);
}
try
{
using Stream contentStream = await HttpContentJsonExtensions.GetContentStreamAsync(response.Content, linkedCTS?.Token ?? cancellationToken).ConfigureAwait(false);
// If ResponseHeadersRead wasn't used, HttpClient will have already buffered the whole response upfront. No need to check the limit again.
Stream readStream = usingResponseHeadersRead
? new LengthLimitReadStream(contentStream, (int)client.MaxResponseContentBufferSize)
: contentStream;
return await deserializeMethod(readStream, jsonOptions, linkedCTS?.Token ?? cancellationToken).ConfigureAwait(false);

It might make sense to extract some of that logic in a common helper, if possible.

@ericstj
Copy link
Member

ericstj commented Jul 24, 2023

@jeffhandley @ericstj for your consideration since this is adding new functionality for RC.

Thanks for the heads up - we discussed this one and are OK with the additions given the low risk / need for feedback on this API.

@eiriktsarpalis eiriktsarpalis merged commit 08e3814 into dotnet:main Jul 24, 2023
103 checks passed
@IEvangelist IEvangelist deleted the streaming-json-http-apis branch July 24, 2023 16:52
@karelz karelz added this to the 8.0.0 milestone Aug 1, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Aug 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[API Proposal]: Streaming APIs for the System.Net.Http.Json extensions
7 participants