Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JsonElement.ValueSpan #77666

Closed
Tracked by #77020
krwq opened this issue Oct 31, 2022 · 26 comments · Fixed by #104595
Closed
Tracked by #77020

Add JsonElement.ValueSpan #77666

krwq opened this issue Oct 31, 2022 · 26 comments · Fixed by #104595
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json in-pr There is an active PR which will close this issue when it is merged partner-impact This issue impacts a partner who needs to be kept updated
Milestone

Comments

@krwq
Copy link
Member

krwq commented Oct 31, 2022

Updated API Proposal

Following conversation with @annelo-msft we identified the key use case is offering a non-allocating variant of the JsonElement.GetRawText method. As such we should probably go ahead with the JsonMarshal design:

namespace System.Runtime.InteropServices;

public partial class JsonMarshal
{
    public static bool TryGetRawValue(JsonElement element, out ReadOnlySpan<byte> rawValue);
}

Furthermore:

  1. The raw value should always be a valid JSON value, therefore strings/properties must always include quotes.
  2. Composite values preserve whitespace and comments contained in the original JSON document.
@krwq krwq added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json labels Oct 31, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 31, 2022
@ghost
Copy link

ghost commented Oct 31, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

Currently Utf8JsonReader exposes ValueSpan. It would be useful to have the same member on JsonElement.

cc: @eiriktsarpalis @KrzysztofCwalina

Author: krwq
Assignees: -
Labels:

api-suggestion, area-System.Text.Json

Milestone: -

@krwq krwq added the partner-impact This issue impacts a partner who needs to be kept updated label Oct 31, 2022
@layomia layomia added this to the 8.0.0 milestone Nov 1, 2022
@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Nov 1, 2022
@mwadams
Copy link
Contributor

mwadams commented Nov 10, 2022

While I like this suggestion in principle, there has been much discussion elsewhere about the level of abstraction expected from JsonElement, and whether this should be providing an encoded or decoded span, and (optionally) efficient conversion to ReadOnlySpan<char>.

There is a proposal in #74028 which addresses these issues for the JsonValueKind.String case (which is, in general, the only time you actually need this API).

@eiriktsarpalis
Copy link
Member

One concern I have with this proposal is that similar to Utf8JsonReader.ValueSpan this would not be accounted for escaped JSON strings (per section 7 of RFC 8259). Users would need to account for this and implement their own unescaping routines. Bare minimum, this should be accompanied with a JsonElement.ValueIsEscaped flag and possibly a CopyString() method that copies the unescaped string to a buffer provided by the caller.

@eiriktsarpalis
Copy link
Member

FWIW this was considered in the past and rejected, cf. #30207 (comment) cc @bartonjs

@bartonjs
Copy link
Member

A CopyString method, either escaping or not (maybe CopyString (unescaping) and CopyRawString (only transcoding, when needed)?), would be OK. The Span property is inappropriate, because the lifetime isn't obvious.

ReadOnlySpan<byte> propValRaw = element.ValueSpan;
DoSomeStuff();
DoMoreStuff(propValRaw);

If DoSomeStuff disposes the JsonDocument that produced the element, then that span is probably pointing to the middle of a buffer that was returned to the array pool. If that array got re-rented, it's now garbage data. So, as a property this fails the guideline

AVOID returning a Span<T> or ReadOnlySpan<T> unless the lifetime of the returned Span is very clear.

The "I'll copy it to your buffer for you" method is more clear, and someone who needs the hyper-performance of inspecting it just has to drop to using the reader directly.

@bartonjs
Copy link
Member

Also, for what it's worth, I still think that anything that removes the possibility of UTF-8-or-UTF-16(-or-other) from JsonDocument/JsonElement is bad... which also favors the copy method over the property.

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Nov 29, 2022

If DoSomeStuff disposes the JsonDocument that produced the element, then that span is probably pointing to the middle of a buffer that was returned to the array pool. If that array got re-rented, it's now garbage data. So, as a property this fails the guideline

Is this by virtue of the fact that JsonDocument is IDisposable? Because one could argue that a similar pattern is possible with Utf8JsonReader.ValueSpan if users return the buffer before the span gets consumed.

cc @KrzysztofCwalina who had mentioned in the past that a copying approach would not meet their requirements.

@bartonjs
Copy link
Member

Utf8JsonReader.ValueSpan produces slices of a span (or sequence) you gave it. It's your buffer to corrupt.

JsonElement's theoretical ValueSpan is over a buffer that is either from the array pool, or gcnew, but you can't know at the call site.

The easy rule on spans is: Don't return a span that you weren't given. The relevant span is a detail from JsonDocument, not a thing JsonDocument was given, so it shouldn't be returned. The closest analogue here is List<T>, but we decided it can't return the span either... without going through a weird side path (CollectionsMarshal). List's isn't as bad as here because the array pool isn't involved.

Maybe there's an extremely convincing argument out there somewhere, but from a general framework approach (this point being about security/reliability) the property approach is a no-go. I don't see a path for JsonElement here other than a copy method; and for the scenario the answer is to use Utf8JsonReader.

@KrzysztofCwalina
Copy link
Member

Could we expose this as TryGetValueSpan that would succeed only if the data is in the original buffer passed to Parse?

@bartonjs
Copy link
Member

A TryGetValueSpan that only works for JsonDocument.Parse(ReadOnlyMemory<byte>, JsonDocumentOptions) does alleviate my ownership concerns. I still think it's important for this type to not box itself out for supporting both UTF-8 and UTF-16 (note we didn't call it Utf8JsonDocument/Utf8JsonElement), so ideally we'd come up with a name that could later be extended to a UTF-16-based document. public bool TryGetValueSpanUtf8(out ReadOnlySpan<byte> valueSpan) being a not-very-aesthetically pleasing strawman.

@bartonjs
Copy link
Member

Although.... since it's an out, not a return, they could just use normal overloading... so maybe TryGetValueSpan is a fine name.

@eiriktsarpalis
Copy link
Member

Not sure I like having a try method whose success also depends on how the object got constructed -- it is an intentional property encapsulated by the object and many components might not be able to control it. If we do end up doing it like that it should have a name that discourages general-purpose usage.

@KrzysztofCwalina
Copy link
Member

KrzysztofCwalina commented Nov 30, 2022

Stream.Seek (and many other APIs in .NET) success depends on how the stream got created.

@eiriktsarpalis
Copy link
Member

I recently had to implement this internally for #103733. Here's what I ended up with:

namespace System.Text.Json;

public partial struct JsonElement
{
+    public bool ValueIsEscaped { get; }
+    public ReadOnlySpan<byte> ValueSpan { get; }
}

public partial struct JsonProperty
{
     public string Name { get; }

+    public bool NameIsEscaped { get; }
+    public ReadOnlySpan<byte> NameSpan { get; }
}

While it doesn't address unescaping directly, this follows the API as exposed by Utf8JsonReader and unblocks a number of common scenaria.

@eiriktsarpalis eiriktsarpalis added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Jun 20, 2024
@eiriktsarpalis eiriktsarpalis modified the milestones: Future, 9.0.0 Jun 25, 2024
@eiriktsarpalis eiriktsarpalis added the blocking Marks issues that we want to fast track in order to unblock other important work label Jun 25, 2024
@bartonjs
Copy link
Member

bartonjs commented Jun 25, 2024

Video

  • JsonDocument/JsonElement don't expose their underlying encoding, so we shouldn't add a property that only works assuming UTF-8.
  • Exposing the value as a property is also problematic as it's difficult to relate the value safety of the property against the arraypool vs non-arraypool backing of JsonDocument

After a lot of discussion that invented a JsonMarshal type, or adding new overloads of NameEquals/ValueEquals, we ended up deciding to not decide anything right now; we need a better understanding of the driving concerns.

namespace System.Text.Json
{
    public partial struct JsonElement
    {
        public bool ValueEquals(JsonElement stringElement);
    }

    public partial struct JsonProperty
    {
        public bool NameEquals(JsonProperty other);
    }
}

namespace System.Runtime.InteropServices
{
    // Very raw thoughts here, take with very large grains of salt.
    public partial class JsonMarshal
    {
        public static bool TryGetRawValue(JsonElement element, out ReadOnlySpan<byte> rawValue);

        public static bool TryGetStringValueSpan(JsonElement element, out ReadOnlySpan<byte> utf8ValueSpan);
        public static bool TryGetStringValueSpan(JsonElement element, out ReadOnlySpan<char> valueSpan);

        public static bool TryGetNameSpan(JsonProperty property, out ReadOnlySpan<byte> utf8ValueSpan);
        public static bool TryGetNameSpan(JsonProperty property, out ReadOnlySpan<char> valueSpan);
    }
}

@bartonjs bartonjs added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 25, 2024
@eiriktsarpalis
Copy link
Member

@KrzysztofCwalina @annelo-msft one question that came up while reviewing this is whether there is a need to extract the raw span in cases where the JsonElement isn't a number or string.

@bartonjs
Copy link
Member

I'm either amused, or dismayed, that during the discussion we ended up saying almost word for word everything previously mentioned in this issue. (Including me starting with "don't return the span" and then later remembering "oh, right, and the theoretical UTF-16 future expansion")

@bartonjs
Copy link
Member

Having walked through everything again, I'm happy with public static bool JsonMarshal::TryGetRawValue(JsonElement, out ReadOnlySpan<byte>) that returns true if (and only if) the JsonDocument was created by JsonDocument.Parse(ReadOnlyMemory<byte>, JsonDocumentOptions) (or maybe also the one that takes a ref Utf8JsonReader).

@eiriktsarpalis
Copy link
Member

that returns true if (and only if) the JsonDocument was created by JsonDocument.Parse(ReadOnlyMemory, JsonDocumentOptions) (or maybe also the one that takes a ref Utf8JsonReader).

If it only returns true in safe contexts, why hide it away in a marshal class? I think this would both be a source of usability problems and also not unblock valid uses cases that consume disposable documents.

@bartonjs
Copy link
Member

Fair, it could also live on JsonElement as the Try. I guess JsonMarshal was just in my head.

@KrzysztofCwalina
Copy link
Member

@eiriktsarpalis, yeah, we often need the whole raw JSON of a complex object or array of values

@eiriktsarpalis
Copy link
Member

Is escaped JSON, comments or erratic indentation something that would be concerning in that case?

@eiriktsarpalis
Copy link
Member

And to follow up on that question, returning the raw JSON would typically involve including quotes in string values. Is that desirable?

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Jul 8, 2024

Following conversation with @annelo-msft we identified the key use case is offering a non-allocating variant of the JsonElement.GetRawText method. As such we should probably go ahead with the JsonMarshal design:

namespace System.Runtime.InteropServices;

public partial class JsonMarshal
{
    public static bool TryGetRawValue(JsonElement element, out ReadOnlySpan<byte> rawValue);
}

Furthermore:

  1. The raw value should always be a valid JSON value, therefore strings/properties must always include quotes.
  2. Composite values preserve whitespace and comments contained in the original JSON document.

@eiriktsarpalis eiriktsarpalis added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-needs-work API needs work before it is approved, it is NOT ready for implementation labels Jul 8, 2024
@eiriktsarpalis eiriktsarpalis self-assigned this Jul 8, 2024
@bartonjs
Copy link
Member

bartonjs commented Jul 8, 2024

@eiriktsarpalis Please either put the proposal in the top post, or, at minimum, update the top post link to point to the item. Especially in "crunch time" full agenda meetings, we will start skipping anything (pushing it to the bottom of the day agenda) that is not in one of those two states.

@dotnet-policy-service dotnet-policy-service bot added in-pr There is an active PR which will close this issue when it is merged labels Jul 9, 2024
@bartonjs
Copy link
Member

bartonjs commented Jul 9, 2024

Video

  • We changed from a Try to a non-Try, and as a consequence put "Utf8" in the method name.
namespace System.Runtime.InteropServices;

public partial class JsonMarshal
{
    public static ReadOnlySpan<byte> GetRawUtf8Value(JsonElement element);
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 9, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Aug 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json in-pr There is an active PR which will close this issue when it is merged partner-impact This issue impacts a partner who needs to be kept updated
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants