Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVM integration with InputStream and OutputStream #1569

Merged
merged 7 commits into from
Sep 3, 2021

Conversation

sandwwraith
Copy link
Member

No description provided.

@sandwwraith sandwwraith requested review from qwwdfsad and shanshin June 24, 2021 14:33
@sandwwraith sandwwraith mentioned this pull request Jun 28, 2021
Copy link
Collaborator

@qwwdfsad qwwdfsad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't got a deep look into decoding yet, but it seems that there is already enough actionable points here


internal class JsonToWriterStringBuilder(private val writer: Writer) : JsonStringBuilder(
// maybe this can also be taken from the pool, but currently initial char array size there is 128, which is too low.
CharArray(BATCH_SIZE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be pretty huge allocation (e.g. it always misses TLAB). Could you please ensure it doesn't dominate small objects serialization?

If so, it's worth either reducing its size or pool a few instances of this

@chris-hatton
Copy link

chris-hatton commented Jul 1, 2021

Question regarding the behaviour of this change:
Will decodeFromStream behave so that I can call it serially (repeatedly, non-overlapping) against an open stream, each time consuming only as much of the stream as is necessary to form a complete object?

For example; given this stream carrying two distinct JSON objects:

{"someKey":"someValue1"}{"someKey":"someValue2"}

Could I call decodeFromStream() twice, to get both objects?
This is the characteristic I am looking for, to be able to read a Flow<T> from a long-lived HTTP response stream.
Thanks for your efforts @sandwwraith 🙏 This is a hotly awaited improvement.

@BenWoodworth
Copy link
Contributor

I'm curious, was using Reader/Writers considered (instead of Streams)? That would've been my first thought for JSON/String formats, and avoids dealing with character encodings.

@sandwwraith
Copy link
Member Author

@BenWoodworth Reader/Writer is used internally. API provides methods with Input/OutputStreams because it's more versatile and allows to implement charset-specific parsers in the future

@chris-hatton Yes, I think we can do it — I'll add a test for it

@sandwwraith sandwwraith force-pushed the jvm-streams-integration branch from 8678e12 to 7807f6d Compare July 19, 2021 16:48
@qwwdfsad qwwdfsad self-requested a review July 29, 2021 16:57
Copy link
Collaborator

@qwwdfsad qwwdfsad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dug through the profile and it seems that JVM cannot optimize string access across CharSequence interface properly, especially in small hot methods. definitelyNotEof is also a heavy-hitter for such functions.

I've tried to tweak it here and there, but it's quite hard to ensure all the invariants with the existing limitations.

I'd suggest you do the following:

Get the base JsonLexer with the only state -- currentPosition and utility functions for slow-paths: skipElement, various fail functions, maybe boolean/numbers consumption (char sequence is an input parameter of a function then). Everything else is copy-pasted between streaming and string implementations.

At this moment, the performance model is quite clear and expected degradation should be insignificant (educated guess -- 2-4%).
Then you can start commonizing (handling via CharSequence interface in the base json lexer) the parts of parsing where the compiler is smart enough to optimize everything away.

I expect that the biggest offenders (things you cannot commonize) will be just a few functions that were written in a compact and polished manner -- skipWhitespaces, tryConsumeComma, canConsumeValue and peekNextToken. Everything else will probably be working well via CharSequence and the amount of duplicated code we have to maintain will be quite isolated

@sandwwraith sandwwraith force-pushed the jvm-streams-integration branch from 365ac9c to cea326e Compare August 23, 2021 15:48
@sandwwraith sandwwraith requested a review from qwwdfsad August 24, 2021 14:44
Copy link
Collaborator

@qwwdfsad qwwdfsad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go 🚀

Please don't forget to file issues for future improvements -- UTF-8 parsing and multishot streams

import java.io.*

/**
* Serializes the [value] with [serializer] into a [stream] using JSON format and UTF-8 encoding..
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant dot

return oldSize
}

private fun dumpAndReset(sz: Int = size) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] it's just flush :)

@sandwwraith sandwwraith force-pushed the jvm-streams-integration branch from ad9f0e5 to bd8c491 Compare September 3, 2021 13:33
@sandwwraith sandwwraith merged commit c0c60a6 into dev Sep 3, 2021
@sandwwraith
Copy link
Member Author

#1662

@sandwwraith sandwwraith deleted the jvm-streams-integration branch September 6, 2021 11:50
@slavonnet
Copy link

String. Serializer work ok on big JSon. Stream get random.eof exception I sinppe replace Strung Encoder t to Stream Encoder in converter factory . Small size json is ok. Buffer rewrite by gzip wtite all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants