Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-4443] Pandaproxy: Avoid large allocations whilst serializing JSON #20827

Merged
merged 15 commits into from
Jul 8, 2024

Conversation

BenPope
Copy link
Member

@BenPope BenPope commented Jul 3, 2024

Replace use of ::json::StringStream with a custom stream based on iobuf to avoid large contiguous allocations.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

Improvements

  • HTTP Proxy: Avoid large allocations during JSON serde of requests and responses
  • Schema Registry: Avoid large allocations during JSON serde of requests and responses

BenPope added 3 commits July 3, 2024 17:55
Some structures serialize quite large, so in order to avoid
oversize allocs, it's necessary to use a buffer type that is not
based onn contiguous memmory.

Explicitly instantiate all overloads with StringBuffer.

Fix the test due to the way name lookup works.

Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
@BenPope BenPope added area/pandaproxy REST interface for Kafka API area/schema-registry Schema Registry service within Redpanda labels Jul 3, 2024
@BenPope BenPope requested a review from a team July 3, 2024 17:14
@BenPope BenPope self-assigned this Jul 3, 2024
@BenPope BenPope requested review from michael-redpanda and removed request for a team July 3, 2024 17:14
@BenPope BenPope requested review from andijcr and pgellert July 3, 2024 17:15
@BenPope BenPope force-pushed the schema_registry/scattered_buffer branch 2 times, most recently from 101edc0 to 9896a8c Compare July 4, 2024 10:57
Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏👏👏 very nice change - I left some questions/suggestions but in general it looks good

src/v/pandaproxy/rest/handlers.cc Show resolved Hide resolved
src/v/pandaproxy/schema_registry/test/client_utils.h Outdated Show resolved Hide resolved
src/v/json/chunked_input_stream.h Outdated Show resolved Hide resolved
src/v/json/chunked_input_stream.h Outdated Show resolved Hide resolved
src/v/pandaproxy/json/rjson_util.h Outdated Show resolved Hide resolved
src/v/pandaproxy/rest/handlers.cc Show resolved Hide resolved
src/v/pandaproxy/rest/handlers.cc Show resolved Hide resolved
src/v/container/include/container/json.h Show resolved Hide resolved
src/v/pandaproxy/json/rjson_util.h Outdated Show resolved Hide resolved
src/v/pandaproxy/schema_registry/handlers.cc Show resolved Hide resolved
BenPope added 6 commits July 4, 2024 14:26
Namespace changed due to lookup rules for templates.

Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
This reverts commit 313d208.

Some edits required to avoid temporaries..
andijcr
andijcr previously approved these changes Jul 5, 2024
Copy link
Contributor

@andijcr andijcr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@BenPope BenPope force-pushed the schema_registry/scattered_buffer branch from 9896a8c to 48c43c2 Compare July 5, 2024 12:25
@BenPope
Copy link
Member Author

BenPope commented Jul 5, 2024

Changes in [force-push]

  • Address review comments

@BenPope BenPope requested review from andijcr and pgellert July 5, 2024 12:26
pgellert
pgellert previously approved these changes Jul 5, 2024
andijcr
andijcr previously approved these changes Jul 5, 2024
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jul 5, 2024

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51148#01908333-f923-4e7a-8065-d8a34d3e8741:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51148#01908333-f8ad-4ce1-971b-7a15bd98bbac:
pandatriage cache was not found

BenPope added 4 commits July 8, 2024 13:17
Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
It shouldn't be used in the general case.

Signed-off-by: Ben Pope <ben@redpanda.com>
@BenPope BenPope dismissed stale reviews from andijcr and pgellert via ac7c7dd July 8, 2024 12:18
@BenPope BenPope force-pushed the schema_registry/scattered_buffer branch from 48c43c2 to ac7c7dd Compare July 8, 2024 12:18
@BenPope
Copy link
Member Author

BenPope commented Jul 8, 2024

Changes in [force-push]

  • Reimplement chunked_input_stream using iobuf_streambuf. This is about 10x faster than the previous implementation which allocated during peek. It's a little slower than string for small buffers, and a little quicker for large buffers.
test                         iterations      median         mad         min         max      allocs       tasks        inst
json_parse_test_0001.string      740716   690.921ns     3.708ns   683.878ns   694.629ns       5.000       0.000      3654.0
json_parse_test_1024.string        6449   135.488us   416.420ns   134.588us   136.066us    2061.000       0.000   2540324.6
json_parse_test_1mil.string           4   209.154ms     7.283ms   201.871ms   226.468ms 2097175.000       0.000 2598852221.9
json_parse_test_0001.iobuf       646573   882.709ns     2.742ns   875.019ns   925.758ns       5.000       0.000      6691.0
json_parse_test_1024.iobuf         4196   214.378us     1.317us   212.868us   218.049us    2061.000       0.000   4375206.7
json_parse_test_1mil.iobuf            5   193.112ms     4.999ms   179.562ms   198.110ms 2097175.000       0.000 2598851887.3

@BenPope BenPope requested review from pgellert and andijcr July 8, 2024 12:24
Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Copy link
Contributor

@michael-redpanda michael-redpanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, this is great

@vbotbuildovich
Copy link
Collaborator

@BenPope BenPope merged commit 9c82122 into redpanda-data:dev Jul 8, 2024
18 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-20827-v23.3.x-956 remotes/upstream/v23.3.x
git cherry-pick -x 7cea48d60091e79ff8e85a0f2f7aacd07b30eef2 ebc4c1347ef8cb9955dd671fb69a7bfbe103a34c 9fcb311283019d0dd359cb89dfd99a7edf289438 a455d2374ec799c14a627989df09ff320ddf0447 5d90218887aec7e665626a8ba1ed5deb40ab3a21 d74ec2aef846f895d0ef5f6e5b05c08120baeb14 f17567fe898c37960896ef303e123ef4c43bcd17 744b0f01067395b993a48005387dae8d356c9f25 5f3236c0ead0f8e9c439e957e04bc342f0f136de c676f6e8b7294019cd4d517dd93a6c4002473c4f 240c682d9fa655d3b19525260d31f5326fb48f54 8560e0ab3653ea3235f170043741a213dca1a66b 6795578e7cd2d171b9733e4da95e829dc0476105 c93475720e00301e2c192bd2a44eac8b9a564c6e ac7c7dd8f2a7177e88a886753c687f46e84c3f1b

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-20827-v24.1.x-558 remotes/upstream/v24.1.x
git cherry-pick -x 7cea48d60091e79ff8e85a0f2f7aacd07b30eef2 ebc4c1347ef8cb9955dd671fb69a7bfbe103a34c 9fcb311283019d0dd359cb89dfd99a7edf289438 a455d2374ec799c14a627989df09ff320ddf0447 5d90218887aec7e665626a8ba1ed5deb40ab3a21 d74ec2aef846f895d0ef5f6e5b05c08120baeb14 f17567fe898c37960896ef303e123ef4c43bcd17 744b0f01067395b993a48005387dae8d356c9f25 5f3236c0ead0f8e9c439e957e04bc342f0f136de c676f6e8b7294019cd4d517dd93a6c4002473c4f 240c682d9fa655d3b19525260d31f5326fb48f54 8560e0ab3653ea3235f170043741a213dca1a66b 6795578e7cd2d171b9733e4da95e829dc0476105 c93475720e00301e2c192bd2a44eac8b9a564c6e ac7c7dd8f2a7177e88a886753c687f46e84c3f1b

Workflow run logs.

//! Get the length of string in Ch in the string buffer.
size_t GetLength() const { return _impl.size_bytes() / sizeof(Ch); }

void Reserve(size_t s) { _impl.reserve(s); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenPope what are the semantics of Reserve? is it like std::vector::reserve? If so, I don't think you want iobuf:reserve because it returns an iobuf::placeholder and increases the size (not the capacity) of the iobuf.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenPope what are the semantics of Reserve? is it like std::vector::reserve? If so, I don't think you want iobuf:reserve because it returns an iobuf::placeholder and increases the size (not the capacity) of the iobuf.

Good spot, thanks.

BenPope added a commit to BenPope/redpanda that referenced this pull request Jul 10, 2024
From post-commit review: redpanda-data#20827 (comment)

Signed-off-by: Ben Pope <ben@redpanda.com>
BenPope added a commit to BenPope/redpanda that referenced this pull request Jul 16, 2024
From post-commit review: redpanda-data#20827 (comment)

Signed-off-by: Ben Pope <ben@redpanda.com>
BenPope added a commit to BenPope/redpanda that referenced this pull request Jul 16, 2024
From post-commit review: redpanda-data#20827 (comment)

Signed-off-by: Ben Pope <ben@redpanda.com>
BenPope added a commit to BenPope/redpanda that referenced this pull request Jul 17, 2024
From post-commit review: redpanda-data#20827 (comment)

Signed-off-by: Ben Pope <ben@redpanda.com>
(cherry picked from commit d5a6516)
BenPope added a commit to BenPope/redpanda that referenced this pull request Jul 22, 2024
From post-commit review: redpanda-data#20827 (comment)

Signed-off-by: Ben Pope <ben@redpanda.com>
(cherry picked from commit d5a6516)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/pandaproxy REST interface for Kafka API area/redpanda area/schema-registry Schema Registry service within Redpanda
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants