http requests fails with timeout on large block (`json.hpp` issue?) #677

greg7mdp · 2023-01-31T23:01:10Z

$ curl -X POST -d '{"block_num_or_id": 291953778}' http://localhost:8888/v1/chain/get_block
{"code":500,"message":"Internal Service Error","error":{"code":2,"name":"timeout_exception","what":"deadline 2023-01-31T22:14:07.564 exceeded by 914656us ","details":[{"message":"deadline 2023-01-31T22:14:07.564 exceeded by 914656us ","file":"json.hpp","line_number":63,"method":"operator()"}]}}#

in eosq block explorer.

The message mentions json.hpp:63, which is in a string to_string() function. Maybe we have an issue where creating json output takes excessively long for large trees?

The text was updated successfully, but these errors were encountered:

matthewdarwin · 2023-01-31T23:48:35Z

This works fine in nodeos 3.1. So regression here.

matthewdarwin · 2023-02-01T00:14:42Z

http-max-response-time-ms = 2500 works in 3.2

matthewdarwin · 2023-02-01T03:37:05Z

Seems to be a deliberate but poorly documented change in #96

greg7mdp · 2023-02-01T12:10:58Z

Seems to be a deliberate but poorly documented change in #96

Probably. Although the PR #96 doesn't list get_block as one of the affected APIs?

matthewdarwin · 2023-02-01T12:35:03Z

I don't know why we would want to have limits on get_block. If the block is stored and requested we want to send it.

greg7mdp · 2023-02-01T12:41:55Z

My (limited) understanding is that if the node is used for block production, we may not want it failing to do so because its main thread is busy serving long http requests. If the node is used mainly for serving data requests, yes probably these limits should be relaxed.

matthewdarwin · 2023-02-01T12:43:24Z

block production nodes should never be serving general http requests anyway.

heifner · 2023-02-01T12:44:38Z

I don't know why we would want to have limits on get_block. If the block is stored and requested we want to send it.

So just set a very high limit on the api node. Or do you want to avoid long running get_table calls?

bhazzard · 2023-02-03T16:55:40Z

In an ENF backlog refinement session, we discussed this issue at length, but didn’t come to a clear consensus on the best path forward.   We did all agree that the current behavior is undesirable and that more comprehensive thought is required around api architecture and where serialization should be handled. For now, we will simply communicate a workaround.

The workaround:
Ensure that max-http-response-time is configured to be greater than max-abi-serialization-time to prevent timeouts on GetBlock. To keep a lower maximum allowed http response time for endpoints other than GetBlock, run multiple nodeos instances with the required configurations, and use a proxy to direct to them as desired.

A more complete statement with context:
API endpoints often require ABI serialization. For large blocks requested with the GetBlock endpoint, this serialization can take quite a long time. To minimize the occurrence of requests that require a long running serialization failing due to exceeding the maximum allowed http response time our suggested workaround is to ensure that max-http-response-time is configured to be greater than max-abi-serialization-time.

If a node operator would like to keep a lower maximum allowed http response times for endpoints that don’t have this problem, one option is to use a proxy to direct GetBlock requests to a dedicated nodeos instance with a higher max-http-response-time, and all other requests to a nodeos instance with your preferred max-http-response-time.

heifner · 2023-02-06T17:17:54Z

Time for the abi serializer to_variant vs the conversion to json for EOS mainnet block 291953778 on my i9-12900K

to_variant calls: 1, total: 884855us, avg: 884855us
to json    calls: 1, total:  48551us, avg:  48551us

greg7mdp · 2023-02-06T17:30:35Z

Well, I guess it shows that the issue, if there is one, is in to_variant!

heifner · 2023-02-06T17:36:25Z

Well, I guess it shows that the issue, if there is one, is in to_variant!

Yeah, that is as expected. Although that to json is rather terrible as well.

heifner · 2023-02-07T00:41:34Z

For reference the same block without exploding it via abi_serializer (what a /v1/chain/get_raw_block would be):

info  2023-02-07T00:40:21.876 nodeos    stopwatch.hpp:41              report               ] get_block calls: 1, total: 1460us, avg: 1460us
info  2023-02-07T00:40:21.878 http-0    stopwatch.hpp:41              report               ] convert to variant calls: 1, total: 1692us, avg: 1692us
info  2023-02-07T00:40:21.883 http-1    stopwatch.hpp:41              report               ] to json calls: 1, total: 4546us, avg: 4546us

heifner · 2023-02-07T01:15:18Z

Move most of the abi_serialization off the main thread:

info  2023-02-07T01:12:34.801 nodeos    stopwatch.hpp:41              report               ] get_block calls: 1, total: 1817us, avg: 1817us
info  2023-02-07T01:12:34.802 nodeos    stopwatch.hpp:41              report               ] gather abis calls: 1, total: 742us, avg: 742us
info  2023-02-07T01:12:34.987 http-0    stopwatch.hpp:41              report               ] abi_serializer calls: 1, total: 184769us, avg: 184769us
info  2023-02-07T01:12:35.036 http-1    stopwatch.hpp:41              report               ] to json calls: 1, total: 46043us, avg: 46043us

greg7mdp · 2023-02-07T01:32:59Z

Wow, did that speed up the abi serialization a lot? Above you reported that to_variant took 884855us.

heifner · 2023-02-07T01:36:16Z

Wow, did that speed up the abi serialization a lot? Above you reported that to_variant took 884855us.

Yes. This uses an abi cache so that an abi is only pulled out of chainbase and an abi_serializer created once per account. In the existing implementation that is done over and over again each time the account is referenced in the block.

greg7mdp · 2023-02-07T01:41:23Z

OK, that makes more sense, looks like an great speedup, over 4x faster. So the speedup is because of the cache, not because it is done on a different thread.

I'll have to check out the cache you implemented. Is it a lru cache? And does it need to support concurrent access from multiple threads?

heifner · 2023-02-07T01:45:21Z

OK, that makes more sense, looks like an great speedup, over 4x faster. So the speedup is because of the cache, not because it is done on a different thread.

I believe so, yes.

I'll have to check out the cache you implemented. Is it a lru cache? And does it need to support concurrent access from multiple threads?

Nothing that complicated is used. The cache only lives for the life of the get_block call. Since the gather abi which creates the cache is less than 1ms in this example; I think that is fine as a final solution.

… issue with invalid abi should not throw exception.

heifner · 2023-02-08T14:15:24Z

Should we consider this closed by #696 ?

greg7mdp · 2023-02-08T15:17:16Z

Should we consider this closed by #696 ?

Yes, I think so, but maybe @matthewdarwin thinks otherwise? With your cache the request is is significantly faster so that should lessen the problem.
Also, I was wondering if you checked whether avoiding the extra string copies did speed up the abi_serializer part at all?

heifner · 2023-02-08T15:37:19Z

Also, I was wondering if you checked whether avoiding the extra string copies did speed up the abi_serializer part at all?

Yes, on my machine for the test block cut it in half.

greg7mdp · 2023-02-08T15:41:23Z

Wow, that's great. So for that block, the total request processing time is about 8x faster than before your changes?

heifner · 2023-02-08T15:48:26Z

The abi_serializer creation was only a small part of that.

info  2023-02-07T01:12:34.802 nodeos    stopwatch.hpp:41              report               ] gather abis calls: 1, total: 742us, avg: 742us

Now that is ~350us on my machine.

greg7mdp · 2023-02-08T16:09:46Z

For having a good idea of the progress made in the PR, I think it would be very helpful if you provided the updated times for the following two measurements:

to_variant calls: 1, total: 884855us, avg: 884855us
to json    calls: 1, total:  48551us, avg:  48551us

greg7mdp · 2023-02-08T16:14:42Z

The abi_serializer creation was only a small part of that.

This must mean that the cache saves us a lot of duplicate abi_serializer creations. And also that having a persistent cache as we discussed probably does not provide significant benefits (since creating the cache on-demand, even for this big block, costs little - 350us)

heifner · 2023-02-08T16:28:15Z

Current PR (34baed9) perf:

info  2023-02-08T16:33:11.344 nodeos    stopwatch.hpp:50              report               ] get_block calls: 1, total: 1351us, avg: 1351us
info  2023-02-08T16:33:11.345 nodeos    stopwatch.hpp:50              report               ] create cache calls: 1, total: 340us, avg: 340us
info  2023-02-08T16:33:11.533 http-1    stopwatch.hpp:50              report               ] to variant calls: 1, total: 188137us, avg: 188137us
info  2023-02-08T16:33:11.577 http-0    stopwatch.hpp:50              report               ] to json calls: 1, total: 40403us, avg: 40403us

to_variant before would include create cache & to variant.

Nothing in the PR should make any difference in to json, so any time diff there is not real.

The exciting thing about this change is how much less time is spent on the main thread.

greg7mdp · 2023-02-08T17:07:59Z

Great, so we are 5x faster on to_variant, 4x faster overall (including to_json). And almost all off the main thread. That is a great PR!

Move get_block abi serialization off the main thread

enf-ci-bot added the triage label Jan 31, 2023

enf-ci-bot added this to Team Backlog Jan 31, 2023

github-project-automation bot moved this to Todo in Team Backlog Jan 31, 2023

greg7mdp added bug Something isn't working actionable labels Jan 31, 2023

matthewdarwin mentioned this issue Jan 31, 2023

Make debug http logging do something useful (appears to do nothing) #678

Closed

heifner added discussion and removed triage actionable bug Something isn't working labels Feb 1, 2023

heifner mentioned this issue Feb 1, 2023

get_block large timeout exceeded values in 3.2 #679

Closed

heifner added a commit that referenced this issue Feb 7, 2023

GH-677 Move abi serialization off the main thread for get_block

38f3988

heifner mentioned this issue Feb 7, 2023

Move get_block abi serialization off the main thread #696

Merged

heifner added a commit that referenced this issue Feb 7, 2023

GH-677 Refactor get_block into multiple methods for testing. Also fix…

b7fdd1c

… issue with invalid abi should not throw exception.

heifner added a commit that referenced this issue Feb 7, 2023

GH-677 Use std::move for abi into abi_serializer to avoid string copies.

62903c0

heifner added a commit that referenced this issue Feb 7, 2023

GH-677 Minor cleanup

eb0d3dc

heifner added a commit that referenced this issue Feb 7, 2023

GH-677 Constrain scope

ed2e91d

heifner added a commit that referenced this issue Feb 7, 2023

GH-677 Fix invalid move

cd3aaf2

heifner added a commit that referenced this issue Feb 7, 2023

GH-677 Use unnamed copy

b1af6cf

heifner added a commit that referenced this issue Feb 7, 2023

GH-677 Fix some uses of std::move

a10d6bc

heifner added a commit that referenced this issue Feb 8, 2023

GH-677 Use unnamed copy

2eb0b43

heifner added a commit that referenced this issue Feb 8, 2023

GH-677 Revert unneeded explicit copy

34baed9

heifner self-assigned this Feb 8, 2023

heifner moved this from Todo to In Progress in Team Backlog Feb 8, 2023

heifner moved this from In Progress to Awaiting Review in Team Backlog Feb 8, 2023

heifner closed this as completed in #696 Feb 15, 2023

heifner added a commit that referenced this issue Feb 15, 2023

Merge pull request #696 from AntelopeIO/GH-677-get-block

dd59a99

Move get_block abi serialization off the main thread

github-project-automation bot moved this from Awaiting Review to Done in Team Backlog Feb 15, 2023

This was referenced Apr 20, 2023

Move abi serialization of transaction trace off main thread for chain_plugin: get_table_rows and get_account #1054

Merged

Update handling of http-max-response-time-ms & abi-serializer-max-time-ms #1062

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http requests fails with timeout on large block (`json.hpp` issue?) #677

http requests fails with timeout on large block (`json.hpp` issue?) #677

greg7mdp commented Jan 31, 2023

matthewdarwin commented Jan 31, 2023

matthewdarwin commented Feb 1, 2023 •

edited

Loading

matthewdarwin commented Feb 1, 2023

greg7mdp commented Feb 1, 2023

matthewdarwin commented Feb 1, 2023

greg7mdp commented Feb 1, 2023

matthewdarwin commented Feb 1, 2023 •

edited

Loading

heifner commented Feb 1, 2023

bhazzard commented Feb 3, 2023

heifner commented Feb 6, 2023 •

edited

Loading

greg7mdp commented Feb 6, 2023 •

edited

Loading

heifner commented Feb 6, 2023

heifner commented Feb 7, 2023

heifner commented Feb 7, 2023

greg7mdp commented Feb 7, 2023

heifner commented Feb 7, 2023

greg7mdp commented Feb 7, 2023

heifner commented Feb 7, 2023

heifner commented Feb 8, 2023

greg7mdp commented Feb 8, 2023 •

edited

Loading

heifner commented Feb 8, 2023

greg7mdp commented Feb 8, 2023 •

edited

Loading

heifner commented Feb 8, 2023

greg7mdp commented Feb 8, 2023

greg7mdp commented Feb 8, 2023

heifner commented Feb 8, 2023 •

edited

Loading

greg7mdp commented Feb 8, 2023

http requests fails with timeout on large block (json.hpp issue?) #677

http requests fails with timeout on large block (json.hpp issue?) #677

Comments

greg7mdp commented Jan 31, 2023

matthewdarwin commented Jan 31, 2023

matthewdarwin commented Feb 1, 2023 • edited Loading

matthewdarwin commented Feb 1, 2023

greg7mdp commented Feb 1, 2023

matthewdarwin commented Feb 1, 2023

greg7mdp commented Feb 1, 2023

matthewdarwin commented Feb 1, 2023 • edited Loading

heifner commented Feb 1, 2023

bhazzard commented Feb 3, 2023

heifner commented Feb 6, 2023 • edited Loading

greg7mdp commented Feb 6, 2023 • edited Loading

heifner commented Feb 6, 2023

heifner commented Feb 7, 2023

heifner commented Feb 7, 2023

greg7mdp commented Feb 7, 2023

heifner commented Feb 7, 2023

greg7mdp commented Feb 7, 2023

heifner commented Feb 7, 2023

heifner commented Feb 8, 2023

greg7mdp commented Feb 8, 2023 • edited Loading

heifner commented Feb 8, 2023

greg7mdp commented Feb 8, 2023 • edited Loading

heifner commented Feb 8, 2023

greg7mdp commented Feb 8, 2023

greg7mdp commented Feb 8, 2023

heifner commented Feb 8, 2023 • edited Loading

greg7mdp commented Feb 8, 2023

http requests fails with timeout on large block (`json.hpp` issue?) #677

http requests fails with timeout on large block (`json.hpp` issue?) #677

matthewdarwin commented Feb 1, 2023 •

edited

Loading

matthewdarwin commented Feb 1, 2023 •

edited

Loading

heifner commented Feb 6, 2023 •

edited

Loading

greg7mdp commented Feb 6, 2023 •

edited

Loading

greg7mdp commented Feb 8, 2023 •

edited

Loading

greg7mdp commented Feb 8, 2023 •

edited

Loading

heifner commented Feb 8, 2023 •

edited

Loading