Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: abort and headers already sent errors for the rest api #5722

Merged
merged 8 commits into from
Jul 14, 2023

Conversation

nazarhussain
Copy link
Contributor

Motivation

Gracefully shutdown the connection.

Description

When validator is loading up and waiting for genesis that can take a lot of time and during this time if the user press Ctrl+C then the following error is shown.

Error: aborted

So with this change we carefully wait for the validator to load and if user abort we skip that error.

Closes #5706

Steps to test or reproduce

  • Run all tests

@nazarhussain nazarhussain requested a review from a team as a code owner June 30, 2023 13:45
@nazarhussain nazarhussain self-assigned this Jun 30, 2023
@nazarhussain
Copy link
Contributor Author

To reproduce the error run following command in unstable branch and then try in this branch.

./lodestar dev --reset --startValidators=0..7 --rest.port=3400

After running this command wait for few seconds and press Ctrl+C.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 30, 2023

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: e73506f Previous: fbe9beb Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 501.13 us/op 571.63 us/op 0.88
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 46.949 us/op 48.257 us/op 0.97
BLS verify - blst-native 1.1993 ms/op 1.2476 ms/op 0.96
BLS verifyMultipleSignatures 3 - blst-native 2.4363 ms/op 2.5390 ms/op 0.96
BLS verifyMultipleSignatures 8 - blst-native 5.2446 ms/op 5.4619 ms/op 0.96
BLS verifyMultipleSignatures 32 - blst-native 19.109 ms/op 19.862 ms/op 0.96
BLS aggregatePubkeys 32 - blst-native 24.742 us/op 26.620 us/op 0.93
BLS aggregatePubkeys 128 - blst-native 99.133 us/op 104.87 us/op 0.95
getAttestationsForBlock 53.818 ms/op 60.158 ms/op 0.89
isKnown best case - 1 super set check 254.00 ns/op 258.00 ns/op 0.98
isKnown normal case - 2 super set checks 249.00 ns/op 254.00 ns/op 0.98
isKnown worse case - 16 super set checks 255.00 ns/op 259.00 ns/op 0.98
CheckpointStateCache - add get delete 5.1210 us/op 6.0770 us/op 0.84
validate gossip signedAggregateAndProof - struct 2.7430 ms/op 2.9249 ms/op 0.94
validate gossip attestation - struct 1.3006 ms/op 1.4152 ms/op 0.92
pickEth1Vote - no votes 1.2684 ms/op 1.3916 ms/op 0.91
pickEth1Vote - max votes 10.442 ms/op 11.988 ms/op 0.87
pickEth1Vote - Eth1Data hashTreeRoot value x2048 8.9111 ms/op 9.9432 ms/op 0.90
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 14.685 ms/op 17.308 ms/op 0.85
pickEth1Vote - Eth1Data fastSerialize value x2048 622.78 us/op 812.93 us/op 0.77
pickEth1Vote - Eth1Data fastSerialize tree x2048 5.9956 ms/op 9.5941 ms/op 0.62
bytes32 toHexString 507.00 ns/op 688.00 ns/op 0.74
bytes32 Buffer.toString(hex) 338.00 ns/op 444.00 ns/op 0.76
bytes32 Buffer.toString(hex) from Uint8Array 539.00 ns/op 634.00 ns/op 0.85
bytes32 Buffer.toString(hex) + 0x 359.00 ns/op 412.00 ns/op 0.87
Object access 1 prop 0.16700 ns/op 0.20300 ns/op 0.82
Map access 1 prop 0.15600 ns/op 0.18000 ns/op 0.87
Object get x1000 6.5030 ns/op 6.7360 ns/op 0.97
Map get x1000 0.54300 ns/op 0.68700 ns/op 0.79
Object set x1000 51.503 ns/op 62.805 ns/op 0.82
Map set x1000 43.676 ns/op 54.677 ns/op 0.80
Return object 10000 times 0.23570 ns/op 0.24960 ns/op 0.94
Throw Error 10000 times 4.2163 us/op 4.2881 us/op 0.98
fastMsgIdFn sha256 / 200 bytes 3.3740 us/op 3.6660 us/op 0.92
fastMsgIdFn h32 xxhash / 200 bytes 268.00 ns/op 316.00 ns/op 0.85
fastMsgIdFn h64 xxhash / 200 bytes 384.00 ns/op 436.00 ns/op 0.88
fastMsgIdFn sha256 / 1000 bytes 11.212 us/op 12.205 us/op 0.92
fastMsgIdFn h32 xxhash / 1000 bytes 399.00 ns/op 476.00 ns/op 0.84
fastMsgIdFn h64 xxhash / 1000 bytes 448.00 ns/op 553.00 ns/op 0.81
fastMsgIdFn sha256 / 10000 bytes 100.76 us/op 106.21 us/op 0.95
fastMsgIdFn h32 xxhash / 10000 bytes 1.8790 us/op 2.0090 us/op 0.94
fastMsgIdFn h64 xxhash / 10000 bytes 1.3280 us/op 1.4940 us/op 0.89
enrSubnets - fastDeserialize 64 bits 1.2530 us/op 1.6430 us/op 0.76
enrSubnets - ssz BitVector 64 bits 462.00 ns/op 609.00 ns/op 0.76
enrSubnets - fastDeserialize 4 bits 167.00 ns/op 212.00 ns/op 0.79
enrSubnets - ssz BitVector 4 bits 469.00 ns/op 592.00 ns/op 0.79
prioritizePeers score -10:0 att 32-0.1 sync 2-0 101.16 us/op 110.67 us/op 0.91
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 130.46 us/op 151.15 us/op 0.86
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 161.33 us/op 183.87 us/op 0.88
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 293.08 us/op 344.26 us/op 0.85
prioritizePeers score 0:0 att 64-1 sync 4-1 360.49 us/op 439.47 us/op 0.82
array of 16000 items push then shift 1.6028 us/op 1.7629 us/op 0.91
LinkedList of 16000 items push then shift 8.7680 ns/op 9.2410 ns/op 0.95
array of 16000 items push then pop 90.704 ns/op 117.84 ns/op 0.77
LinkedList of 16000 items push then pop 8.4020 ns/op 9.1980 ns/op 0.91
array of 24000 items push then shift 2.3236 us/op 2.4496 us/op 0.95
LinkedList of 24000 items push then shift 8.7670 ns/op 10.850 ns/op 0.81
array of 24000 items push then pop 85.961 ns/op 89.995 ns/op 0.96
LinkedList of 24000 items push then pop 8.4180 ns/op 9.3080 ns/op 0.90
intersect bitArray bitLen 8 12.954 ns/op 13.689 ns/op 0.95
intersect array and set length 8 73.609 ns/op 91.974 ns/op 0.80
intersect bitArray bitLen 128 43.065 ns/op 46.120 ns/op 0.93
intersect array and set length 128 1.0097 us/op 1.2560 us/op 0.80
Buffer.concat 32 items 2.5720 us/op 3.0220 us/op 0.85
Uint8Array.set 32 items 2.8530 us/op 2.4170 us/op 1.18
transfer serialized Status (84 B) 2.0640 us/op 2.2020 us/op 0.94
copy serialized Status (84 B) 1.7460 us/op 1.8000 us/op 0.97
transfer serialized SignedVoluntaryExit (112 B) 2.1450 us/op 2.2360 us/op 0.96
copy serialized SignedVoluntaryExit (112 B) 1.6780 us/op 1.8830 us/op 0.89
transfer serialized ProposerSlashing (416 B) 2.3660 us/op 2.4400 us/op 0.97
copy serialized ProposerSlashing (416 B) 2.2450 us/op 2.8640 us/op 0.78
transfer serialized Attestation (485 B) 2.9470 us/op 3.3610 us/op 0.88
copy serialized Attestation (485 B) 3.2310 us/op 2.8680 us/op 1.13
transfer serialized AttesterSlashing (33232 B) 2.5490 us/op 2.7360 us/op 0.93
copy serialized AttesterSlashing (33232 B) 5.1050 us/op 9.5560 us/op 0.53
transfer serialized Small SignedBeaconBlock (128000 B) 3.0310 us/op 3.1330 us/op 0.97
copy serialized Small SignedBeaconBlock (128000 B) 13.887 us/op 24.875 us/op 0.56
transfer serialized Avg SignedBeaconBlock (200000 B) 3.1230 us/op 3.6370 us/op 0.86
copy serialized Avg SignedBeaconBlock (200000 B) 50.412 us/op 36.402 us/op 1.38
transfer serialized BlobsSidecar (524380 B) 3.2900 us/op 4.0620 us/op 0.81
copy serialized BlobsSidecar (524380 B) 82.434 us/op 206.48 us/op 0.40
transfer serialized Big SignedBeaconBlock (1000000 B) 3.2570 us/op 4.3000 us/op 0.76
copy serialized Big SignedBeaconBlock (1000000 B) 234.90 us/op 383.32 us/op 0.61
pass gossip attestations to forkchoice per slot 2.6429 ms/op 2.8956 ms/op 0.91
forkChoice updateHead vc 100000 bc 64 eq 0 2.0396 ms/op 2.2986 ms/op 0.89
forkChoice updateHead vc 600000 bc 64 eq 0 11.090 ms/op 13.427 ms/op 0.83
forkChoice updateHead vc 1000000 bc 64 eq 0 21.144 ms/op 25.271 ms/op 0.84
forkChoice updateHead vc 600000 bc 320 eq 0 16.453 ms/op 18.044 ms/op 0.91
forkChoice updateHead vc 600000 bc 1200 eq 0 81.815 ms/op 92.221 ms/op 0.89
forkChoice updateHead vc 600000 bc 64 eq 1000 20.861 ms/op 22.144 ms/op 0.94
forkChoice updateHead vc 600000 bc 64 eq 10000 23.679 ms/op 23.963 ms/op 0.99
forkChoice updateHead vc 600000 bc 64 eq 300000 30.911 ms/op 40.440 ms/op 0.76
computeDeltas 3.8969 ms/op 3.1311 ms/op 1.24
computeProposerBoostScoreFromBalances 1.7698 ms/op 1.8330 ms/op 0.97
altair processAttestation - 250000 vs - 7PWei normalcase 2.1029 ms/op 2.5801 ms/op 0.82
altair processAttestation - 250000 vs - 7PWei worstcase 3.2712 ms/op 5.2014 ms/op 0.63
altair processAttestation - setStatus - 1/6 committees join 137.83 us/op 148.02 us/op 0.93
altair processAttestation - setStatus - 1/3 committees join 270.67 us/op 281.92 us/op 0.96
altair processAttestation - setStatus - 1/2 committees join 365.42 us/op 375.55 us/op 0.97
altair processAttestation - setStatus - 2/3 committees join 455.28 us/op 506.61 us/op 0.90
altair processAttestation - setStatus - 4/5 committees join 642.95 us/op 674.98 us/op 0.95
altair processAttestation - setStatus - 100% committees join 730.93 us/op 803.21 us/op 0.91
altair processBlock - 250000 vs - 7PWei normalcase 18.136 ms/op 18.653 ms/op 0.97
altair processBlock - 250000 vs - 7PWei normalcase hashState 24.711 ms/op 27.448 ms/op 0.90
altair processBlock - 250000 vs - 7PWei worstcase 49.578 ms/op 54.209 ms/op 0.91
altair processBlock - 250000 vs - 7PWei worstcase hashState 66.241 ms/op 70.489 ms/op 0.94
phase0 processBlock - 250000 vs - 7PWei normalcase 2.0525 ms/op 2.2575 ms/op 0.91
phase0 processBlock - 250000 vs - 7PWei worstcase 26.827 ms/op 30.753 ms/op 0.87
altair processEth1Data - 250000 vs - 7PWei normalcase 470.94 us/op 568.34 us/op 0.83
getExpectedWithdrawals 250000 eb:1,eth1:1,we:0,wn:0,smpl:15 6.8770 us/op 8.6160 us/op 0.80
getExpectedWithdrawals 250000 eb:0.95,eth1:0.1,we:0.05,wn:0,smpl:219 19.493 us/op 28.309 us/op 0.69
getExpectedWithdrawals 250000 eb:0.95,eth1:0.3,we:0.05,wn:0,smpl:42 8.5920 us/op 12.623 us/op 0.68
getExpectedWithdrawals 250000 eb:0.95,eth1:0.7,we:0.05,wn:0,smpl:18 6.4000 us/op 10.083 us/op 0.63
getExpectedWithdrawals 250000 eb:0.1,eth1:0.1,we:0,wn:0,smpl:1020 74.385 us/op 108.90 us/op 0.68
getExpectedWithdrawals 250000 eb:0.03,eth1:0.03,we:0,wn:0,smpl:11777 613.90 us/op 744.90 us/op 0.82
getExpectedWithdrawals 250000 eb:0.01,eth1:0.01,we:0,wn:0,smpl:16384 861.97 us/op 942.63 us/op 0.91
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,smpl:16384 816.44 us/op 976.21 us/op 0.84
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,nocache,smpl:16384 2.2469 ms/op 3.0204 ms/op 0.74
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,smpl:16384 1.4703 ms/op 1.5954 ms/op 0.92
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,nocache,smpl:16384 3.7486 ms/op 8.8747 ms/op 0.42
Tree 40 250000 create 296.18 ms/op 498.50 ms/op 0.59
Tree 40 250000 get(125000) 184.11 ns/op 202.58 ns/op 0.91
Tree 40 250000 set(125000) 926.20 ns/op 1.0665 us/op 0.87
Tree 40 250000 toArray() 17.861 ms/op 22.305 ms/op 0.80
Tree 40 250000 iterate all - toArray() + loop 17.657 ms/op 22.511 ms/op 0.78
Tree 40 250000 iterate all - get(i) 68.871 ms/op 77.996 ms/op 0.88
MutableVector 250000 create 10.371 ms/op 13.565 ms/op 0.76
MutableVector 250000 get(125000) 6.2440 ns/op 6.9050 ns/op 0.90
MutableVector 250000 set(125000) 256.11 ns/op 300.33 ns/op 0.85
MutableVector 250000 toArray() 2.8246 ms/op 4.4380 ms/op 0.64
MutableVector 250000 iterate all - toArray() + loop 2.9361 ms/op 4.4368 ms/op 0.66
MutableVector 250000 iterate all - get(i) 1.4828 ms/op 1.6281 ms/op 0.91
Array 250000 create 2.6716 ms/op 4.3702 ms/op 0.61
Array 250000 clone - spread 1.3024 ms/op 1.6003 ms/op 0.81
Array 250000 get(125000) 0.61800 ns/op 1.2440 ns/op 0.50
Array 250000 set(125000) 0.69000 ns/op 1.1830 ns/op 0.58
Array 250000 iterate all - loop 80.746 us/op 90.221 us/op 0.89
effectiveBalanceIncrements clone Uint8Array 300000 31.705 us/op 36.354 us/op 0.87
effectiveBalanceIncrements clone MutableVector 300000 416.00 ns/op 361.00 ns/op 1.15
effectiveBalanceIncrements rw all Uint8Array 300000 163.80 us/op 172.47 us/op 0.95
effectiveBalanceIncrements rw all MutableVector 300000 83.259 ms/op 91.229 ms/op 0.91
phase0 afterProcessEpoch - 250000 vs - 7PWei 109.84 ms/op 117.82 ms/op 0.93
phase0 beforeProcessEpoch - 250000 vs - 7PWei 31.666 ms/op 42.162 ms/op 0.75
altair processEpoch - mainnet_e81889 343.94 ms/op 344.08 ms/op 1.00
mainnet_e81889 - altair beforeProcessEpoch 64.839 ms/op 51.321 ms/op 1.26
mainnet_e81889 - altair processJustificationAndFinalization 18.164 us/op 19.988 us/op 0.91
mainnet_e81889 - altair processInactivityUpdates 6.0568 ms/op 6.2892 ms/op 0.96
mainnet_e81889 - altair processRewardsAndPenalties 67.449 ms/op 66.302 ms/op 1.02
mainnet_e81889 - altair processRegistryUpdates 2.7730 us/op 3.1890 us/op 0.87
mainnet_e81889 - altair processSlashings 459.00 ns/op 482.00 ns/op 0.95
mainnet_e81889 - altair processEth1DataReset 501.00 ns/op 547.00 ns/op 0.92
mainnet_e81889 - altair processEffectiveBalanceUpdates 1.2127 ms/op 1.2692 ms/op 0.96
mainnet_e81889 - altair processSlashingsReset 3.9080 us/op 5.2670 us/op 0.74
mainnet_e81889 - altair processRandaoMixesReset 4.6130 us/op 6.8580 us/op 0.67
mainnet_e81889 - altair processHistoricalRootsUpdate 700.00 ns/op 873.00 ns/op 0.80
mainnet_e81889 - altair processParticipationFlagUpdates 2.6860 us/op 2.4810 us/op 1.08
mainnet_e81889 - altair processSyncCommitteeUpdates 511.00 ns/op 462.00 ns/op 1.11
mainnet_e81889 - altair afterProcessEpoch 122.20 ms/op 121.85 ms/op 1.00
phase0 processEpoch - mainnet_e58758 373.24 ms/op 358.65 ms/op 1.04
mainnet_e58758 - phase0 beforeProcessEpoch 132.77 ms/op 138.21 ms/op 0.96
mainnet_e58758 - phase0 processJustificationAndFinalization 18.998 us/op 17.329 us/op 1.10
mainnet_e58758 - phase0 processRewardsAndPenalties 59.272 ms/op 59.644 ms/op 0.99
mainnet_e58758 - phase0 processRegistryUpdates 8.0220 us/op 9.1170 us/op 0.88
mainnet_e58758 - phase0 processSlashings 543.00 ns/op 665.00 ns/op 0.82
mainnet_e58758 - phase0 processEth1DataReset 544.00 ns/op 549.00 ns/op 0.99
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 1.0561 ms/op 1.0546 ms/op 1.00
mainnet_e58758 - phase0 processSlashingsReset 4.6630 us/op 4.2930 us/op 1.09
mainnet_e58758 - phase0 processRandaoMixesReset 4.6120 us/op 5.0020 us/op 0.92
mainnet_e58758 - phase0 processHistoricalRootsUpdate 830.00 ns/op 766.00 ns/op 1.08
mainnet_e58758 - phase0 processParticipationRecordUpdates 4.3890 us/op 4.4610 us/op 0.98
mainnet_e58758 - phase0 afterProcessEpoch 97.587 ms/op 101.53 ms/op 0.96
phase0 processEffectiveBalanceUpdates - 250000 normalcase 1.2421 ms/op 1.2714 ms/op 0.98
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 1.5207 ms/op 1.7047 ms/op 0.89
altair processInactivityUpdates - 250000 normalcase 18.715 ms/op 26.320 ms/op 0.71
altair processInactivityUpdates - 250000 worstcase 24.544 ms/op 27.595 ms/op 0.89
phase0 processRegistryUpdates - 250000 normalcase 7.5930 us/op 6.9620 us/op 1.09
phase0 processRegistryUpdates - 250000 badcase_full_deposits 280.81 us/op 273.16 us/op 1.03
phase0 processRegistryUpdates - 250000 worstcase 0.5 114.95 ms/op 126.75 ms/op 0.91
altair processRewardsAndPenalties - 250000 normalcase 71.835 ms/op 67.735 ms/op 1.06
altair processRewardsAndPenalties - 250000 worstcase 73.089 ms/op 71.755 ms/op 1.02
phase0 getAttestationDeltas - 250000 normalcase 7.8566 ms/op 6.9021 ms/op 1.14
phase0 getAttestationDeltas - 250000 worstcase 6.9136 ms/op 7.1005 ms/op 0.97
phase0 processSlashings - 250000 worstcase 3.3967 ms/op 3.7314 ms/op 0.91
altair processSyncCommitteeUpdates - 250000 177.69 ms/op 193.56 ms/op 0.92
BeaconState.hashTreeRoot - No change 321.00 ns/op 371.00 ns/op 0.87
BeaconState.hashTreeRoot - 1 full validator 54.240 us/op 55.355 us/op 0.98
BeaconState.hashTreeRoot - 32 full validator 546.87 us/op 549.40 us/op 1.00
BeaconState.hashTreeRoot - 512 full validator 5.4313 ms/op 5.4654 ms/op 0.99
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 65.395 us/op 63.821 us/op 1.02
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 924.42 us/op 910.72 us/op 1.02
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 12.475 ms/op 12.950 ms/op 0.96
BeaconState.hashTreeRoot - 1 balances 50.168 us/op 51.100 us/op 0.98
BeaconState.hashTreeRoot - 32 balances 470.52 us/op 497.99 us/op 0.94
BeaconState.hashTreeRoot - 512 balances 5.0120 ms/op 4.7530 ms/op 1.05
BeaconState.hashTreeRoot - 250000 balances 76.109 ms/op 78.766 ms/op 0.97
aggregationBits - 2048 els - zipIndexesInBitList 21.120 us/op 16.782 us/op 1.26
regular array get 100000 times 47.728 us/op 46.338 us/op 1.03
wrappedArray get 100000 times 44.126 us/op 49.849 us/op 0.89
arrayWithProxy get 100000 times 15.636 ms/op 16.756 ms/op 0.93
ssz.Root.equals 607.00 ns/op 658.00 ns/op 0.92
byteArrayEquals 600.00 ns/op 628.00 ns/op 0.96
shuffle list - 16384 els 7.1117 ms/op 7.9433 ms/op 0.90
shuffle list - 250000 els 105.77 ms/op 108.27 ms/op 0.98
processSlot - 1 slots 10.046 us/op 9.9160 us/op 1.01
processSlot - 32 slots 1.4237 ms/op 1.5143 ms/op 0.94
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 35.916 ms/op 37.547 ms/op 0.96
getCommitteeAssignments - req 1 vs - 250000 vc 2.9764 ms/op 3.0665 ms/op 0.97
getCommitteeAssignments - req 100 vs - 250000 vc 4.2416 ms/op 4.2931 ms/op 0.99
getCommitteeAssignments - req 1000 vs - 250000 vc 4.7123 ms/op 5.7759 ms/op 0.82
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 4.8400 ns/op 5.5600 ns/op 0.87
state getBlockRootAtSlot - 250000 vs - 7PWei 1.0233 us/op 665.32 ns/op 1.54
computeProposers - vc 250000 11.395 ms/op 10.933 ms/op 1.04
computeEpochShuffling - vc 250000 105.23 ms/op 104.86 ms/op 1.00
getNextSyncCommittee - vc 250000 178.31 ms/op 196.35 ms/op 0.91
computeSigningRoot for AttestationData 14.277 us/op 14.122 us/op 1.01
hash AttestationData serialized data then Buffer.toString(base64) 2.5534 us/op 2.6096 us/op 0.98
toHexString serialized data 1.5795 us/op 1.2140 us/op 1.30
Buffer.toString(base64) 387.06 ns/op 322.86 ns/op 1.20

by benchmarkbot/action

Copy link
Member

@nflaig nflaig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors mentioned in #5706 are solely on the beacon node side and are not addressed by this PR, although the unclean disconnect might be a fix in the validator client but it happens at any point when shutting down and not just at startup.

I am not sure if it is a good idea to swallow aborted errors during initialization of the validator client (or beacon node) as in that case those kinda make sense. Specifically in the case of "waiting for genesis..." it feels alright to me to print out a abort error if user aborts (CTRL+C).

@nazarhussain
Copy link
Contributor Author

nazarhussain commented Jun 30, 2023

@nflaig Please suggest the steps which you used to produce the issue mentioned in #5706. The only way I was able to produce was the steps mentioned above and I fixed it accordingly.

Related to your other comment. When user presses Ctrl+C and we show a message

Stopping gracefully, use Ctrl+C again to force process exit

Then it does not make useful to show abort error.

@nflaig
Copy link
Member

nflaig commented Jun 30, 2023

Please suggest the steps which you used to produce the issue mentioned in #5706

Can be reproduce by running a beacon node

./lodestar beacon \
    --dataDir /home/devops/goerli/data/beacon \
    --rest \
    --rest.address "0.0.0.0" \
    --rest.namespace '*' \
    --metrics \
    --execution.urls http://localhost:8551 \
    --jwt-secret /home/devops/goerli/data/jwtsecret \
    --logLevel info \
    --network goerli

and a validator client

./lodestar validator \
    --dataDir /home/devops/goerli/data/validator \
    --beaconNodes http://localhost:9596 \
    --metrics \
    --logLevel info \
    --network goerli \

and then just shutting down (CTRL+C) the validator client which produces this log on the beacon node

Jun-30 17:00:44.746[rest]            error: Req req-1 eventstream error  aborted
Error: aborted
    at connResetException (node:internal/errors:720:14)
    at abortIncoming (node:_http_server:771:17)
    at socketOnClose (node:_http_server:765:3)
    at Socket.emit (node:events:523:35)
    at TCP.<anonymous> (node:net:334:12)

the other issue Error: Cannot set headers after they are sent to the client is hard to reproduce as it depends on the timing when shutting down the validator client.

@nazarhussain
Copy link
Contributor Author

@nflaig Thanks for sharing the snippet. Will check and fix the issue.

@nazarhussain nazarhussain marked this pull request as draft July 4, 2023 17:23
@nazarhussain nazarhussain marked this pull request as ready for review July 13, 2023 16:09
@nazarhussain nazarhussain requested a review from nflaig July 13, 2023 16:09
Copy link
Member

@nflaig nflaig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error: Cannot set headers after they are sent to the client not yet addressed, can't close the issue with this in current state

packages/cli/src/cmds/validator/handler.ts Outdated Show resolved Hide resolved
packages/api/src/beacon/server/events.ts Outdated Show resolved Hide resolved
@@ -49,7 +50,12 @@ export function getRoutes(config: ChainForkConfig, api: ServerApi<Api>): ServerR
// The client may disconnect and we need to clean the subscriptions.
req.raw.once("close", () => resolve());
req.raw.once("end", () => resolve());
req.raw.once("error", (err) => reject(err));
req.raw.once("error", (err) => {
if ("code" in err && (err as unknown as {code: string}).code === "ECONNRESET") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure if this is ideal, we might swallow cases where validator client crashed or disconnected for other reasons than shutting it down.

Was really hoping that this could be somehow addressed on the validator client side, it should not cause ECONNRESET error on the beacon node but disconnect gracefully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not swallowing something, trying to log more relevant information. If the client crashed internally it's still aborted from the context of AbortController on client. So we can't differentiate on the server side of both scenarios.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the code specifically but shouldn't the validator client be able to more gracefully close the connection is it receives a abort signal?

@@ -36,6 +37,9 @@ export function getRoutes(config: ChainForkConfig, api: ServerApi<Api>): ServerR
await new Promise<void>((resolve, reject) => {
void api.eventstream(req.query.topics, controller.signal, (event) => {
try {
// If the request is already aborted, we don't need to send any more events.
if (req.raw.destroyed) return;
Copy link
Contributor Author

@nazarhussain nazarhussain Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will try to avoid headers already sent scenario. Though that edge case is very tricky to reproduce. I tried a lot of ticks but was not able to produce it locally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this handles all cases where this can happen, I was think maybe to check res.raw.headersSent before setting hearders but would also have to look into this more if that the correct condition to check for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

headers already sent error occurred, when you try to write to a response, which related request is completed. That completion can be because request was aborted, or server already finished writing response and closed the stream with res.end().

I can't see the later case happens in our code, as we write the response only from one place. So the only case left in my mind is the first one, so this condition will cover it.

If you can find a way to reproduce headers already sent scenario manually, please share then we can test.

Copy link
Member

@nflaig nflaig Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the stack trace in the issue, the error is not caused by us setting headers but something fastify does.

Looks like here res.writeHead, might be an issue on fastify or related to us not closing correcty.

But res.raw.write looks like a good candidate that could be causing this. If we are sure this does not cause unwanted side effects the change looks good.

If you can find a way to reproduce headers already sent scenario manually

I just saw this once or twice on my end over a period of several months and it was reported by a user once. Seems to be highly timing specific, I tried to reproduce it consistently but no luck so far

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any side effects for this change.

Copy link
Member

@nflaig nflaig Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nazarhussain I was going through logs from sim tests and it looks like the error still happens there.

[node-1-cl-lodestar] [8460]: Eph 6/0 2.220[node-1-cl-lodestar] �[31merror�[39m: uncaughtException: Cannot set headers after they are sent to the client
Error: Cannot set headers after they are sent to the client
    at new NodeError (node:internal/errors:405:5)
    at ServerResponse.setHeader (node:_http_outgoing:648:11)
    at ServerResponse.writeHead (node:_http_server:381:21)
    at onSendEnd (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/reply.js:597:7)
    at onSendHook (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/reply.js:530:5)
    at fallbackErrorHandler (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/error-handler.js:127:3)
    at handleError (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/error-handler.js:61:5)
    at onErrorHook (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/reply.js:743:5)
    at Reply.send (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/reply.js:133:5)
    at defaultErrorHandler (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/error-handler.js:92:9) Cannot set headers after they are sent to the client
Error: Cannot set headers after they are sent to the client

See node-1-cl-lodestar.log from this sim tests run. This error seems to happen quite frequently in sim tests.

I would suggest we revert this change here as it does not solve the issue. It might also be a bug in fastify and not directly solvable on our end.

Edit: I created an issue to keep track of this

@@ -169,10 +171,15 @@ export async function validatorHandler(args: IValidatorCliArgs & GlobalArgs): Pr
distributed: args.distributed,
},
metrics
);
).catch((err) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #5511 (comment) by @tuyennhv we shouldn't use await and catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I earlier used the try/catch pattern as suggested by @tuyennhv. You asked it does not feel good wrap it. So alternative is the following approach. As we don't prefer using let in our code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, all of the proposed solutions look sub-optimal

packages/cli/src/cmds/validator/handler.ts Outdated Show resolved Hide resolved
@@ -36,6 +37,9 @@ export function getRoutes(config: ChainForkConfig, api: ServerApi<Api>): ServerR
await new Promise<void>((resolve, reject) => {
void api.eventstream(req.query.topics, controller.signal, (event) => {
try {
// If the request is already aborted, we don't need to send any more events.
if (req.raw.destroyed) return;
Copy link
Member

@nflaig nflaig Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the stack trace in the issue, the error is not caused by us setting headers but something fastify does.

Looks like here res.writeHead, might be an issue on fastify or related to us not closing correcty.

But res.raw.write looks like a good candidate that could be causing this. If we are sure this does not cause unwanted side effects the change looks good.

If you can find a way to reproduce headers already sent scenario manually

I just saw this once or twice on my end over a period of several months and it was reported by a user once. Seems to be highly timing specific, I tried to reproduce it consistently but no luck so far

@@ -49,7 +53,12 @@ export function getRoutes(config: ChainForkConfig, api: ServerApi<Api>): ServerR
// The client may disconnect and we need to clean the subscriptions.
req.raw.once("close", () => resolve());
req.raw.once("end", () => resolve());
req.raw.once("error", (err) => reject(err));
req.raw.once("error", (err) => {
if ((err as unknown as {code: string}).code === "ECONNRESET") {
Copy link
Member

@nflaig nflaig Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed our code and did some research on how EventSource behaves. It looks like we are closing this correctly on the client by calling .close().

It is invoking both the close and error event on the server, this is a bit strange to me but it looks like that's how EventSource works, it doesn't gracefully close for some reason.

Another thing is that usually this would be handled by fastify which also just ignore errors with code ECONNRESET, see https://github.com/fastify/fastify/blob/cc347d7c0b4266097b61b126158b797878668353/fastify.js#L667C23-L667C33

But since the event API is not handled by fastify we need to do it ourselves and applying same pattern sounds good to me.

We might even consider just resolving here instead of rejecting with abort error but I guess if this error is not logged due to it being ignored upstream this is also fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should abort here on ECONNRESET and let the higher level stack handle that error, as it's currently doing. Once we have more resources to knew if the reset was actually abort or gracefully closed by client then we can decide to resolve the later scenario.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sounds good to me 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is invoking both the close and error event on the server, this is a bit strange to me but it looks like that's how EventSource works, it doesn't gracefully close for some reason.

This might be related to

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also simliar to issue reported here

@@ -169,10 +171,15 @@ export async function validatorHandler(args: IValidatorCliArgs & GlobalArgs): Pr
distributed: args.distributed,
},
metrics
);
).catch((err) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, all of the proposed solutions look sub-optimal

packages/cli/src/cmds/validator/handler.ts Outdated Show resolved Hide resolved
@nazarhussain nazarhussain requested a review from nflaig July 14, 2023 11:08
Copy link
Member

@nflaig nflaig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

Let's also update the PR description to reflect the changes implemented

@nazarhussain nazarhussain changed the title fix: validator client unclean disconnect on shutdown fix: abort and headers already sent errors for the reset api Jul 14, 2023
@nazarhussain nazarhussain changed the title fix: abort and headers already sent errors for the reset api fix: abort and headers already sent errors for the rest api Jul 14, 2023
@nazarhussain nazarhussain merged commit 96820d7 into unstable Jul 14, 2023
@wemeetagain
Copy link
Member

🎉 This PR is included in v1.10.0 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validator client unclean disconnect on shutdown
3 participants