Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests(StoragePruner): Robusta Testing #2782

Closed
Tracked by #2748
distractedm1nd opened this issue Oct 2, 2023 · 4 comments
Closed
Tracked by #2748

tests(StoragePruner): Robusta Testing #2782

distractedm1nd opened this issue Oct 2, 2023 · 4 comments
Assignees
Labels
area:storage kind:testing Related to unit tests

Comments

@distractedm1nd
Copy link
Collaborator

distractedm1nd commented Oct 2, 2023

The StoragePruner needs to be tested using robusta testnets.

The current robusta network being used is pruning-v9 on tag v0.11.0-alpha.9.
Using Robusta many bugs have been fixed so far, including various races and panics. I will start updating this issue with a log of what I change/do. As of right now I am committing fixes and updates to the branch timestamp-pruning-metrics

If anybody external wants to run a node, CELESTIA_CUSTOM is
CELESTIA_CUSTOM=pruning-v9:076359F00AA7A56361C723059376C0FEF0D1209954FFD5CD65EADE18FDB3FC90:/ip4/151.115.12.221/tcp/2121/p2p/12D3KooW9yNjW9MWQbKyoxTeZW2N88HHq7RMz4Ucx5EKosuSWM6u and core ip is 151.115.15.49

@distractedm1nd
Copy link
Collaborator Author

distractedm1nd commented Oct 2, 2023

Most recent issue: I noticed that bridge nodes are experiencing many stateless resets in shrexeds. I added the byte count to the log for more info, here are some examples:

2023-10-02T11:48:06.141+0200	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWPkB62oBjhUkcPDYPJAd5xkhgnsatGvWJMdSrX7zu5wuD", "hash": "100D9084DA905FF5729D88AB2261CBDD419EEC5A4E704B66290C943B102A68BF", "err": "writing ODS bytes: 476601 bytes written, received a stateless reset with token 2c51064bde9a46273c4096f6f771d114"}
2023-10-02T11:48:06.150+0200	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWNsFhX7NFKcGUqboptr4URTd9Y8ajpm8WYgTBztLSC4xq", "hash": "100D9084DA905FF5729D88AB2261CBDD419EEC5A4E704B66290C943B102A68BF", "err": "writing ODS bytes: 664971 bytes written, received a stateless reset with token 54a17307f61b52e7cea31b390b268228"}
2023-10-02T11:48:06.175+0200	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWMqhVxL2tnGWc9MMXkSXqWM3ZWj7WuhTDriJQxw6oJKh3", "hash": "100D9084DA905FF5729D88AB2261CBDD419EEC5A4E704B66290C943B102A68BF", "err": "writing ODS bytes: 483460 bytes written, received a stateless reset with token 7310daa063f654a6235f886c8294e67a"}
2023-10-02T11:49:45.631+0200	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWSXhDedsNTbacBSCZPH2CtVtYBL23VHbWLTNWBQdqoZsw", "hash": "938AECA9133B5B02F310F1CD210AD5DA30FFDD0E06073778A17EEB54E5CA5660", "err": "writing ODS bytes: 233495 bytes written, received a stateless reset with token 8233d5ee405dcbae06d9672453925768"}
2023-10-02T11:49:47.736+0200	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWSXhDedsNTbacBSCZPH2CtVtYBL23VHbWLTNWBQdqoZsw", "hash": "938AECA9133B5B02F310F1CD210AD5DA30FFDD0E06073778A17EEB54E5CA5660", "err": "writing ODS bytes: 595416 bytes written, received a stateless reset with token 1ad0b85468c45a234feb544c51859f3b"}
2023-10-02T11:49:48.416+0200	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWDr92PrFG1kKRJJn2VPK7uZB9TAgBv76oB3gti1sw3yEp", "hash": "938AECA9133B5B02F310F1CD210AD5DA30FFDD0E06073778A17EEB54E5CA5660", "err": "writing ODS bytes: 620030 bytes written, received a stateless reset with token ea381de8746b94a6150c0c6efb75e356"}

Observations:

  • When it happens, it is closing all streams with that peer over multiple datahashes that are not yet finished. They are clearly in the middle of writing the ODS to the client when this happens.

Other protocols are also receiving this error:

2023-10-02T13:06:23.210+0200	ERROR	header/p2p	p2p/server.go:146	server: writing header to stream	{"err": "received a stateless reset with token 1c738ea2ffe62f0c8c61a5ff00265b2f"}

@distractedm1nd distractedm1nd added area:storage kind:testing Related to unit tests and removed needs:triage labels Oct 2, 2023
@distractedm1nd
Copy link
Collaborator Author

distractedm1nd commented Oct 2, 2023

Even more interesting shrex logs:

2023-10-02T09:59:58.277Z	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWMqhVxL2tnGWc9MMXkSXqWM3ZWj7WuhTDriJQxw6oJKh3", "hash": "36EE87D61E78D6D439D829EFE026D91F4A03430377B86A93A4606E348846CA1F", "err": "writing ODS bytes: 65536 bytes written, INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.21.32:2121: sendmsg: invalid argument"}
2023-10-02T09:59:58.277Z	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWMqhVxL2tnGWc9MMXkSXqWM3ZWj7WuhTDriJQxw6oJKh3", "hash": "9E545C277F2467494951E2372348025A91090AF0FE3CE674D5F53CEE4E005674", "err": "writing ODS bytes: 98304 bytes written, INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.21.32:2121: sendmsg: invalid argument"}
2023-10-02T09:59:58.288Z	WARN	shrex/eds	shrexeds/server.go:124	server: writing status to stream	{"peer": "12D3KooWPkB62oBjhUkcPDYPJAd5xkhgnsatGvWJMdSrX7zu5wuD", "hash": "9E545C277F2467494951E2372348025A91090AF0FE3CE674D5F53CEE4E005674", "err": "INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.2.12:2121: sendmsg: invalid argument"}
2023-10-02T09:59:58.291Z	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWPkB62oBjhUkcPDYPJAd5xkhgnsatGvWJMdSrX7zu5wuD", "hash": "9E545C277F2467494951E2372348025A91090AF0FE3CE674D5F53CEE4E005674", "err": "writing ODS bytes: 0 bytes written, stream reset"}

Suddenly, around the target epoch rate, sampling started struggling hard:
image
image

Client side:

2023-10-02T10:05:50.287Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "C91F56D6E062D9AF64755CC87EF68F5F41DF88121AB7E12AEB0B82159C351718", "peer": "12D3KooWMqhVxL2tnGWc9MMXkSXqWM3ZWj7WuhTDriJQxw6oJKh3", "attempt": 2, "err": "share: data not found", "finished (s)": 0.001943993}
2023-10-02T10:05:50.287Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "825317CFD6EF177FF78C936EBC4BC422A700BEA0FB95D5BD56BB17FDD0F66012", "peer": "12D3KooWMqhVxL2tnGWc9MMXkSXqWM3ZWj7WuhTDriJQxw6oJKh3", "attempt": 147, "err": "share: data not found", "finished (s)": 0.001336518}
2023-10-02T10:05:50.288Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "6FD808ABDC9588006F9021A017D705BFA0706147B0C76A786324BC378AEDA135", "peer": "12D3KooWMqhVxL2tnGWc9MMXkSXqWM3ZWj7WuhTDriJQxw6oJKh3", "attempt": 115, "err": "share: data not found", "finished (s)": 0.00146089}
2023-10-02T10:05:50.306Z	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooW9yNjW9MWQbKyoxTeZW2N88HHq7RMz4Ucx5EKosuSWM6u", "hash": "825317CFD6EF177FF78C936EBC4BC422A700BEA0FB95D5BD56BB17FDD0F66012", "err": "failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument"}
2023-10-02T10:05:50.306Z	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooW9yNjW9MWQbKyoxTeZW2N88HHq7RMz4Ucx5EKosuSWM6u", "hash": "C91F56D6E062D9AF64755CC87EF68F5F41DF88121AB7E12AEB0B82159C351718", "err": "failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument"}
2023-10-02T10:05:50.306Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "825317CFD6EF177FF78C936EBC4BC422A700BEA0FB95D5BD56BB17FDD0F66012", "peer": "12D3KooW9yNjW9MWQbKyoxTeZW2N88HHq7RMz4Ucx5EKosuSWM6u", "attempt": 148, "err": "failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument", "finished (s)": 0.012976854}
2023-10-02T10:05:50.306Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "C91F56D6E062D9AF64755CC87EF68F5F41DF88121AB7E12AEB0B82159C351718", "peer": "12D3KooW9yNjW9MWQbKyoxTeZW2N88HHq7RMz4Ucx5EKosuSWM6u", "attempt": 3, "err": "failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument", "finished (s)": 0.012984088}
2023-10-02T10:05:50.306Z	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooW9yNjW9MWQbKyoxTeZW2N88HHq7RMz4Ucx5EKosuSWM6u", "hash": "6FD808ABDC9588006F9021A017D705BFA0706147B0C76A786324BC378AEDA135", "err": "failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument"}
2023-10-02T10:05:50.307Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "6FD808ABDC9588006F9021A017D705BFA0706147B0C76A786324BC378AEDA135", "peer": "12D3KooW9yNjW9MWQbKyoxTeZW2N88HHq7RMz4Ucx5EKosuSWM6u", "attempt": 116, "err": "failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument", "finished (s)": 0.013173852}
2023-10-02T10:05:51.286Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "C91F56D6E062D9AF64755CC87EF68F5F41DF88121AB7E12AEB0B82159C351718", "peer": "12D3KooWMqhVxL2tnGWc9MMXkSXqWM3ZWj7WuhTDriJQxw6oJKh3", "attempt": 4, "err": "share: data not found", "finished (s)": 0.001661565}
2023-10-02T10:05:51.288Z	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWDr92PrFG1kKRJJn2VPK7uZB9TAgBv76oB3gti1sw3yEp", "hash": "C91F56D6E062D9AF64755CC87EF68F5F41DF88121AB7E12AEB0B82159C351718", "err": "failed to read status from stream: received a stateless reset with token 66c886a79a8b42e80fbfba9adc8d0a17"}
2023-10-02T10:05:51.288Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "C91F56D6E062D9AF64755CC87EF68F5F41DF88121AB7E12AEB0B82159C351718", "peer": "12D3KooWDr92PrFG1kKRJJn2VPK7uZB9TAgBv76oB3gti1sw3yEp", "attempt": 5, "err": "failed to read status from stream: received a stateless reset with token 66c886a79a8b42e80fbfba9adc8d0a17", "finished (s)": 0.001947751}
2023-10-02T10:05:51.288Z	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWDr92PrFG1kKRJJn2VPK7uZB9TAgBv76oB3gti1sw3yEp", "hash": "6FD808ABDC9588006F9021A017D705BFA0706147B0C76A786324BC378AEDA135", "err": "failed to read status from stream: received a stateless reset with token 66c886a79a8b42e80fbfba9adc8d0a17"}
2023-10-02T10:05:51.288Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "6FD808ABDC9588006F9021A017D705BFA0706147B0C76A786324BC378AEDA135", "peer": "12D3KooWDr92PrFG1kKRJJn2VPK7uZB9TAgBv76oB3gti1sw3yEp", "attempt": 117, "err": "failed to read status from stream: received a stateless reset with token 66c886a79a8b42e80fbfba9adc8d0a17", "finished (s)": 0.004159945}
2023-10-02T10:05:51.288Z	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWDr92PrFG1kKRJJn2VPK7uZB9TAgBv76oB3gti1sw3yEp", "hash": "825317CFD6EF177FF78C936EBC4BC422A700BEA0FB95D5BD56BB17FDD0F66012", "err": "failed to read status from stream: received a stateless reset with token 66c886a79a8b42e80fbfba9adc8d0a17"}
2023-10-02T10:05:51.289Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "825317CFD6EF177FF78C936EBC4BC422A700BEA0FB95D5BD56BB17FDD0F66012", "peer": "12D3KooWDr92PrFG1kKRJJn2VPK7uZB9TAgBv76oB3gti1sw3yEp", "attempt": 149, "err": "failed to read status from stream: received a stateless reset with token 66c886a79a8b42e80fbfba9adc8d0a17", "finished (s)": 0.004480504}
2023-10-02T10:05:51.292Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "825317CFD6EF177FF78C936EBC4BC422A700BEA0FB95D5BD56BB17FDD0F66012", "peer": "12D3KooWMqhVxL2tnGWc9MMXkSXqWM3ZWj7WuhTDriJQxw6oJKh3", "attempt": 150, "err": "share: data not found", "finished (s)": 0.001472723}
2023-10-02T10:05:51.292Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "C91F56D6E062D9AF64755CC87EF68F5F41DF88121AB7E12AEB0B82159C351718", "peer": "12D3KooWMqhVxL2tnGWc9MMXkSXqWM3ZWj7WuhTDriJQxw6oJKh3", "attempt": 6, "err": "share: data not found", "finished (s)": 0.00152459}
2023-10-02T10:05:51.293Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "6FD808ABDC9588006F9021A017D705BFA0706147B0C76A786324BC378AEDA135", "peer": "12D3KooWMqhVxL2tnGWc9MMXkSXqWM3ZWj7WuhTDriJQxw6oJKh3", "attempt": 118, "err": "share: data not found", "finished (s)": 0.001646998}
2023-10-02T10:05:51.308Z	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooW9yNjW9MWQbKyoxTeZW2N88HHq7RMz4Ucx5EKosuSWM6u", "hash": "C91F56D6E062D9AF64755CC87EF68F5F41DF88121AB7E12AEB0B82159C351718", "err": "failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument"}
2023-10-02T10:05:51.308Z	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooW9yNjW9MWQbKyoxTeZW2N88HHq7RMz4Ucx5EKosuSWM6u", "hash": "825317CFD6EF177FF78C936EBC4BC422A700BEA0FB95D5BD56BB17FDD0F66012", "err": "failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument"}
2023-10-02T10:05:51.308Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "C91F56D6E062D9AF64755CC87EF68F5F41DF88121AB7E12AEB0B82159C351718", "peer": "12D3KooW9yNjW9MWQbKyoxTeZW2N88HHq7RMz4Ucx5EKosuSWM6u", "attempt": 7, "err": "failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument", "finished (s)": 0.013529646}
2023-10-02T10:05:51.308Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "825317CFD6EF177FF78C936EBC4BC422A700BEA0FB95D5BD56BB17FDD0F66012", "peer": "12D3KooW9yNjW9MWQbKyoxTeZW2N88HHq7RMz4Ucx5EKosuSWM6u", "attempt": 151, "err": "failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument", "finished (s)": 0.013803829}
2023-10-02T10:05:51.308Z	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooW9yNjW9MWQbKyoxTeZW2N88HHq7RMz4Ucx5EKosuSWM6u", "hash": "6FD808ABDC9588006F9021A017D705BFA0706147B0C76A786324BC378AEDA135", "err": "failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument"}
2023-10-02T10:05:51.308Z	DEBUG	share/getters	getters/shrex.go:177	eds: request failed	{"hash": "6FD808ABDC9588006F9021A017D705BFA0706147B0C76A786324BC378AEDA135", "peer": "12D3KooW9yNjW9MWQbKyoxTeZW2N88HHq7RMz4Ucx5EKosuSWM6u", "attempt": 119, "err": "failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument", "finished (s)": 0.014074233}

@distractedm1nd
Copy link
Collaborator Author

Another observation: The daser was stuck for some reason on old headers. Why it didn't see that these were outside of the recency window and ignored, I have no idea yet. But something interesting is clear in the metrics, and that is that full nodes are unable to keep up with network head:

image

@ramin
Copy link
Contributor

ramin commented Nov 27, 2023

lets close this @distractedm1nd

@ramin ramin closed this as completed Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:storage kind:testing Related to unit tests
Projects
None yet
Development

No branches or pull requests

2 participants