Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Spurious issue with subgraph deployments #4724

Closed
1 of 3 tasks
paymog opened this issue Jun 27, 2023 · 6 comments · Fixed by #4812
Closed
1 of 3 tasks

[Bug] Spurious issue with subgraph deployments #4724

paymog opened this issue Jun 27, 2023 · 6 comments · Fixed by #4812
Assignees
Labels
bug Something isn't working

Comments

@paymog
Copy link

paymog commented Jun 27, 2023

Bug report

We have an end to end test suite which does many subgraph deployments. We're find that our tests spuriously fail with subgraph validation error: [the specified block must exist on the Ethereum network]. In our test suite, we always deploy subgraphs with against mainnet with start blocks in the range of 0-100.

We can't reliably reproduce the issue but do find it happens often on different tests. Our API calls the subgraph_deploy on graph node and this is the full message that's returned: subgraph validation error: [the specified block must exist on the Ethereum network].

We're running graph node v0.31.0.

Relevant log output

No response

IPFS hash

No response

Subgraph name or link to explorer

No response

Some information to help us out

  • Tick this box if this bug is caused by a regression found in the latest release.
  • Tick this box if this bug is specific to the hosted service.
  • I have searched the issue tracker to make sure this issue is not a duplicate.

OS information

None

@paymog paymog added the bug Something isn't working label Jun 27, 2023
@paymog
Copy link
Author

paymog commented Jul 7, 2023

Found some relevant logs from our e2e tests:

e2e-tests-graph-node-1   | Jul 07 07:50:18.580 INFO Received subgraph_create request, params: SubgraphCreateParams { name: SubgraphName("cljsa19l3001jc6joh17r1gq4") }, component: JsonRpcServer
e2e-tests-graph-node-1   | Jul 07 07:50:18.588 DEBG Created subgraph, subgraph_name: cljsa19l3001jc6joh17r1gq4, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Jul 07 07:50:18.590 INFO Received subgraph_deploy request, params: SubgraphDeployParams { name: SubgraphName("cljsa19l3001jc6joh17r1gq4"), ipfs_hash: DeploymentHash("QmX2wBD6mB5CcqXkp9XP1HfF9LrdKwFSA5eddXVj8sYSTB"), node_id: None, debug_fork: None, history_blocks: None }, component: JsonRpcServer
e2e-tests-graph-node-1   | Jul 07 07:50:19.164 ERRO subgraph_deploy failed, params: SubgraphDeployParams { name: SubgraphName("cljsa19l3001jc6joh17r1gq4"), ipfs_hash: DeploymentHash("QmX2wBD6mB5CcqXkp9XP1HfF9LrdKwFSA5eddXVj8sYSTB"), node_id: None, debug_fork: None, history_blocks: None }, error: ManifestValidationError([BlockNotFound("100")]), component: JsonRpcServer

Which is very unusual because block 100 definitely exists on mainnet which is the network that this subgraph is being deployed against. We also see that this test often succeeds.

@azf20
Copy link
Contributor

azf20 commented Aug 9, 2023

hey @paymog could you possibly run with TRACE logging in case there is more info on where this is failing?

@paymog
Copy link
Author

paymog commented Aug 9, 2023

hey @paymog could you possibly run with TRACE logging in case there is more info on where this is failing?

Totally, how do I do that? I searched for TRACE in MD files and don't see mentions of an env var which controls that.

EDIT: nvm found it! Here are some logs which seem relevant

e2e-tests-graph-node-1   | Aug 09 12:16:12.988 INFO Received subgraph_create request, params: SubgraphCreateParams { name: SubgraphName("cll3p2c3d0026pgjo6j139txr") }, component: JsonRpcServer
e2e-tests-graph-node-1   | Aug 09 12:16:12.990 DEBG Created subgraph, subgraph_name: cll3p2c3d0026pgjo6j139txr, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:12.993 INFO Received subgraph_deploy request, params: SubgraphDeployParams { name: SubgraphName("cll3p2c3d0026pgjo6j139txr"), ipfs_hash: DeploymentHash("QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m"), node_id: None, debug_fork: None, history_blocks: None }, component: JsonRpcServer
e2e-tests-graph-node-1   | Aug 09 12:16:12.993 TRCE IPFS cache miss, hash: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:12.993 TRCE Run with retry: IPFS stat, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:12.995 TRCE Run with retry: ipfs.cat, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:12.997 TRCE IPFS cache miss, hash: Qmbdxh2z2mwmwxzG27XRXjSm9rXZkvHTqsfeNuJEWMeQpb, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:12.997 TRCE Run with retry: IPFS stat, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:12.997 TRCE IPFS cache miss, hash: QmVPdBm7yEA5rDQgFJF6a7gwu851Gghnja1fShwCvwyhKB, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:12.997 TRCE Run with retry: IPFS stat, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:12.997 TRCE IPFS cache miss, hash: QmfWs6Jr41aAtEyiYgDn5GCbHMtnBEtN8DFJvKCL1x8zdw, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:12.997 TRCE Run with retry: IPFS stat, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:12.998 TRCE Run with retry: ipfs.cat, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:13.001 TRCE Run with retry: ipfs.cat, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:13.008 TRCE Run with retry: ipfs.cat, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:13.010 DEBG Connecting to firehose to retrieve block for number 17826813, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:13.055 DEBG Retrieving block(s) from firehose, sgd: 0, subgraph_id: QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m, component: SubgraphRegistrar
e2e-tests-graph-node-1   | Aug 09 12:16:13.055 ERRO subgraph_deploy failed, params: SubgraphDeployParams { name: SubgraphName("cll3p2c3d0026pgjo6j139txr"), ipfs_hash: DeploymentHash("QmVcAor2zRu4e5swombWEML2gGghq2LyG6x5mkVEf9nz1m"), node_id: None, debug_fork: None, history_blocks: None }, error: ManifestValidationError([BlockNotFound("17826814")]), component: JsonRpcServer

Maybe this is uncovering some reliability issues with firehose?

@sduchesneau
Copy link
Contributor

sduchesneau commented Aug 11, 2023

The error from firehose is eaten up (never displayed in the logs or back to the user) and there is no retry mechanism.

Err(e) => return Err(anyhow::format_err!("firehose error {}", e)),

Error could be one of:

  1. network connection issue
  2. TooManyRequests (backoff mechanism when getting a burst of connections)
  3. server shutting down between the request and the response (ex: due to autoscaling)
  4. other...

IMHO, the IPFS call is made with a retry, I don't see why the "get_block" from firehose shouldn't also be done with a retry mechanism.

@sduchesneau
Copy link
Contributor

This is where the (legitimate?) error gets replaced by "Block Not Found":

SubgraphManifestValidationError::BlockNotFound(min_start_block.to_string()),

The fetch block should be 1) retried (2 or 3 times is more than enough) and the error should bubble up (ex: connection error...) instead of being replaced by a BlockNotFound.

@azf20
Copy link
Contributor

azf20 commented Aug 14, 2023

good catch @sduchesneau!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants