-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to move all subgraphs out of a shard #4371
Comments
We're outgrowing our database and would like to start sharding. Is there a way to add a new shard to the graph node and also configure it to create all new subgraphs in the new shard? |
I actually talked about sharding in the indexer office hours a few weeks ago (starts at ~ 12:00 in the recording) Be warned though that if you copy deployments across shards, the indexer agent will get stuck because of this issue |
@lutter just did my first deployment migration between shards and it went well! However, it's not clear how to move a deployment back to the original shard. Any chance you have some tips you can share? I'd assume than trying to EDIT: maybe the following is correct?
|
I just tried copying another subgraph several times and I keep getting errors like the following (each line is a different attempt):
However, when I query the database, I can see these blocks in the cache:
The command which produced the above errors is
I also tried the following with the same effect
Not sure if this is relevant but this subgraph is running against mainnet while the subgraph from the above comment which I moved successfully was running against matic. |
Interestingly, it seems the copy command calls this function which uses the
Now I'm wondering how the first subgraph was successfully copied... EDIT: maybe the |
If I run
and if I check the
...oh,
|
What could be causing the graph node to not populate the block cache for mainnet? It seems the cache stops at block ~1500
|
It seems there are large gaps in my block cache:
|
I decided to try another subgraph which was caught up to the tip of mainnet where I have a complete (ish) block cache. I ran the command to create a copy successfully
I then checked the secondary shard to see if the subgraph is there and indeed it is.
However it seems it's not sycning. That's when I realized this copy operation is effectively a graft based on the following:
I've been waiting a while for this graft to complete and I'm not seeing any progress in the graph node logs. I'm also not seeing any data in the new subgraph
And when I check the running queries in postgres I don't see anything in the secondary shard
Nor do I see any relevant queries in the primary shard where the subgraph is being copied from
|
Just found the following:
The subgraph is ~8gigs from the
Is it expected that the graft would still be at 0% after ~30 minutes? EDIT: it's now been over an hour and the graft is still at 0% EDIT 2: it's now the next day and still at 0%. I wonder if this is because I assigned the copy to the same indexer as the source? |
I tried to cancel the copy process by running the following:
and it seems to have removed the original deployment since I can't create a new copy:
nor find info on the deployment
EDIT: I manually redeployed the same subgraph and things are good now. I was a bit surprised that dropping |
Maybe this issue is because of #4394 and the fact that I'm still running 0.29.0? Does the copy operation invoke |
There are situations where it is desirable to move all subgraphs from one shard into another shard (e.g., to consolidate shards) It's currently possible to do that, but pretty laborious: for each subgraph in the shard that should be emptied, it is necessary to run the following commands manually:
It would be much better if
graph-node
could perform these steps without much user interaction. We should changegraphman copy create
to accept two new flags:--activate
: activate the copy when it has caught up with the chain head. For that,SyncStore::deployment_synced
will callprimary::Connection::activate
--replace
: when the copy has synced, do what--activate
does but also unassign the source of the copy so that it will eventually be deletedWith that, a shard can be emptied by running
graphman copy create --replace sgd<src> <dst shard> <index node>
for each subgraph in that shard. It should be possible to issue those commands all at once,graph-node
should work through that list on its own and appropriately limit load in case there is a huge number of deployments to move, though that needs to be tried out and tested carefullyThe text was updated successfully, but these errors were encountered: