-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Archive node loses peers (stop syncing blocks) #10724
Comments
@APshenkin if you specify a cache_size, parity will try to use all of that as cache, that is expected. Some memory will also be used beyond that during normal operation. Things I am concerned about - |
Yes I'm agreed, but it strange, that it stucks each time when cache fills (this was several times today already) Feel free to ask any other questions Will post some info here if this will occur again |
@joshua-mir So this occurs again
And in this case parity wasn't use all memory |
Also maybe it will help somehow: We use |
that's an extremely heavy rpc method - it could very well be one of the causes of the problem here if you see it taking a while to return a response. |
No, It's completes very fast. E.g. after syncing node after restart it completes ~ 150 success request per minute |
Just to warn you, 500gb will not be sufficient for an archive node (once you are in sync) - you will need upwards of 2tb. You don't want to hit a full disk either because it's a known issue that full disks can cause database corruption. |
Our node is already synced up 😄 parity folder size 2.6Tb |
Ah, you increased by 500gb, not to 500gb, misunderstood that 😅 |
Yes, sorry for that 🤷♂️ |
So 9 hours without stopping blocks import! |
We've got more than 1TB free and I've seen parity getting stuck this morning. Doesn't seem like the free space has any influence on the issue. |
Agreed. Today we rebooted our node and after this we noticed, that after blocks syncing it lost all peers and stucks again ( |
Related to #10626. I am having the exact same issue - 800GB free disk on 2 x 2TB SSD in raid0. |
Hi! I have like the same issue with my archive node, but after some time the peers come back and the parity prints some empty newlines instead of status message. Here is the log, the issue starts around This happens weekly 1-2 times. |
Today the node has stopped about 10 times. I have added reversed peer which points to my other parity node on the same server, so have "survived" many peer drop period. But now has been stopped syncing:
Only the restart helps, if the devs needs any trace i can help. |
@iFA88 @APshenkin can you please reproduce the issue with this trace enabled: -l sync=trace (add this to your Parity's startup parameters)? Thanks! |
Yes, I ran now with that parameter. When i use the following parameters UPDATE |
I will try to manage it. |
About 24 hour log: https://www.fusionsolutions.io/doc/paritylog.tar.gz (1.2gb) |
@iFA88 Thanks for the log! As I can see, the sync process in it is happening without stuck:
|
I have written now a script what gets the current block in every minute from my nodes, example: |
Hey, I think i have reproduced a little sister of the issue: |
@iFA88 can you please help with testing of the next patch? This time I'd like to apply the patch to your full node, the binary is here: https://gitlab.parity.io/parity/parity-ethereum/-/jobs/185203/artifacts/download Please keep the previously patched version for the archive node (if possible) and run both nodes with the same trace parameters as before (sync for archive, usual logging for full). In this build (except patch itself) one more log message added, that can spam your full node's log a little bit. I hope, that it's ok for you. |
So its something went very wrong.. Logs: Result: |
@iFA88 Thank again! Your help is invaluable! Is it possible to launch a short log session with both nodes patched and sync trace enabled on both as well? Just run them both and wait 1-2 mins |
@iFA88 you're saying, that archive node quits? Can you please explain in more details? Does it panic? How does it look like? From the logs it seems, that the node was simply shut down after several seconds :-( |
I will try recover my parity database. May that was for that I can only shut down the parity with SIGKILL.. UPDATE: |
:-( I would also recommend check your disk and memory, in order to make sure, that it was not the reason for the corruption (fsck and memtest). |
Still syncing.. It is on #5805037 block now.. At least going without issue :) |
Hey @iFA88 ! Has the sync completed? |
@grbIzl Still syncing.. best block is 6743002 now. |
Good that the node has cached many blocks, because the archive node drops lot of times my full node. I database is now 1.7T, so about 15% missing from the full size.. about 6-10 days. |
hey @APshenkin , i am facing the exact same problem with parity. |
I got critical error which i can not handle during syncing. Update: Update 2:
|
It's very sad to hear :-( It seems, that you bumped the bug in rocksDB implementation facebook/rocksdb#3509 We see its sometimes, but still not able to address. I'm thinking, what I can do in order to help with resolution, but frankly there are not so many options available now (and all of them are not quick) |
Greetings! My archive node still syncing, but now the same issue has happened on my full node: https://www.fusionsolutions.io/doc/ethlog.tar.gz The issue has started on This is the first time that this happened with my full node. |
Yes, I see this stuck in logs, but I cannot say, what caused it without corresponding log targets enabled. I suggest for the sake of simplicity to track this problem with the full node separately and create another issue, if it repeats. |
Sadly it is very hard to reproduce with the full node. I think the problem is the same for both, If my archive node finish with the sync then i can apply trace outputs. I need at least 30 days for that.. :( |
My archive node has finished syncing ( |
Ok. For the start, can you please confirm, which exact versions run on your both nodes (full and archive)? Some time ago I made patches for both of them, did you apply these patches? If not, please don't for now. I'll try to figure out first, if it's possible to help you with backup. |
I run |
@iFA88 would it be ok for you to use our backup (made and backuped on Parity archive node)? We use 2.5.8 there, but the diff doesn't affect the db |
When i understand correctly, you want download my archive node db and use it in your local? |
Actually vice versa. If anything happens on your archive node, you may download backup from us and apply it on your node. |
Alright, please, let we talk on gitter. |
Machine spec: 8 CPU 64GB ram, parity cache-size= 32GB
We have launched archive node and complete its sync. Today we update the node from 2.4.5. to 2.4.6 and run some software that fetch blocks information.
It's working, but peers connection is unstable and after some time node stop syncing blocks (it just displays Syncing 0.00 0.00 blk/s )
Below is examples of peers pump and down logs and last blocks (reversed order)
peers info
stuck on block
Also we have usual full node and it's not faced this issue (it's just continue syncing normaly)
After archive node restart it continuous syncing
Is it a bug? Or maybe we can do something to prevent restarting every two hours?
Thanks in advance
The text was updated successfully, but these errors were encountered: