Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant replaying of blockchain #390

Closed
khelle opened this issue Sep 4, 2017 · 13 comments
Closed

Constant replaying of blockchain #390

khelle opened this issue Sep 4, 2017 · 13 comments

Comments

@khelle
Copy link

khelle commented Sep 4, 2017

Whenever I start my witness node it ALWAYS detects the unclean shutdown and replays the blockchain from zero. What's wrong with the implementation?

I use this command to run it:

/home/bts/node/programs/witness_node/witness_node --data-dir=/home/bts/.bts/trusted_node --rpc-endpoint=0.0.0.0:8090

Which then always starts with following:

2201957ms th_a       witness.cpp:88                plugin_initialize    ] witness plugin:  plugin_initialize() begin
2201969ms th_a       witness.cpp:99                plugin_initialize    ] Public Key: [...]
2201970ms th_a       witness.cpp:117               plugin_initialize    ] witness plugin:  plugin_initialize() end
2202032ms th_a       application.cpp:441           startup              ] Replaying blockchain due to: unclean shutdown detected
2202032ms th_a       application.cpp:330           operator()           ] Initializing database...
2215261ms th_a       db_management.cpp:51          reindex              ] reindexing blockchain
2215261ms th_a       db_management.cpp:104         wipe                 ] Wiping database

If I close the witness node by hitting CTRL+C (exactly once, then wait), the next time I boot it up, it detects the unclean shutdown. If I let it synchronize to the full extend first , this is what happens:

98.5628%   18180000 of 18445096   
Killed

It goes to 98% every time, and is being killed randomly on that stage, every time. Then, if I start it - "unclead shutdown detected". Why it does not backup the blockchain at all? Is there option to remove this "unclean detection"? What can I do? Witness node is unusuable for me right now.

I use Ubuntu 16.04 for that.

@pmconrad
Copy link
Contributor

pmconrad commented Sep 4, 2017

You may have two object_database directories. Remove them both. (One in data-dir and one in data-dir/blockchain I think.)

Will be fixed with #339 .

@khelle
Copy link
Author

khelle commented Sep 4, 2017

This is being run by system service that ensure that working directory for that is /home/bts/.bts/trusted_node which is the same path as I entered in --data-dir. Do I need to do something more?

@abitmore
Copy link
Member

abitmore commented Sep 4, 2017

It goes to 98% every time, and is being killed randomly on that stage, every time.

@khelle it's likey that you don't have enough RAM. See #378

@khelle
Copy link
Author

khelle commented Sep 4, 2017

I have 64 GB RAM on the dedicated server that runs this node exclusively. Its also a new and clean OS installation. Is it not enough?

@abitmore
Copy link
Member

abitmore commented Sep 4, 2017

64 GB seems OK, but I'm not sure. Anyway you can monitor your server while replaying to find out if it's a RAM issue.

@khelle
Copy link
Author

khelle commented Sep 4, 2017

It never hit the RAM limit and was far from that during the process according to supervisors. Despite that, many things CAN happen during working with high load servers. Does it mean, anything happen wrong, then I will need to resync from zero?

@abitmore
Copy link
Member

abitmore commented Sep 4, 2017

Check your system logs. "Killed" should be done by another process or the OS, but not the witness_node itself. It's true that a lot of resource is required for replaying, some VPS providers may have daemons kill other high-resouce-consuming processes randomly, but you said you're using a dedicated server, so I'm not sure if it's the case, say, it's still possible you have such daemon running in your server. Ask your hosting providers.

@khelle
Copy link
Author

khelle commented Sep 4, 2017

I have nothing like that. I do manual OS installation for each server. This also does not answer the question why after hitting CTRL+C exactly once and then waiting I need to replay everything from zero? I am administrating around 40 nodes at this moment, still growing in numbers, but BTS is the only one giving me so much issues. Its insane. I also see there were similar issues almost one year ago, not fixed but closed due to its age, ignored. What should I do now? Can the BTS even be stopped for maintenance without need to replay? How?

EDIT:
I tried deploying this on another server with another linux OS and have the same problem. Am I missing some step with configuration maybe?

@pmconrad
Copy link
Contributor

pmconrad commented Sep 4, 2017

Can the BTS even be stopped for maintenance without need to replay? How?

Sending SIGTERM during normal operation/sync will make the witness_node write its internal database to the blockchain/object_database directory, from which it will be read again on restart. This should work fine except under certain circumstances, like a crash of the OS or a SIGKILL. If that happens (indicated by the "unclean shutdown" message), stop the node, remove the object_database directories I mentioned above, and restart the node with the option --replay-blockchain.

The "Killed" message is a separate problem that triggers the first. Like abit said, this indicates that something kills the node from the outside. Typically the reason is running out of memory. To rule that out, use the option --max-ops-per-account=1000 for example, see https://github.com/bitshares/bitshares-core/wiki/Memory-reduction-for-nodes .

@oxarbitrage
Copy link
Member

we need to get back into #339 and merge, even if there are cases where the replay is needed the pull have some improves in this subject.

@abitmore
Copy link
Member

abitmore commented Sep 4, 2017

Currently a full node requires 40GB of RAM, and the number is increasing, I remember a few months back it's 30GB. If need to run witness_node and delayed_node on same machine, 80GB of RAM is required. Likely @khelle is running nodes for an exchange, so the best approach would be using --track-account and --partial-operations options combined.

@khelle
Copy link
Author

khelle commented Sep 6, 2017

Is there any documentation for --track-account and --partial-operations options?

@oxarbitrage
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants