Constant replaying of blockchain #390

khelle · 2017-09-04T16:42:44Z

Whenever I start my witness node it ALWAYS detects the unclean shutdown and replays the blockchain from zero. What's wrong with the implementation?

I use this command to run it:

/home/bts/node/programs/witness_node/witness_node --data-dir=/home/bts/.bts/trusted_node --rpc-endpoint=0.0.0.0:8090

Which then always starts with following:

2201957ms th_a       witness.cpp:88                plugin_initialize    ] witness plugin:  plugin_initialize() begin
2201969ms th_a       witness.cpp:99                plugin_initialize    ] Public Key: [...]
2201970ms th_a       witness.cpp:117               plugin_initialize    ] witness plugin:  plugin_initialize() end
2202032ms th_a       application.cpp:441           startup              ] Replaying blockchain due to: unclean shutdown detected
2202032ms th_a       application.cpp:330           operator()           ] Initializing database...
2215261ms th_a       db_management.cpp:51          reindex              ] reindexing blockchain
2215261ms th_a       db_management.cpp:104         wipe                 ] Wiping database

If I close the witness node by hitting CTRL+C (exactly once, then wait), the next time I boot it up, it detects the unclean shutdown. If I let it synchronize to the full extend first , this is what happens:

98.5628%   18180000 of 18445096   
Killed

It goes to 98% every time, and is being killed randomly on that stage, every time. Then, if I start it - "unclead shutdown detected". Why it does not backup the blockchain at all? Is there option to remove this "unclean detection"? What can I do? Witness node is unusuable for me right now.

I use Ubuntu 16.04 for that.

The text was updated successfully, but these errors were encountered:

pmconrad · 2017-09-04T16:55:55Z

You may have two object_database directories. Remove them both. (One in data-dir and one in data-dir/blockchain I think.)

Will be fixed with #339 .

khelle · 2017-09-04T17:00:31Z

This is being run by system service that ensure that working directory for that is /home/bts/.bts/trusted_node which is the same path as I entered in --data-dir. Do I need to do something more?

abitmore · 2017-09-04T17:01:04Z

It goes to 98% every time, and is being killed randomly on that stage, every time.

@khelle it's likey that you don't have enough RAM. See #378

khelle · 2017-09-04T17:02:04Z

I have 64 GB RAM on the dedicated server that runs this node exclusively. Its also a new and clean OS installation. Is it not enough?

abitmore · 2017-09-04T17:07:49Z

64 GB seems OK, but I'm not sure. Anyway you can monitor your server while replaying to find out if it's a RAM issue.

khelle · 2017-09-04T17:15:02Z

It never hit the RAM limit and was far from that during the process according to supervisors. Despite that, many things CAN happen during working with high load servers. Does it mean, anything happen wrong, then I will need to resync from zero?

abitmore · 2017-09-04T17:20:15Z

Check your system logs. "Killed" should be done by another process or the OS, but not the witness_node itself. It's true that a lot of resource is required for replaying, some VPS providers may have daemons kill other high-resouce-consuming processes randomly, but you said you're using a dedicated server, so I'm not sure if it's the case, say, it's still possible you have such daemon running in your server. Ask your hosting providers.

khelle · 2017-09-04T17:35:44Z

I have nothing like that. I do manual OS installation for each server. This also does not answer the question why after hitting CTRL+C exactly once and then waiting I need to replay everything from zero? I am administrating around 40 nodes at this moment, still growing in numbers, but BTS is the only one giving me so much issues. Its insane. I also see there were similar issues almost one year ago, not fixed but closed due to its age, ignored. What should I do now? Can the BTS even be stopped for maintenance without need to replay? How?

EDIT:
I tried deploying this on another server with another linux OS and have the same problem. Am I missing some step with configuration maybe?

pmconrad · 2017-09-04T20:30:03Z

Can the BTS even be stopped for maintenance without need to replay? How?

Sending SIGTERM during normal operation/sync will make the witness_node write its internal database to the blockchain/object_database directory, from which it will be read again on restart. This should work fine except under certain circumstances, like a crash of the OS or a SIGKILL. If that happens (indicated by the "unclean shutdown" message), stop the node, remove the object_database directories I mentioned above, and restart the node with the option --replay-blockchain.

The "Killed" message is a separate problem that triggers the first. Like abit said, this indicates that something kills the node from the outside. Typically the reason is running out of memory. To rule that out, use the option --max-ops-per-account=1000 for example, see https://github.com/bitshares/bitshares-core/wiki/Memory-reduction-for-nodes .

oxarbitrage · 2017-09-04T20:51:04Z

we need to get back into #339 and merge, even if there are cases where the replay is needed the pull have some improves in this subject.

abitmore · 2017-09-04T23:35:28Z

Currently a full node requires 40GB of RAM, and the number is increasing, I remember a few months back it's 30GB. If need to run witness_node and delayed_node on same machine, 80GB of RAM is required. Likely @khelle is running nodes for an exchange, so the best approach would be using --track-account and --partial-operations options combined.

khelle · 2017-09-06T09:42:14Z

Is there any documentation for --track-account and --partial-operations options?

oxarbitrage · 2017-09-06T11:44:43Z

https://github.com/bitshares/bitshares-core/wiki/Memory-reduction-for-nodes

oxarbitrage closed this as completed Sep 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constant replaying of blockchain #390

Constant replaying of blockchain #390

khelle commented Sep 4, 2017 •

edited

Loading

pmconrad commented Sep 4, 2017

khelle commented Sep 4, 2017

abitmore commented Sep 4, 2017

khelle commented Sep 4, 2017 •

edited

Loading

abitmore commented Sep 4, 2017

khelle commented Sep 4, 2017

abitmore commented Sep 4, 2017 •

edited

Loading

khelle commented Sep 4, 2017 •

edited

Loading

pmconrad commented Sep 4, 2017

oxarbitrage commented Sep 4, 2017

abitmore commented Sep 4, 2017

khelle commented Sep 6, 2017

oxarbitrage commented Sep 6, 2017

Constant replaying of blockchain #390

Constant replaying of blockchain #390

Comments

khelle commented Sep 4, 2017 • edited Loading

pmconrad commented Sep 4, 2017

khelle commented Sep 4, 2017

abitmore commented Sep 4, 2017

khelle commented Sep 4, 2017 • edited Loading

abitmore commented Sep 4, 2017

khelle commented Sep 4, 2017

abitmore commented Sep 4, 2017 • edited Loading

khelle commented Sep 4, 2017 • edited Loading

pmconrad commented Sep 4, 2017

oxarbitrage commented Sep 4, 2017

abitmore commented Sep 4, 2017

khelle commented Sep 6, 2017

oxarbitrage commented Sep 6, 2017

khelle commented Sep 4, 2017 •

edited

Loading

khelle commented Sep 4, 2017 •

edited

Loading

abitmore commented Sep 4, 2017 •

edited

Loading

khelle commented Sep 4, 2017 •

edited

Loading