Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize on-disk deterministic masternode storage to reduce size of evodb #3017

Merged
merged 18 commits into from
Jul 9, 2019

Conversation

codablock
Copy link

This PR reduces the size of the evodb LevelDB instance from >700m to around ~180m and should also reduce future growth a lot. The optimizations are basically the following, but for details look into the commits.

  1. We store a diff of the MN list for every block. The diff is between the previous block and the current block. In the old version of CDeterministicMNListDiff, we stored a full-blown copy of the whole MN state (CDeterministicMNState) for each changed MN. So, even if just a single field (e.g. nPoSePenalty, which might change for every block) has changed, the whole state needs to be stored, which needs about 230 bytes per changed MN. In the new version, we only store the individual changed fields per MN. This is implemented in CDeterministicMNStateDiff. This reduces the size per MN to just a few bytes.
  2. Every 576 blocks we store a full snapshot of the current MN list. This resulted in a few MB per snapshot. This is reduced in this PR by omitting serialization of the mnUniquePropertyMap map, which can actually be simply reconstructed from the MN list. This still requires around 1.7MB per snapshot, but it's still a lot better then before. Future optimizations (another PR) might reduce this even further by deleting old and not needed snapshots (we can always reconstruct MN lists based on much older snapshots).
  3. Track internalId's for each MN and use this as a key in CDeterministicMNListDiff. This reduces the required bytes per changed MN in a diff by another ~30 bytes.
  4. Remove blockHash, prevBlockHash and nHeight from CDeterministicMNListDiff. These should actually be known already at the time it is loaded from evodb and then processed. This change required some refactoring in deterministic MN and quorum handling to use CBlockIndex* in many places where the block hash and/or height were used before.

This PR also includes upgrade code which is executed the first time Dash Core is started with an old evodb present. It simply goes through all the old diffs and converts them to the new format, while at the same time rewriting snapshots.

codablock added 7 commits July 7, 2019 10:24
This allows us to directly use READWRITE() on scripts and removes the need
for the ugly cast to CScriptBase. This commit also changes all Dash specific
uses of CScript to not use the cast.
This allows to compact the whole DB in one go.
This introduces CDeterministicMNStateDiff which requires to only store
fields on-disk which actually changed.
This map can be rebuilt by simply using AddMN for each deserialized MN.
…rnodes

The "internalId" is simply the number of MNs registered so far when the
new MN is added. It is deterministic and stays the same forever.
This reduces the used size on-disk.
1. Avoid full compare if dmn or state pointers match in BuildDiff
2. Use std::move when adding diff to listDiff in GetListForBlock
@codablock codablock added this to the 14.1 milestone Jul 7, 2019
codablock added 7 commits July 7, 2019 18:34
This allows us to switch CDeterministicMNManager::GetListForBlock to work
with CBlockIndex.
…dex*

Instead of requiring a block hash. This allows us to remove blockHash and
prevBlockHash from CDeterministicMNListDiff without the use of cs_main
locks in GetListForBlock.
…ng()

The deterministic MN manager is not fully initialized yet at the time this
is called, which results in an empty list being returned everytime.
Reuse the "best block" logic to figure out if an upgrade is needed. Also
use it to ensure that older nodes are unable to start after the upgrade
was performed.
@codablock codablock force-pushed the pr_evodb_reduce_size branch from a1ae17d to a7af16a Compare July 7, 2019 16:34
@UdjinM6
Copy link

UdjinM6 commented Jul 8, 2019

Crashes on the first shutdown after db upgrade for me with

Exception: type=std::__1::ios_base::failure, what="obj and buf not initialized: unspecified iostream_category error"
   0#: (0x108D203A9) serialize.h:288    - ReadCompactSize<CDataStream>
   1#: (0x108D2C28A) serialize.h:826    - Unserialize_impl<CDataStream, 28, unsigned char>
   2#: (0x108D2C1DF) script.h:658       - SerializationOp<CDataStream, CSerActionUnserialize>
   3#: (0x108D2BDA7) deterministicmns.h - CDeterministicMNState<CDataStream>
   4#: (0x108D2B812) type_traits:4506   - SerializationOp<CDataStream, CSerActionUnserialize>
   5#: (0x108D2B51A) memory:4329        - make_shared<const deserialize_type &, CDataStream &>
   6#: (0x108D2B034) memory:4225        - Unserialize<CDataStream>
   7#: (0x108D26B43) string:1406        - SerializationOp<CDataStream, CSerActionUnserialize>
   8#: (0x108D240FD) string:1507        - Read
   9#: (0x108CC9627) flat-database.h    - Dump
  10#: (0x108CC55DD) memory:2139        - PrepareShutdown
  11#: (0x108CD1FAF) init.cpp:366       - Shutdown
  12#: (0x108B03C4B) dash.cpp           - shutdown
  13#: (0x109C9D0E2) <unknown-file>     - ???

Looks like we need to bump CGovernanceManager::SERIALIZATION_VERSION_STRING adfb188 and fix the way it is checked 596c0e1

@codablock
Copy link
Author

@UdjinM6 Applied the 2 commits from you and also added another one (57e7cb7)

@UdjinM6
Copy link

UdjinM6 commented Jul 8, 2019

Yep, testing again and it seems to be working as expected now 👍

Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly tested ACK 👍

@UdjinM6 UdjinM6 merged commit 7a440d6 into dashpay:develop Jul 9, 2019
@codablock codablock deleted the pr_evodb_reduce_size branch August 7, 2019 09:56
@codablock
Copy link
Author

Added the backport-candidate-14.0.x label as a few MN operators are having issues with too large evodb folders atm.

codablock added a commit to codablock/dash that referenced this pull request Aug 7, 2019
…vodb (dashpay#3017)

* Implement CompactFull() in CDBWrapper

This allows to compact the whole DB in one go.

* Implement more compact version of CDeterministicMNListDiff

This introduces CDeterministicMNStateDiff which requires to only store
fields on-disk which actually changed.

* Avoid writing mnUniquePropertyMap to disk when storing snapshots

This map can be rebuilt by simply using AddMN for each deserialized MN.

* Implement Serialize/Unserialize in CScript

This allows us to directly use READWRITE() on scripts and removes the need
for the ugly cast to CScriptBase. This commit also changes all Dash specific
uses of CScript to not use the cast.

* Keep track of registeration counts and introduce internalID for masternodes

The "internalId" is simply the number of MNs registered so far when the
new MN is added. It is deterministic and stays the same forever.

* Use internalId as keys in MN list diffs

This reduces the used size on-disk.

* Two simple speedups in MN list diff handling

1. Avoid full compare if dmn or state pointers match in BuildDiff
2. Use std::move when adding diff to listDiff in GetListForBlock

* Implement upgrade code for old CDeterministicMNListDiff format to new format

* Track tipIndex instead of tipHeight/tipBlockHash

* Store and pass around CBlockIndex* instead of block hash and height

This allows us to switch CDeterministicMNManager::GetListForBlock to work
with CBlockIndex.

* Refactor CDeterministicMNManager::GetListForBlock to require CBlockIndex*

Instead of requiring a block hash. This allows us to remove blockHash and
prevBlockHash from CDeterministicMNListDiff without the use of cs_main
locks in GetListForBlock.

* Remove prevBlockHash, blockHash and nHeight from CDeterministicMNListDiff

* Remove access to determinisitcMNManager in CMasternodeMetaMan::ToString()

The deterministic MN manager is not fully initialized yet at the time this
is called, which results in an empty list being returned everytime.

* Better logic to determine if an upgrade is needed

Reuse the "best block" logic to figure out if an upgrade is needed. Also
use it to ensure that older nodes are unable to start after the upgrade
was performed.

* Return null block hash if it was requested with getmnlistdiff

* bump CGovernanceManager::SERIALIZATION_VERSION_STRING

* Check SERIALIZATION_VERSION_STRING before deserializing anything else

* Invoke Clear() before deserializing just to be sure
codablock added a commit to codablock/dash that referenced this pull request Aug 7, 2019
…vodb (dashpay#3017)

* Implement CompactFull() in CDBWrapper

This allows to compact the whole DB in one go.

* Implement more compact version of CDeterministicMNListDiff

This introduces CDeterministicMNStateDiff which requires to only store
fields on-disk which actually changed.

* Avoid writing mnUniquePropertyMap to disk when storing snapshots

This map can be rebuilt by simply using AddMN for each deserialized MN.

* Implement Serialize/Unserialize in CScript

This allows us to directly use READWRITE() on scripts and removes the need
for the ugly cast to CScriptBase. This commit also changes all Dash specific
uses of CScript to not use the cast.

* Keep track of registeration counts and introduce internalID for masternodes

The "internalId" is simply the number of MNs registered so far when the
new MN is added. It is deterministic and stays the same forever.

* Use internalId as keys in MN list diffs

This reduces the used size on-disk.

* Two simple speedups in MN list diff handling

1. Avoid full compare if dmn or state pointers match in BuildDiff
2. Use std::move when adding diff to listDiff in GetListForBlock

* Implement upgrade code for old CDeterministicMNListDiff format to new format

* Track tipIndex instead of tipHeight/tipBlockHash

* Store and pass around CBlockIndex* instead of block hash and height

This allows us to switch CDeterministicMNManager::GetListForBlock to work
with CBlockIndex.

* Refactor CDeterministicMNManager::GetListForBlock to require CBlockIndex*

Instead of requiring a block hash. This allows us to remove blockHash and
prevBlockHash from CDeterministicMNListDiff without the use of cs_main
locks in GetListForBlock.

* Remove prevBlockHash, blockHash and nHeight from CDeterministicMNListDiff

* Remove access to determinisitcMNManager in CMasternodeMetaMan::ToString()

The deterministic MN manager is not fully initialized yet at the time this
is called, which results in an empty list being returned everytime.

* Better logic to determine if an upgrade is needed

Reuse the "best block" logic to figure out if an upgrade is needed. Also
use it to ensure that older nodes are unable to start after the upgrade
was performed.

* Return null block hash if it was requested with getmnlistdiff

* bump CGovernanceManager::SERIALIZATION_VERSION_STRING

* Check SERIALIZATION_VERSION_STRING before deserializing anything else

* Invoke Clear() before deserializing just to be sure
MIPPL pushed a commit to biblepay/biblepay that referenced this pull request Nov 20, 2019
* commit '7d8eab2641023c78a72ccd6efc99fc35fd030a46': (32 commits)
  Add 0.14.0.3 change log to release-notes.md (dashpay#3055)
  Update release-notes.md for 0.14.0.3 (dashpay#3054)
  Bump version to 0.14.0.3 and copy release notes (dashpay#3053)
  Re-verify invalid IS sigs when the active quorum set rotated (dashpay#3052)
  Remove recovered sigs from the LLMQ db when corresponding IS locks get confirmed (dashpay#3048)
  Add "instantsendlocks" to getmempoolinfo RPC (dashpay#3047)
  Use fEnablePrivateSend instead of fPrivateSendRunning
  Show number of InstantSend locks in Debug Console (dashpay#2919)
  Optimize on-disk deterministic masternode storage to reduce size of evodb (dashpay#3017)
  Add "isValidMember" and "memberIndex" to "quorum memberof" and allow to specify quorum scan count (dashpay#3009)
  Implement "quorum memberof" (dashpay#3004)
  Bail out properly on Evo DB consistency check failures in ConnectBlock/DisconnectBlock (dashpay#3044)
  Do not count 0-fee txes for fee estimation (dashpay#3037)
  Fix broken link in PrivateSend info dialog (dashpay#3031)
  Merge pull request dashpay#3028 from PastaPastaPasta/backport-12588
  Add Dash Core Group codesign certificate (dashpay#3027)
  Fix osslsigncode compile issue in gitian-build (dashpay#3026)
  Backport bitcoin#12783: macOS: disable AppNap during sync (and mixing) (dashpay#3024)
  Remove support for InstantSend locked gobject collaterals (dashpay#3019)
  [v0.14.0.x] Update release notes for 0.14.0.2 (dashpay#3012)
  ...

# Conflicts:
#	.gitignore
#	.travis.yml
#	configure.ac
#	doc/man/biblepay-cli.1
#	doc/man/biblepay-qt.1
#	doc/man/biblepay-tx.1
#	doc/man/biblepayd.1
#	doc/release-notes.md
#	src/clientversion.h
#	src/qt/utilitydialog.cpp
barrystyle pushed a commit to PACGlobalOfficial/PAC that referenced this pull request Jan 22, 2020
…vodb (dashpay#3017)

* Implement CompactFull() in CDBWrapper

This allows to compact the whole DB in one go.

* Implement more compact version of CDeterministicMNListDiff

This introduces CDeterministicMNStateDiff which requires to only store
fields on-disk which actually changed.

* Avoid writing mnUniquePropertyMap to disk when storing snapshots

This map can be rebuilt by simply using AddMN for each deserialized MN.

* Implement Serialize/Unserialize in CScript

This allows us to directly use READWRITE() on scripts and removes the need
for the ugly cast to CScriptBase. This commit also changes all Dash specific
uses of CScript to not use the cast.

* Keep track of registeration counts and introduce internalID for masternodes

The "internalId" is simply the number of MNs registered so far when the
new MN is added. It is deterministic and stays the same forever.

* Use internalId as keys in MN list diffs

This reduces the used size on-disk.

* Two simple speedups in MN list diff handling

1. Avoid full compare if dmn or state pointers match in BuildDiff
2. Use std::move when adding diff to listDiff in GetListForBlock

* Implement upgrade code for old CDeterministicMNListDiff format to new format

* Track tipIndex instead of tipHeight/tipBlockHash

* Store and pass around CBlockIndex* instead of block hash and height

This allows us to switch CDeterministicMNManager::GetListForBlock to work
with CBlockIndex.

* Refactor CDeterministicMNManager::GetListForBlock to require CBlockIndex*

Instead of requiring a block hash. This allows us to remove blockHash and
prevBlockHash from CDeterministicMNListDiff without the use of cs_main
locks in GetListForBlock.

* Remove prevBlockHash, blockHash and nHeight from CDeterministicMNListDiff

* Remove access to determinisitcMNManager in CMasternodeMetaMan::ToString()

The deterministic MN manager is not fully initialized yet at the time this
is called, which results in an empty list being returned everytime.

* Better logic to determine if an upgrade is needed

Reuse the "best block" logic to figure out if an upgrade is needed. Also
use it to ensure that older nodes are unable to start after the upgrade
was performed.

* Return null block hash if it was requested with getmnlistdiff

* bump CGovernanceManager::SERIALIZATION_VERSION_STRING

* Check SERIALIZATION_VERSION_STRING before deserializing anything else

* Invoke Clear() before deserializing just to be sure
PastaPastaPasta pushed a commit that referenced this pull request Dec 27, 2022
<!--
*** Please remove the following help text before submitting: ***

Provide a general summary of your changes in the Title above

Pull requests without a rationale and clear improvement may be closed
immediately.

Please provide clear motivation for your patch and explain how it
improves
Dash Core user experience or Dash Core developer experience
significantly:

* Any test improvements or new tests that improve coverage are always
welcome.
* All other changes should have accompanying unit tests (see
`src/test/`) or
functional tests (see `test/`). Contributors should note which tests
cover
modified code. If no tests exist for a region of modified code, new
tests
  should accompany the change.
* Bug fixes are most welcome when they come with steps to reproduce or
an
explanation of the potential issue as well as reasoning for the way the
bug
  was fixed.
* Features are welcome, but might be rejected due to design or scope
issues.
If a feature is based on a lot of dependencies, contributors should
first
  consider building the system outside of Dash Core, if possible.
-->

## Issue being fixed or feature implemented
<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. -->


## What was done?
<!--- Describe your changes in detail -->
Removed code related to the upgrade for old `CDeterministicMNListDiff`
format to new format.
This was implemented in #3017.
I believe we can safely remove this now

## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran
to -->
<!--- see how your change affects other areas of the code, etc. -->


## Breaking Changes
<!--- Please describe any breaking changes your code introduces -->


## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes
that apply. -->
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have added or updated relevant unit/integration/functional/e2e
tests
- [ ] I have made corresponding changes to the documentation

**For repository code-owners and collaborators only**
- [x] I have assigned this pull request to a milestone
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants