Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use partial parent cache and other methods to reduce the mem usage for sdr #1135

Closed
wants to merge 1 commit into from

Conversation

dtynn
Copy link

@dtynn dtynn commented May 31, 2020

tested on a machine

  • AMD Ryzen 9 3900X 12-Core Processor
  • 62.9GiB physical mem
  • 10GiB swap on ssd
  • 8T hdd (7200rpm) for staged file, sealed file & cache dir

with

  • 13:32:51 ~ 20:11:44 for seal_pre_commit_phase1
  • 20:11:44 ~ 23:04:04 for seal_pre_commit_phase2

TODOs:

  • patch bellman to serialize circuit synthesizing & calculating: bellman pr #81
  • use mixed-base-layer to see if we can reduce some more mem for seal_pre_commit_phase1
  • copying base layer to exp cost ~9min in the last 4 layers, with a high ratio of SWAPIN, which I think, can be optimized

see log here

env::var(SDR_CACHE_ENV_VAR)
.map(PathBuf::from)
.unwrap_or(PathBuf::from(PARAMETER_CACHE_DIR))
}
Copy link
Collaborator

@porcuquine porcuquine Jun 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For env var configuration, please use the mechanism in settings.rs for consistency and to consolidate in one place developers and users can keep track of.

@dignifiedquire
Copy link
Contributor

@dtynn what are the numbers you are seeing on that setup with the current implementation, so we have a comparison.

@dtynn
Copy link
Author

dtynn commented Jun 4, 2020

@dtynn what are the numbers you are seeing on that setup with the current implementation, so we have a comparison.

@dignifiedquire
considering for seal_pre_commit_phase1 only:

62.9GiB physical + 10GiB swap, both full used VS 56GiB (parent cache) + 64GiB, with 50% ~ 80% more time cost for each layer

if we use the official implementation without full cached parent indexes(FIL_PROOFS_MAXIMIZE_CACHING=0), mem usage reduces, but the time would be much longer (don't remember the exact numbers)

update

for official implementation without full cached parent indexes(FIL_PROOFS_MAXIMIZE_CACHING=0), used 2h11m for layer1
benchy binary compiled with

CC=clang RUSTFLAGS="-C target-cpu=native -C target-feature=+sse4.1,+sse4.2,+avx,+avx2,+sse2,+sha,+adx," cargo build --release -p fil-proofs-tooling --bin benchy

@dignifiedquire
Copy link
Contributor

@dtynn for the numbers you reported, what value for partial_num did you use?

@dtynn
Copy link
Author

dtynn commented Jun 15, 2020

@dtynn for the numbers you reported, what value for partial_num did you use?

@dignifiedquire
128

@dignifiedquire
Copy link
Contributor

Thanks a lot for this @dtynn! I have used this code to build #1163 which has similar properties, but better integrated into the machinery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants