Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/sphere/fft mem opt #28

Closed
wants to merge 10 commits into from
Closed

Conversation

lispc
Copy link

@lispc lispc commented Jan 30, 2023

we plan to run some tests on this branch, if nothing is broken and mem usage indeed is optimized much, we will make a new `scroll-dev-01XX" branch since this is a big new feature

fyi @spherel

@spherel
Copy link
Member

spherel commented Jan 30, 2023

Here is a draft explaining this optimization. Actually there are two points in this draft:

  1. partitioning the cosetFFT,
  2. exploiting different gate degrees.

By now only the first one has been implemented.

@kunxian-xia kunxian-xia self-requested a review January 31, 2023 03:24
@kunxian-xia
Copy link

kunxian-xia commented Feb 1, 2023

By running the modified plonk_api test which accepts K as an input (by this branch), we get the following stats on gpu-cluster machine:
K = 25, the running time is 21min15s with peak memory 124GiB.

[2023-02-01T09:32:57Z INFO  plonk_api] [before load params] mem free: 987 GiB, mem available: 986 GiB
[2023-02-01T09:37:13Z INFO  plonk_api] [loaded params] mem free: 981 GiB, mem available: 982 GiB
[2023-02-01T09:37:13Z INFO  halo2_proofs::plonk::keygen] num_fixed_cols: 7
[2023-02-01T09:37:13Z INFO  halo2_proofs::plonk::keygen] permutation: 12 cols
[2023-02-01T09:37:13Z INFO  halo2_proofs::plonk::keygen] num_selectors: 0
[2023-02-01T09:41:13Z INFO  plonk_api] [keygen finished] mem free: 940 GiB, mem available: 941 GiB
[2023-02-01T09:45:36Z INFO  halo2_proofs::plonk::prover] [after phase1] mem free: 926 GiB, mem available: 927 GiB
[2023-02-01T09:46:23Z INFO  halo2_proofs::plonk::prover] [after phase2] mem free: 914 GiB, mem available: 915 GiB
[2023-02-01T09:49:01Z INFO  halo2_proofs::plonk::prover] [after phase3] mem free: 908 GiB, mem available: 909 GiB
[2023-02-01T09:56:24Z INFO  halo2_proofs::plonk::prover] [after quotient h] mem free: 898 GiB, mem available: 899 GiB
[2023-02-01T10:02:28Z INFO  plonk_api] [create proof] mem free: 909 GiB, mem available: 910 GiB
test plonk_api ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1775.39s

	Command being timed: "cargo test --release --test plonk_api -- --nocapture"
	User time (seconds): 96234.61
	System time (seconds): 353.93
	Percent of CPU this job got: 5383%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 29:53.99
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 130144868
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 128
	Minor (reclaiming a frame) page faults: 285567295
	Voluntary context switches: 521405
	Involuntary context switches: 89166
	Swaps: 0
	File system inputs: 74872
	File system outputs: 5403512
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Without this pr, the plonk_api test needs 17min42s but the peak memory is 280GiB.

[2023-02-01T10:17:05Z INFO  plonk_api] [before load params] mem free: 984 GiB, mem available: 986 GiB
[2023-02-01T10:17:30Z INFO  plonk_api] [loaded params] mem free: 980 GiB, mem available: 982 GiB
[2023-02-01T10:17:30Z INFO  halo2_proofs::plonk::keygen] num_fixed_cols: 7
[2023-02-01T10:17:30Z INFO  halo2_proofs::plonk::keygen] permutation: 12 cols
[2023-02-01T10:17:30Z INFO  halo2_proofs::plonk::keygen] num_selectors: 0
[2023-02-01T10:23:33Z INFO  plonk_api] [keygen finished] mem free: 853 GiB, mem available: 856 GiB
[2023-02-01T10:28:01Z INFO  halo2_proofs::plonk::prover] [after phase1] mem free: 839 GiB, mem available: 842 GiB
[2023-02-01T10:28:48Z INFO  halo2_proofs::plonk::prover] [after phase2] mem free: 827 GiB, mem available: 830 GiB
[2023-02-01T10:32:38Z INFO  halo2_proofs::plonk::prover] [after phase3] mem free: 773 GiB, mem available: 775 GiB
[2023-02-01T10:35:02Z INFO  halo2_proofs::plonk::prover] [after quotient h] mem free: 768 GiB, mem available: 770 GiB
[2023-02-01T10:41:15Z INFO  plonk_api] [create proof] mem free: 823 GiB, mem available: 825 GiB
test plonk_api ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1457.04s

	Command being timed: "cargo test --release --test plonk_api -- --nocapture"
	User time (seconds): 63374.62
	System time (seconds): 288.65
	Percent of CPU this job got: 4327%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 24:31.17
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 293726392
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2
	Minor (reclaiming a frame) page faults: 252937119
	Voluntary context switches: 232665
	Involuntary context switches: 61065
	Swaps: 0
	File system inputs: 2008
	File system outputs: 427840
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@spherel
Copy link
Member

spherel commented Feb 2, 2023

It seems domain.coeff_to_extended() and domain.coeff_to_extended_parts have similar running time:
for k = 25, j = 8

domain.coeff_to_extended time: 20.481749943s
domain.coeff_to_extended_parts time: 20.976022433s

@spherel spherel changed the title Feat/tianyi/fft mem opt Feat/sphere/fft mem opt Feb 2, 2023
@lispc
Copy link
Author

lispc commented Feb 3, 2023

OK i will do some final review before merging

@spherel
Copy link
Member

spherel commented Feb 24, 2023

Add several features:

  • Group constraints in clusters, with degree from 2^0 to 2^(extended_k - k), each only requires a subset of parts to recover the coefficient representation.
  • Maintain used fixed, instance and advice columns in the current each cluster and avoid FFT calling for unnecessary columns when computing quotient polynomial h.

@mabbamOG
Copy link

Here is a draft explaining this optimization. Actually there are two points in this draft:

  1. partitioning the cosetFFT,
  2. exploiting different gate degrees.

By now only the first one has been implemented.

draft link is broken?

@spherel
Copy link
Member

spherel commented Apr 26, 2023

Here is a draft explaining this optimization. Actually there are two points in this draft:

  1. partitioning the cosetFFT,
  2. exploiting different gate degrees.

By now only the first one has been implemented.

draft link is broken?

Sorry, I am currently working on revising this blog post to provide a more comprehensive explanation and additional figures for better clarity. The updated version will be published soon.

@lispc
Copy link
Author

lispc commented Nov 30, 2023

merged

@lispc lispc closed this Nov 30, 2023
CPerezz added a commit to privacy-scaling-explorations/halo2 that referenced this pull request Feb 7, 2024
This incorporates the work done in
scroll-tech#28 in order to lower the
memory consumption significantly trading off for some performance.

A much more deep analysis can be found here: axiom-crypto#17
CPerezz added a commit to privacy-scaling-explorations/halo2 that referenced this pull request Feb 26, 2024
This incorporates the work done in
scroll-tech#28 in order to lower the
memory consumption significantly trading off for some performance.

A much more deep analysis can be found here: axiom-crypto#17
CPerezz added a commit to privacy-scaling-explorations/halo2 that referenced this pull request Mar 2, 2024
This incorporates the work done in
scroll-tech#28 in order to lower the
memory consumption significantly trading off for some performance.

A much more deep analysis can be found here: axiom-crypto#17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants