Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batched I/O for CO OT #13

Merged
merged 1 commit into from
Jul 25, 2023
Merged

Conversation

rozag
Copy link
Contributor

@rozag rozag commented Jul 25, 2023

Backstory

I've been working on a server that acts as a garbler for several quite large circuits. It's performance was inappropriately poor. Profiler made it clear that most of the time we wait for connection read/write. After digging some more I've noticed that ot.CO makes a lot of very small flushes. In my case I have ~20 MiB of data transferred over the TCP connection, which would be fine for several huge batches, but ot.CO Send and Recieve were sending 40-72 bytes, flushing, and waiting for the "response" to arrive before sending the next part. It is mentioned in the ot/co.go header that the implementation is based on emp-toolkit. I've noticed that they do this stage separation of write everything first, then flush, then wait for response. And CO does a loop with send-flush-read inside which makes p2p.Conn flush these small batches of data.

Fix

I propose to use the same stage separation approach. Basically, here we only flush once after everything is written. The downside: we'll loop twice, but it's still sort of O(n). I've already tested this fix with my server, observed behavior: 4-5 minutes of execution before, 12-14 seconds after, which is a huge difference.

Tests

$ cd ot && go test -v --count 1
Output
=== RUN   TestCO
--- PASS: TestCO (0.00s)
=== RUN   TestLabel
--- PASS: TestLabel (0.00s)
=== RUN   TestOTCO
--- PASS: TestOTCO (0.01s)
=== RUN   TestOTRSA
--- PASS: TestOTRSA (0.95s)
=== RUN   TestPipe
--- PASS: TestPipe (0.00s)
PASS
ok      github.com/markkurossi/mpc/ot   1.328s

Benchmarks

$ cd ot && go test --bench .

This change doesn't seem to be influencing current benchmarks much:

Before
goos: darwin
goarch: amd64
pkg: github.com/markkurossi/mpc/ot
cpu: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
BenchmarkCO-8                       3817            277631 ns/op
BenchmarkLabelMul2-8            542378509                2.217 ns/op
BenchmarkLabelMul4-8            522979458                2.205 ns/op
BenchmarkLabelXor-8             1000000000               0.3096 ns/op
BenchmarkOTCO_1-8                   4683            238019 ns/op
BenchmarkOTCO_8-8                   1036           1133589 ns/op
BenchmarkOTCO_16-8                   537           2197241 ns/op
BenchmarkOTCO_32-8                   278           4344838 ns/op
BenchmarkOTCO_64-8                   139           8550312 ns/op
BenchmarkOTRSA_2048_1-8               97          12104427 ns/op
BenchmarkOTRSA_2048_8-8               12          97815929 ns/op
BenchmarkOTRSA_2048_64-8               2         779272778 ns/op
BenchmarkRSA512-8                   3300            349835 ns/op
BenchmarkRSA1024-8                   619           1904981 ns/op
BenchmarkRSA2048-8                    96          12051810 ns/op
PASS
ok      github.com/markkurossi/mpc/ot   23.257s
After
goos: darwin
goarch: amd64
pkg: github.com/markkurossi/mpc/ot
cpu: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
BenchmarkCO-8                       4022            278031 ns/op
BenchmarkLabelMul2-8            541403588                2.209 ns/op
BenchmarkLabelMul4-8            547191450                2.188 ns/op
BenchmarkLabelXor-8             1000000000               0.3042 ns/op
BenchmarkOTCO_1-8                   4851            243142 ns/op
BenchmarkOTCO_8-8                    739           1596737 ns/op
BenchmarkOTCO_16-8                   370           3189234 ns/op
BenchmarkOTCO_32-8                   182           6561498 ns/op
BenchmarkOTCO_64-8                    93          12940329 ns/op
BenchmarkOTRSA_2048_1-8               98          12186653 ns/op
BenchmarkOTRSA_2048_8-8               12          96631495 ns/op
BenchmarkOTRSA_2048_64-8               2         772423033 ns/op
BenchmarkRSA512-8                   3055            353650 ns/op
BenchmarkRSA1024-8                   634           1876732 ns/op
BenchmarkRSA2048-8                    99          12032335 ns/op
PASS
ok      github.com/markkurossi/mpc/ot   23.513s

At first I wanted to update ot.Pipe with something like NewLatencyPipe(time.Duration) and to perform time.Sleep on e.g. each Write to better represent my case of flushing often on a real TCP connection. The problem with Pipe here is that it doesn't really use Flush, it simply reads/writes on demand. p2p.Conn over something pipe-like with this latency emulation would probably be a more representative benchmark, but io.Pipe won't do in this case, we'd deadlock. djherbis/nio has a nice pipe that'd probably do, but adding dependency for a benchmark seems to be an overkill. So, I decided to avoid benchmark changes at the moment, but would love to hear your opinion on this. If you find the proposed approach reasonable, I can do it.

@markkurossi markkurossi merged commit 68e0663 into markkurossi:master Jul 25, 2023
2 checks passed
@markkurossi
Copy link
Owner

Thanks for the pull request, makes sense. My initial idea was to optimize the memory allocations by doing each bit OT separately but of course this write-sync-read loop kills performance.

No need to simulate the latency on the benchmark, I would rather avoid the extra dependency.

@rozag rozag deleted the ot-batching branch July 25, 2023 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants