Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove all cryptoxide calls from fake_full_tx() #286

Merged
merged 1 commit into from
Dec 12, 2021

Conversation

rooooooooob
Copy link
Contributor

Profiling the tx builder showed that even after we removed the
fake_key_root construction in #214 due to terrible performance in asm.js
(#213) that the cryptoxide calls within fake_full_tx() were up a
ridiculous amount of the runtime. I investigated this as we had a report
that it was taking up to hundreds of milliseconds (in rust) just to make a tx and
I wanted to see if there would be any significant improvement by
migrating to an idiomatic rust API as discussed in #276 but from a
purely performance perspective in a release build (opt-level=3) this seemed to
be pretty minor, even with the unnecessary cloning forced by this
inside of the implementation.

This will need to be slightly modified once #273 gets merged as there are
some conflicts.

Previous perf results from building 100,000 simple tx bodies (no witnesses) and serializing
them:

-   53.64%        [.] <&cryptoxide::curve25519::Fe as
core::ops::arith::Mul>::mul
    - 53.63% 0xffffffffffffffff
      + 23.03% cardano_serialization_lib::tx_builder::min_fee
      - 21.32%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          - 15.21%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
              cardano_serialization_lib::tx_builder::min_fee
            + cardano_serialization_lib::tx_builder::fake_full_tx
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      - 7.78% ser_lib_perf::main
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::build
          + 1.67%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
      + 1.46% std::rt::lang_start_internal
-   19.40%        [.] cryptoxide::curve25519::Fe::square
    - 0xffffffffffffffff
      - 8.34% cardano_serialization_lib::tx_builder::min_fee
            cardano_serialization_lib::tx_builder::TransactionBuilder::build
            cardano_serialization_lib::tx_builder::TransactionBuilder::build_and_size
          + cardano_serialization_lib::tx_builder::fake_full_tx
      - 7.41%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          + 5.58%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
          + 1.83%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      + 2.77% ser_lib_perf::main
      + 0.88% std::rt::lang_start_internal
+    7.80%        [.] cryptoxide::curve25519::GePrecomp::maybe_set
+    5.13%        [.] cryptoxide::curve25519::Fe::invert
+    3.98%        [.] <&cryptoxide::curve25519::GeP3 as
core::ops::arith::Add<&cryptoxide::curve25519::GePrecomp>>::add
+    2.84%        [.] cryptoxide::curve25519::GePrecomp::select
+    1.22%        [.] cryptoxide::curve25519::ge_scalarmult_base
+    1.00%        [.]
cryptoxide::sha2::impl512::reference::digest_block_u64
      0.47%        [.] cryptoxide::curve25519::GeP2::dbl
      0.13%        [.] cryptoxide::curve25519::sc_muladd
      0.12%        [.] cryptoxide::curve25519::Fe::to_bytes
      0.12%        [.] cryptoxide::curve25519::sc_reduce
      0.10%        [.] cbor_event::se::Serializer<W>::write_type
      0.07%        [.]
cardano_serialization_lib::tx_builder::fake_full_tx

which shows that almost the entire runtime was spent on cryptoxide calls
which were only necessary for fake_full_tx().

Running it again afterwards there is no overwhelming bottleneck anymore
and the remaining runtime was fairly distributed.

Profiling the tx builder showed that even after we removed the
fake_key_root construction in #214 due to terrible performance in asm.js
(#213) that the cryptoxide calls within `fake_full_tx()` were up a
ridiculous amount of the runtime. I investigated this as we had a report
that it was taking up to hundreds of milliseconds just to make a tx and
I wanted to see if there would be any significant improvement by
migrating to an idiomatic rust API as discussed in #276 but from a
purely performance perspective in a release build (opt-level=3) this seemed to
be pretty minor, even with the unnecessary cloning forced by this
inside of the implementation.

This will need to be slightly modified once #273 gets merged as there are
some conflicts.

Previous perf results from building 100,000 simple txs and serializing
them:
```
-   53.64%        [.] <&cryptoxide::curve25519::Fe as
core::ops::arith::Mul>::mul
    - 53.63% 0xffffffffffffffff
      + 23.03% cardano_serialization_lib::tx_builder::min_fee
      - 21.32%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          - 15.21%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
              cardano_serialization_lib::tx_builder::min_fee
            + cardano_serialization_lib::tx_builder::fake_full_tx
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      - 7.78% ser_lib_perf::main
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::build
          + 1.67%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
      + 1.46% std::rt::lang_start_internal
-   19.40%        [.] cryptoxide::curve25519::Fe::square
    - 0xffffffffffffffff
      - 8.34% cardano_serialization_lib::tx_builder::min_fee
            cardano_serialization_lib::tx_builder::TransactionBuilder::build
            cardano_serialization_lib::tx_builder::TransactionBuilder::build_and_size
          + cardano_serialization_lib::tx_builder::fake_full_tx
      - 7.41%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          + 5.58%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
          + 1.83%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      + 2.77% ser_lib_perf::main
      + 0.88% std::rt::lang_start_internal
+    7.80%        [.] cryptoxide::curve25519::GePrecomp::maybe_set
+    5.13%        [.] cryptoxide::curve25519::Fe::invert
+    3.98%        [.] <&cryptoxide::curve25519::GeP3 as
core::ops::arith::Add<&cryptoxide::curve25519::GePrecomp>>::add
+    2.84%        [.] cryptoxide::curve25519::GePrecomp::select
+    1.22%        [.] cryptoxide::curve25519::ge_scalarmult_base
+    1.00%        [.]
cryptoxide::sha2::impl512::reference::digest_block_u64
      0.47%        [.] cryptoxide::curve25519::GeP2::dbl
      0.13%        [.] cryptoxide::curve25519::sc_muladd
      0.12%        [.] cryptoxide::curve25519::Fe::to_bytes
      0.12%        [.] cryptoxide::curve25519::sc_reduce
      0.10%        [.] cbor_event::se::Serializer<W>::write_type
      0.07%        [.]
cardano_serialization_lib::tx_builder::fake_full_tx
```

which shows that almost the entire runtime was spent on cryptoxide calls
which were only necessary for `fake_full_tx()`.

Running it again afterwards there is no overwhelming bottleneck anymore
and the remaining runtime was fairly distributed.
@rooooooooob
Copy link
Contributor Author

I should note that those inner calls when I expanded the top few are inclusive of the underlying cryptoxide calls. The cryptoxide calls were something like 95%+ of the runtime. For exclusive samples not a single non-cryptoxide function took up more than a fraction of a percent of total samples.

This is also only the time for the transaction body creation (full TransactionBuilder flow + also serializing the resulting tx body and summing the total lengths of the generated CBOR to print out at the end so I could assure myself that the tx building or any other part wasn't being optimized out since this was a release build (with debug symbols) that I was profiling). This seems to line up with the massive difference in regular release builds (albeit in asm.js but I bet the difference is there in other compilation targets but they're just faster in general so no one notices if they're only building 1 tx) when someone removed the other crpyto code (I assume it was the worst offender) in #214

@vsubhuman vsubhuman added this to the 10.0.0 milestone Dec 12, 2021
@vsubhuman vsubhuman merged commit e0b33c6 into master Dec 12, 2021
@vsubhuman vsubhuman deleted the tx-builder-optimization-remove-unncessary-crypto branch December 12, 2021 23:17
@vsubhuman vsubhuman restored the tx-builder-optimization-remove-unncessary-crypto branch January 10, 2022 22:45
@vsubhuman vsubhuman deleted the tx-builder-optimization-remove-unncessary-crypto branch January 10, 2022 22:46
vsubhuman pushed a commit that referenced this pull request Jan 10, 2022
Profiling the tx builder showed that even after we removed the
fake_key_root construction in #214 due to terrible performance in asm.js
(#213) that the cryptoxide calls within `fake_full_tx()` were up a
ridiculous amount of the runtime. I investigated this as we had a report
that it was taking up to hundreds of milliseconds just to make a tx and
I wanted to see if there would be any significant improvement by
migrating to an idiomatic rust API as discussed in #276 but from a
purely performance perspective in a release build (opt-level=3) this seemed to
be pretty minor, even with the unnecessary cloning forced by this
inside of the implementation.

This will need to be slightly modified once #273 gets merged as there are
some conflicts.

Previous perf results from building 100,000 simple txs and serializing
them:
```
-   53.64%        [.] <&cryptoxide::curve25519::Fe as
core::ops::arith::Mul>::mul
    - 53.63% 0xffffffffffffffff
      + 23.03% cardano_serialization_lib::tx_builder::min_fee
      - 21.32%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          - 15.21%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
              cardano_serialization_lib::tx_builder::min_fee
            + cardano_serialization_lib::tx_builder::fake_full_tx
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      - 7.78% ser_lib_perf::main
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::build
          + 1.67%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
      + 1.46% std::rt::lang_start_internal
-   19.40%        [.] cryptoxide::curve25519::Fe::square
    - 0xffffffffffffffff
      - 8.34% cardano_serialization_lib::tx_builder::min_fee
            cardano_serialization_lib::tx_builder::TransactionBuilder::build
            cardano_serialization_lib::tx_builder::TransactionBuilder::build_and_size
          + cardano_serialization_lib::tx_builder::fake_full_tx
      - 7.41%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          + 5.58%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
          + 1.83%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      + 2.77% ser_lib_perf::main
      + 0.88% std::rt::lang_start_internal
+    7.80%        [.] cryptoxide::curve25519::GePrecomp::maybe_set
+    5.13%        [.] cryptoxide::curve25519::Fe::invert
+    3.98%        [.] <&cryptoxide::curve25519::GeP3 as
core::ops::arith::Add<&cryptoxide::curve25519::GePrecomp>>::add
+    2.84%        [.] cryptoxide::curve25519::GePrecomp::select
+    1.22%        [.] cryptoxide::curve25519::ge_scalarmult_base
+    1.00%        [.]
cryptoxide::sha2::impl512::reference::digest_block_u64
      0.47%        [.] cryptoxide::curve25519::GeP2::dbl
      0.13%        [.] cryptoxide::curve25519::sc_muladd
      0.12%        [.] cryptoxide::curve25519::Fe::to_bytes
      0.12%        [.] cryptoxide::curve25519::sc_reduce
      0.10%        [.] cbor_event::se::Serializer<W>::write_type
      0.07%        [.]
cardano_serialization_lib::tx_builder::fake_full_tx
```

which shows that almost the entire runtime was spent on cryptoxide calls
which were only necessary for `fake_full_tx()`.

Running it again afterwards there is no overwhelming bottleneck anymore
and the remaining runtime was fairly distributed.

(cherry picked from commit e0b33c6)
vsubhuman added a commit that referenced this pull request Jan 10, 2022
Remove all cryptoxide calls from fake_full_tx() (#286)
@vsubhuman vsubhuman mentioned this pull request Feb 6, 2022
40 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants