Remove all cryptoxide calls from fake_full_tx() #286

rooooooooob · 2021-12-10T01:00:33Z

Profiling the tx builder showed that even after we removed the
fake_key_root construction in #214 due to terrible performance in asm.js
(#213) that the cryptoxide calls within fake_full_tx() were up a
ridiculous amount of the runtime. I investigated this as we had a report
that it was taking up to hundreds of milliseconds (in rust) just to make a tx and
I wanted to see if there would be any significant improvement by
migrating to an idiomatic rust API as discussed in #276 but from a
purely performance perspective in a release build (opt-level=3) this seemed to
be pretty minor, even with the unnecessary cloning forced by this
inside of the implementation.

This will need to be slightly modified once #273 gets merged as there are
some conflicts.

Previous perf results from building 100,000 simple tx bodies (no witnesses) and serializing
them:

-   53.64%        [.] <&cryptoxide::curve25519::Fe as
core::ops::arith::Mul>::mul
    - 53.63% 0xffffffffffffffff
      + 23.03% cardano_serialization_lib::tx_builder::min_fee
      - 21.32%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          - 15.21%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
              cardano_serialization_lib::tx_builder::min_fee
            + cardano_serialization_lib::tx_builder::fake_full_tx
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      - 7.78% ser_lib_perf::main
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::build
          + 1.67%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
      + 1.46% std::rt::lang_start_internal
-   19.40%        [.] cryptoxide::curve25519::Fe::square
    - 0xffffffffffffffff
      - 8.34% cardano_serialization_lib::tx_builder::min_fee
            cardano_serialization_lib::tx_builder::TransactionBuilder::build
            cardano_serialization_lib::tx_builder::TransactionBuilder::build_and_size
          + cardano_serialization_lib::tx_builder::fake_full_tx
      - 7.41%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          + 5.58%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
          + 1.83%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      + 2.77% ser_lib_perf::main
      + 0.88% std::rt::lang_start_internal
+    7.80%        [.] cryptoxide::curve25519::GePrecomp::maybe_set
+    5.13%        [.] cryptoxide::curve25519::Fe::invert
+    3.98%        [.] <&cryptoxide::curve25519::GeP3 as
core::ops::arith::Add<&cryptoxide::curve25519::GePrecomp>>::add
+    2.84%        [.] cryptoxide::curve25519::GePrecomp::select
+    1.22%        [.] cryptoxide::curve25519::ge_scalarmult_base
+    1.00%        [.]
cryptoxide::sha2::impl512::reference::digest_block_u64
      0.47%        [.] cryptoxide::curve25519::GeP2::dbl
      0.13%        [.] cryptoxide::curve25519::sc_muladd
      0.12%        [.] cryptoxide::curve25519::Fe::to_bytes
      0.12%        [.] cryptoxide::curve25519::sc_reduce
      0.10%        [.] cbor_event::se::Serializer<W>::write_type
      0.07%        [.]
cardano_serialization_lib::tx_builder::fake_full_tx

which shows that almost the entire runtime was spent on cryptoxide calls
which were only necessary for fake_full_tx().

Running it again afterwards there is no overwhelming bottleneck anymore
and the remaining runtime was fairly distributed.

Profiling the tx builder showed that even after we removed the fake_key_root construction in #214 due to terrible performance in asm.js (#213) that the cryptoxide calls within `fake_full_tx()` were up a ridiculous amount of the runtime. I investigated this as we had a report that it was taking up to hundreds of milliseconds just to make a tx and I wanted to see if there would be any significant improvement by migrating to an idiomatic rust API as discussed in #276 but from a purely performance perspective in a release build (opt-level=3) this seemed to be pretty minor, even with the unnecessary cloning forced by this inside of the implementation. This will need to be slightly modified once #273 gets merged as there are some conflicts. Previous perf results from building 100,000 simple txs and serializing them: ``` - 53.64% [.] <&cryptoxide::curve25519::Fe as core::ops::arith::Mul>::mul - 53.63% 0xffffffffffffffff + 23.03% cardano_serialization_lib::tx_builder::min_fee - 21.32% cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed - 15.21% cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output cardano_serialization_lib::tx_builder::min_fee + cardano_serialization_lib::tx_builder::fake_full_tx + 6.11% cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee (inlined) - 7.78% ser_lib_perf::main + 6.11% cardano_serialization_lib::tx_builder::TransactionBuilder::build + 1.67% cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed + 1.46% std::rt::lang_start_internal - 19.40% [.] cryptoxide::curve25519::Fe::square - 0xffffffffffffffff - 8.34% cardano_serialization_lib::tx_builder::min_fee cardano_serialization_lib::tx_builder::TransactionBuilder::build cardano_serialization_lib::tx_builder::TransactionBuilder::build_and_size + cardano_serialization_lib::tx_builder::fake_full_tx - 7.41% cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed + 5.58% cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output + 1.83% cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee (inlined) + 2.77% ser_lib_perf::main + 0.88% std::rt::lang_start_internal + 7.80% [.] cryptoxide::curve25519::GePrecomp::maybe_set + 5.13% [.] cryptoxide::curve25519::Fe::invert + 3.98% [.] <&cryptoxide::curve25519::GeP3 as core::ops::arith::Add<&cryptoxide::curve25519::GePrecomp>>::add + 2.84% [.] cryptoxide::curve25519::GePrecomp::select + 1.22% [.] cryptoxide::curve25519::ge_scalarmult_base + 1.00% [.] cryptoxide::sha2::impl512::reference::digest_block_u64 0.47% [.] cryptoxide::curve25519::GeP2::dbl 0.13% [.] cryptoxide::curve25519::sc_muladd 0.12% [.] cryptoxide::curve25519::Fe::to_bytes 0.12% [.] cryptoxide::curve25519::sc_reduce 0.10% [.] cbor_event::se::Serializer<W>::write_type 0.07% [.] cardano_serialization_lib::tx_builder::fake_full_tx ``` which shows that almost the entire runtime was spent on cryptoxide calls which were only necessary for `fake_full_tx()`. Running it again afterwards there is no overwhelming bottleneck anymore and the remaining runtime was fairly distributed.

rooooooooob · 2021-12-10T01:09:48Z

I should note that those inner calls when I expanded the top few are inclusive of the underlying cryptoxide calls. The cryptoxide calls were something like 95%+ of the runtime. For exclusive samples not a single non-cryptoxide function took up more than a fraction of a percent of total samples.

This is also only the time for the transaction body creation (full TransactionBuilder flow + also serializing the resulting tx body and summing the total lengths of the generated CBOR to print out at the end so I could assure myself that the tx building or any other part wasn't being optimized out since this was a release build (with debug symbols) that I was profiling). This seems to line up with the massive difference in regular release builds (albeit in asm.js but I bet the difference is there in other compilation targets but they're just faster in general so no one notices if they're only building 1 tx) when someone removed the other crpyto code (I assume it was the worst offender) in #214

Profiling the tx builder showed that even after we removed the fake_key_root construction in #214 due to terrible performance in asm.js (#213) that the cryptoxide calls within `fake_full_tx()` were up a ridiculous amount of the runtime. I investigated this as we had a report that it was taking up to hundreds of milliseconds just to make a tx and I wanted to see if there would be any significant improvement by migrating to an idiomatic rust API as discussed in #276 but from a purely performance perspective in a release build (opt-level=3) this seemed to be pretty minor, even with the unnecessary cloning forced by this inside of the implementation. This will need to be slightly modified once #273 gets merged as there are some conflicts. Previous perf results from building 100,000 simple txs and serializing them: ``` - 53.64% [.] <&cryptoxide::curve25519::Fe as core::ops::arith::Mul>::mul - 53.63% 0xffffffffffffffff + 23.03% cardano_serialization_lib::tx_builder::min_fee - 21.32% cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed - 15.21% cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output cardano_serialization_lib::tx_builder::min_fee + cardano_serialization_lib::tx_builder::fake_full_tx + 6.11% cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee (inlined) - 7.78% ser_lib_perf::main + 6.11% cardano_serialization_lib::tx_builder::TransactionBuilder::build + 1.67% cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed + 1.46% std::rt::lang_start_internal - 19.40% [.] cryptoxide::curve25519::Fe::square - 0xffffffffffffffff - 8.34% cardano_serialization_lib::tx_builder::min_fee cardano_serialization_lib::tx_builder::TransactionBuilder::build cardano_serialization_lib::tx_builder::TransactionBuilder::build_and_size + cardano_serialization_lib::tx_builder::fake_full_tx - 7.41% cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed + 5.58% cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output + 1.83% cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee (inlined) + 2.77% ser_lib_perf::main + 0.88% std::rt::lang_start_internal + 7.80% [.] cryptoxide::curve25519::GePrecomp::maybe_set + 5.13% [.] cryptoxide::curve25519::Fe::invert + 3.98% [.] <&cryptoxide::curve25519::GeP3 as core::ops::arith::Add<&cryptoxide::curve25519::GePrecomp>>::add + 2.84% [.] cryptoxide::curve25519::GePrecomp::select + 1.22% [.] cryptoxide::curve25519::ge_scalarmult_base + 1.00% [.] cryptoxide::sha2::impl512::reference::digest_block_u64 0.47% [.] cryptoxide::curve25519::GeP2::dbl 0.13% [.] cryptoxide::curve25519::sc_muladd 0.12% [.] cryptoxide::curve25519::Fe::to_bytes 0.12% [.] cryptoxide::curve25519::sc_reduce 0.10% [.] cbor_event::se::Serializer<W>::write_type 0.07% [.] cardano_serialization_lib::tx_builder::fake_full_tx ``` which shows that almost the entire runtime was spent on cryptoxide calls which were only necessary for `fake_full_tx()`. Running it again afterwards there is no overwhelming bottleneck anymore and the remaining runtime was fairly distributed. (cherry picked from commit e0b33c6)

Remove all cryptoxide calls from fake_full_tx() (#286)

rooooooooob requested a review from vsubhuman December 10, 2021 01:00

vsubhuman approved these changes Dec 12, 2021

View reviewed changes

vsubhuman added this to the 10.0.0 milestone Dec 12, 2021

vsubhuman mentioned this pull request Dec 12, 2021

BETA Release / 10.0.0-beta.8 #283

Merged

vsubhuman added the performance label Dec 12, 2021

vsubhuman merged commit e0b33c6 into master Dec 12, 2021

vsubhuman deleted the tx-builder-optimization-remove-unncessary-crypto branch December 12, 2021 23:17

rooooooooob mentioned this pull request Dec 16, 2021

add_change_if_needed very slow #271

Closed

vsubhuman restored the tx-builder-optimization-remove-unncessary-crypto branch January 10, 2022 22:45

vsubhuman deleted the tx-builder-optimization-remove-unncessary-crypto branch January 10, 2022 22:46

vsubhuman mentioned this pull request Jan 10, 2022

Remove all cryptoxide calls from fake_full_tx() (#286) #322

Merged

vsubhuman added a commit that referenced this pull request Jan 10, 2022

Merge pull request #322 from Emurgo/tx-builder-optimisation

bd49158

Remove all cryptoxide calls from fake_full_tx() (#286)

vsubhuman mentioned this pull request Feb 6, 2022

Release / 10.0.0 #358

Merged

40 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove all cryptoxide calls from fake_full_tx() #286

Remove all cryptoxide calls from fake_full_tx() #286

rooooooooob commented Dec 10, 2021

rooooooooob commented Dec 10, 2021

Remove all cryptoxide calls from fake_full_tx() #286

Remove all cryptoxide calls from fake_full_tx() #286

Conversation

rooooooooob commented Dec 10, 2021

rooooooooob commented Dec 10, 2021