Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BigNumber Type and syscalls for BPF programs #17082

Closed
wants to merge 59 commits into from
Closed

Add BigNumber Type and syscalls for BPF programs #17082

wants to merge 59 commits into from

Conversation

FrankC01
Copy link
Contributor

@FrankC01 FrankC01 commented May 6, 2021

@jackcmay - As per your comments... for review...

Problem

BigNum operations are not available in vanilla Solana for BPF programs. For big and prime numbers it is desirable to expose these capabilities to BPF programs.

In addition, hashing to a prime number is valuable as well.

Summary of Changes

  • Added budget costing for use of BigNumber syscalls as well as hashing to prime number. Can be reviewed and modified.
  • Added BigNumber operational syscalls and unit tests
  • Added BigNumber type and implementation. When used in BPF programs, passing a BigNumb back and forth to the syscalls does so via Box raw pointers

Fixes #

@FrankC01
Copy link
Contributor Author

FrankC01 commented May 7, 2021

@jackcmay Second commit should address the sanity failure

@jackcmay
Copy link
Contributor

jackcmay commented May 7, 2021

@joncinque @jstarry I think you guys have been using big numbers recently? If so, do you see a syscall for these types of things as a good solution?

@jstarry
Copy link
Member

jstarry commented May 8, 2021

@joncinque @jstarry I think you guys have been using big numbers recently? If so, do you see a syscall for these types of things as a good solution?

Yes, definitely. Had to make some compromises on the amount of precision used in token-lending in order to reduce compute cost of exponentiation, multiplication, and division. This is super useful if the compute cost is reasonable.

@FrankC01
Copy link
Contributor Author

FrankC01 commented May 9, 2021

@jackcmay From a PR completeness standpoint, is there anything else I should be adding to this?

@joncinque
Copy link
Contributor

This looks like a great set of syscalls to me. I'll echo @jstarry 's caveat, that as long as the compute usage is reasonable, these will definitely be useful. How were the current numbers decided?

@@ -21,6 +21,9 @@ solana-runtime = { path = "../../runtime", version = "=1.7.0" }
solana-sdk = { path = "../../sdk", version = "=1.7.0" }
solana_rbpf = "=0.2.8"
thiserror = "1.0"
hkdf = "0.11.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nice to keep these in alphabetical order

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 03af751

@@ -1,5 +1,8 @@
use crate::{alloc, BpfError};
use alloc::Alloc;
use blake3::traits::digest::Digest;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using cargo fmt on these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 03af751

BorshSerialize,
BorshDeserialize,
BorshSchema,
// Clone,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what rustfmt does, although I did find one other change with rustfmt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is clone commented out? Do you not want to support the default implementation of clone? If not, remove this line.


// Syscall interfaces
#[cfg(target_arch = "bpf")]
extern "C" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stick these in their representative functions below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 03af751

{
use openssl::bn::BigNum;
let braw: &BigNum = unsafe { &*(self.0 as *mut BigNum) };
write!(f, "{}", braw)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does bignum not already implement default?

}

impl AsRef<u64> for BigNumber {
fn as_ref(&self) -> &u64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackcmay No, it will always return the u64 of the BigNumber structure.0

}

/// Drop - removes the underlying BigNum
impl Drop for BigNumber {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that if someone hacks thier program they could force a leaked bignum in the runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need clarity here...

sdk/program/src/bignum.rs Outdated Show resolved Hide resolved
}

#[cfg(test)]
mod tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to see more comprehensive testing. Also would need testing from within a bpf program, check out the sha program in this or for an example: #16498

Copy link
Contributor Author

@FrankC01 FrankC01 May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added BPF program testing in 2b1f3e1

@@ -934,901 +1150,1892 @@ impl<'a> SyscallObject<BpfError> for SyscallSha256<'a> {
}
}

fn get_sysvar<T: std::fmt::Debug + Sysvar + SysvarId>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why github is breaking the diff up like this but it's making it very hard to read

let rwptr = Box::into_raw(bbox);
let bignum_ptr = rwptr as u64;
*big_number = bignum_ptr;
question_mark!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should consume first so that the call fails before doing any work if the caller doesn't have enough units left

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved consumes to early as possible depending on when size is available to calculate byte costs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 03af751

);
let bytes: f64 = byte_slice.len() as f64;
let bbox = Box::new(BigNum::from_slice(byte_slice).unwrap());
let rwptr = Box::into_raw(bbox);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This screems memory leak, is it possible to create a container in program space (above the syscall) and just have the syscall fill in the result?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pass back up a byte array and then call from_slice()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May need clarity here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model here looks like it is allocating a bignum type in the syscall and then passing back a raw pointer, which in program space is converted into a bignum type. This means the allocation is occurring in the runtime and relying on the program to trigger the drop. A malicious program could skip the drop resulting in a memory leak in the runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use from_bytes and to_bytes to pass a byte array between the two. That does lead to local allocations on both ends which isn't ideal but better then a possible memory leak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackcmay So, remember I am a Rust n00b:

  1. Instead of boxing and raw-pointers, send and receive byte arrays to/from syscalls and in the syscalls instantiate the BigNum's?
  2. Is there a way to pass in a Vec and translate it to type in a syscall? I've had trouble and can't seem to make it work.
  3. The reason for the Vec is that BigNum are ultimately variable length and can get quite large so knowing fixed buffers a priori would be daunting.

@@ -188,6 +226,25 @@ impl BpfComputeBudget {
max_cpi_instruction_size: 1280, // IPv6 Min MTU size
cpi_bytes_per_unit: 250, // ~50MB at 200,000 units
sysvar_base_cost: 100,
bignum_new_base_cost: 100,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the sense that these costs are too small. By chance have you done any perf measurements of the big-num calls on native?

@arthurgreef
Copy link
Contributor

The compute unit calculations are different for each of the following classes of operation.

  • Big number operations – new, drop, convert from bytes, convert to bytes, log.
  • Big number arithmetic operations – add, multiply, divide, exp.
  • Big number modular arithmetic operations – mod_mul, mod_inv, mod_sqr, mod_exp.
  • Hash operations – hash to prime, hashed generator, blake3 digest.

Big number operations. The base cost of 100 was chosen as that is what it cost to log and we really do not know how else to select or derive this number.

Big number arithmetic operations. The compute unit calculations are based on this (Ethereum precompile for big arithmetic issue] (ethereum/EIPs#101). The formula is GADDSUBBASE + GARITHWORD * ceil( / 32). The value for GARITHWORD is 6. The base cost of an operation GADDSUBBASE we took by logging the compute units required to perform the same operation in a Solana program using u64 numbers which is 15 or 30 units.

Big number modular arithmetic operations. The unit calculation uses the same formula from the Ethereum precompile for big arithmetic but the base cost is the cost of two operations in a Solana program on u64 numbers. For example, mod_mul base cost is the cost of a multiply operation plus the cost of a mod operation. We did not use the [Ethereum Big integer modular exponentiation precompile](EIP-198: Big integer modular exponentiation (ethereum.org)) formula for the mod_exp calculation but this is something we are looking for feedback on.

Hash operations. For hash operations we used the same formula from the Ethereum precompile for big arithmetic but the base costs vary based on the number of sub-operations. The blake3 digest base cost is the same as the SHA256 digest base cost. The hash generator has more sub-operatinos then a digest operation. The hash to prime operation has a variable number of loops which is a concern as I do not know the maximum number of loops required to find the next prime number if the digest is not a prime number. This is the most expensive hash operation and we wanted feedback on how to calculate compute units. It would not be acceptable for a hash to prime operation to fail for a 32 byte number.

When we were developing our program we moved all the operations in our RSA accumulator membership and non-membership proof verification to sys calls if an instruction resulted in us exceeding the 200_000 compute unit limit. When our program was able to run within the 200_000 compute units we tweaked the cost calculations so that we could still execute our proof verifications without exceeding the unit limit. It may be the case that the current Solana compute unit limits and the cost of executing the operations required for RSA accumulator proof verification make it impossible to execute these types of programs on Solana but I’m hoping that this is not the case.

@seanyoung
Copy link
Contributor

The solang compiler https://github.com/hyperledger-labs/solang could possibly use bignum operations if they are efficient enough for math of integer types > 64. However there are some issues with that.

  • Converting to/from BigNum is expensive, and needs to be done often. I have my doubts how well this will work out IRL.
  • Solang needs BigNum operations with overflow checks for BigNum of N bits width. This PR doesn't offer that.
  • The only relevant instruction not available in BPF is 64 bit mul with 128 bit result (imulq with result $rax/$rdax on x86). If BPF had that instruction then I don't know if this implementation is more efficient than a native one.

@FrankC01
Copy link
Contributor Author

@jackcmay - Checked in that addresses review comments I marked with 👍 , commented where I was not sure or did not address yet

@arthurgreef
Copy link
Contributor

@seanyoung these big number operations are useful for cryptographic protocols such as the RSA accumulator we are using. There is very little conversion between native numbers and big numbers in this protocol. Some RSA accumulator implementations use fixed width integers, u256 and u512, usually for hashing operations but we did not take that approach in this iteration.

@jackcmay
Copy link
Contributor

@FrankC01 Does this test pass for you locally:

thread 'test_program_bpf_finalize' panicked at 'assertion failed: `(left == right)`
left: `InstructionError(0, MissingAccount)`,
right: `InstructionError(0, ProgramFailedToComplete)`', tests/programs.rs:2427:5

@FrankC01
Copy link
Contributor Author

FrankC01 commented May 19, 2021

@jackcmay When I run either (in solana/programs/bpf) :
cargo test --features="bpf_c,bpf_rust" -- --nocapture test_program_bpf_sanity

or

cargo test --features="bpf_c,bpf_rust" -- --nocapture

I do not see that error locally. And here is what I captured from running:

[2021-05-19T08:20:39.545901000Z DEBUG solana_runtime::message_processor] Finalized account 2KSfz5hoWqr8vkBKKthsctpAP96vwdKcat7nHZ2dCXPB
[2021-05-19T08:20:39.549157000Z DEBUG solana_runtime::message_processor] Program 11111111111111111111111111111111 invoke [1]
[2021-05-19T08:20:39.549270000Z DEBUG solana_runtime::message_processor] Program 11111111111111111111111111111111 success
[2021-05-19T08:20:39.552253000Z DEBUG solana_runtime::message_processor] Program BfKnjeXBD7bQxRjpoK2VKNWQt1AGMU94mqn24EYDtGut invoke [1]
[2021-05-19T08:20:39.552498000Z DEBUG solana_runtime::message_processor] Program log: Finalize a program
[2021-05-19T08:20:39.552831000Z DEBUG solana_runtime::message_processor] Program BfKnjeXBD7bQxRjpoK2VKNWQt1AGMU94mqn24EYDtGut consumed 2131 of 200000 compute units
[2021-05-19T08:20:39.552921000Z DEBUG solana_runtime::message_processor] Program failed to complete: Program BPFLoader2111111111111111111111111111111111 not supported by inner instructions
[2021-05-19T08:20:39.553006000Z DEBUG solana_runtime::message_processor] Program BfKnjeXBD7bQxRjpoK2VKNWQt1AGMU94mqn24EYDtGut failed: Program failed to complete
test test_program_bpf_finalize ... ok

@jackcmay
Copy link
Contributor

jackcmay commented May 19, 2021

Hmm, I also don't see that error when running the exact test that CI is running :-(

I kicked CI, will try and resolve tomorrow

@jackcmay
Copy link
Contributor

The solang compiler https://github.com/hyperledger-labs/solang could possibly use bignum operations if they are efficient enough for math of integer types > 64. However there are some issues with that.

* Converting to/from BigNum is expensive, and needs to be done often. I have my doubts how well this will work out IRL.

* Solang needs BigNum operations with overflow checks for BigNum of N bits width. This PR doesn't offer that.

* The only relevant instruction not available in BPF is 64 bit mul with 128 bit result (imulq with result $rax/$rdax on x86). If BPF had that instruction then I don't know if this implementation is more efficient than a native one.

@seanyoung is the overflow checks something that can be added later or must it be part of these initial syscalls to meet Solang's requirements?

@FrankC01
Copy link
Contributor Author

@jackcmay It seems like a different error now (in coverage)... other PRs seem to have issues as well. Are any actually getting through?

@arthurgreef
Copy link
Contributor

@jackcmay The syscalls in this pull request do not provide fixed width integer arithmetic. For this you would need to use something like unit provided by parity. https://crates.io/crates/uint. These syscalls rely on OpenSSL to catch underflow and overflow errors.

@jackcmay
Copy link
Contributor

@FrankC01 Are you rebased up to the latest master?

@FrankC01
Copy link
Contributor Author

FrankC01 commented May 19, 2021

@FrankC01 Are you rebased up to the latest master?

Before each checkin i do
git fetch upstream
git merge upstream/master

jackcmay and others added 7 commits May 19, 2021 13:43
The current implementations use only the id and disregard other fields,
in particular wallclock. This can lead to bugs where an outdated
contact-info shadows or overrides a current one because they compare
equal.
* improve insert into map initially

* rework towards single code path

* rename

* update test
* don't log shrink metrics on first call

* simplify logic
- upgrade rustc to 1.52.1 and clang to 12.0
@FrankC01
Copy link
Contributor Author

@jackcmay @arthurgreef @seanyoung - Folks, I'm closing this PR as I had completely messed up the works. I will open a new Draft PR and reference this for the background

Apologies

@FrankC01 FrankC01 closed this May 20, 2021
@FrankC01 FrankC01 mentioned this pull request May 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants