Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add heuristic to determine contract origin from Wasm blob #1329

Merged
merged 11 commits into from
Oct 23, 2023

Conversation

smiasojed
Copy link
Collaborator

@smiasojed smiasojed commented Sep 12, 2023

Summary

Closes #784

  • [n] y/n | Does it introduce breaking changes?
  • [n] y/n | Is it dependent on the specific version of ink or pallet-contracts?

Add heuristic to determine contract origin from Wasm blob

Description

Introduced contract-analyze crate with detects the programming language of a smart contract from its WebAssembly (Wasm) binary code.
New crate has been integrated with cargo contract info command, which generates now the output:

          TrieId c7fa1c1ef70d66023c5ecf4a517d39ec908c92d56e5e80af00b222c441dd0f40
       Code Hash 0x2c1388d32d59b87d0a2b6208e3562a11bed29a2d46cf1e631ee1c211a41004c1
   Storage Items 1
 Storage Deposit 999500000
 Source Language ink!

The downside of implementing this change is that it results in the need to download the WebAssembly (Wasm) code from the blockchain for every "info" call.

Checklist before requesting a review

  • My code follows the style guidelines of this project
  • I have added an entry to CHANGELOG.md
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • Any dependent changes have been merged and published in downstream modules

@smiasojed smiasojed changed the title Sm/contract origin Add heuristic to determine contract origin from Wasm blob Sep 12, 2023
@SkymanOne
Copy link
Contributor

The downside of implementing this change is that it results in the need to download the WebAssembly (Wasm) code from the blockchain for every "info" call.

Perhaps you can cache it?

@smiasojed
Copy link
Collaborator Author

I am not sure if it is real problem the issue could be visible when you iterate over all contracts addresses on chain to call info on them. In this case having cache<hash, wasm> could improve the performance assuming that the same contract is instantiated many times.

crates/analyze/Cargo.toml Outdated Show resolved Hide resolved
Comment on lines 46 to 48
if import_section_first != "memory"
&& start_section.is_none()
&& custom_sections.peek().is_none()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% convinced by this heuristic, I know it is copied from that python script.

It is because it is identifying the language based on the combination of a lack of things, which has the potential for false positives.

I would prefer to match on the presence of something, where we can be reasonably sure that it is an ink contract. We could add a custom section like the other languages (although that excludes contracts built with older versions). Alternatively we could identify the signature of a commonly used function somehow? e.g. a quick glance at a few contracts I can spot something that similar to:

incrementer.wat

(func (;8;) (type 5) (param i32)
    (local i32 i32 i32 i32)
    global.get 0
    i32.const 16
    i32.sub
    local.tee 1
    global.set 0
    local.get 1
    i64.const 16384
    i64.store offset=4 align=4
    local.get 1
    i32.const 65536
    i32.store
    local.get 1
    i32.const 0
    call 9
    block  ;; label = @1

flipper.wat

(func (;11;) (type 6) (param i32)
    (local i32 i32 i32 i32)
    global.get 0
    i32.const 32
    i32.sub
    local.tee 1
    global.set 0
    local.get 1
    i64.const 16384
    i64.store offset=20 align=4
    local.get 1
    i32.const 65536
    i32.store offset=16
    local.get 1
    i32.const 0
    i32.store offset=12
    local.get 1
    i32.const 16
    i32.add
    local.get 1
    i32.const 12
    i32.add
    i32.const 4
    call 6
    block  ;; label = @1

Although not sure which function that corresponds to, but maybe there is another ink core library function we can find which is guaranteed to be included in the generated Wasm.

Anyway I would prefer to be more strict and miss some contracts that are written in ink rather than falsely categorising some contracts as having been written in ink.

Copy link
Collaborator Author

@smiasojed smiasojed Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this heuristic too. I only included it because the task was defined such way.
I can take a look to see if it could be done in a better way.

Copy link
Collaborator Author

@smiasojed smiasojed Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that basing on function presence can be risky, because inlining optimization can affect our logic. This function, which you are referring to, it is the deploy function. The advantage is that it will not be inlined, but its type is the same for Solang contracts.

For now I added check for the ink! function: pub fn deny_payment<E>() -> Result<(), DispatchError>
From the tests, it appears that its presence does not depend on the optimization type (size/performance) or its level, despite being defined with an inline attribute.
Stats from rococo wss://rococo-contracts-rpc.polkadot.io

total: 2778
ink!: 2643
Ask: 4
Solang: 22
Unknown: 109

Copy link
Collaborator

@ascjones ascjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right direction. We can consider adding a very small custom section to the Wasm for future ink versions, except obviously that will appear in every contract so that is the cost.

Comment on lines 45 to 53
if start_section.is_none()
&& (is_ink_function_present(&module) || has_function_name(&module, "ink_env"))
{
return Ok(Language::Ink)
} else if start_section.is_none() && has_custom_section(&module, "producers") {
return Ok(Language::Solidity)
} else if start_section.is_some() && has_custom_section(&module, "sourceMappingURL") {
return Ok(Language::AssemblyScript)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are reasonably certain that ink contracts will not have the custom sections of the solidity and ask, then we can change the order of this so that the ink if branch goes last.

This would reduce the possibility of a false positive ink identification, since it is still be possible for another type of contract to have a function matching the signature of deny_payment or transferred_value which has a call to value_transferred

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will change the order.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@smiasojed
Copy link
Collaborator Author

I think this is the right direction. We can consider adding a very small custom section to the Wasm for future ink versions, except obviously that will appear in every contract so that is the cost.

I can add an Ink custom section when the contract is built using the cargo-contract tool. A drawback of this solution is that if someone builds an Ink! contract without using the cargo-contract tool, the section will be missing.

Copy link
Collaborator

@ascjones ascjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think @cmichi should also have a quick look since he originally specified the issue in #784.

@ascjones ascjones merged commit c2b550b into master Oct 23, 2023
11 checks passed
@ascjones ascjones deleted the sm/contract-origin branch October 23, 2023 09:34
@smiasojed smiasojed mentioned this pull request Nov 30, 2023
@smiasojed smiasojed mentioned this pull request Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add heuristic to determine contract origin from Wasm blob
3 participants