Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve nondeterminism about AST IDs #11907

Closed
chriseth opened this issue Sep 7, 2021 · 9 comments
Closed

Solve nondeterminism about AST IDs #11907

chriseth opened this issue Sep 7, 2021 · 9 comments

Comments

@chriseth
Copy link
Contributor

chriseth commented Sep 7, 2021

The nondeterminism about AST IDs will always hurt us, and we should solve it once and for all.

The problem is that when you compile multiple unrelated files, the AST IDs get assigned uniquely for the whole compilation run.
This can result in different code depending on whether you compile a file in isolation or together with other files for the following reason:

These AST IDs are used in the code generator to create unique function names (for example for struct types in the ABI routines).
We are currently trying to strip the AST IDs as much as possible as long as the function names are still unique, but if this fails, it can happen that functions are sorted differently and thus the optimizer handles them in a different order or they are just laid out differently in the final code.

Proposals to overcome this problem:

Make AST IDs two-dimensional where the first dimension depends on the file. This can be the file name or the hash of the file content.

The problem here is that these IDs get very long and also it is a breaking change. We can keep the current AST IDs for anything that is not related to code generation. The length could be dealt with by specially marking those AST IDs in function names and reducing their length in the NameSimplifier (we already do something like that there). The NameSimplifier could start with one pass where it just collects all prefixes and replaces them with unique and shorter identifiers (either prefixes or just newly created numbers starting from 1).

@ekpyron
Copy link
Member

ekpyron commented Sep 7, 2021

We do generate code per-contract - can't we just have a mapping "global ID -> contract-local ID" that maps our current IDs to a new counter that is per-contract and increments based on codegen visit order, resp. first use in codegen? You considered that yourself earlier, not sure what made you drop the idea...
Especially, if we want to keep the current IDs for everything except codegen anyways.

@chriseth
Copy link
Contributor Author

chriseth commented Sep 7, 2021

Yes, that is the other way to do it. I'm a bit uneasy about the codegen visit order, and it would require Type::toString() to have access to this translation table.

@chriseth
Copy link
Contributor Author

chriseth commented Sep 7, 2021

Ah sorry, it's Type::richIdentifier(), and we could actually provide the mapping.

@chriseth
Copy link
Contributor Author

chriseth commented Sep 7, 2021

In

	string functionName =
		"abi_encode_" +
		_from.identifier() +
		"_to_" +
		to.identifier() +
		_options.toFunctionNameSuffix();

This might lead to .identifier() allocating a new ID, which means that the new ID will depend on the order in which this C++ expression is evaluated.

@ekpyron
Copy link
Member

ekpyron commented Sep 7, 2021

It's not exactly beautiful to pass a mapping to richIdentifier, but yeah, should work.

@ekpyron
Copy link
Member

ekpyron commented Sep 7, 2021

In

	string functionName =
		"abi_encode_" +
		_from.identifier() +
		"_to_" +
		to.identifier() +
		_options.toFunctionNameSuffix();

This might lead to .identifier() allocating a new ID, which means that the new ID will depend on the order in which this C++ expression is evaluated.

Yeah, that indeed makes it extremely error-prone...

@chriseth
Copy link
Contributor Author

chriseth commented Sep 7, 2021

It turns out that it might be better to make (a) the sol -> yul code generator output code that only depends on solidity source order and (b) check that the yul (util) function generation is deterministic. This is done in #11910

@axic
Copy link
Member

axic commented Sep 7, 2021

The problem here is that these IDs get very long and also it is a breaking change.

Why not take a truncated hash as the ID of the two-level ID?

@chriseth
Copy link
Contributor Author

chriseth commented Sep 8, 2021

Hopefully solved by not making anything depend on names at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants