Solve nondeterminism about AST IDs #11907

chriseth · 2021-09-07T10:20:17Z

The nondeterminism about AST IDs will always hurt us, and we should solve it once and for all.

The problem is that when you compile multiple unrelated files, the AST IDs get assigned uniquely for the whole compilation run.
This can result in different code depending on whether you compile a file in isolation or together with other files for the following reason:

These AST IDs are used in the code generator to create unique function names (for example for struct types in the ABI routines).
We are currently trying to strip the AST IDs as much as possible as long as the function names are still unique, but if this fails, it can happen that functions are sorted differently and thus the optimizer handles them in a different order or they are just laid out differently in the final code.

Proposals to overcome this problem:

Make AST IDs two-dimensional where the first dimension depends on the file. This can be the file name or the hash of the file content.

The problem here is that these IDs get very long and also it is a breaking change. We can keep the current AST IDs for anything that is not related to code generation. The length could be dealt with by specially marking those AST IDs in function names and reducing their length in the NameSimplifier (we already do something like that there). The NameSimplifier could start with one pass where it just collects all prefixes and replaces them with unique and shorter identifiers (either prefixes or just newly created numbers starting from 1).

ekpyron · 2021-09-07T10:26:27Z

We do generate code per-contract - can't we just have a mapping "global ID -> contract-local ID" that maps our current IDs to a new counter that is per-contract and increments based on codegen visit order, resp. first use in codegen? You considered that yourself earlier, not sure what made you drop the idea...
Especially, if we want to keep the current IDs for everything except codegen anyways.

chriseth · 2021-09-07T10:28:56Z

Yes, that is the other way to do it. I'm a bit uneasy about the codegen visit order, and it would require Type::toString() to have access to this translation table.

chriseth · 2021-09-07T10:30:14Z

Ah sorry, it's Type::richIdentifier(), and we could actually provide the mapping.

chriseth · 2021-09-07T10:31:22Z

In

	string functionName =
		"abi_encode_" +
		_from.identifier() +
		"_to_" +
		to.identifier() +
		_options.toFunctionNameSuffix();

This might lead to .identifier() allocating a new ID, which means that the new ID will depend on the order in which this C++ expression is evaluated.

ekpyron · 2021-09-07T10:32:01Z

It's not exactly beautiful to pass a mapping to richIdentifier, but yeah, should work.

ekpyron · 2021-09-07T10:36:38Z

In
	string functionName =
		"abi_encode_" +
		_from.identifier() +
		"_to_" +
		to.identifier() +
		_options.toFunctionNameSuffix();
This might lead to .identifier() allocating a new ID, which means that the new ID will depend on the order in which this C++ expression is evaluated.

Yeah, that indeed makes it extremely error-prone...

chriseth · 2021-09-07T15:58:48Z

It turns out that it might be better to make (a) the sol -> yul code generator output code that only depends on solidity source order and (b) check that the yul (util) function generation is deterministic. This is done in #11910

axic · 2021-09-07T16:23:34Z

The problem here is that these IDs get very long and also it is a breaking change.

Why not take a truncated hash as the ID of the two-level ID?

chriseth · 2021-09-08T17:53:34Z

Hopefully solved by not making anything depend on names at all.

chriseth closed this as completed Sep 8, 2021

cameel mentioned this issue Aug 16, 2023

Bytecode comparison run with extra input files #14495

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solve nondeterminism about AST IDs #11907

Solve nondeterminism about AST IDs #11907

chriseth commented Sep 7, 2021

ekpyron commented Sep 7, 2021 •

edited

Loading

chriseth commented Sep 7, 2021

chriseth commented Sep 7, 2021

chriseth commented Sep 7, 2021

ekpyron commented Sep 7, 2021

ekpyron commented Sep 7, 2021

chriseth commented Sep 7, 2021

axic commented Sep 7, 2021

chriseth commented Sep 8, 2021

Solve nondeterminism about AST IDs #11907

Solve nondeterminism about AST IDs #11907

Comments

chriseth commented Sep 7, 2021

ekpyron commented Sep 7, 2021 • edited Loading

chriseth commented Sep 7, 2021

chriseth commented Sep 7, 2021

chriseth commented Sep 7, 2021

ekpyron commented Sep 7, 2021

ekpyron commented Sep 7, 2021

chriseth commented Sep 7, 2021

axic commented Sep 7, 2021

chriseth commented Sep 8, 2021

ekpyron commented Sep 7, 2021 •

edited

Loading