Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend "using for" #9211

Closed
Tracked by #13316
chriseth opened this issue Jun 16, 2020 · 31 comments · Fixed by #12121
Closed
Tracked by #13316

Extend "using for" #9211

chriseth opened this issue Jun 16, 2020 · 31 comments · Fixed by #12121
Assignees
Labels
language design :rage4: Any changes to the language, e.g. new features

Comments

@chriseth
Copy link
Contributor

chriseth commented Jun 16, 2020

This is a proposal to extend the "using for" directive in the following ways:

  • allow it at file level to be valid for all contracts and functions in that file
  • allow its first argument to be a free function
  • allow its first argument to be an imported module
  • allow its first argument to be *

The semantics are as follows:

The directive using * for ... matches all free functions in the current file (including non-scoped free functions imported from somewhere else), but it does not match functions in libraries or functions imported scoped into a module (moduleName.functionName).

The directive using functionName for ... matches the specified function. Here, functionName can be a path like moduleName.functionName.

The directive using moduleName for ... matches all free functions in the imported module (via import "fileName" as moduleName) including free functions imported from that module at file level. Here, moduleName can be a path like moduleA.moduleB.

Note that most of the new uses are incompatible with flatteners.


Update (2021-09-20):

Extend the using A for B statement everywhere to allow A to also be

  • a list of file-level functions in curly braces.
  • the name of a module
  • a single file-level function
  • *

If used at file-level, B cannot be *.

If used at file-level also allow export using A for B, when B is a type declared at file-level in the current file.

Clarification: In the above, all "items" can be paths (A.B.f) and don't have to be single identifiers.

@chriseth
Copy link
Contributor Author

Restrict proposal to:

  • allow using for at file level
  • only allow the name of a module where currently a library is allowed

@cameel
Copy link
Member

cameel commented Jun 17, 2020

I wanted to address one point from today's call: while doing using x for y separately for every function might be a bit verbose I think it does have some advantages over binding all functions from a module wholesale.

This is very similar to imports in Python where importing every name separately is in fact the recommended style. So this:

from math import abs, sin, ceil;

is recommended over this:

from math import *;

Wildcard imports are actually frowned upon.

With fine-grained imports you don't pollute the namespace with names you're not going to use (which helps avoid collisions) and they make it very straightforward to trace the source of every name in the file without having to refer to any other files. It seems like something that would be helpful when auditing. It's also something I loved when I switched to Python from Ruby which does not have these fine-grained imports and it can be very hard to find the definition of something. I'm missing it a bit in C++ too.

Solidity is not Python, the circumstances are a bit different (contracts are not full applications and are therefore short; also new definitions can't be added dynamically at runtime) and the scale is smaller (it's not about all imports, just bound functions) but I wanted to add another perspective on this. For some people (including me) importing things one by one may actually be the more natural way to use this feature.

@chriseth
Copy link
Contributor Author

@cameel this is actually a valid point - I also very much like it if you can see where each item was defined or imported. Since using M for uint[] is very similar to the syntax for libraries, I would still keep it. What syntax for attaching specific functions would you propose?

using {M.f, M.g} for uint[]?

@cameel
Copy link
Member

cameel commented Jun 17, 2020

Wouldn't using M.f, M.g for uint[] just work? Or does this need to be analyzed before imports are resolved when we can't yet tell if M.f is a function or a library or something else?

Braces are good too. They would allow doing this, which some people might like when the list gets long and needs to get wrapped:

using {
    math.add,
    math.sub,
    math.mul,
    convert.toString,
    convert.toBytes16,
} for uint[];

@cameel
Copy link
Member

cameel commented Jun 17, 2020

Alternatively, this would work too:

using math.{add,sub,mul}, convert.{toString,toBytes16} for uint[];

This is similar to braces work with shell expansion in Bash so it might seem familiar to some people.

@cameel
Copy link
Member

cameel commented Jun 17, 2020

Since using M for uint[] is very similar to the syntax for libraries, I would still keep it.

Yeah, I'm for having that form too. Even in Python both ways to import are available and preference for one or the other is just a convention, not something forced upon you by the language.

@chriseth
Copy link
Contributor Author

It would be nice to somehow also "import" the effects of using. This could be especially useful for #11531

Proposal: If the definition of X and the statement using ... for X are in the same file at the top level, then any way to access X automatically applies the using statement. This means even if you do import X from "file.sol", you can use X.f() even though f itself is not directly visible.

@chriseth
Copy link
Contributor Author

chriseth commented Aug 2, 2021

Especially with #11531 I think it would be useful to map operators to user-defined functions. This could be especially useful to define fixed point (or even floating point) types and make computations with them readable. It also removes the problem of how to apply compiler-supplied operators to user-defined value types (because the compiler-supplied operator is actually on a different type).

Proposal:

type MyFixed = bytes32;
function add(MyFixed _a, MyFixed _b) pure returns (MyFixed) { ... }
function sub(MyFixed _a, MyFixed _b) pure returns (MyFixed) { ... }
using {add as +, sub as -} for MyFixed;

So instead of a function or module, you can use <function> as <operator>.

For now, custom operators should not resolve inside unchecked blocks.

@chriseth
Copy link
Contributor Author

chriseth commented Aug 2, 2021

Question brought up by @hrkrshnn : How to handle mixed-type operators for user-defined operators:

Apart from exceptions like << and **, the compiler-supplied operators always only handle a single type. I think to prevent complications, we should do the following for user-defined operators:

  • each operator can only be named once for a type in a using statement (or all using statements combined), and you cannot use * in combination with operators.
  • the signature of functions like add in the above example has to be add(MyFixed, MyFixed) ... (it can be non-pure and it can return a different type)
  • the identifier has to be unique (cannot used function overloads)
  • the type (MyFixed) cannot be a built-in type.

The operator is resolved as follows: In x + y, the compiler first checks if the type of y is implicitly convertible to the type of x. If this is true, the using statement is looked up and the return type and pureness is determined from the function. If it is not implicitly convertible or if no operator matches, the compiler checks if the type of x is implicitly convertible to the type of y. If yes, it looks up the operator for the type of y.

@chriseth
Copy link
Contributor Author

chriseth commented Aug 2, 2021

As an alternative: Lookup fails if both x can be converted to y and vice-versa and an operator / function can be found in both cases.

@cameel
Copy link
Member

cameel commented Aug 2, 2021

(it can be non-pure and it can return a different type)

Would it be allowed to modify storage? I hope not. Side-effects in operators do not sound like a good idea. I'd require the user to have a function rather than an operator if he really wants that.

Even letting them be view seems a bit questionable but I guess that's only meant to allow operators to take storage parameters? If that's the only way for them to access storage (other than inline assembly) then it's fine.

@chriseth
Copy link
Contributor Author

chriseth commented Aug 3, 2021

I'm not sure there is a good way to restrict access for functions implementing operators. Operators on storage structs sound useful, so if we want to restrict something else, we would have to look at the body of a function which I think is not a good idea.

On the other hand this raises another question: If you want to implement an operator for reference types, you need several different functions for different storage locations. This would contradict the requirement above ("no overloads") - but I guess we can relax that to "no overloads, except for differences in storage locations". In order to make this work well, we also need #1256

@chriseth
Copy link
Contributor Author

chriseth commented Sep 2, 2021

Yet another alternative - which is way more permissive and maybe also achieves the same goals and is more consistent with how using works currently:

There can be only one using statement at file-level, but operators can be named multiple times and function names can resolve to multiple identifiers. Operators can neither be used with a using statement for built-in types nor with *.

For each x + y in the code, the matching set of functions according to the following rule has to contain exactly one element:

  • consider all using statements for the (mobile) type of x and the (mobile) type of y. Restrict to the given operator. Iterate through all functions the names resolve to and add them to the set if the parameter types match after an implicit type conversion.

We can ignore "built-in" operators because using cannot be applied to built-in types.

@chriseth
Copy link
Contributor Author

chriseth commented Sep 8, 2021

@hrkrshnn
Copy link
Member

hrkrshnn commented Sep 9, 2021

I'm not sure about allowing mixed types. Consider type Timestamp is uint and type Duration is uint. If we allow a + :: Timestamp -> Duration -> Timestamp, the expression t + d + d may be somewhat confusing: (t + d) + d is well-defined, whereas t + (d + d) is not. (Here d + d is not defined.) This can be even more problematic if the operator + :: Duration -> Duration -> Duration is defined, making (t + d) + d and t + (d + d) evaluate to do different timestamps.

The problem is that the current syntax hides away all these complexities. For example, the operations may be non-commutative. Or how associativity of operators isn't well specified, which can affect the final value or introduce ambiguity during parsing.


A second problem with the using syntax is that it doesn't allow easy extension into unchecked block. I think some form of traits, without mixed types, may be better. Here's a rough syntax:

type UFixed18 is uint256 with Add, UncheckedAdd;

instance Add(UFixed18) {
    function add(UFixed18 a, UFixed18 b) public returns (UFixed18 c) { ... }
}
instance UnCheckedAdd(UFixed18) {
   function add(UFixed18 a, UFixed18 b) public returns (UFixed18 c) {unchecked{ ... }}
}

This still doesn't make any guarantees about commutativity of UFixed18 or associativity. So a + b + c is again ambiguous.


Another approach is to allow defining global infix operators similar to free functions (and disallow redefining existing operators).

type UFixed18 is uint256;
infix function <*>(UFixed18 a, uint b) returns (uint) { ... }

uint constant a = UFixed18.wrap(25) <*> 50;

Maybe a stricter definition that allows specifying the precedence as a number, as well as associativity to resolve parsing ambiguity. Something similar to what Haskell already does.

@chriseth
Copy link
Contributor Author

chriseth commented Sep 9, 2021

Mixed types are also useful for things like elliptic_curve_point * integer -> elliptic_curve_point, so I would say the possibilities this opens outweighs the problem that you can implement the operators in a way that they are not associative.

@ekpyron
Copy link
Member

ekpyron commented Sep 9, 2021

Scalar vector space multiplication should get its own operator :-).

@ekpyron
Copy link
Member

ekpyron commented Sep 9, 2021

I wager this is the only other signatures you'd ever want, i.e. monoidal multiplication 'a -> 'a -> 'a and scalar multiplication over a vector space 'a -> 'b -> 'b are all you ever need.

@chriseth
Copy link
Contributor Author

If you also consider the inner product, I think we have now collected at least one example for all the possible combinations of two types (a * a -> a, a * b -> b, a * b -> a and a * a -> b).

Given that, my crystal ball clearly says we should just allow all possible signatures for all operators.

@hrkrshnn
Copy link
Member

type UncheckedInt is int;
function mul(UncheckedInt a, int b) returns (UncheckedInt c) {...}
using {mul as *} for UncheckedInt;

Going back to the associativity issue, this creates an annoying problem on whether the expression a * b * c (where a, b, c has the types UncheckedInt, int and int respectively) is fully unchecked or partially unchecked. The expression, (a * b) * c would be unchecked and a * (b * c) would be partially unchecked. I think this pattern should be discouraged.

I'm generally in favour of operators, but allowing mixed operators for *, +, /, -, == etc does not sound safe.

Custom operators (which can be mixed) surprisingly solve this problem:

infix function <*>(UncheckedInt a, int b) returns(Unchecked c) {...}

Now for a <*> b * c, only a <*> (b * c) is well-defined. The other expression, i.e., (a <*> b) * c doesn't type check, since there is no * : UncheckedInt -> int -> UncheckedInt defined. Similarly, for the expression a <*> b <*> c, only (a <*> b) <*> c is well-defined.

@chriseth
Copy link
Contributor Author

Another option would be to introduce a new set of operators: +', -', *', ..., disable the associativity during parsing for them and only allow them to be defined for non-matching types.

@ekpyron
Copy link
Member

ekpyron commented Sep 14, 2021

Another option would be to extend the concept of unchecked to arbitrary named operator scopes, each with its own set of operators (with its own signature requirements) and the ability to import compatible operators from other operator scopes.

This would even prevent obscure operator definitions that cannot be recognized locally, since if I don't use well-behaved standard-signature operators, I locally have to use e.g.

timestamp x;
duration y;
timestampOperators {
  x += y;
}

@chriseth
Copy link
Contributor Author

Rust restricts custom operators to be defined only on the left hand side type. This makes lookup easier and maybe also eliminates the associativity problem, but I currently don't see how you can do things like 2 * x.

@chriseth
Copy link
Contributor Author

It might be "clean" to (at least for now) restrict the target of a "using for" statement at file-level to data structures only, i.e. disallow the use for contract-like types and of course non-nameable types.

@chriseth
Copy link
Contributor Author

As far as I understood @ekpyron 's arguments, the only benefit of restricting operators to a fixed type signature (i.e. + always takes the same parameter types and also returns the same type, == always takes two equal parameter types and always returns bool, ...) is that it helps in the future when we introduce generics in the following way:

You can write functions add(a, b) -> a + b and the compiler only has to type-check this generic function and it will not have to type check it again when the function is used on specific types.

If we relax the restriction (like rust does it) and say that operators can be defined on different types (like elliptic_curve_point * integer), the only thing we have to add is traits and say that this add function is not absolutely generic, but can be applied to types that satisfy a certain trait.

Since I'm in general in favour of requiring explicit types and think that auto-type-deduction is nice but can lead to errors since explicit types always also force you to condense your thinking into categories that specify semantics (like AssociativeMul), I think the benefits of being able to specify cross-type operators far outweigh the downsides.

@chriseth
Copy link
Contributor Author

@hrkrshnn proposed to make applying the using statement upon import more explicit. One solution there would be to use

export using f for MyType;

I.e. if we have "export using" then the effect of using is applied whenever the type is visible, not only in the scope where the using is visible.

@chriseth
Copy link
Contributor Author

Updated the issue description.

@hrkrshnn hrkrshnn self-assigned this Sep 20, 2021
@cameel
Copy link
Member

cameel commented Nov 22, 2021

Some alternatives for the export using f for MyType syntax:

  • using {f, h} for MyType propagate
  • using {f, h} for MyType public (with or without an explicit using {f, h} for MyType private needed for the opposite case)
    • using {public f, private h} for MyType - would allow specifying it for each function separately.

latest: using {f, h} for MyType global;

@chriseth
Copy link
Contributor Author

chriseth commented Jan 5, 2022

@hrkrshnn wanted to know hy using {f, h} for MyType globally or export using... should not also be available it contract level.
As currently planned, using ... globally is only allowed at file level in the same file (and scope) where the type is defined and you can only use library functions or file-level functions. The reason is that these library functions and file-level functions can be called from everywhere. If we allow using ... globally inside a contract, then it would suggest that you can also use internal functions of that contract inside the contract. This would prevent the function to be called everywhere. Example:

contract C {
  struct S { uint x; }
  using {f} for S globally;
  function f(S memory) internal { ... }

  function g() public returns (S memory) { ... }
}

function t(C c) {
  // since the using statement above uses `globally`, the member is available on `S`
  // here. So you should be able to call c.g().f(), but you cannot because it is an internal
  // function of the contract.
  c.g().f();
}

I think only allowing file-level (and library) functions for using statements at file level is the simpler and saner approach.
I would actually lean towards not modifying the syntax and semantics of the using statement inside a contract at all and maybe also deprecating it over time.

@ekpyron
Copy link
Member

ekpyron commented Jan 5, 2022

EDIT: [this wasn't entirely thought through]


Thinking about it, I'm not so sure the whole "export using" or "using.. globally" stuff is a good idea...

I'd do the following:

  • All using statements always ever only extend to their syntactic scope period.
  • On each type definition we can immediately add functions to the type using a different mechanism, for example:
    type UncheckedInt is int with {f}; - for structs maybe even just as struct S { uint x; function f() ... } or also with with or some keyword.
  • Only those functions declared directly together with the type are available everywhere where the type is available. Extending this set of functions later on is impossible.
  • In situations in which I want to extend a type with more global functions, I simply define a new type that inherits all functions of the old type, make the type implicitly convertible and add more functions in the process (all of which we'd of course have to allow).

The result is a situation in which you can:

  • explicitly extend any type for your current scope without affecting anything or anyone else
  • have well-defined types with well-defined interfaces that do not depend on scope or the like at all
  • still have extensible types with implicit conversions

And I avoid to have weird implicit scoping effects and strange behaviour due to diamond-like cross-importing exported using statements on the same types and all such issues that will come up eventually with exporting using...

EDIT: Ok, I missed the fact that the intention was to only allow using...globally in the same file/scope as the type declaration...

@chriseth
Copy link
Contributor Author

When we have using {..., f, ...} for T, we should enforce that the first parameter of f is equal to T up to data location, because it is less confusing and we also will get fewer problems with operators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language design :rage4: Any changes to the language, e.g. new features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants