Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early version of RFC: Pattern matching ? #245

Closed
alehander92 opened this issue Jul 23, 2020 · 34 comments
Closed

Early version of RFC: Pattern matching ? #245

alehander92 opened this issue Jul 23, 2020 · 34 comments

Comments

@alehander92
Copy link

Pattern matching for Nim

Pattern matching can be a good addition to the stdlib/fusion. It seems it becomes more mainstream:

This is an early version of a RFC: we need to decide what to do before having a more complete document.

We already have several pattern matching libraries :

A first question is, what do we want:

  • just keep it in third party libraries
  • standartize one and put it in fusion
  • invent a new dsl based on case stamenets/better research of other languages and put it in fusion
  • standartize one and put it in stdlib
  • invent a new dsl based on case stamenets/better research of other languages and put it in stdlib

Motivation

  • Easier to write complicated conditions
  • Easier to extract values
  • More flexible syntax

TODO

case a:
of Circle(r: @b):
  echo b
of c@Other:
  echo c
else:
  discard

Many more examples in gara, ast_pattern_matching and patty's docs: TODO when a general approach is decided on

Constraints

  • compile time: ast and other values : as @krux02's ast-pattern-matching
  • runtime for most nim types
  • extensible
  • zero/low overhead
  • exhaustiveness checking
  • maybe: make it easily swapable for other pattern matching lib
  • maybe: make it support binary matching like Erlang/Elixir

Research

It would be useful to research a bit more Rust, Haskell, OCaml, Python, Ruby, Scala, Erlang, Prolog, Elixir

We can also compare several possible designs like the Python folks did in their preparation for the feature.

Next steps

  • Decide on general direction
  • Offer 1 or more designs in a more detailed version of the RFC
  • Implement

Future directions

Not sure

Disclaimer

I am the author of one of the libs gara, so I might be biased.

pinging @krux02 and @andreaferretti which probably have something to say on the topic

@andreaferretti
Copy link

I have nothing in particular to add, except for the fact that pattern matching is often done on algebraic data types. Nim today has almost algebraic data types, via variant objects, but:

  • with an ugly syntax
  • with a limitation that two different variants cannot have fields with the same name

If pattern matching lands in stdlib, it would be nice if at least the second of these issues is solved. If I remember correctly there is another RFC for that.

Other than this, I would be happy with just putting gara in the stdlib, as it seems the most complete option today

@Varriount
Copy link

Varriount commented Jul 24, 2020

Ideally, pattern matching should work well with AST manipulation, since recognizing structural patterns in the AST tends to be 50% of a macro's job.

That being said, I've only ever needed the kind of pattern matching provided by something like py-good. I can't recall ever having needed it anywhere else.

@haxscramper
Copy link

haxscramper commented Sep 20, 2020

Additional arguments in favor of pattern matching in stdlib

A lot of popular programming languages (almost all with exception of Go/C that position themselves as "simple" languages with very few functional features) either already have support for pattern matching or existing proposals.

I'm not a big functional programmer myself, but i think Nim is already a really good functional programming language - closures, sum types, optional types etc. Support for custom operators and UFCS makes transition even more seamless. Pattern matching is an important addition to the list if features that would allow nim to position itself as a functional programming language.

Use caseStmtMacro

I think pattern matching in stdlib should be implemented using caseStmtMacros that are currently experimental, but quite usable (I tried implementing pattern matching and there was no immediate issues). The reason - this way pattern matching would be just an extension of case statement and won't introduce any new concepts/keywords to the language. Can be added as another feature in sugar module.

Set of features should be relatively small

Second - I think it is important to keep list of supported features relatively small - e.g. no predicate validation (match field only if it satisfies some kind of predicate), unification etc. The reason - due to potential number of extensions to concept of pattern matching it is very easy to end up with feature creep where macro creates whole new semantics for case statement. Although some additions to pattern matching would certainly be useful. For example - allowing abitrary predicate extensions seems like a little too much for me, but checking if value is in set kind: in {enEE, enEE1} will certainly be useful in a lot of situations.

It is also important to not require any modifications to existing types - reuse existing len and `[]` overloads as well as kind fields. This makes certain assumptions about what functions are implemented for object, but I would argue this is a good thing. E.g. further promoting idea that all int-indexable (arrays, sequences etc.) shoud implement `[]` and len, all case objects should have kind etc. as it would only lead to more consistent API across all code.

Pattern matching on key-value pairs

Pattern matching on key-value tables (json for example) might be really practical in a lot of scenarios. Possible syntax:

    assertEq "000", case %{"hello" : %"world"}:
           of {"999": _}: "nice"
           of {"hello": _}: "000"
           else: "discard"

NimNode

One of the main use cases for pattern matching in nim is macros. NimNode probably the most used case object and large portion of macro is just re-parsing of input structures. Pattern matching should account for this.

In addition to making nim node pattern matching easier it will also make macros more declarative (for simple macros): it would be easier to understand what kind of input is expected and what will be generated from it.

My implementation

I'm currently working on my own pattern matching library based on case statement macros. You can see example tests here - https://github.com/haxscramper/hmisc/blob/master/tests/tMatching.nim

I originally wanted to PR it as a stdlib module but then decided to first ask for feedback on general idea/syntax. Right now it is a little over-engineered and has dependencies on other part of hmisc. If it can be accepted as stdlib module I will make it self-contained and PR it into stdlib.

Examples

    case [1,2,3,4]:
      of [_]: fail()
      of [_, 3, _]: fail()
      of [_, 2, 3, _]:
        discard

Support for regular elif branches in case stmts

    assertEq "default fallback", case (a: 22, b: 90):
           of (_, b: 91): "900999"
           elif "some other" == "check": "rly?"
           elif true: "default fallback"
           else: raiseAssert("#[ not possible ! ]#")

Todo list

There are some things that I still have on todo list such as

  • Trailing zero-or-more (*<patt>) for lists
  • Use *(<patt>) in non-trailing format - [$a, *_, $b] to match first and last element in sequence.
  • exact and prefix match for sequences: right now matching using [] pattern requires len to be equal - in a lot of situations it is necessary to just check first couple of items. Can be implemented using *_ with special treatment for this pattern (e.g. just don't check anything) but I would like to have something closer to [ , ] ... with trailing ellipsis. But that is just preference mostly.
  • List variables - similar to nimtrs where I have distinction between $scalarVariable that can only be assigned once and @listVariable which will store all appended elements.
  • Other suggestions?
    • Pattern matching on data streams ?
    • Maybe adapt some syntax ideas from rust pattern matching for macros: link. Pattern matching for NimNode is a most common use case, so it might be a good idea to adapt some solutions from rust.
    • Another potential source of syntax ideas are term rewriting macros

@juancarlospaco
Copy link
Contributor

More programming languages using Pattern Matching:

  • Scala, Scala-Native, ScalaJS.
  • Swift.
  • F Sharp.
  • ML.
  • Haskell.
  • D.

@alehander92
Copy link
Author

@haxscramper cool! but please take a look at impl-s of patty/ast-pattern-matching/gara . They do some of those, so this might be cool. Also please keep in mind the goal of zero overhead : expanded code should be usually similar to hand-written condition/extraction(not always happening in those libs tho)

Awesome!

@alehander92
Copy link
Author

@haxscramper thanks and Also: can you write down a bit more detailed test spec/proposal here or in your own rfc : that was one of the goals of the rfc in my mind, to discuss a more official dsl

@haxscramper
Copy link

@alehander92

After some thinking I came to conclusion it is not necessary to specifically annotate captured variables with prefix since it really differs from any other nim dsl in stdlib. I started with $capturedVar but now I think it is not really necessary. Type of matched variable can be inferred based on pattern context - e.g. variable is assigned (=) if used in regular pattern and appended (.add) if it is inside of repetition *(<pattern>) although I've done this in nimtrs and current version of matching I think it is not that good of a solution in general.

Capture and submatch can be done using infix is, equality comparison (any other prefix?) will be rewritten in infix notation, e.g. fld: == SomeVal to match field against value will be rewritten into ... .fld == SomeVal. Capture and match can be written as fld: captured == SomeVal for infix operator checks (in, ==) or as fld: capt is SomePattern( ... )

Although I think this is a really slippery slope because there will be a lot of different behaviors of the match depending on written expression (e.g. == Val vs var == Val vs var is Patt) so it is necessary to be really careful here. As I already mentioned it is really easy to just sit for a while and come up with dozens of possible nice things that can be added (regex pattern extraction, custom unpackers, predicatess, regex-like pattern matching for sequences (like in nimtrs) and so on).

One thing I think that is really necessary is to make it look closer to pattern matching in functional language and not enum case-of v2.0. Default nim convention for enum naming is to have abbrValueName and ObjectName. I think it should be possible to omit the abbr prefix. Eg. I want to write IfStmt instead of nnkIfStmt etc.

More examples

Create value to compare against

    macro e(body: untyped): untyped =
      case body[0]:
        of ForStmt([$ident, _, $expr]):
          quote do:
            9
        of ForStmt([$ident, Infix([%ident(".."), $rbegin, $rend]),
                    $body]):
          quote do:
            `rbegin` + `rend`
        else:
          quote do:
            90


    let a = e:
      for i in 10 .. 12:
        echo i

    assertEq a, 22

I would prefer to write `expr` instead of %expr but right now it seems it is not possible to add arbitrary expressions inside of quoted part. Syntax is just taken from nimtrs. Since it is not possible to differentiate newValue() from KindOfCaseObject() I decided to just prefix it with % for now.

Iflet macro

    macro ifLet2(head: untyped,  body: untyped): untyped =
      case head[0]:
        of Asgn([$lhs is Ident(), $rhs]):
          quote do:
            let expr = `rhs`
            if expr.isSome():
              let `lhs` = expr.get()
              `body`
        else:
          head[0].assertNodeKind({nnkAsgn})
          head[0][0].assertNodeKind({nnkIdent})
          head[0].raiseCodeError("Expected assgn expression")

    ifLet2 (nice = some(69)):
      echo nice

    ifLet2 (`nice` = some(69)):
      echo nice

It uses some of my own node kind assertions, but this can be removed when design is finalized. Although due to complexity of pattern matching I think having something like this to show errors would be nice. But that is a topic for another RFC/PR.

Unexpected node kind. Expected one of {nnkIdent} but found nnkAccQuoted

 112          echo nice
 113    
 116:12     ifLet2 (`nice` = some(69)):
                    ^~~~~~
                    |
                    nnkAccQuoted

Non-goals

  • Support for variable unification - even though it is really nice it have in some cases it adds too much complexity to the implementation (e.g. when you have alternatives that each bind the same variable). I think it is a good idea to completely disallow
  • High extensibility - pattern matching is complex enough on it's own and I would prefer to have more simple implementation that forces end user to provide all additional checks either via if/else or using nested pattern matches. Some things are used often enough (check if in set or equal to some specific value) so they should be supported out of the box.

Other goals

make it easily swapable for other pattern matching lib

If case statement macro will be moved to 'stable' part of the language (e.g. no longer hidden behind experimental switch) this can be done quite easily I think.

Some comments on other libraries

Patty

I don't think matching non-tuple types by position only is a good idea as it makes matches really fragile in case new fields are added (which is much more likely to happen than with regular anonymous tuple).

ast-pattern-matching

Supports only nim nodes

@haxscramper
Copy link

@alehander92 just addition to whatever I wrote above to RFC comment for pattern matching - I think it is better to just take as much potential ideas for syntax and features and then sort through them to determine if this is really necessary. But what I'm really certain about several things

  1. Pattern matching should be an extensio of case
  2. Unification is overkill in most cases
  3. Special syntax for variable captures should be carefully considered - is it really necessary or will it only make code less readable
  4. Omitting abbreviated enum name when matching kind is a huge plus for readability.

@Araq
Copy link
Member

Araq commented Sep 21, 2020

The only thing I really care about is supporting if conditions with the pattern like so:

case n.kind
of foo and n.len == 4:
  actionA
else:
  actionB

This might require a language extension as a macro can only support it well via code duplication.

@haxscramper
Copy link

haxscramper commented Sep 21, 2020

@Araq, This is already supported thanks to UFCS - I just generate access path to the field in object, without differentiating between fields and function calls

    macro e(body: untyped): untyped =
      expandMacros:
        case body:
          of Bracket(len: in {3 .. 6}):
            newLit(expr.toStrLit().strVal() & " matched")
          else:
            newLit("not matched")

    echo e([2,3,4])
    echo e([3, 4])

Generated code

block:
  let expr = body
  if kind(expr) == nnkBracket and contains({3..6}, len(expr)): newLit(
      strVal(toStrLit(expr)) & " matched")
  else:
    newLit("not matched")

@alehander92
Copy link
Author

@haxscramper thanks!

Pattern matching should be an extensio of case

sounds good

Unification is overkill in most cases

iirc it seemed not hard to implement, but this is not a top goal indeed

Special syntax for variable captures should be carefully considered - is it really necessary or will it only make code less readable

makes sense: I'll look at the bigger comment later

Omitting abbreviated enum name when matching kind is a huge plus for readability.

yep!

@alehander92
Copy link
Author

please keep in mind that probably nim nodes should work in a similar way to other variant types(even if nim nodes are defined as a more special type): we don't want to overspecialize for one case !

Also, exhaustiveness checking would be useful even if not for all possible situations 👍

@haxscramper
Copy link

@alehander92

There is zero specific distinction for Nim node in a way I currently implement it - I have a little hack that just takes .kind type implementation and determines correct prefix that should be added if necessary.

https://github.com/haxscramper/hmisc/blob/f58838040170547e19c01bd6b9f3e42a16811f91/src/hmisc/macros/matching.nim#L44

Exhaustiveness check is possible in theory but only for simpler patterns and it would make things more complicated as it would be necessary to either write custom checker or somehow lift kind fields and do more sophisticated rewrite to take advantage of case checks. Right now this is a simple rewrite to series of if branches with no specific checks.

@alehander92
Copy link
Author

iirc exhaustiveness checking isn't really that hard . Also we want to take advantage of generation of case if possible: pattern matching should be zero overhead. Of course those are not top priorities, so probably they don't need to be implemented now

@alehander92
Copy link
Author

alehander92 commented Sep 21, 2020

basically for compound patterns you need to compose the simpler checks and if you already have those: this might not be too hard, but this is just old brainstorming, so I might be very wrong about it

@haxscramper
Copy link

tl;dr - maybe I'm missing something, but to me it seems like extremely difficult task.

Whether or not it is possible to check for exhaustiveness is a question that should be delayed until after we have finalized set of supported features. For example set checks in {3..5} would require answering questions like "does this level of pattern tries all values or something is not matched yet". Things like ("hello", _), [2 | 3, _] will also make it necessary to handle arrays, tuples etc. Due to how matching is done for function (e.g. example with len) it will be even more complicated.

@alehander92
Copy link
Author

alehander92 commented Sep 21, 2020

disclaimer: don't really look hard into this: maybe you're right and it's not so useful for Nim !

hm, I might not understand all the details .. but this means that it might be good to design the dsl in a way that it makes it a bit more prone to exhaustiveness checking.

but what's the problem with set / array checks? we would have just several data structures to check for, I guess

and yeah, as it's not only about the shape, the analysis can always be a bit more conservative, but a useful thing would be to just autogenerate a case which is not being handled as a warning: this is out of scope tho, so I am just discussing ideas

@haxscramper
Copy link

haxscramper commented Sep 22, 2020

More todo

  • Remove $a for variable captures and replace it with just a for variable, add non-trailling pattern matches.
  • Support for set checks.
    • {2} means len(set) == 1 and contains(set, 2),
    • {2, *@others} means len(set) >= 1 and contains(set, 2) followed by for val in set: if val != 2: other.incl val
  • Disallow multiple uses for the same variable in single pattern

@haxscramper
Copy link

haxscramper commented Sep 23, 2020

Replacing special prefixes with keywords?

  • any instead of .._
  • any @var, until Patt() after variadic instead of *@var, Patt()
  • any @var, incl Patt() instead of **@var, Patt()
    # With special keyworkds like `add`, `until`, `incl`
    [any @leading, until @middle is "d", any @trailing]
    [any, Patt(), any]
    # With prefix annotations like `*`
    [*@leading, @middle is "d", *@trailing]
    [.._, Patt(), .._]

Variadic match would use any, search-until is until or incl. For optional match use opt. Capture wll be in form <kwd> @<var> or <kwd> @<var> <op> <rhs> (@middle is "d" for example)

And just in general would you rather prefer DSL to introduce keywords or work with special symbols like ?@, *@, .._ ?

We will still have opt @var ? default and other things, but in general it would be <kwd> @<var> or <kwd> @<var> <infix> <rhs>

And <fld>: <prefix> <rhs> or <fld> @<var> <infix> <rhs> for field/kv-pair captures

@haxscramper
Copy link

Match expression

  • Check if tuple element is in set: (in {1, 2, 3})
  • Tuple element equals to something: (== 123)
  • Field is in set (fld: in {2, 3, 4})
  • Field matches pattern (fld: is Patt())
  • Match and capture field (fld: @captured is Patt())

Tuple matching

case (true, false):
  of (true, true) | (false, false): 3
  else: 2

Object/named tuple matching

Matching of fields is performed using (fld: <expr>) for both objects and tuples.

To match case object you can either use Kind( ... ) to match only instances of single kind or (kind: in <set of values>) to test against multiple possible values. It is not necessary to add enum prefix - e.g. to match nnkIfStmt you can write IfStmt( ... ).

Object matches can contains either fld: ... matches or [elems] for cases when you need to check items accessible via [] operator.

  • Match for statement: ForStmt([@ident, _, @expr])
  • For statement with infix .. as expression: ForStmt([@ident, Infix([== ident(".."), @rbegin, @rend]), @body]) and capture loop variable, @rbegin .. @rend and @body

Matching optionas ((fld: Option[T]))

  • Get optional value from object field, otherwise set to default: fld: @opt ? "default"

Thanks to UFCS I can treat functions as regular fields, making it possible to have patterns like

macro e(body: untyped): untyped =
    case body:
      of Bracket([Bracket(len: in {1 .. 3})]):
        newLit("Nested bracket !")
      of Bracket(len: in {3 .. 6}):
        newLit(expr.toStrLit().strVal() & " matched")
      else:
        newLit("not matched")

echo e([2,3,4])
echo e([[1, 3, 4]])
echo e([3, 4])
[2, 3, 4] matched
Nested bracket !
not matched

Sequence matching

Sequence elements can be either matched exactly (for example [_] to match sequence with exactly one element) or using variadic expressions.

Right now I have two ideas for expression syntax - first one relies on use of symbols like * and the other one introduces more keywords. Example of uses

  • capture all elements in sequence: [any @elems] or [*@elems]
  • get all elements until (not including "d"): [any @leading, until "d"] or [*@leading, "d"]
  • All elements including all following: [any @leading, with "d"] or [**@leading, "d"]
  • Match first two elements and ignore the rest [_, _, any] or [_, _, .._]
  • Match optional third element [_, _, opt @trail] or [_, _, ?@trail]
  • Match third element and if not matched use default value [_, _, opt @trail ? "default"] or [_, _, "default" ?@trail]
  • Capture all elements until first separator: [add @leading, until @middle is "sep", add @trailing]
  • Extract all conditions from IfStmt: IfStmt([add ElseIf([@cond, _])])

I'm more in favor of keyword-heavy syntax because **@, *@, .._ is looks worse than any , until, any, with. Trailing ellipsis might be added as alternative to standalone any

Set matching

  • Check if only 2 and 3 is in set: {2, 3}
  • Set contains 2: {2, .._}
  • Check if set contains 9 and capture all other elements: {9, @other}

Key-value pairs matching

  • Match only if has "key": {"key": _}
  • Capture value from key: {"key": @val}
case %{"hello" : %"world"}:
  of {"999": _}: "nice"
  of {"hello": _}: "000"
  else: "discard"

@metagn
Copy link
Contributor

metagn commented Sep 23, 2020

of (true, true) | (false, false): 3

@haxscramper Just to clarify: Is this different from of (true, true), (false, false):, and if so, would the comma version still work?

@haxscramper
Copy link

@hlaaftana No it is not different and comma version should work the same way it works in regular case.

So

of (true, false), (false, true): 
of (true, false) | (false, true):

Should be identical

@haxscramper
Copy link

Another use case for pattern matching is input data validation. One of the main use cases for pattern matching is macro implementation.

Use isnot as optional clause to execute when something does not match. It is not really useful for pattern matching with multiple branches (because not matched branch does not signal that input data is invalid), but should be really handy when doing basically anything related to external input validation (json, nim node etc.)

I withdraw my previous statement about support for custom predicates - it is not really that hard to implement, so there is no reason to avoid it, but it will be very useful for input validation.

  • @var.matchesCond() - check if element matches condition and bind it to var
  • _.matchesCond() - only check predicate

I think pattern matching (data destructuring) should at least have all features of data synthesis (object construction, json input, array comprehension etc), so support for validation is a good thing to have.

  {
    # Match pattern, otherwise execute `doError`
    "key": @a is ("2" | "3") isnot doError(),
    # execute regular code
    "key": @a is ("2" | "3") isnot (
      echo "Expected `2` or `3`"
    ),
    # Do nothing on fail
    "key": @a is ("2" | "3"),
    # Execute callback if match failed
    "key": "2" | "3" isnot doError(),
    # Check for match
    "key": "2" | "3",

    "key": _.isString() isnot (echo "Expected string for key"),
    "key": @a.isInt() isnot (echo "expected integer")
  }

  (
    fld: ForStmt([
      # First element must be an `nnkIdent("..")`
      == ident("..") isnot (
        echo "Expected infix `..` on ", it.lineInfoObj()
      ),

      # Second element should have kind `IntLit()` and will be bound
      # to injected vaeriable `a`
      @a is IntLit() isnot (
        echo "Expected integer literal `..`", it.lineInfoObj()
      ),

      @a is (
        IntLit() or StrLit()
      ) isnot (
        echo "Expected either integer literal or string but found " &
          $it.lineInfoObj()
      )
    ])
  )

https://github.com/Originate/lodash-match-pattern

  • is Patt(): check if matches pattern
  • in Set: is in set
  • == Val: Is equal to value
  • add @var: only in sequence patterns. Add all elements to var until input end reached or next pattern matched
  • add, until Expr: match all until frist Expr
  • add, incl Expr: match all including all Expr
  • opt @a: match optional element and bind it to a
  • opt @a ? "default": either match element to a or set a to "default"

@haxscramper
Copy link

haxscramper commented Sep 23, 2020

Final draft of the syntax

To me it looks good enough, so most likely I will be implementing this or something really simiar to it.

  • Introduce stdlib module matching

  • Make caseStmtMacros a non-experimental feature

  • Overload for match macro with following prototype:

    macro match*(
      n: tuple | object | ref object | seq | array | set
    ): untyped
  • New matches macro with following prottype:

    macro matches*(
      n: tuple | object | ref object | seq | array | set,
      expr: untyped,
      makeDiagnostics: bool = false
          ): untyped
  • assertMatch template

    template assertMatch*(input: typed, pattern: untyped): untyped
  • MatchDiagnostics objec

    type
      MatchDiagnostics = object
        matchFails*: seq[tuple[path: Path, diagn: string]] ## Position of
        ## the the first match failure. If multiple alternatives are tried
        ## (`1 | 2`) may contain more than one path. `diagn` is a string
        ## describing match failure
  • matches is a version of match that works for single checks, evaluating either as true or false if makeDiangostics is false. If makeDiagnostics is true tuple[matchOk: bool, matchDiagn: MatchDiagnostics] is returned.

  • assertMatch is a wrapper on top of match with semantics similar to

    template assertMatch(input: typed, pattern: untyped): untyped =
      let (matchRes, diagnostics) = matches(input, pattern, makeDiagnostics = true)
      if not matchRes:
        raise newMatchError(diagnostics)

    All variables injected by matches are available in the scope, so you can write assertMatch(<user input>, [all @elems is JString()] and just use elems down the line, knowing this is a array of json strings.

Supported structures for pattersn

Similar to flexible serialization RFC #247, nimtrs several kinds of collections are supported which are differentiated based on pattern match syntax.

  • seqs - matched using [Patt1(), Patt2(), ..]. Must have len(): int and [int]: T defined. 1
  • tuples - matched using (Patt1(), Patt2(), ..). 2
  • pairable - matched using {Key: Patt()}. Must have [Key]: T defined. Key is not a pattern - search for whole collection won't be performed.
  • set - matched using {Val1, Val2, .._}. Must have contains defined. If variable is captured then Val1 must be comparable and collection should also implement items and incl.
  • object - matched using (field: Val). Case objects are matched using Kind(field: Val) or, if you want to check agains multiple values for kind field (kind: in {set, of, values})

Element access

Where expr is a result of evaluation for case head. If case head is an identifier it will be used as-is in pat subsitution. Otherwise let expr = <case-head> will be injected in scope. expr is not gensymed, making it possible to access it when necessary.

  • (fld: "3"): Match field fld against "0". Generated access is expr.fld == "3".
  • ["2"]: Match first element of expression agains patt. Generate acess expr[0] == "2"
  • ("2"): Same as sequence
  • {"key": "val"}: First check "key" in expr and then expr["key"] == "val". No exception on missing keys, just fail match.

It is possible to have mixed assess for objects. Mixed object access via (gg: _, [], {}) creates the same code for checking. E.g ([_]) is the same as [_], ({"key": "val"}) is is identical to just {"key": "val"}. You can also call functions and check their values (like (len: _(it < 10)) or (len: in {0 .. 10})) to check for sequence length.

Checks

  • Any operator with exception of is is considered final comparison and just pasted as-is into generated pattern match code. E.g. fld: in {2,3,4} will generate expr.fld in {2,3,4}
  • (fld: is Patt()) - check if expr.fld matches pattern Patt()
  • (fld: _.matchesPredicate()) - if call to matchesPredicate(expr.fld) evaluates to true.

Notation: <expr> refers to any possible combination of checks. For example

  • fld: in {1,2,3} - <expr> is in {1,2,3}
  • [_] - <expr> is _
  • fld: is Patt() - <expr> is is Patt()

Examples

  • (fld: 12) If rhs for key-value pair is integer, string or identifier then == comparison will be generated.
  • (fld: == ident("33")) if rhs is a prefix of == then == will be generated. Any for of prefix operator will be converted to expr.fld <op> <rhs>. 3
  • (fld: in {1, 3, 3}) or (fld: in Anything) creates fld.expr in Anything. Either in or notin can be used.

Variable binding

Match can be bound to new varaible. All variable declarations happen via @varname syntax.

  • To bind element to variable without any additional checks do: (fld: @varname)
  • To bind element with some additional operator checks do:
    • (fld: @varname <operator> Value) first perform check using <operator> and then add Value to @varname
      • (fld: @hello is ("2" | "3"))
  • Predicate checks: fld: @a.matchPredicate()
  • Arbitrary expression: fld: @a(it mod 2 == 0). If expression has no type it is considered true.

Bind order

Bind order: if check evaluates to true variable is bound immediately, making it possible to use in other checks. [@head, any @tail != head] is a valid pattern. First match head and then any number of @tail elements. Can use any _(if it != head: tail.add it) and declare tail externally.

Variable is never rebound. After it is bound, then it will have the value of first binding.

Bind variable type

  • Any variadics are mapped to sequence
  • Only once in alternative is option
  • Explicitly optional is option
  • Optional with default value is regular value
  • Variable can be used only once if in alternative
Pattern Ijected variables
[@a] var a: typeof(expr[0])
{"key": @val} var val: typeof(expr["key"])
[all @a] var a: seq[typeof(expr[0])]
[opt @val] var a: Option[typeof(expr[0])]
[opt @val or default] var a: typeof(expr[0])
(fld: @val) var val: typeof(expr.fld)

Matching different things

Sequence matching

Input sequence: [1,2,3,4,5,6,5,6]

Pattern Result Comment
[_] Fail Input sequence size mismatch
[.._] Ok
[@a] Fail Input sequence size mismatch
[@a, .._] Ok, a = 1
[any @a, .._] Error 4
[any @a.(it < 10)] Ok, a = [1..6] Capture all elements that match
[until @a == 6, .._] Ok All until first ocurrence of 6
[all @a == 6, .._] Ok a = [] All leading 6
[any @a.(it > 100)] Fail No elements > 100
[none @a.(it in {6 .. 10})] Fail There is an element == 6
  • until: non-greedy. Match everything until <expr>
    • until <expr>: match all until frist element that matches Expr
  • all: greedy. Match everything that matches <expr>
    • all <expr>: all elements should match Expr
    • all @val is <expr>: capture all elements in @val if <expr> is true for every one of them.
  • opt: Single element match
    • opt @a: match optional element and bind it to a
    • opt @a or "default": either match element to a or set a to "default"
  • any: greedy. Consume all sequence elements until the end and succed only if any element has matched.
    • any @val is "d": capture all element that match is "d"
  • none: greedy. Consume all sequence elements until the end and succed only if any element has matched.

Greedy patterns match until the end of a sequence and cannot be followed by anything else.

For sequence to match is must either be completely matched by all subpatterns or have trailing .._ in pattern. 5.

Sequence Pattern Match result
[1,2,3] [1,2] Fail
[1, .._] Ok
[1,2,_] Ok

More use examples

  • capture all elements in sequence: [all @elems]
  • get all elements until (not including "d"): [until @a is "d"]
  • All leading "d": [all @leading is "d"]
  • Match first two elements and ignore the rest [_, _, .._]. 6
  • Match optional third element [_, _, opt @trail]
  • Match third element and if not matched use default value [_, _, opt @trail or "default"]
  • Capture all elements until first separator: [until @leading is "sep", @middle is "sep", all @trailing]
  • Extract all conditions from IfStmt: IfStmt([all ElseIf([@cond, _]), .._])

Tuple matching

Input tuple: (1, 2, "fa")

Pattern Result Comment
(_, _, _) Ok Match all
(@a, @a, _) Fail 7
(@a is (1 | 2), @a, _) Error 8
(1, 1 | 2, _) Ok

Case object matching

Input AST

ForStmt
  Ident "i"
  Infix
    Ident ".."
    IntLit 1
    IntLit 10
  StmtList
    Command
      Ident "echo"
      IntLit 12
  • ForStmt([== ident("i"), .._]) Only for loops with i as variable
  • ForStmt([@a is Ident(), .._]) Capture for loop variable
  • ForStmt([@a.isTuple(), .._]) for loops in which first subnode satisfies predicate isTuple(). Bind match to a
  • ForStmt([_, _, (len: in {1 .. 10})]) between one to ten statements in the for loop body

KV-pairs matching

Input json string

{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}
  • Get input ["menu"]["file"] from node and

    case inj:
      of {"menu" : {"file": @file is JString()}}:
        # ...
      else:
        raiseAssert("Expected [menu][file] as string, but found " & $inj)

Option matching

Some(@x) and None() is a special case that will be rewritten into (isSome: true, get: @x) and (isNone: true) respectively. This is made to allow better integration with optional types. 9.

Footnotes

1 it might be possible to generate
input sequence if only items is defined

2 you can,
technically match arrays using the same syntax since generated
access paths are the same

3 even though it opens possibilities for
various hacks it is advisable to use only comparison operators
like ==, or =~

4 consecutiveGreedy

5 Trailing .._
does not generate exchaustive checks for sequence match, so you can
use it to do things like pattern match next tokens in lexer for
example. This is one of the main use cases that I have in min for
sequence matching - you only need to have len and [], so it should
be pretty easy to adapt anything basically

6 actually you can also use (len: _(it >= 2)) - it will
generate call to expr.len which is perfectly valid. It is not
strictly necessary to only match sequences with [...] and objects
with (fld: )

7 First match for a has value 1 but second one should be equal to 2

8 Binding of variables to pattern that contains alternative is only supported if varible occurs once in the expression. Reason: there is no backtracking for match. First tuple element matches to 1, but then fails on second element. To make correct match we would have to backtrack, rebind @a to two and then try binding again.

9 if your
enum contains enSome and enNone you can match for elements using
enNone(fld: @val) or (kind: enNone, fld: @val). Option is much
more common case so it takes priority and special-cased

@haxscramper
Copy link

haxscramper commented Sep 28, 2020

@alehander92 initial implementation as described in the draft is mostly complete (sets and predicate calls (@a.matches) are not finished & I have some other ideas that I need to test). I will be adding more unit test and searching for potential flaws in design/edge cases + improving documentation.

You can try current version by installing hmisc@#head - right now implementation depends on some other modules from the library; I will of course clean this up before PRing it, but now I just used it to save time. Minimal standalone use example

In addition to syntax described above I would also like to introduce two operators

template `:=`*(lhs, rhs: untyped): untyped = assertMatch(rhs, lhs)
template `?=`*(lhs, rhs: untyped): untyped = matches(rhs, lhs)

Which would allow sequence and nested tuple unpacking, similar to #194, unpack and definesugar module. rust-like iflet macro (even including some features from RFC) - all of that, basically for free (two-line implementation)

    (@a, (@b, @c), @d) := (1, (2, 3), 4)
    [all @head] := [1,2,3]

    if (Some(@x) ?= some("hello")) and
       (Some(@y) ?= some("world")):
      assertEq x, "hello"
      assertEq y, "world"
    else:
      discard

@haxscramper
Copy link

Implementation is completed (feature-wise) in nim-lang/fusion#33. Some additional notes in this comment, but everything else is complete (as far as I'm concerned)

@juancarlospaco
Copy link
Contributor

This bug should be closed, pattern matching already exists and works.

@konsumlamm
Copy link

konsumlamm commented Jan 28, 2021

I'm not sure if it's already too late to change stuff, but here are some opinions (<pattern> means an arbitrary pattern):

  • Nested matching (and in general, the different kinds of patterns, like sequences, objects etc.) seems to have inconsistent syntax, for example to match for a nested pattern in objects, you need to use (field: is <pattern>), while in sequences, you can nest a pattern directly ([<pattern>]). (EDIT: seems like this is not the case and you can just leave out the is, so I don't see the point of having it)

  • Using operators (@a == 42), predicates (@a.matchesPredicate()) and trailing expressions with it (@a(it mod 2 == 0)) don't really fit in, it feels more like a tacked on, overloaded DSL. Instead, I think something like trailing guards (putting if <condition> or and condition after the pattern) and as-patterns (a pattern where the value is bound to a name) (EDIT: this already exists with is) would be better.

  • Some should allow matching for Some(<pattern>) (but I assume something like that is planned). I also don't really like that Option is special cased (it's in the stdlib, so it's not that bad, but still), with custom unpackers (see next point), that should be possible to solve.

  • An option to easily create new patterns would be nice (something like unpackers in gara or the unapply method in Scala). This should also allow to implement matching on Options in a library, without special casing it, by creating custom patterns for Some and None.

Don't get me wrong, I love pattern matching and I appreciate the effort you make, but I think we can do better (especially since this is in fusion now and will likely become the recommended pattern matching implementation for Nim).

@Araq
Copy link
Member

Araq commented Jan 28, 2021

It's not too late but further progress needs my upcoming "let/var inside expressions" RFC. IMHO.

@haxscramper
Copy link

I think this RFC can be closed, because there nothing else to add to it. "let expressions" should be discussed in a separate RFC (when it comes out), and particular details of pattern matching implementation should be discussed in fusion.

@levovix0
Copy link

levovix0 commented May 24, 2021

Is it possible to make capturing matched values to variables not so ugly?
for example

const b = 2

case some_seq
of [a, (b)]: ... # b is constant and = 2
of [a, b]: ... # b = some_seq[1]
else: ...

instead of

const b = 2

case some_seq
of [@a, b]: ... # b is constant and = 2
of [@a, @b]: ... # b = some_seq[1]
else: ...

@haxscramper
Copy link

haxscramper commented May 24, 2021

Current implementation won't be changed, but when/if we get let expressions this part of the syntax could be revised. Drop the requirements for @ prefix and instead require use ==myVarToCompare when equality comparison is necessary.

@al6x
Copy link

al6x commented Jun 2, 2021

A bit late to the party. It's strange that among other langauges the Elixir/Erlang was not mentioned. As it has one of the most powerfull and elegant pattern matching capabilities https://elixir-lang.org/getting-started/pattern-matching.html and https://elixir-lang.org/getting-started/case-cond-and-if.html

@haxscramper
Copy link

haxscramper commented Jun 2, 2021

Prolog also wasn't mentioned, and it seems like elixir adopted a large portion of the list operations from its [head | tail] syntax. Variable pinning via ^ is probably better than current @var syntax. Patterns for function arguments is an interesting idea (and it can be implemented with fusion/matching matching easily (someone already asked about this, so I have a working example)):

import fusion/matching
import std/[macros, options]

{.experimental: "caseStmtMacros".}

macro matchCall(procs: untyped): untyped =
  var patterns: seq[tuple[pattern: NimNode, funcName: string]]

  result = newStmtList()

  var topParams: tuple[name: string, arg0, type0Name, returnType: NimNode]

  for idx, pr in pairs(procs):
    pr.assertMatch:
      ProcDef:
        Ident(strVal: @name)
        @trmTemplate # Term rewriting template
        @genParams # Generic params
        FormalParams:
          @returnType
          all @arguments

        @pragmas
        _ # Reserved
        @implementation

    topParams.name = name

    arguments[0].assertMatch: # FIXME handles only one-argument functions
      IdentDefs:
        @arg0Name
        CurlyExpr[@type0Name, @pattern]
        _

    topParams.returnType = returnType
    topParams.type0Name = type0Name
    topParams.arg0 = arg0Name

    let funcName = name & "impl" & $idx
    result.add nnkProcDef.newTree(
      ident(funcName),
      trmTemplate,
      genParams,
      nnkFormalParams.newTree(@[
        returnType, nnkIdentDefs.newTree(arg0Name, type0Name, newEmptyNode())
      ]),
      pragmas,
      newEmptyNode(),
      implementation
    )


    patterns.add((pattern, funcName))

  var dispatchImpl = nnkCaseStmt.newTree(topParams.arg0)
  for (patt, funcName) in patterns:
    dispatchImpl.add nnkOfBranch.newTree(
      patt, nnkReturnStmt.newTree(newCall(funcName, topParams.arg0)))

  result.add nnkProcDef.newTree(
    ident(topParams.name),
    newEmptyNode(),
    newEmptyNode(),
    nnkFormalParams.newTree(@[
      topParams.returnType,
      nnkIdentDefs.newTree(topParams.arg0, topParams.type0Name, newEmptyNode())
    ]),
    newEmptyNode(),
    newEmptyNode(),
    dispatchImpl
  )

  echo result.repr
matchCall:
  proc pattern(a: NimNode{Infix[Ident(strVal: == "+"), .._]}): NimNode =
    echo "Matches plus"
    result = newEmptyNode()

  proc pattern(a: NimNode{Infix[Ident(strVal: == "-"), .._]}): NimNode =
    echo "Matches minus"
    result = newEmptyNode()


macro usesPattern(body: untyped): untyped =
  return pattern(body)

usesPattern(12 + 2)
usesPattern(2 - 2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants