Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Literal into several subtypes #4198

Merged
merged 1 commit into from
Mar 5, 2016

Conversation

lydell
Copy link
Collaborator

@lydell lydell commented Feb 2, 2016

Previously, the parser created Literal nodes for many things. This resulted in
information loss. Instead of being able to check the node type, we had to use
regexes to tell the different types of Literals apart. That was a bit like
parsing literals twice: Once in the lexer, and once (or more) in the compiler.
It also caused problems, such as this and this being indistinguishable
(fixes #2009).

Instead returning new Literal in the grammar, subtypes of it are now returned
instead, such as NumberLiteral, StringLiteral and IdentifierLiteral. new Literal by itself is only used to represent code chunks that fit no category.

StringWithInterpolations has been added as a subtype of Parens, and
RegexWithInterpolations as a subtype of Call. This makes it easier for other
programs to make use of CoffeeScript's "AST" (nodes). For example, it is now
possible to distinguish between "a #{b} c" and "a " + b + " c". Fixes #4192.

SuperCall has been added as a subtype of Call.

Note, though, that some information is still lost, especially in the lexer. For
example, there is no way to distinguish a heredoc from a regular string, or a
heregex without interpolations from a regular regex.

After the new subtypes were added, they were taken advantage of, removing most
regexes in nodes.coffee. SIMPLENUM (which matches non-hex integers) had to be
kept, though, because such numbers need special handling in JavaScript (for
example in 1..toString()).

An especially nice hack to get rid of was using new String() for the token
value for reserved identifiers (to be able to set a property on them which could
survive through the parser). Now it's a good old regular string.

In range literals, slices, splices and for loop steps when number literals
are involved, CoffeeScript can do some optimizations, such as precomputing the
value of, say, 5 - 3 (outputting 2 instead of 5 - 3 literally). As a side
bonus, this now also works with hexadecimal number literals, such as 0x02.

Finally, this also improves the output of coffee --nodes:

# Before:
$ bin/coffee -ne 'while true
  "#{a}"
  break'
Block
  While
    Value
      Bool
    Block
      Value
        Parens
          Block
            Op +
              Value """"
              Value
                Parens
                  Block
                    Value "a" "break"

# After:
$ bin/coffee -ne 'while true
  "#{a}"
  break'
Block
  While
    Value BooleanLiteral: true
    Block
      Value
        StringWithInterpolations
          Block
            Op +
              Value StringLiteral: ""
              Value
                Parens
                  Block
                    Value IdentifierLiteral: a
      StatementLiteral: break

@vendethiel
Copy link
Collaborator

Looks pretty good on the surface.

Can you please add a test with the new (fixed) behavior of this (not getting rewritten to _this in a bound function)?

@lydell
Copy link
Collaborator Author

lydell commented Feb 2, 2016

I actually thought about adding a test, but then I chose not to because I wasn’t sure if we’d then lock us down on implementation details. I was thinking something like this:

nonce = {}
fn = null
(->
  fn = => `this`
).call null
eq nonce, fn.call nonce

But let’s say we wanted to compile => to ES2015 => – then the test wouldn’t work anymore. But I can add a test if that’s wanted (but not tonight).

@bjmiller
Copy link

bjmiller commented Feb 2, 2016

This may be a little out of scope, but I don't think that we ever want to compile CS => to ES6 arrow functions. The semantics are very different, and they would be breaking changes for each other. There may need to be, unfortunately, another symbol. :-/

isAssignable: NO
isComplex: NO
compileNode: (o) ->
code = if o.scope.method?.bound then o.scope.method.context else @value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is indented too much, I think.

@lydell
Copy link
Collaborator Author

lydell commented Feb 3, 2016

Fixed indentation, added test … and also removed parseNum() (turns out Number() does the job).

(Regarding the test: What if we decide to compile to ES5 some day? (foo) =>function(foo) {}.bind(this)? :) But whatever, we'll think about that then.)

@lydell
Copy link
Collaborator Author

lydell commented Feb 9, 2016

@michaelficarra Do you have any opinion on this PR? (Asking since you 👍ed #4192.)

isComplex: NO
compileNode: -> [@makeCode "null"]
class exports.Null extends Literal
constructor: -> super 'null'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ThisLiteral constructor is on two lines and this is on one line. Be consistent.

@michaelficarra
Copy link
Collaborator

Agreed with @vendethiel about super having its own node. But I think it should be done as special case subclasses of Call and Access. That's the only way super is used after all.

@lydell
Copy link
Collaborator Author

lydell commented Feb 9, 2016

Special case of Call and Access? Mind elaborating on that?

@michaelficarra
Copy link
Collaborator

Sorry, I had ECMAScript super on the mind. You're right, just Call.

@lydell
Copy link
Collaborator Author

lydell commented Feb 9, 2016

I've fixed most things now. Would be happy for some feedback regarding IdentifierLiteral and SuperCall.

nonceB = {}
fn = null
(->
fn = => [this is nonceA, `this` is nonceB]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fn = => this is nonceA and `this` is nonceB

And then you can just ok it below.

]

# Alphanumerics are separated from the other **Literal** matchers because
# they can also serve as keys in object literals.
AlphaNumeric: [
o 'NUMBER', -> new Literal $1
o 'NUMBER', -> new NumberLiteral $1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to separate InfinityLiteral from NumberLiteral. Whenever parseFloat returns an infinity value, we can render it simply as 2e308.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ bin/coffee -ne 'Infinity'
Block
  Value IdentifierLiteral: Infinity

Hehe.

CoffeeScript actually allows Infinity = 0. (That's not valid in strict mode.) Not sure if it's intentional.

Don't really know what to do now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we don't really do "strict"... #2337 has been open for a while.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we have test/strict.coffee. #1547

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lydell I'm not talking about a reference to the Infinity global, I'm talking about infinity values (such as the initial value of the Infinity global or the value created by the literals 2e308 or 500 9s in a row).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what you mean?

exports.NumberLiteral = class NumberLiteral extends Literal
  constructor: (@value) ->
    return new InfinityLiteral if parseFloat(@value) is Infinity

exports.InfinityLiteral = class InfinityLiteral extends Literal
  constructor: ->
    super '2e308'

Or should we do the parseFloat check already in the lexer and output an INFINITY token?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter please. Sorry for the delay here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (See the next line.)


test "Infinity", ->
eq Infinity, CoffeeScript.eval "0b#{Array(1024 + 1).join('1')}"
eq Infinity, CoffeeScript.eval "0o#{Array(342 + 1).join('7')}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above two lines actually failed before adding InfinityLiteral. Invalid JS was generated: 0xInfinity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 😄.

@lydell
Copy link
Collaborator Author

lydell commented Mar 5, 2016

InfinityLiteral is done. I’ll now continue with improving IdentifierLiteral.

@michaelficarra
Copy link
Collaborator

@lydell Maybe submit it as a separate PR? This should be good to merge right now, right?

Previously, the parser created `Literal` nodes for many things. This resulted in
information loss. Instead of being able to check the node type, we had to use
regexes to tell the different types of `Literal`s apart. That was a bit like
parsing literals twice: Once in the lexer, and once (or more) in the compiler.
It also caused problems, such as `` `this` `` and `this` being indistinguishable
(fixes jashkenas#2009).

Instead returning `new Literal` in the grammar, subtypes of it are now returned
instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new
Literal` by itself is only used to represent code chunks that fit no category.
(While mentioning `NumberLiteral`, there's also `InfinityLiteral` now, which is
a subtype of `NumberLiteral`.)

`StringWithInterpolations` has been added as a subtype of `Parens`, and
`RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other
programs to make use of CoffeeScript's "AST" (nodes). For example, it is now
possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192.

`SuperCall` has been added as a subtype of `Call`.

Note, though, that some information is still lost, especially in the lexer. For
example, there is no way to distinguish a heredoc from a regular string, or a
heregex without interpolations from a regular regex. Binary and octal number
literals are indistinguishable from hexadecimal literals.

After the new subtypes were added, they were taken advantage of, removing most
regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be
kept, though, because such numbers need special handling in JavaScript (for
example in `1..toString()`).

An especially nice hack to get rid of was using `new String()` for the token
value for reserved identifiers (to be able to set a property on them which could
survive through the parser). Now it's a good old regular string.

In range literals, slices, splices and for loop steps when number literals
are involved, CoffeeScript can do some optimizations, such as precomputing the
value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side
bonus, this now also works with hexadecimal number literals, such as `0x02`.

Finally, this also improves the output of `coffee --nodes`:

    # Before:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value
          Bool
        Block
          Value
            Parens
              Block
                Op +
                  Value """"
                  Value
                    Parens
                      Block
                        Value "a" "break"

    # After:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value BooleanLiteral: true
        Block
          Value
            StringWithInterpolations
              Block
                Op +
                  Value StringLiteral: ""
                  Value
                    Parens
                      Block
                        Value IdentifierLiteral: a
          StatementLiteral: break
@lydell
Copy link
Collaborator Author

lydell commented Mar 5, 2016

Ok, I’ll do it in a separate PR. Yes, I consider this PR good to merge.

@vendethiel
Copy link
Collaborator

👍

michaelficarra added a commit that referenced this pull request Mar 5, 2016
Refactor `Literal` into several subtypes
@michaelficarra michaelficarra merged commit 8afb7cc into jashkenas:master Mar 5, 2016
@michaelficarra
Copy link
Collaborator

Thanks. Great work, @lydell.

@lydell lydell deleted the node-types branch March 5, 2016 16:29
@lydell
Copy link
Collaborator Author

lydell commented Mar 5, 2016

The follow-up PR became three: #4219, #4220 and #4221.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expand string interpolation expressions into dedicated AST Bug with backtick
5 participants