Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Attempt to rewrite grammar rules correctly (#7) #18

Merged
merged 10 commits into from
May 7, 2014

Conversation

moshee
Copy link
Contributor

@moshee moshee commented Apr 30, 2014

Due to being machine converted several times and actually being an amalgam of Java, C, D, Python, a little Go, and some other syntaxes I can't recognize, the Go grammar has been having a bit of a hard time highlighting actual Go code. I'm sure many (at least #7) have noticed.

It was kind of bothering me so I deleted most of it and replaced it with a port of the "misc/vim" syntax in the standard Go distribution. Some things are missing, but for now, it's much more Go-compliant than before.

I realize this is a big rewrite but it needed to be done. Observe: before and after (images not inline because they're kinda big). I forgot to comment on this but you'll also notice that the imaginary number literals are not being highlighted at all.

I realize the features I have in here are probably not what everybody would like, so I welcome and encourage any PRs to my PR. In particular, I've noticed a few things:

  • TODO, etc highlight inside line comments (it works in block comments only)
  • Decide what to do with package.Type, package.Function, obj.field, etc. highlighting. It's impossible to tell what's a method/field/package/custom type/etc without using the type system (oracle integration, anyone?) but I didn't like highlighting everything as a Type either.
    • In x.Y, x could be an identifier or a type, and Y could be a type, constant, variable, field, or method/function.
    • In x.Y(z), Y could still be either a type (conversion), method, or function. It's safe to say that it is a method or function in x.Y(), however.
  • I've also tried to test it as well as I could, but I can't think of every possible syntax combination, and as such my test file probably isn't comprehensive. How does one even test a language grammar?
  • Some parts are probably also a bit redundant, but the tmBundle file layout was already confusing me enough before being converted to CSON, so I am not terribly well-versed in the format.
  • Highlighting of format verbs in printf strings
  • And so on.

Due to being machine converted several times and actually being an
amalgam of Java, C, D, Python, a little Go, and some other syntaxes I
can't recognize, the Go grammar has been having a bit of a hard time
highlighting actual Go code.

It was kind of bothering me so I deleted most of it and replaced it with
a port of the "misc/vim" syntax in the standard Go distribution. Some
things are missing, but for now, it's much more Go-compliant than
before.
@joefitzgerald
Copy link
Contributor

Thanks @moshee! This is a step in the right direction. I do think that your 2nd point is particularly important. os should be visually distinct from File when os.File is typed.

Scott Barron (@rubyist) is integrating oracle with go-plus currently - if grammars can be modified / augmented programmatically (can they, @kevinsawicki?), it would follow that we could use that integration (when oracle is present) to add additional highlighting. We should probably have some naive highlighting even when oracle is not present.

We should be able to add specs to the package and write tests against it. Have you tried and run into issues?

@@ -2,235 +2,87 @@
'fileTypes': [
'go'
]
'foldingStartMarker': '(?x)/\\*\\*(?!\\*)|^(?![^{]*?//|[^{]*?/\\*(?!.*?\\*/.*?\\{)).*?\\{\\s*($|//|/\\*(?!.*?\\*/.*\\S))'
'foldingStopMarker': '(?<!\\*)\\*\\*/|^\\s*\\}'
'foldingStartMarker': '({|\\()\\s*$'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still ensure comments (block style /* ... */ and line style //) fold...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually noticed that my comments still fold. If this wasn't it, my guess is it was included from somewhere, but my Atom is pretty vanilla right now so unless there's something happening behind the scenes here, I dunno.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just fold when indentation increases, so only the stop marker really matters. It allows the fold to extend to the next line if it matches the pattern, even if it's outdented.

@kevinsawicki
Copy link
Contributor

@joefitzgerald There probably aren't sufficient grammar APIs for this currently but we can add them for something like this and atom/language-todo#4

@moshee
Copy link
Contributor Author

moshee commented Apr 30, 2014

We should be able to add specs to the package and write tests against it. Have you tried and run into issues?

Well, I'm not even sure where to start. I haven't seen another language-* repo with any tests so I didn't find anything to go off of.

@kevinsawicki
Copy link
Contributor

language-toml and language-gfm both have specs

@joefitzgerald
Copy link
Contributor

@moshee apm init -p test-package <-- steal the 'spec' directory from there

@moshee
Copy link
Contributor Author

moshee commented Apr 30, 2014

Ah thanks, I'll look into that when I get back this evening.

@rubyist
Copy link

rubyist commented Apr 30, 2014

I'm not sure oracle will be much help for syntax highlighting. You have to ask it questions about specific things and it's kind of slow.

@joefitzgerald
Copy link
Contributor

An alternative would be to write some helper that hooks into the golang AST package: (http://golang.org/pkg/go/ast/) to try to provide the required information.

@rubyist
Copy link

rubyist commented Apr 30, 2014

That could be interesting, fun, and a lot faster than oracle.

@joefitzgerald
Copy link
Contributor

On the plus side, said helper could be cross-platform 😉 but I'm not sure how @kevinsawicki feels about distributing go binaries with Atom, nor how the CI pipeline would work for something like that.

Let me clarify what I mean when I say go binaries: go apps (binaries) do not require a runtime - they natively execute on the platform they are running on (they're statically linked). Just didn't want you to think I'm suggesting shipping the entire go binary distribution.

@moshee
Copy link
Contributor Author

moshee commented Apr 30, 2014

You have to ask it questions about specific things and it's kind of slow.

Anything that can answer "what is this expression's type?" will do. go/ast might be cool, but yeah, anything is going to need an extra utility bundled. This might be in the scope of an external package instead of language-go, if other packages can inject highlighting after the fact.

@kevinsawicki
Copy link
Contributor

but I'm not sure how @kevinsawicki feels about distributing go binaries with Atom, nor how the CI pipeline would work for something like that.

We opted to ship ctags binaries on all platforms in the symbols-view package so I'm fine with it for the right cases.

You could also make downloading the binaries part of the install script, apm does this to install node to make sure a specific version is always available.

@joefitzgerald
Copy link
Contributor

OK, with that barrier crossed - got any plans for building an AST model that is language agnostic that we could plug it into? 😉 Thinking something like IntelliJ's model.

That was a bit facetious, but it would probably be helpful to constrain the scope of anything we do in this area or we'll end up building a new IDE.

  • Make tokens pretty
  • Stop

?

@kevinsawicki
Copy link
Contributor

OK, with that barrier crossed - got any plans for building an AST model that is language agnostic that we could plug it into? 😉 Thinking something like IntelliJ's model.

Long term, absolutely, @nathansobo has been keen on upping our parser game for some time now and will probably be leading that effort once we complete some of the other items first regarding performance and cross-platform support.

@moshee
Copy link
Contributor Author

moshee commented Apr 30, 2014

For now, how about this: highlight

identifier '.' identifier '('
               ^~~~~~~~~~

as function and

identifier '.' identifier [ NOT '(' | word-boundary ]
               ^~~~~~~~~~

as something temporary -- not storage.type, but maybe constant? Skip it entirely, even?

@joefitzgerald
Copy link
Contributor

In theory, any time you use a package identifier, followed by a period, followed by a UTF-8 'capital letter', you're accessing an exported token in a package. We should have a test for that 😉

Makes sense.

Numeric parsing was leaving off leading and trailing periods in numbers
like '.25' and '0.' because of erroneously putting word borders around
both sides. Each of those needs a special case.
@moshee
Copy link
Contributor Author

moshee commented May 1, 2014

Seems good enough to me for now. Does this need a .travis.yml?

The '(' was being included in the match and didn't get captured as an
operator by the appropriate matcher elsewhere.
It was incorrectly expecting the rest of the function (method)
declaration to follow it after a space.
@joefitzgerald
Copy link
Contributor

Wouldn't hurt to have a .travis.yml. Just use the contents from here: https://github.com/atom/language-toml/blob/master/.travis.yml

@kevinsawicki
Copy link
Contributor

I just enabled the travis hook for this repo

@kevinsawicki
Copy link
Contributor

Seems good enough to me for now.

I see there are still some unchecked items in the task list in this PR, are you planning to do this in separate PRs then?

@moshee
Copy link
Contributor Author

moshee commented May 1, 2014

Yeah, the way I see it, TODO highlighting is not critical, type-aware highlighting must wait for a more complex solution, and redundancy can be fixed over time if it causes trouble.

So if nobody finds any outstanding issues (please try using this first, somebody), then I'm done for now, at least.

@joefitzgerald
Copy link
Contributor

@moshee My gut has me wishing for a few more things:

  1. Highlight type names
  2. Highlight function names
  3. Highlight the name of a variable during declaration/initialization

@moshee
Copy link
Contributor Author

moshee commented May 2, 2014

type names

How do you suggest identifying those? Aside from the built-in types, I mean.

function names

Regular functions in calls, right? I already have the names in declarations and method/package qualified calls.

name of a variable

Let's see, how many possibilities are there?

var x int
x := 0
var (
    x int
    x = 0
)

Any others? Go's pretty predictable so I'd tend to think there aren't any corner cases, but you never know...

@joefitzgerald
Copy link
Contributor

type names
How do you suggest identifying those?

Anything following type separated by whitespace until the next whitespace.

type Animal struct {
     ^----^       
type Jumper interface {
     ^----^  

name of a variable
Let's see, how many possibilities are there?

In addition to your suggestions:

  • Anything on a line preceded only by white space (or white space and var) and then followed by = or :=.
  • Anything on a line preceded only by white space and then followed by a predeclared identifier for a type: http://golang.org/ref/spec#Predeclared_identifiers
  • Keep in mind that multiple variables can be declared / initialized using , (x, y := 0)

I'm sure that is not exhaustive. It seems like we should be developing a set of rules – based directly on the spec – that we can codify as tests and then satisfy in the grammar.

Variable assignment can be pretty complex[1]. The simple var
declarations matched in this commit are limited to "var <name>" and
"<name> :=".
@moshee
Copy link
Contributor Author

moshee commented May 2, 2014

I got type and func call in, but I stopped after the single "var" declaration after I realized how hard it would be to support all of this (from the spec):

var i int
var U, V, W float64
var k = 0
var x, y float32 = -1, -2
var (
    i       int
    u, v, s = 2.0, 3.0, "bar"
)
var re, im = complexSqrt(-1)
var _, found = entries[name]

Multiple declaration seems particularly troublesome. This also raises questions about other things like const and type declaration blocks.

Something else I'm wondering about is how to handle this:

func (g *Gopher) Dig()
//   ^  ^      ^ should be keyword.operator
//       ^~~~~~ should be storage.type

My regex-fu is showing its limitations here...perhaps we can defer this until later/give it to someone else? I've added some tests (and some disabled ones), whoever works on this can easily add the other cases as needed.

@MrTravisB
Copy link

Let me just say "THANK YOU" for working on this. This has been driving me insane and I will be patiently awaiting this update. 👍

@nathansobo
Copy link
Contributor

This is a huge change and I'm not a Go user yet. But I'm really happy to see the addition of spec coverage. Do we have a couple people who have tried this branch out that can help me at least get some independent verification that it's in good shape?

@joefitzgerald
Copy link
Contributor

@nathansobo I have tried it out, and it is much better; we still have a ways to go, but this moves us forward from the current state and is worth merging on that basis.

nathansobo pushed a commit that referenced this pull request May 7, 2014
Attempt to rewrite grammar rules correctly (#7)
@nathansobo nathansobo merged commit 069e822 into atom:master May 7, 2014
@nathansobo
Copy link
Contributor

Lets do it then! Thanks for your excellent contributions everyone!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants