Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable JuliaSyntax.jl as the default Julia parser #46372

Merged
merged 9 commits into from
Jun 19, 2023

Conversation

c42f
Copy link
Member

@c42f c42f commented Aug 17, 2022

This enables the use of JuliaSyntax.jl as the Julia parser for the runtime, which greatly improves parser error messages in various cases. As an experimental feature, this is only enabled if the JULIA_USE_NEW_PARSER=true environment variable is set. Update: it's the default!

20220817_17h14m06s_grim

For now I've installed it after bootstrapping because this is the simplest (flisp parser still used for bootstrap). In the future we can solve bootstrapping but this isn't necessary right away.

API Versioning: Stdlib woes vs vendoring

The simplest way to get this working immediately was as a stdlib. However in retrospect this has several downsides:

  • As only version JuliaSyntax-0.1, the JuliaSyntax API is not at all stable. The only stable API for using this from Base should be Meta.parse() and Expr. Therefore, I think users should not really be able to do using JuliaSyntax and get the version bundled with Base.
  • For use in tooling like the language server, it's necessary for the JuliaSyntax API (custom syntax tree types etc) to be able to evolve with its own versioning scheme independent from whatever version of Base someone is using. Users should be able to install a "normal" version of JuliaSyntax via Pkg.

So I think having this as a stdlib doesn't really make sense and we are left with some kind of vendoring approach. I think what I'll try next is to include the parser at the end of building Base.

Having said all that, it should be easy to try this out already :-)

Fixes #31449

@ViralBShah ViralBShah added the parser Language parsing and surface syntax label Aug 17, 2022
base/boot.jl Outdated Show resolved Hide resolved
@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 15, 2022

Can we drop the implied Mmap usage and dependency? Doing mmap leads to segfaults and bus errors, where a read would have just gotten a permission error or file truncation error, and can lead to other problems like that, with cleanup and continued edits by the user on the original file.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 15, 2022

It is really cool to see how small this is! And exciting!

@c42f
Copy link
Member Author

c42f commented Sep 16, 2022

Can we drop the implied Mmap usage and dependency

Yes :-) Already done on a WIP branch of mine, will be merged soon.

BTW, for anyone wondering what's happening here - all the development is happening over at JuliaSyntax.jl which is why this PR is just sitting around at the moment. I'll get back to this PR once I've sorted out a few remaining large usability problems (Particularly generating Expr(:incomplete) for REPL completions. Which isn't hard in itself, but has proven to be a bit of a rabbit hole leading to other things.)

@oscardssmith
Copy link
Member

Now that version 0.2 of JuliaSyntax has been released, can this PR be updated to use it? We have 2 weeks left if we want to merge this in the 1.9 timeframe.

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Sep 28, 2022

only enabled if the JULIA_USE_NEW_PARSER=true environment variable is set.

Can it be merged, and with it on by default? Would it be good for PkgEval, or should it alternatively be run with:

$ JULIA_USE_NEW_PARSER=true julia

I'm not necessarily thinking it should be in Julia 1.9-rc1, it just think it might be good to have it from now or the feature freeze (up to then) to check, and not have to explain the ENV to users (and you will likely not get feedback from master users with a non-default option), though this should also be a possibility:

$ JULIA_USE_NEW_PARSER=false julia

@c42f
Copy link
Member Author

c42f commented Oct 2, 2022

We have 2 weeks left if we want to merge this in the 1.9 timeframe.

Yep I've got julia-1.9 in mind for this. Will probably do a JuliaSyntax-0.3 release prior to that.

Can it be merged, and with it on by default

I don't think we're ready to turn this on by default. PkgEval would be good though...

@maleadt
Copy link
Member

maleadt commented Oct 3, 2022

PkgEval would be good though...

PkgEval currently doesn't configuring environment flags, so to test this either add a commit that enables the new parser by default, or add a Make flag that results in the new parser being used by default (which can then be set using the configuration keyword to Nanosoldier's runtests, resulting in a Make.user entry).

@maleadt
Copy link
Member

maleadt commented Oct 5, 2022

Tried running this under PkgEval with the new parser enabled by default (using the latest JuliaSyntax.jl, not the one from this PR), but Julia fails to bootstrap:

error during bootstrap:
LoadError("sysimg.jl", 19, LoadError("/source/usr/share/julia/stdlib/v1.9/JuliaSyntax/src/JuliaSyntax.jl", 1, LoadError("/source/usr/share/julia/stdlib/v1.9/JuliaSyntax/src/parser_api.jl", 137, LoadError("/source/usr/share/julia/stdlib/v1.9/JuliaSyntax/src/parser_api.jl", 137, UndefRefError()))))
getproperty at ./Base.jl:37 [inlined]
getindex at ./refvalue.jl:56 [inlined]
docm at ./docs/Docs.jl:521
unknown function (ip: 0x7f90895bcc8d)
_jl_invoke at /source/src/gf.c:2447 [inlined]
ijl_apply_generic at /source/src/gf.c:2629
jl_apply at /source/src/julia.h:1866 [inlined]
do_apply at /source/src/builtins.c:730
@doc at ./boot.jl:534

I'll probably add an environment option to PkgEval then.

@maleadt
Copy link
Member

maleadt commented Oct 5, 2022

@nanosoldier runtests(ALL, vs = "%self", configuration = (buildflags=["DEPS_GIT=JuliaSyntax"], environment=["JULIA_USE_NEW_PARSER=true"]))

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@maleadt
Copy link
Member

maleadt commented Oct 5, 2022

Packages that seem to occur a lot in failure logs: Macrotools, Unitful, and CpuId.

@oscardssmith
Copy link
Member

Unitful, AbstractAlgebra, and OrdinaryDiffeq all have failures that I have filed issues about.

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Oct 5, 2022

In total, 8352 packages were tested, out of which 3596 succeeded, 3615 failed and 1141 were skipped.

Testing took 5 hours, 3 minutes, 31 seconds (or, sequentially, 20 days, 6 hours, 50 minutes, 52 seconds to execute 16704 package tests suites).

This seems rather bad, but comparing to some random other PkgEval, about half failing isn't unusual (for all kinds of reasons including syntax):

#46372 (comment)

I'm amazed by how long it takes (and not scalable, with package ecosystem increasing), but is the timing maybe (only slightly?) better here?

This one has timing: #45646 (comment)

@maleadt
Copy link
Member

maleadt commented Oct 5, 2022

I'm amazed by how long it takes (and not scalable, with package ecosystem increasing), but is the timing maybe (only slightly?) better here?

The timings cannot be compared, as quite some packages now fail to precompile and thus never execute any test.

@oscardssmith
Copy link
Member

It is honestly kind of silly to use PkgEval to test this, since we could just test that everything parses the same between the 2 versions (which would allow us to skip loading the code and running the tests).

@Krastanov
Copy link

Krastanov commented Oct 5, 2022

The CpuId failure is due to something like commas after new lines:

julia> [1
       , 2]
ERROR: ParseError:
Error: Expected `]`
@ REPL[33]:2:1
[1
, 2]

However, that parses fine if you use the current main branch of JuliaSyntax.jl

@oscardssmith
Copy link
Member

yeah. this pr is still using 0.1 but should be using 0.2

@KristofferC
Copy link
Sponsor Member

It is honestly kind of silly to use PkgEval to test this, since we could just test that everything parses the same between the 2 versions (which would allow us to skip loading the code and running the tests)

Using existing infrastructure to quickly get a result instead of having to write something tailor made that wouldn't have gotten any better results in the end seems to me like an efficient use of resources, not something silly.

@oscardssmith
Copy link
Member

fair :). I'm coming from the perspective that we'll probably have to run this between 10 and 100 times, at which point the balance might shift.

@oscardssmith
Copy link
Member

especially since the custom version could give more information per run since pkgeval doesn't give us errors for packages that have a failing dependency and does give us spurious errors for the thousand or so packages with bad CI

c42f added 3 commits June 17, 2023 15:25
This is more consistent with the way we're likely call it from the Julia
side via Meta.parse().
REPL completion of paths within strings need to be escaped according to
the usual escaping rules, and delimited by the starting " rather than
whitespace.

This differs from completion of paths within cmd backticks which need to
be escaped according to shell escaping rules.

Separate these cases and fix string escaping.

This was found because JuliaSyntax emits an Expr(:error) rather than
Expr(:incomplete) for paths inside strings with invalid escape sequences
before whitespace.
* Vendor JuliaSyntax into Base via deps directory
* Install JuliaSyntax as the Julia parser unless the environment
  variable JULIA_USE_NEW_PARSER=0 is set.
* Add a function to set the Core._parse binding.
  Required because we'd like to set the binding during Base.__init__.
  This can be done with `Core.eval` but that doesn't work well in
  incremental compilation mode.

Also accommodate JuliaSyntax within tests:

* When JuliaSyntax is enabled, ignore error messages in parser tests
  which are tested separately upstream - error messages are inherently
  expressed a bit differently when they go alongside full source
  location info.
* Accommodate a small number of incompatibilities where in JuliaSyntax
  - `import .Mod.x as (a.b)` is a syntax not lowering error
  - `f(2x for x=1:10, y` is `Expr(:incomplete)` not `Expr(:error)`
  - `incomplete_tag` is more precise for `:block` vs `:other`
  - `global const` without an assignment is a syntax error, in keeping
    with plain `const` without assignment being a syntax error (not
    lowering error).
* Adjust a few tests to be more precise about testing lowering vs the
  parser.
* Make Meta.parse doctest compatible with JuliaSyntax errors
@c42f
Copy link
Member Author

c42f commented Jun 17, 2023

Any idea what's up with the i686 windows build?

We seem to have an LLVM out of memory error. But I think a previous build on this same branch succeeded, then failed, then succeeded again with this same error on the i686 build.

So it might be build environment related?

Snip of build log:

Sysimage built. Summary:
Base ────────  67.772410 seconds 53.4589%
Stdlibs ─────  59.000268 seconds 46.5394%
Total ─────── 126.774893 seconds
make[1]: Leaving directory '/c/workdir'
make[1]: Entering directory '/c/workdir'
 cd /c/workdir/base && if ! JULIA_BINDIR=`cygpath -w /c/workdir/usr/bin` WINEPATH="`cygpath -w /c/workdir/usr/bin`;$WINEPATH" JULIA_LOAD_PATH='@stdlib' JULIA_PROJECT= JULIA_DEPOT_PATH=':' JULIA_NUM_THREADS=1  /c/workdir/usr/bin/julia.exe -O3 -C "pentium4;sandybridge,-xsaveopt,clone_all" --output-o `cygpath -w /c/workdir/usr/lib/julia/sys-o.a`.tmp  --startup-file=no --warn-overwrite=yes --sysimage `cygpath -w /c/workdir/usr/lib/julia/sys.ji` `cygpath -w /c/workdir/contrib/generate_precompile.jl` 1; then echo '*** This error is usually fixed by running `make clean`. If the error persists, try `make cleanall`. ***'; false; fi 
Collecting and executing precompile statements
└ Collect (Basic: ✓ 819, REPL LLVM ERROR: out of memory
Allocation failed

[7388] signal (22): SIGABRT
in expression starting at none:0
crt_sig_handler at C:/workdir/src\signals-win.c:100
raise at C:\Windows\System32\msvcrt.dll (unknown line)
abort at C:\Windows\System32\msvcrt.dll (unknown line)
.text$_ZN4llvm22report_bad_alloc_errorEPKcb at C:\workdir\usr\bin\libLLVM-15jl.dll (unknown line)
Allocations: 86200899 (Pool: 86152611; Big: 48288); GC: 250
*** This error is usually fixed by running `make clean`. If the error persists, try `make cleanall`. ***
make[1]: *** [sysimage.mk:93: /c/workdir/usr/lib/julia/sys-o.a] Error 1
make[1]: Leaving directory '/c/workdir'
🚨 Error: The command exited with status 2
^^^ +++
^^^ +++
user command error: The plugin docker command hook exited with status 2
~~~ Running plugin docker pre-exit hook
C:\buildkite-agent\plugins\github-com-buildkite-plugins-docker-buildkite-plugin-v3-13-0\hooks\pre-exit
cd C:\buildkite-agent\builds\win2k22-amdci6-7\julialang\julia-master\c\buildkite-agent\builds\win2k22-amdci6-7\julialang\julia-master

@giordano
Copy link
Contributor

@gbaraldi said that error has been seen in other PRs, so it doesn't seem to be specific to this one.

@gbaraldi
Copy link
Member

Well, I've seen it in #50144, which is a PR that does mess with the GC, but I do think we near the limit when compiling the sysimg at 32 bit.

@giordano
Copy link
Contributor

#50205 it's broken also on master.

@c42f
Copy link
Member Author

c42f commented Jun 18, 2023

@giordano thanks, what a relief. I also tried to repro the LLVM out of memory on the i686 windows build locally with cygwin, and could not.

@c42f
Copy link
Member Author

c42f commented Jun 18, 2023

I think this is good to go. I'll do one last nanosoldier run and have a look through those results before merging.

@nanosoldier runtests()

@c42f c42f mentioned this pull request Jun 18, 2023
@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected.
A full report can be found here.

@oscardssmith oscardssmith merged commit 82ab124 into master Jun 19, 2023
@oscardssmith oscardssmith deleted the cjf/juliasyntax-stdlib branch June 19, 2023 23:08
@oscardssmith
Copy link
Member

Thank you so much @c42f for all the work to make this a reality!

@c42f
Copy link
Member Author

c42f commented Jun 20, 2023

Ah. I must admit I was really really looking forward to pushing the merge button on this myself. It kind of felt symbolic given this was like 1.5 years of work. (Not if it meant missing feature freeze, but that's tomorrow morning my time IIUC?)

I was just preparing a final minor bugfix based on the latest nanosolider run. I'll make a separate PR for that.

Thanks for the enthusiasm anyway!

@aviatesk
Copy link
Sponsor Member

I think there may be some elements, such as the method of merging, that would be better handled by @c42f . If you feel it's preferrable, we wouldn't mind if the PR were to be temporarily reverted and reland it quickly with minor fixes, what are your thoughts? When it comes to major PRs like this one, it often seems best if the person who created the PR decides matters such as whether to squash the commits. Regarding the feature freeze, it's set to be extended until the 27th of June, so there's no need to get hurry just yet:)

@c42f
Copy link
Member Author

c42f commented Jun 20, 2023

At this point I'd rather not revert this - that will just cause extra churn. I think it'll be fine just to put a separate PR in for any minor fixes because this was very, very close and the remaining bug I know about doesn't seem very disruptive.

Luckily I'd already rearranged the commits into logical order and done a bunch of squashing so that side of it was in pretty good shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Language parsing and surface syntax
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Julep: Rewrite Julia Parser in Julia