Skip to content
This repository has been archived by the owner on Sep 21, 2023. It is now read-only.

Org mode parser #1

Open
schoettl opened this issue May 29, 2021 · 3 comments
Open

Org mode parser #1

schoettl opened this issue May 29, 2021 · 3 comments

Comments

@schoettl
Copy link

Hi Gerry, I stumbled upon your post mentioning tree-sitter. Now I just found your tree-sitter repo.

We at org-parser have a pretty much complete EBNF grammar of the org syntax. Maybe it can be converted with some tool and kept in sync for the tree-sitter grammar? I don't know anything about the grammar required by tree-sitter, but maybe these projects can work together?

The raw parse tree from org-parser will be transformed to a simpler AST, but for tree-sitter it might already be sufficient. Although, not all styled (esp. nested styled) text in org-mode is recognized by our parser because it's hard or impossible with only EBNF.

@gagbo
Copy link
Owner

gagbo commented Jun 1, 2021

Disclaimer : I have 0 experience in parsing/lexing.

I’m trying to work on multiple fronts, this parser in particular is probably not going to see any work, especially as I have other projects and I’m getting a little frustrated with this effort.
I know about that EBNF grammar, and I know that org-mode isn’t really the most compliant grammar for this form, so that’s why I wanted to see if there’s any way to make it better.
Then I saw @tgbugs work on Laundry (through a message on the mailing list), which seems to actually do exactly what I want : make as much parsing as possible with a parser tool, and add the last stateful touches in the programming language that ships with the parser.
I’m trying to implement this approach in tree-sitter, mostly here : https://github.com/gagbo/tree-sitter-org/blob/laundry_parser/grammar.js (and I would try to finish the parsing using the C++ state a TS parser can use). Then again, I can’t say at all when (or if) I’ll be able to work on it, it’s quite taxing and I’m really not advancing at a rate that motivates me.

I guess that last paragraph was more of a "my current state of affairs", and I just want to finish this by suggesting to port all org grammar discussions on the mailing list. Tom (@tgbugs) sent a message there to try and spark a conversation, and I’m pretty sure that Nicolas Goaziou would be glad to particiate there and give insights to go on with grammar efforts for Org-mode

Best regards,
Gerry

@schoettl
Copy link
Author

schoettl commented Jun 1, 2021

Thanks Gerry, for the insights and links. Wow, you already have defined a lot grammer in your tree-sitter-org! It would be cool if it could all be generated and kept in sync with the grammer from org-parsers or from tgbugs laundry.

@tgbugs
Copy link

tgbugs commented Jun 1, 2021

@gagbo thanks for the @ and also wow :). I'm in the process (https://github.com/tgbugs/laundry/tree/next) of revising my original draft to remove as much of the ambiguity from the grammar as I can, and am happy to help keep things in sync as it evolves. @schoettl I think it is a bit premature, but down the road a bit I think reconciling all of these different approaches using a set of common test cases would be extremely productive (re: https://orgmode.org/list/CA+G3_PNksnSf=P_-4UQ_bBASiA9fpgeN_jwJT8rRmP6XJGPGLg@mail.gmail.com/).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants