-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenizers lex their own child tokens #2124
Tokenizers lex their own child tokens #2124
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/markedjs/markedjs/wA4Xa4rc9Kz4J8GUSfwU33a5zJYR |
I like that idea. Then extensions can use those properties as well if they need to.
Ya, The way I see it is the Lexer handles the overall state (including which order to call the tokenizers) and the tokenizers handle everything about creating the token (including children). Currently the tokenizers handle everything about creating the token except for the children but as we see in #2112 (comment) sometimes the tokens need to know the children to create the token. |
Alright. I've added this now. Just need to finish moving |
@UziTech New issue I'm running into in moving
You end up with trailing/starting spaces between the two
This can also lead to some child tokens not being detected because the ending half is in the other token before merging. Any ideas? |
nptable is 95% identical to table, except for handling a weird special case. Special case now handled in splitCells()
Would it be feasible to only do inline tokens for |
I might be able to manage the other ones. I'm pretty sure I can get Tables at least. It would be kind of inconsistent but it might be something we just need to leave until a deeper revision down the road. |
@UziTech Sigh... new problem. Reflinks that are defined after they are referenced don't get handled since they aren't registered to the
I'm stumped on this one. How do we approach this? It seems like we would have to make a first pass just to get the link definitions before parsing anything else. |
Hmm. Maybe we need to have a property on the token that is a function that the lexer will call to parse the inline tokens after all block tokens are handled? Not sure what that would do with the benchmarks but It might slow it down too much. Right now the extensions parse their inline tokens right away but we might want them to wait for all block tokens as well in case they need reflinks. |
or instead of tokenizers calling // in lexer
inline(src, tokens) {
this.inlineQueue.push({src, tokens});
} then after the block tokens are complete we would run while (const next = this.inlineQueue.shift()) {
this.inlineTokens(next.src, next.tokens);
} |
We could do something like that. That might also better work with paragraphs and text that can't inline until they are merged. The only problem I see with that, though, is the original problem of #2112 (comment) is broken again, since the parent token won't be able to see its child tokens until its too late. |
I guess it would have to be a function to accomplish setting token properties based on children. Or should that be what walkTokens is for? |
Maybe we could use walktokens. I'm a little hesitant to break token function apart further but that might make sense. Hm... What about handling it in the Renderer? The token can organize all the text info but the Renderer decides how to render it based on what the children look like? Meh... |
I updated the documentation. Double-check it for typos if you like. I think everything else up to this point is resolved now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! 💯
Do we want to close #2126 since this PR also removes the nptable tokenizer? or this one will have to be rebased after that one is merged. |
Yep, did it just now. 👍 |
#2112 might need tweaking now, but it should go out in the same major version bump after this is merged. |
@davisjam @joshbruce @styfle Is anyone able to take a look at this one? I'm eager to get this one merged! |
Hm, do we need to add to the documentation mention of the |
There are the following breaking changes that I can think of in this PR:
Am I missing any? |
That wouldn't hurt but it could be done in a separate PR. |
Just that some function signatures have changed. If people were overwriting any of those functions they might not work anymore. I guess that's kind of covered by the "tokenizers left their own child tokens" though. Edit: nevermind you got those listed already. |
One minor thing is the naming of Might make it easier for users to understand when to use each one. |
I don't think We could change them to |
🎉 This PR is included in version 3.0.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Based on the conversation in #2112 (comment), this is the start of an attempt to have the Tokenizers handle the lexing of their own children tokens, rather than making the Lexer.js do it.
For block tokens this was relatively simple. For inline tokens it's also not a huge issue, except for the ugliness that comes with passing in
inRawBlock
andinLink
to a bunch of the Tokenizers since it kind of muddies up the legibility in what the Tokenizers are actually doing. Passing those values around seems like a code smell we could avoid but I don't know how, or how much those variables actually need to be passed around. Any thoughts? I wanted to get some early feedback before going through the whole thing.Edit : What about refactoring the
inLink
andinRawBlock
flags to instead be properties of the Lexer? I.e. in the constructor:And a second question: do we want the logic of
inline()
fromLexer.js
to also be handled by the Tokenizers themselves?Contributor
Committer
In most cases, this should be a different person than the contributor.