-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Emphasis/strong] Consolidate issues #1036
Comments
I apologize in advance for the following rant. Generally speaking, our emphasis parsing is quite broken with respect to commonmark. The fact is, the spec is really a p. of s. because they wrote it with the goal of parsing any possible combination of multiple/nested/escaped/unescaped underscores and/or asterisks any monkey could type on a keyboard. So much so that they came up with idiotic examples such as:
So, really, really enlightening stuff.
We have a couple alternatives:
Now, marked was built with speed in mind. The bulk of the parsing is done by regex rules and it should be as "greedy" as possible. So, no backtracking, very little to no manual fiddling with the source. If we did alternative 2, we'll have to break this "rule". There's no way to parse emphasis as the spec dictates using only regular expressions. My guess is a recursive algorithm would be needed. I can't really predict the performance of that, but I don't want to write it and then scrape it because it's too slow, or causes stack overflow on pathologically long inputs. |
Agreed. Emphasis and strong are the tough ones because of the overemphasis possibilities from the HTML spec not translating well to the plain text world. Like you point out. I mean what is this supposed to end up as:
I also agree on the beauty and elegance of Markdown and, honestly, I can't imagine someone designing a written document that would have more than three consecutive asterisks or underscores. Bold, italic, bold + italic.
At which point, maybe we can take a note from one of Kevlin Henney's FizzBuzz examples?? Do all the asterisks or underscores as
I'm not a big regex person; so, not sure how to help here. I thought regex had a recurse-type option to it. For this, I'm definitely leaning toward solving the most likely cases (one, two, three) and giving a caveat in the docs...if someone can solve that problem while maintaining the spirit and principles of Marked - great; otherwise, some of those cases in the spec just seem like someone making things complicated based on the HTML spec. |
?what's this about? 🤔
Well, there are different implementations of regexes, and almost none of them support recursion. I'd like to give the possible recursive algorithm some more thought and then decide whethere to implement it, or just monkey patch regexes until all test cases we deemed interesting pass. (but I prefer having a clean recursive algorithm than an ugly 2-rows long regex that no one can understand, not even me). Now that I think about this? When are we gonna tackle performance? Have you got a milestone in mind for that? I'd do it after fixing issues and improving the architecture, but not much late in the process. I think our goal should be to take on commonmark.js and especially markdown-it from the point of view of performance and extensiblity. They're both fast, but with #975 we still have an edge on them, even after my work on compliance with commonmark, that usually complicates rules. After that, I think it should be worth doing some work on profiling code and analyzing hot spots. |
Roman Numerals - not FizzBuzz - sorry: https://youtu.be/nrVIlhtoE3Y?t=38m18s (was thinking of the last one where you get all the ones and convert in passes...sarcasm was heavy in that) The question of peformance and things is a good one. Think once we get into developer exerience on extensibility the milestones will go away and the general idea will be to always be doing all of them continuously - fixing defects, making minor architecture changes, and improving the developer experience, which I put performance in with...faster processing == happier developers. :) (We do have that one PR we've been sitting on that has an apparent nice performance boost - just want to make sure it's working consistently as expected.) Thanks for all the work you've been putting into this by the way. You're awesome. |
ps. Would be nice if our benches could display the call stack in its entirety with the times for each thing...that way we can actually target areas for optimization instead of guessing at it. (A bit more empirical.) Maybe regex isn't the way to go in some cases because they are slower than a method-based approach, for example. Roping in @UziTech on that score since it's in the realm of testing and whatnot. As to taking someone else on - I tend to not worry about other people and just make myself the best version I can using the knowledge and skills I acquire over time. (I lean heavily on the collaborative side of Game Theory as opposed to the competitive. Not saying everyone else should but, between me wanting to help make Marked awesome by itself and others wanting to make it better compared to others...we should be good to go.) :) |
By "taking on" I meant trying to do better than them. Probably a language
mishap on my part there :)
|
lol - I think I was picking up what you were putting down there. Maybe the language failing is on my side (despite language used in #963). Think the comparison piece is more an after-the-fact discovery from just trying to make Marked the greatest parser it can be - not something I target others for during the process - it's not the thing that drives me. Making marked awesome and making the team, such as it is more awesome every day. :) |
Yep. I believe I'm not trying to get into a "competition" on who can do
more and faster, but I like having a baseline to compare my work with. When
I look at the alternatives I see a bunch of people doing an amazing work,
so if I'm willing to keep contributing to marked, I think it should be
worth my time.
|
Agree wholeheartedly! |
Sorry I haven't been as active in this project lately. I'm finishing up a big project at work and you know how it goes (90% of the project takes 90% of the time and the other 10% of the project takes up the other 90% of the time) so I should have more time in the near future. As far as benchmarks go, I think it would be a great idea to drill down into each benchmark test for some more insight into specific cases. Here is my idea of priorities:
I feel like we are still on I really like @Feder1co5oave's comment on #659 (comment) about not adding additional features that can be implemented by overriding the renderer. I would also like to see some of the options removed in v1.0 that could be implemented by overriding the rederer (e.g. |
@UziTech - The three of us are definitely on the same page, I think. Not sure how far out of the loop you've been - totally understandable (had a hiccup like that myself) - so, we have the three milestones; 0.4.0 is definitely number two in the list you provided. I do try to make sure the tests pass and the benches don't change dramatically before merging. The 0.5.0 milestone is the 4 and 5 in the list. Think this is when we would start implementing some of these things - if we can even do it; I know it's possible, just not sure with the overall setup. Definitely agree that we should consider deprecating some of these added features in favor of a more coherent extensibility strategy. My inner product owner (and UX brain) would love to find out how many people are using which features...make sure we have a transition strategy laid out for them. The notion of deep-dives into the benches would allow us to also empirically demostrate to the community of users why we are making some of these decisions; thereby, reducing feelings that we're just doing it because we want to. Think we would be using the 0.6.0 milestone for a lot of the performance-related decisions moving forward. Then, once things have calmed down, go ahead and make a 1.0 release. By then we should have the biggest hurdles overcome, a stable architecture going into the foreseeable future that is easy to change in future, and a crystalized product vision and scope. Thank you both so much. This is actually proving to be one of my favorite projects. I'm glad we've been able to revive it and work pretty seamlessly. (Was thinking it might be worthwhile to put together a Skype or Hangouts session going into 0.5.0 - gonna be some major changes there, might be good to use something other than text-based communication. Could tie in @Nalinc as well from the #746 issue. Just a thought and no pressure.) |
I was looking at an issue we were having with a BoldBoldItalicItalic combination. That is, "Bold" next to "BoldItalic" next to "Italic" or The following
|
@robertwbradford thanks for reporting this, it is one of those cases where commonmark parsers get it right and more legacy markdown.pl-inspired ones get it wrong: try it. However, just replacing the bin/marked
*test **test** test*
_test __test__ test_
<p><em>test </em><em>test</em><em> test</em></p>
<p><em>test </em><em>test</em><em> test</em></p> whereas it should be <p><em>test <strong>test</strong> test</em></p>
<p><em>test <strong>test</strong> test</em></p> |
No need to post that on every single issue.
|
@Feder1co5oave: Agreed. I'm actually trying to work up the case for moving the main repo to a different owner and demonstrate how effectively we are working as a team to make Marked awesome again...this gives the players involved a central starting place - I just wish I hadn't marked some of the other issues that aren't directly related to our efforts as a team. |
The text was updated successfully, but these errors were encountered: