Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roundtrip implementation #481

Merged
merged 124 commits into from
Mar 7, 2021
Merged

Roundtrip implementation #481

merged 124 commits into from
Mar 7, 2021

Conversation

generateui
Copy link

Implements a CST aka LST (Complete Syntax Tree aka Lossless Syntax Tree).
Please keep the following in mind:

  • some changes have broken Markdig parser
  • tabs not yet implemented
  • some hacky code here and there
  • performance not taken into consideration/no perf tests run on this

As such, this is a very modest first attempt to learn if a CST can be implemented in markdig while keeping the current focus on performance. I'd be happy to hear feedback about

  1. is having CST support desirable
  2. what are pain points currently in included changes
  3. any tips on moving forward

Any feedback is highly appreciated.

For context, the usecase for this is creating bots that automatically enhance and fix markdown documentation. Think creating links automatically, fixing links automatically, inserting generated blocks of markdown code, et cetera. This requires a markdown renderer that does not change any input markdown as much as possible in the output markdown.

@generateui generateui changed the title Cst implementation [WIP] Cst implementation Oct 3, 2020
@xoofx
Copy link
Owner

xoofx commented Oct 4, 2020

Indeed, Markdig was not built with that requirement but we have seen many cases where users would have loved to do roundtrip modifications to a Markdown document, so thanks for starting to work on this!

So far, the changes make senses. I will just add a couple of remarks and requirements:

  • It's a huge amount of work because storing precisely spaces information (e.g need to store precisely which spaces characters were used) in every details will be a daunting detailed task. Expect several weeks of work, full-time. I also can't participate to the development due to limited sparetime availability for this project. I won't be able to finish the PR if it is stopped in the middle, but I should be able to review it once in a while you have made progress.
  • Don't use the term Cst or Lst in the code. Similar to the fact that we are not using the term Ast in the code.
  • Focus on the CommonMark part before working on any extensions.
  • The NormalizeRenderer would be the MarkdownRenderer (couldn't use this name because of the interface name and it would have been confusing) that performs no modifications on the Markdown. Any normalization could be done in a separate pass on the Ast.
  • Because of performance, this feature must be optional. By default, when performing a Markdown to HTML, the code should not involve more allocations than it should and should stay within the same CPU usage. It will allocate a bit more for sure (many of the nodes will have to store for details), but we should try to minimize that. Ideally, this should not modify the performance drastically when turned off (e.g within 20%) while it can be e.g 2 times slower if it is turned on. It's important to keep that as a primary requirement because it will make your work more difficult but I wouldn't be able to merge it if we regress too much in terms of performance. This feature is only interesting for users that would like to do roundtrip modifications while the vast majority of users are just looking for the md2html feature.

I'm gonna add a few comments on the current PR. Ping me in the PR whenever you have made enough progress and you want another checkpoint.

Good luck with the work!

src/Markdig.Tests/TestCst.cs Outdated Show resolved Hide resolved
Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
src/Markdig/Markdown.cs Outdated Show resolved Hide resolved
@generateui
Copy link
Author

generateui commented Mar 4, 2021

Update:

  1. docs
  2. NoBlocksFoundBlock
  3. handle review feedback
  4. TODO: RTP
  5. Assertion fail on a larger dataset (I cloned 500 repos, and got some assertion fails when parsing all markdown)
  6. fix CI
  7. optional: tree tests and IEqualityComparers
  8. optional: dive into some edge cases
  9. 🆕: Rename whitespace to trivia where possible
  10. fix ContainerBlocksBlockQuotes_Example210

🎉🥳 DONE! As far as I can see, all items are done and this PR is ready to merge.

@MihaZupan
Copy link
Collaborator

I'll do a once-over tomorrow.

Comment on lines +18 to +34
RoundTrip(value, expected);
}

// this method is copied intentionally to ensure all other tests
// do not unintentionally use the expected parameter
private static void RoundTrip(string markdown, string expected)
{
var pipelineBuilder = new MarkdownPipelineBuilder();
pipelineBuilder.EnableTrackTrivia();
MarkdownPipeline pipeline = pipelineBuilder.Build();
MarkdownDocument markdownDocument = Markdown.Parse(markdown, pipeline);
var sw = new StringWriter();
var rr = new RoundtripRenderer(sw);

rr.Write(markdownDocument);

Assert.AreEqual(expected, sw.ToString());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use TestRoundtrip.TestSpec instead?

/// </summary>
/// <param name="pipeline">The pipeline.</param>
/// <returns>he modified pipeline</returns>
public static MarkdownPipelineBuilder EnableTrackTrivia(this MarkdownPipelineBuilder pipeline)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static MarkdownPipelineBuilder EnableTrackTrivia(this MarkdownPipelineBuilder pipeline)
public static MarkdownPipelineBuilder TrackTrivia(this MarkdownPipelineBuilder pipeline)

Comment on lines +51 to +55
/// <summary>
/// True to parse trivia such as whitespace, extra heading characters and unescaped
/// string values.
/// </summary>
public bool TrackTrivia { get; set; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// <summary>
/// True to parse trivia such as whitespace, extra heading characters and unescaped
/// string values.
/// </summary>
public bool TrackTrivia { get; set; }
internal bool TrackTrivia { get; }

MarkdownPipeline should have no way of changing after creation.

/// string values.
/// </summary>
public bool TrackTrivia { get; set; }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being used at all? I assume it can be removed.

Copy link
Collaborator

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments.

There's a few outstanding comments that should also be resolved
#481 (comment)
#481 (comment) (missing Or)
#481 (comment)
#481 (comment)

Newline.cs should be renamed to NewLine.cs, NoBlocksFoundBlock.cs to EmptyBlock.cs

src/Markdig/Parsers/QuoteBlockParser.cs Outdated Show resolved Hide resolved
src/Markdig/Renderers/Roundtrip/CodeBlockRenderer.cs Outdated Show resolved Hide resolved
src/Markdig/Renderers/Roundtrip/ThematicBreakRenderer.cs Outdated Show resolved Hide resolved
src/Markdig/Syntax/IBlock.cs Outdated Show resolved Hide resolved
@generateui
Copy link
Author

Thanks for taking a look again, MihaZupan! There were quite some small things I could fix for the better.

Let's merge this one :)

@xoofx xoofx merged commit 8b48acc into xoofx:master Mar 7, 2021
@xoofx
Copy link
Owner

xoofx commented Mar 7, 2021

It's merged! Thanks for all the massive work @generateui that you put on this PR, really appreciated. 👍

@generateui
Copy link
Author

It's merged! Thanks for all the massive work @generateui that you put on this PR, really appreciated. 👍

w00t! It was a fun and lengthy challenge. Thanks for your collab, @xoofx and @MihaZupan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants