MyST Specification (IN-DEVELOPMENT)

A CommonMark compliant AST format for the MyST specification.

This is some initial work on a specification for the MyST syntax.

The package currently contains functions to:

Convert CommonMark to mdast, via parsing with Markdown-it
Convert the mdast to CommonMark compliant HTML (tested against https://spec.commonmark.org/0.30/spec.json)

$ pip install .
$ myst-spec --help
usage: myst-spec [-h] COMMAND ...

MyST Specification tools.

optional arguments:
  -h, --help  show this help message and exit

Commands:
    to-mdast  Convert CommonMark to MDAST JSON.
    to-html   Convert CommonMark to HTML.

$ echo "hallo" | myst-spec to-mdast
{"type": "root", "children": [{"type": "paragraph", "position": {"start": {"line": 1, "column": 1}, "end": {"line": 2, "column": 1}}, "children": [{"type": "text", "value": "hallo"}]}]}

$ echo "hallo" | myst-spec to-html
<p>hallo</p>

This can then be extended, to include the MyST syntax nodes.

The CommonMark Specification

The creation of commonmark-spec represented a great step forward in Markdown standardisation. However, the current specification only specifies the expected HTML output, which conflates two aspects of markup language processing:

The reading of the source input
The writing of the output format

There are other aspects of Markdown processing that would benefit from such a specification, such as:

Output to other formats than HTML
Syntax highlighting of the source text
Language Server Protocol integration

This would promote interoperability between different implementations for reading and processing of Markdown.

Note, there is an open issue (#274), suggesting an XML specification, but this discussion has not been re-visited since 2017.

Design decisions

The format should be language agnostic.
- A program written in any programming language should be able to generate the AST, then offload to a different language for processing.
The format should be extensible.
- The format should allow for new syntax types to be added, and not hard-code to only the CommonMark types.
- Not all processor may be able to handle extended syntax types, but they should be able to "fail gracefully"
- An example of this would be to allow for the GitHub Flavored Markdown extensions
The AST format should be lossless.
- The AST should be able to be converted back to the source text, without loss of syntax information.
- Note, this does not mean that round-trip conversion should be "byte equivalent", just that it will produce again the same AST.
- Line/column information, for example, would not be preserved.
The format should allow incremental parsing.
- This would allow for sub-parsing of modified document, without having to re-parse the entire document.

Inspiration also taken from:

Markdown-it tokens
Docutils doctree's
Pandoc JSON AST
https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocuments
agoose77/jupyterlab-markup#12

Markdown-It to MDAST

Markdown-it-py is used as the parser here, since it is what we currently use for MyST-Parser. It is the best Python Markdown parser I know of:

It is pure-python
It is fast
It is CommonMark compliant
It captures source line number information
It is easy to extend by plugins

However, it is not actually the ideal reference implementation, since it does not capture source column position information (currently we just always set 1), or specific line information for inline nodes. Also, the conversion here is not currently supported by the Markdown-IT JavaScript implementation, since we utilise the store_labels and inline_definitions options, which are only implemented in markdown-it-py.

Notes

A general issue with CommonMark, is that (inline) link/image references are only recognised if the (block level) definitions have already been parsed. This is an issue for incremental parsing, since we wold need to parse all the definitions first, if we were to allow them at "any level".
docutils records the source for every node, since it may be different to the parent document, if using the include directive.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
docs		docs
schema		schema
src/myst_spec_py		src/myst_spec_py
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MyST Specification (IN-DEVELOPMENT)

The CommonMark Specification

Design decisions

Markdown-It to MDAST

Notes

About

Releases

Packages

Languages

License

chrisjsewell/myst-spec

Folders and files

Latest commit

History

Repository files navigation

MyST Specification (IN-DEVELOPMENT)

The CommonMark Specification

Design decisions

Markdown-It to MDAST

Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages