Skip to content
forked from am-kantox/md

Stream-aware markdown parser with custom syntax setting

License

Notifications You must be signed in to change notification settings

christianbourgeois/md

 
 

Repository files navigation

Md Logo Kantox ❤ OSS  Test  Dialyzer

Stream markup parser, extendable, flexible, blazingly fast, with callbacks and more, ready for markdown…


Main Focus

This library is not yet another markdown parser, rather it’s a highly configurable and extendable parser for any custom markdown-like markup. It has been created mostly to allow custom markdown syntax, like ^foo^ for superscript, or ⇓bar⇓ for subscript. It also supports custom parsers for anything that cannot be handled with generic parsers, inspired by markdown (something more complex than standard markdown provides.)

The library provides callbacks for all the default syntax handlers, as well as for custom handlers, allowing the on-fly modification of what’s currently being processed.

Md parses the incoming stream once and keeps the state, producing an AST of the input document. It has an ability to recover from errors collecting them.

It currently does not support (and I frankly doubt it ever will) lists with embedded quotes, and other contrived syntax. If one needs to perfectly parse the common markdown, Md is probably not the correct choice.

But if one wants to easily extend syntax almost without limits, Md might be good.

Markup Handling

There are several different syntax patterns recognizable by Md. Those are:

  • custom — the custom parser implementing Md.Parser behavious would be called
  • substitute — simple substitution, like "<" → "&lt;"
  • escape — characters to be treated as is, not as a part of syntax
  • comment — characters to be treated as a comment, discarded in the output
  • flush — somewhat breaking a paragraph flow, like triple-dash
  • magnet — the markup for a single work following the patters, like #tag
  • block — the whole block of input treated distinguished, like triple-backtick
  • shift — the same as block, but the opening marker should precede each line and "\n" is treated as the closing marker
  • pair — the opening marker followed by closing marker, and a subsequent pair of opening and closing, like ![name](#anchor); the second element might be an internal shortcut to the deferred disclosure
  • disclosure — the disclosure of elements previously declared as pair with deferred parameter provided
  • paragraph — a header, blockquote, or such, followed by a paragraph flow break
  • list — a list, like - one\n-two
  • tag — allowed tags (e. g. <sup>2</sup>)
  • brace — a most common markdown feature, like text decoration or such (e. g. **bold**)

Syntax description

The syntax must be configured at compile time (because parse/2 handlers are generated in compile time.) It is a map, having settings key

settings: %{
  outer: :p,
  span: :span,
  empty_tags: ~w|img hr br|a
}

and key ⇒ list_of_tuples key-values, providing a text markup representation and its handling rules. Here is the excerpt from the default parser for braces

  brace: %{
    "*" => %{tag: :b},
    "_" => %{tag: :i},
    "**" => %{tag: :strong, attributes: %{class: "nota-bene"}},
    "__" => %{tag: :em},
    "~" => %{tag: :s},
    "~~" => %{tag: :del},
    "`" => %{tag: :code, mode: :raw, attributes: %{class: "code-inline"}}
  }

For more examples of what properties are allowed for each kind of handlers, see the sources (ATM.)

Predefined parsers

Md comes with a generic predefined parser Md.Parser.Default, which includes all the markup currently supported by Md.

Custom parser definition would be usually based on Md.Parser.Syntax.Void syntax as shown below

defmodule MyParser do
  use Md.Parser

  alias Md.Parser.Syntax.Void

  @default_syntax Map.put(Void.syntax(), :settings, Void.settings())
  @syntax @default_syntax |> Map.merge(%{
    comment: [{"<!--", %{closing: "-->"}}],
    paragraph: [
      {"##", %{tag: :h2}},
      {"###", %{tag: :h3}},
      {">", %{tag: :blockquote}}
    ],
    list:
      [
        {"- ", %{tag: :li, outer: :ul}},
        {"+ ", %{tag: :li, outer: :ol}}
      ]
    brace: [
      {"*", %{tag: :b}},
      {"_", %{tag: :i}},
      {"~", %{tag: :s}},
      {"`", %{tag: :code, mode: :raw, attributes: %{class: "code-inline"}}}
    ]
  })
end

@syntax module attribute must be declared, or DSL used as shown below (declarations), or an argument in a call to use Md.Parser. The separate declarations will be collected and merged.

defmodule MyDSLParser do
  @my_syntax %{brace: [{"***", %{tag: "u"}}]}
  
  use Md.Parser, syntax: @my_syntax
  import Md.Parser.DSL

  comment "<!--", %{closing: "-->"}
  ...
end

Instead of @syntax module attribute, one might use

  • a parameter to use Md.Parser as use Md.Parser, syntax: map()
  • a DSL like paragraph {"#", %{tag: :h1}}.

Changelog

  • 0.9.4 adds class: "empty-anchor" to tags expecting attributes to be set, fixed 🐜 with nested brackets
  • 0.9.3 adds class: "empty-tag" to empty tags to allow their display supression
  • 0.9.2 nested tags shallow support, default support for <dl>
  • 0.9.1 accept config :md, :httpc_options config used with Floki in TwitterCard/OG retrieval
  • 0.9.0 use unicode_set instead of string_naming by default for guards
  • 0.8.5 advanced terminators: in magnet
  • 0.8.4 Md.Parser.Syntax.merge/2
  • 0.8.2 walker: and parser: options in a call to generate/2
  • 0.8.0 Md.Transforms.Anchor supporting twitter/og cards via Floki
  • 0.7.4 configurable linebreaks (defaults: \n, \r\n)
  • 0.7.1 Md.Guards through StringNaming
  • 0.7.0 allow payload to be passed through within state
  • 0.6.8 xml_builderxml_builder_ex (default formatting for code is now :none)
  • 0.6.0 allow HTML tags (without attributes yet)
  • 0.5.0 DSL + use Md.Parser, syntax: … for configurable syntax
  • 0.4.0 configurable syntax as @syntax
  • 0.3.0 relaxed support for comments and tables
  • 0.2.1 deferred references like in [link][1] followed by [1]: https://example.com somewhere
  • 0.2.0 PoC, most of reasonable markdown is supported

Installation

def deps do
  [
    {:md, "~> 0.1"}
  ]
end

About

Stream-aware markdown parser with custom syntax setting

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Elixir 100.0%