How can I write custom parser to convert to link with prefix? #385
-
Hello and thank you for this amazing library. So I am trying the arena parser by adding hashtag link support to the markdown. My approach to adding the support is to convert the text starting with hashtag like use comrak::{
format_html,
nodes::{AstNode, NodeValue},
parse_document, Arena, Options,
};
use once_cell::sync::Lazy;
use regex::Regex;
pub static HASHTAG_REGEX: Lazy<Regex> = Lazy::new(|| Regex::new(r"(?m)#([\w\d_]+)").unwrap());
fn iter_nodes<'a, F>(node: &'a AstNode<'a>, f: &F)
where
F: Fn(&'a AstNode<'a>),
{
f(node);
for c in node.children() {
iter_nodes(c, f);
}
}
fn main() {
let arena = Arena::new();
let root = parse_document(
&arena,
"# welcomeee\n\n[#welcome_to_github](https://github.com/grindarius)\n\nmy guy",
&Options::default(),
);
iter_nodes(root, &|node| {
if let &mut NodeValue::Text(ref mut text) = &mut node.data.borrow_mut().value {
let original = std::mem::replace(text, "".to_string());
let tagged_text =
HASHTAG_REGEX.replace_all(&original, "<a href=\"/hashtags/$1\">#$1</a>");
*text = tagged_text.to_string();
}
});
let mut html: Vec<u8> = vec![];
format_html(root, &Options::default(), &mut html).unwrap();
println!("{}", String::from_utf8(html).unwrap());
} The problem is the output from this code from the given root string is
It looks like comrak escaped it again (which I think is correct). So my question is how can I create custom parser and parse parts of the |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
This will be relatively complicated to do with a post-processing step on the AST itself: you'd need to split the use comrak::{format_html, parse_document, Arena, Options};
use once_cell::sync::Lazy;
use regex::Regex;
use std::error::Error;
pub static HASHTAG_REGEX: Lazy<Regex> = Lazy::new(|| Regex::new(r"(?m)#([\w\d_]+)").unwrap());
fn main() -> Result<(), Box<dyn Error>> {
let arena = Arena::new();
let doc = "# welcomeee\n\n#welcome\n\nmy guy";
let tagged_doc = HASHTAG_REGEX.replace_all(&doc, "[#$1](/hashtags/$1)");
let root = parse_document(&arena, &tagged_doc, &Options::default());
let mut html = vec![];
format_html(root, &Options::default(), &mut html)?;
println!("{}", String::from_utf8(html)?);
Ok(())
} Output: <h1>welcomeee</h1>
<p><a href="/hashtags/welcome">#welcome</a></p>
<p>my guy</p> This lacks some of the safety of the more complicated solution — the regex could match in places you might not want it to. You could compromise halfway by first parsing, then replacing within |
Beta Was this translation helpful? Give feedback.
-
Sana all |
Beta Was this translation helpful? Give feedback.
This will be relatively complicated to do with a post-processing step on the AST itself: you'd need to split the
Text
node into separateText
andLink
nodes within the container, rather than just modifying theText
node itself. How about preprocessing the Markdown input instead?