-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedding Markup inside markup #60
Comments
I confess that I'm a little confused by this question. By default, everything inside a So I would have expected that your first “invalid code” example would generate an AST something like: RakuAST::Doc::Block.new(
type => "code",
paragraphs => (
"1. an entity E<",
RakuAST::Doc::Markup.new(
letter => "B",
opener => "<",
closer => ">",
atoms => ( "this is fallback |" )
),
" raquo>\n2. an indexed item X<",
RakuAST::Doc::Markup.new(
letter => "B",
opener => "<",
closer => ">",
atoms => ( "this is display text |" )
),
"one, two>\n3. an alias A<",
RakuAST::Doc::Markup.new(
letter => "B",
opener => "<",
closer => ">",
atoms => ( "this is fallback |" )
),
"ALIAS_NAME>\n"
)
) ...which I would have thought would be relatively easy to render correctly. Is that not the AST you get? (BTW, I certainly agree that, if the example code were actual raw RakuDoc, rather than the contents of a |
@thoughtstream @lizmat See the snippet for what we get at the present:
|
Question: does the |
Your assumption is correct. An interior Though, of course, it might still have special meaning to another formatting code that is itself contained within the
|
Right, but Right? |
If it's actual independent RakuDoc source code, yes, that's correct: But the case we're dealing with here is when something like that is inside a
In this case, the entire construct is perfectly valid, because that's not actually an As far as RakuDoc is concerned, in terms of things that are special inside the
(where |
@thoughtstream I get the feeling that
I think that is correct code. The idea is to create a clickable link to header2 inside a code block, whilst also indexing the Alias. But no Alias is actually created. The renderer would process B L X and V, but not A. The question then is what happens when any of the letters BLXV are then taken out of the allow list? |
@finanalyst, you raise some good points. My view of the matter is as follows... Normally, everything inside a In other words, the default rules to parse a rule code-block {
^^ \h* '=code' >>
<code-contents>
<blank-line>
|
^^ \h* '=for' 'code' <metaoption>* \n
<code-contents>
<blank-line>
|
^^ \h* '=begin' 'code' <metaoption>* \n
<code-contents>
^^ \h* '=end' 'code' \h* \n
}
token code-contents {
.*?
}
token blank-line {
^^ \h* $$
} But if an rule code-block {
^^ \h* '=code' >>
<code-contents>
<blank-line>
|
# Capture :allow option separately...
^^ \h* '=for' 'code' [ <allow-option> | <metaoption>]* \n
# Then pass the allowed values to the contents parser...
<code-contents($<allow-option><value>)>
<blank-line>
|
# And the same here...
^^ \h* '=begin' 'code' [ <allow-option> | <metaoption>]* \n
<code-contents($<allow-option><value>)>
^^ \h* '=end' 'code' \h* \n
}
token code-contents ($allowedoption) {
[
@($allowed.words) # Match any of the allowed format code letters
'<' # ...then the left delimiter
<code-contents($allow-option)> # ...then any nested code contents
'>' # ...then the right delimiter
|
. # Or else any single non-special character
]*?
} In other words, after we parse an Of course, in the real parser, the parsing of allowed formatting codes would have to But the above example (which, BTW, I haven’t verified!) should at least illustrate the Having said that, I begin to wonder whether So now I’m wondering whether, instead of a full In which case, our problematical example would become:
Now, since rule formcode-block {
^^ \h* '=formcode' >>
<formcode-contents>
<blank-line>
|
^^ \h* '=for' 'formcode' <metaoption>* \n
<formcode-contents>
<blank-line>
|
^^ \h* '=begin' 'formcode' <metaoption>* \n
<formcode-contents>
^^ \h* '=end' 'formcode' \h* \n
}
token formcode-contents ($allowedoption) {
[
# Fixed set of permitted formatting codes (possibly nested)...
<[BHIJKORTUV]> '<' <formcode-contents> '>'
|
# Anything else...
.
]*?
} It wouldn’t be as powerful or as flexible as the The only real downside is that, if you didn’t want one or more of the For example:
...would then have to be written:
That would probably be more annoying (and error-prone) for those of us who really like to mark-up our code, |
Suppose we simply restrict allow to the format codes? They by definition
have no internal structure.
…On Tue, 26 Nov 2024, 04:40 thoughtstream, ***@***.***> wrote:
@finanalyst <https://github.com/finanalyst>, you raise some good points.
My view of the matter is as follows...
Normally, everything inside a code block is treated as
*“something foreign to RakuDoc so we don’t care about the internal
structure of it”*. Which means those contents
need not be parsed at all...just matched with a minimal .*?.
In other words, the default rules to parse a code block would be
something like:
rule code-block {
^^ \h* '=code' >>
<code-contents>
<blank-line>
|
^^ \h* '=for' 'code' <metaoption>* \n
<code-contents>
<blank-line>
|
^^ \h* '=begin' 'code' <metaoption>* \n
<code-contents>
^^ \h* '=end' 'code' \h* \n
}
token code-contents {
.*?
}
token blank-line {
^^ \h* $$
}
But if an :allow is one of the metaoptions, then the value of that
metaoption
is supposed to configure the way contents are matched, so the parser needs
to be somewhat more sophisticated. Something like:
rule code-block {
^^ \h* '=code' >>
<code-contents>
<blank-line>
|
# Capture :allow option separately...
^^ \h* '=for' 'code' [ <allow-option> | <metaoption>]* \n
# Then pass the allowed values to the contents parser...
<code-contents($<allow-option><value>)>
<blank-line>
|
# And the same here...
^^ \h* '=begin' 'code' [ <allow-option> | <metaoption>]* \n
<code-contents($<allow-option><value>)>
^^ \h* '=end' 'code' \h* \n
}
token code-contents ($allowedoption) {
[
@($allowed.words) # Match any of the allowed format code letters
'<' # ...then the left delimiter
<code-contents($allow-option)> # ...then any nested code contents
'>' # ...then the right delimiter
|
. # Or else any single non-special character
]*?
}
In other words, after we parse an :allow option we pass that option’s
values into the
code-contents parser, so that it can parse those particular formatting
codes specially.
Of course, in the real parser, the parsing of allowed formatting codes
would have to
be more sophisticated, to account for the different structures of various
formatting
codes, and to allow for different delimiters than just a single <..> pair.
But the above example *(which, BTW, I haven’t verified!)* should at least
illustrate the
approach I had presupposed for parsing code blocks with :allow exceptions.
Having said that, I begin to wonder whether :allow is just to difficult
to *(ahem)* allow.
In reality, the only formatting codes people are likely to actually want
to place inside
a code block are: B<>, I<>, U<>, H<>, J<>, T<>, K<>, O<>, R<>, and V<>.
All of which have no special internal structure.
So now I’m wondering whether, instead of a full :allow mechanism, what if
we just
defined a second kind of code block (perhaps a formcode block), which
automatically
allows *all* of those formatting codes.
In which case, our problematical example would become:
=formcode
1. an entity E<B<this is fallback |> raquo>
2. an indexed item X<B<this is display text |>one, two>
3. an alias A<B<this is fallback |>ALIAS_NAME>
Now, since E<>, X<>, and A<> are never special in a formcode block,
there’s no need for special parsing of them. Or, in other words,
the rules for parsing a formcode can be hardwired, without any
run-time reconfiguration of the contents parser:
rule formcode-block {
^^ \h* '=formcode' >>
<formcode-contents>
<blank-line>
|
^^ \h* '=for' 'formcode' <metaoption>* \n
<formcode-contents>
<blank-line>
|
^^ \h* '=begin' 'formcode' <metaoption>* \n
<formcode-contents>
^^ \h* '=end' 'formcode' \h* \n
}
token formcode-contents ($allowedoption) {
[
# Fixed set of permitted formatting codes (possibly nested)...
<[BHIJKORTUV]> '<' <formcode-contents> '>'
|
# Anything else...
.
]*?
}
It wouldn’t be as powerful or as flexible as the :allow option,
but it might be a whole lot simpler to implement (and to use).
The only real downside is that, if you *didn’t* want one or more of the
permitted formatting codes to actually behave like a formatting code
(*i.e.* you wanted it to just be literal contents), you’d have to escape
that particular code with a V<>.
For example:
=begin code :allow< B R > :lang<raku>
sub demo {
B<say> 'Hello R<name>';
I<note> 'The I format is not recognised';
U<warn> 'The U format is not recognised either';
}
=end code
...would then have to be written:
=begin formcode :lang<raku>
sub demo {
B<say> 'Hello R<name>';
V<I><note> 'The I format is not recognised';
V<U><warn> 'The U format is not recognised either';
}
=end code
That would probably be more annoying (and error-prone) for those of us who
really like to mark-up our code,
but maybe that occasional inconvenience would be an acceptable price to
pay in order to get the feature at all.
—
Reply to this email directly, view it on GitHub
<#60 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACYZHB53DTKPQLB76NAWG32CP3SFAVCNFSM6AAAAABSLGJDJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJZGY2DENRTG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I'd be perfectly fine with specifying that |
There might be evil edge cases, but Rainbow (a mere highlighter, not a full parser) has just gained support for But I do wonder, if the parser used to parse |
@patrickbkr It is the evil edge cases that cause the problems. :)
|
Yeah. That issue I don't have. In Rainbow I only have to spit out a flat list of tokens, not at tree. As such I don't care about composability. Rainbow does it like this:
This does get tricky when the output is a tree and not a flat token list, because you then need to untangle partially overlapping nodes. I.e. |
I'm closing this issue here but mentioning it in the suggestions for V3. The specification has some commented out examples related in some way to this issue, but not quite. |
@thoughtstream @lizmat This issue arises with
:allow
and in particular some examples in the RakuDoc specification.Several markup codes, such as E and X have more complex structures. The examples use
B<>
that then create invalid code.Consider
The first two are difficult to render, but the third will render easily. The problem is that the once the
B<...>
is rendered into a string - which may include extra elements for the output format, and then replaced into the embedding markup, the contents of embedding markup is no longer valid, and the inner structure of the contents is lost.The following, however, seem to be OK because whatever is on the left of the
|
is a string.@thoughtstream is this analysis correct?
FYI, We are close to removing most of the remaining RakuDoc issues.
The text was updated successfully, but these errors were encountered: