Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd get_current returns empty if optional arguments are after mandatory ones #1983

Closed
poetaman opened this issue Mar 8, 2021 · 8 comments
Closed

Comments

@poetaman
Copy link

poetaman commented Mar 8, 2021

vimtex#cmd#get_current() explained in comment: #1981 (comment), returns an empty value if optional argument is after curly argument, unless the name of the macro is \begin.

Why distinguish between color of optional arguments of \begin, and those of \setmainfont. Both are just macros for TeX/LaTeX. Perhaps this is rooted in the fact that earlier commands in TeX world required optional argument to be before mandatory ones. That restriction does not exist for modern key-value based commands, of which there are many... including fontspec macros. Most of the fontspec code online uses this syntax pattern: optional font parameters after font name. This will also truly resolve #1981

Screen Shot 2021-03-08 at 12 53 34 PM

I came across this problem while after coding a function to determine the name of the current command/enviornment. Put the following in your vimrc file, move the cursor to parts of macro arguments (in a .tex file), hit <C-r>, and read messages with :messages

function! s:printVimTeXScope()
    echomsg "==== PrintVimTeXScope ===="
    let l:cmd = vimtex#cmd#get_current()
    if !empty(l:cmd)
        echomsg "cmd:" . l:cmd.name[1:]
        if l:cmd.name[1:] ==# "begin"
            let [l:open, l:close] = vimtex#delim#get_surrounding('env_tex')
            if !empty(l:open)
                echomsg "env:" . l:open.name
            else
                echomsg "ERROR: env empty"
            endif
        endif
    else
        echomsg "ERROR: cmd empty"
    endif
    return ""
endfunction
 
inoremap <buffer> <expr> <C-r> <SID>printVimTeXScope()

Sample code block from above screenshot: (as you will notice, pressing <C-r> inside [] of \setmainfont will return ERROR: cmd empty)

\setmainfont{LucidaBrightOT}[
    Extension = .otf,
    Path = fonts/LucidaSans/,
    UprightFont = *,
    BoldFont = *-Demi,
    ItalicFont = *-Italic,
    BoldItalicFont = *-DemiItalic
]

\begin{helloworld}[
    Extension = .otf,
    Path = fonts/LucidaSans/,
    UprightFont = *,
    BoldFont = *-Demi,
    ItalicFont = *-Italic,
    BoldItalicFont = *-DemiItalic
]
@poetaman poetaman changed the title cmd get_current returns empty if optional arguments are after mandatory one cmd get_current returns empty if optional arguments are after mandatory ones Mar 8, 2021
@lervag
Copy link
Owner

lervag commented Mar 8, 2021

It would help if you can be more clear what you think is wrong and how you think it should be fixed. As it stands, I have to guess. Even though I probably guess right, it is much better if I don't need to guess.

My guess: You think I should update vimtex#cmd#get_current() (and related functions) to also allow options after arguments as a general rule. I.e., to relax the current implementation to gobble any number arguments/options regardless of the order in which they appear. So, given \cmd[a]{b}[c], this should give result in the parsed command containing both a and c in the opts list.

Note: As explained, it is really hard or impossible to actually determine if something is an actual argument/option group or not, so we need to specify a behaviour that works as expected nearly always, but that we must allow to probably fail in some cases. We won't achieve perfect, so let's figure out the next best thing. I think I'm pretty close, but I'll accept that the current behaviour might need an update, but this would require a slightly clearer description and explanation.

@poetaman
Copy link
Author

poetaman commented Mar 8, 2021

Yes, I think you got it right :) Will keep a note on adding more exact note on expected behavior in future. Yes this syntax \cmd[a]{b}[c], with optional arguments at either/both ends is a good place to begin. Calling vimtex#cmd#get_current() while cursor is in any of those arguments should return the command/macro name \cmd

P.S. I have seen macros with optional arguments on both ends of mandatory arguments like \cmd[a]{b}[c]. Though don't remember seeing optional arguments to appear anywhere between positional arguments.

@lervag
Copy link
Owner

lervag commented Mar 8, 2021

The question is: should we allow spaces between }{, ]{, }[ and, ][? I think doing so would lead to more false positives, but if we don't allow them, it will lead to a lot of failed parses. An example to show what I mean with false positives:

$\frac{1}{2} {\color{red} a}$ % how to prevent the last {...} to be recognized as an argument?

Whatever we do, it will end up being wrong in some way. Still, if vimtex#cmd#get_*() simply catches the largest possible command atom, then it will be the user of the APIs problem to deal with the result. Thus, when we work with a parsed \frac command, we can safely ignore any erroneously catched arguments.

The current behaviour allows any number of whitespace characters, including newlines.

My current idea is that it seems relatively safe to adjust the behaviour as follows:

  1. Match command name \word
  2. If \begin, then match the name argument
  3. Match an overlay argument (for beamer) <>
  4. Greedily match any number of {...} and [...] that follows, in any order.

I think a possible alternative is to adjust step 4 to greedily match first [...], then {...}, then again [...]. The only differen
ce from the current behaviour would in this case be the very final matching of the [...] at the end.

@poetaman
Copy link
Author

poetaman commented Mar 8, 2021

@lervag Good point, I think it boils down to this rule: Starting from the macro name \<somemacro>, keep including {.*},[.*],(.*) option groups till the opening delimiter {, [, ( of some group ([],{},()) does not "touch" the closing delimiter of last included option group.

What does "touch" mean?
If they (opening delimiter of potential option group, and closing one of last option group) are right next to each other on same line, i.e there is no space in between, then they "touch". Or if they are on different lines, and all lines in between end with comment % and just have white space/comments in before we reach the opening delimiter, then too they "touch".

$\frac{1}{2} {\color{red} a}$ % how to prevent the last {...} to be recognized as an argument? The above rule should cover this, and all other such cases... Whitespace between macros argument delimiters on same line does not make sense (it does seem to compile for some macros), but is just a bad coding style for TeX as one cannot decipher the meaning by looking at TeX code... So it makes little sense to support that, IMHO.

I did some testing on a sample tex file I have that has textpos environment (note the parentheses delimiters for last argument):

\begin{textblock*}{}[,](,)

and it seems like spaces are allowed between some of the arguments of textblock even when they are on same line. Though if you ask TeX experts on https://tex.stackexchange.com, I bet they will recommend against writing such code (as it lends easily itself to ambivalence).

This pattern compiles:

\begin{textblock*}{3in}[0,0]%
    %   
    %   
    (3in,1in)%

and this too:

\begin{textblock*}{3in}[0,0]%
    %   
    %   
    (3in,%
      1in)%

But this does not compile (as atleast one line does not end with % )

\begin{textblock*}{3in}[0,0]
    %   
    %   
    (3in,%
    1in)%

Feel free to ask more questions, and you can also refer this on https://tex.stackexchange.com to see what TeX experts have to say about my proposal.

@poetaman
Copy link
Author

poetaman commented Mar 9, 2021

And, regarding 2. If \begin, then match the name argument: In the current API there is a way to distinguish whether cursor is in command or environment, hopefully that will remain that way (just confirming). That's because there are environments & commands that have same name & do same thing; good packages provide that for cases where a property is applied on a small piece of text & its too much verbosity to have an environment around such text.

Lastly, it will be a great addition of capability if the the API can give one more information: The number of argument cursor is in. For instance in a macro command like this: \begin{textblock*}{3in}[0,0](3in,1in), it will return 1 when cursor is in {textblock*}, 2 when cursor is in {3in}, 3 when cursor is in [0,0], and 4 when cursor is in (3in,1in). This will enable users to write functions that do specific things based on combination of position & delimiter type (we already know the type of delimiter I guess, looking at table returned by the current API). In my case, it will help me switch completion dictionary from argument to argument (because not all arguments accept same type of values).

Update (from prior observation): Also, this example \begin{textblock*}{3in}[0,0](3in,1in) shows we can have optional arguments mixed in between mandatory ones. Something I didn't think about in my previous comment in this thread: #1983 (comment) (check the PostScript at the bottom of that comment).

@lervag
Copy link
Owner

lervag commented Mar 9, 2021

First, a minor comment: it is a relatively large mental effort to parse big "walls of text", and since I can't focus on a single issue for a lot of time in one go, this can become a barrier for me wrt. focus on actually implementing anything. So, to get progress here, it would be very beneficial to get a short and concise summary of what we want done here. If it is a short proposal that makes sense, then we can also allow others to comment on it before I act.

So, I propose the following: we finish this discussion with the goal of having a short and simple definition of where we want to go, then we open a new issue where this is clearly stated where we/I can invite for some more comments before I implement the changes. I want to do this carefully, because I don't want to break things for anyone, and we are now discussing a relatively deep point in the VimTeX APIs.

So, my current idea of this thread is the following:

Feature request

vimtex#cmd#get*() is a family of commands to parse a LaTeX command. The parser is generic and greedy, and it does not guarantee that it parses the "right amount", but a core idea is that the matched content should always contain the actual command. As it stands, the matching is not good enough, because it is restricted to the following pattern \cmd[...]*{...}*. Instead, we should apply the following pattern: \cmd\([...]\|{...}\|(...)\)*.

We should also adhere to the following special cases and considerations:

  • Allow overlay specifications from the beamer package, which can be roughly specified as \cmd<...> or \begin{...}<...>. The <...> group is always the first element, except for the \begin command.
  • We consume any number of arguments separated by whitespace and commented newlines.

The latter requirement is actually a little bit complicated. The current behaviour does not allow newlines.

@poetaman
Copy link
Author

poetaman commented Mar 9, 2021

@lervag Sure, its good idea to have a proposal ticket with just a clear proposal!

Regarding your special considerations:

  • Regarding beamer's<...>. It might be a good idea to just include these delimiters too in the generic the main command pattern \cmd\([...]\|{...}\|(...)\|<...>\)*. Few advantages: 1) you'll reduce a special case handling, 2) maybe there are other packages out there which now or in future use use <...> for some options (just like I found those parentheses (...) for textblock* environment), 3) you won't be bugged for it in future.

Note: This should be safe to do as our proposed definition does not allow space in between text delimited by (..),[..], <..>, {..}, as such code in latex always leads to ambiguity of meaning.

  • Regarding your second special case/consideration above: The description of the bullet is not that clear, it doesn't convey that whitespace between delimited pairs on same line will "end the command" before the space.

The latter requirement is actually a little bit complicated. The current behaviour does not allow newlines.

Yes I understand, it could be staggered, and perhaps lower priority. Most of the well formatted code I come across on https://tex.stackexchange.com/, (which also happens to be intuitive) looks like code below, where opening delimiter of new argument and closing delimiter of previous argument are on same line and have no whitespace in between (like }[ in code below). And you already support this (for \begin environments at least). So just changing the implementation to \cmd\([...]\|{...}\|(...)\|<...>\)*, and adding an attribute that conveys what argument number cursor is in might be good enough.

\begin{helloworld}[
    Extension = .otf,
    Path = fonts/LucidaSans/,
    UprightFont = *,
    BoldFont = *-Demi,
    ItalicFont = *-Italic,
    BoldItalicFont = *-DemiItalic
]

@lervag
Copy link
Owner

lervag commented Oct 7, 2021

Very sorry about lack of progress here; it's simply been to much the last half year so this did not get high enough priority. But I finally got around to it and I've opened a PR. Let's continue the discussion there: #2197

@lervag lervag closed this as completed Oct 7, 2021
lervag added a commit that referenced this issue May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants