-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(shellwords)!: change arg handling strategy #11149
Conversation
In working on this I came across some things I wasn't so sure about, and was hoping you might be able to help me understand them. For instance, with the checks done to see of the input ended in a space, as well as tests that checked if it preserved it. I retained this behavior in the parsing, but I'm just not sure what use is for it? for instance in this check: helix/helix-term/src/commands/typed.rs Lines 3160 to 3172 in a75b1cf
It seems that if parsed in a way that removes any whitespace at the end would have been covered by just The next this is if the And lastly, for now, list support: Thanks! |
affbeb9
to
94740f5
Compare
Starting to propagate the changes and came across: helix/helix-term/src/commands/typed.rs Lines 1958 to 1961 in e6bf97b
I've never really used the non For example, using |
4b136bc
to
1ff0084
Compare
When refactoring the changes to the main entry command function, the completion behavior slightly changed: when typing a partial command, to get the fuzzy matches, and selecting one with This is further an issue for a command like
helix/helix-term/src/commands/typed.rs Lines 3165 to 3208 in e6bf97b
helix/helix-term/src/commands/typed.rs Lines 3097 to 3135 in 1ff0084
I also have some questions around what the count is expecting here as this seems pretty unintuitive and this might be part of the problem. helix/helix-term/src/commands/typed.rs Lines 3265 to 3281 in e6bf97b
I suspect this is connected to the earlier question I had about the significance of end whitespace, as this seems like something someone would have done to solve a problem you'd only understand when initially implementing the function. |
baf5128
to
88419ad
Compare
892116b
to
7929ea7
Compare
091740d
to
b2998f2
Compare
I believe this should be ready for reviewing. |
It sounds like this could help address #10549 in a future PR. |
I added a Also I believe that even now the We can still change over to the |
Yeah we should restore the behavior of |
Ah, you are right, I remember now that I had brought this up too, about how to deal with escaping slashes and this. Would all slashes need to be escaped? Or just when its for spaces? And one more thing I remember was that the |
Right yeah, if I remember correctly that function was for the sake of completion and we wanted I'm not sure if all backslashes need escaping. With the prior shellwords we may have only used backslash to escape on non-windows? (Thinking that on Windows you would want to use it as a path separator) |
Would it be fine to let this sit for a bit longer so more people can see if they need it? I dont think this is needed anymore now that |
I believe we should have different backslashing rules between unix and windows. The completion results for unix are different in this example: I would consider the backslashing behavior pretty important as it follows unix shell conventions so if this isn't a small fix then I'd prefer to revert this PR temporarily. |
I see. I dont think it should be too bad, so we can leave it for now, and then see what the PR looks like to get this implemented. I think this only fixes for unix completions though, with the windows one able to produce invalid file paths in the path completion. One solution besides escaping it, would be to handle unix and windows the same with a "all paths in quotes" approach. Would no longer need to escape anything for any platform. I forget if quotes are valid names in linux or not, but that would then be the sole edge case. |
Also, is this only for completions or are backslashes expected to be used by a user? |
Yeah I think this will be ok, propagated the Cow change and got these to pass: assert_eq!(
Shellwords::from(r":o a\ ").args().next(),
Some(Cow::Borrowed("a "))
);
assert_eq!(
Shellwords::from(r":o a\ b.txt").args().next(),
Some(Cow::Borrowed("a b.txt"))
); Though tabs are being weird, so have to figure that out. I think I might just not be representing the expected text correctly but ill leave that to last. Still have to correct the other tests but so far I think its straight forward. Will see if that holds true as I get to them. |
@the-mikedavis How should backslashes in quotes be handled exactly? Want to make sure we are all on the same page. Should I think this is the last edge case to deal with. If you can perhaps paste some more examples that you feel could be wonky and I can make tests around them |
Ok actually I'm going to revert for the time being because I think we may also want to re-examine how we store the arguments and that could change what the API looks like. Sorry for the flip/flop on this - Pascal and I both thought this looked pretty promising but I don't want to commit to a really big change like this without being totally satisfied with the API. What I liked reading through this previously was the lack of allocation for the args type but I don't think it's necessary to avoid allocating since parsing the command line is not a hot loop, and I think it will make it harder to add switches in an ergonomic way later. Plus we'll need to allocate sometimes for backslash escapes anyways. I'd like to explore parsing similar to how Kakoune does it since Kakoune already has variable expansions and switches. The parsing in Kakoune is done in two steps in |
before a final choice is made could you look at the progress I have made in this fix? |
Im only one test away from getting this done. Its just deciding on the exact behavior. |
Ill take a look at |
The parsing rules would also be the same right? So it would be the same as using the existing iterator approach but just collect a String in to a Vec instead. I dont see how that fixes anything or what opportunities it presents going forward. |
An immediate benefit of collecting the parameters upfront is checking the minimum and maxmimum number of non-flag params a command can accept |
This reverts commit 64b38d1.
There is |
So what kind of changes are wanted? I will get the backslash escaping done, but what you say as benefits of an upfront collect dont make a lot of sense to me, so I assume there is other issues besides not having a |
Looking at the kakuone code it looks like it only validates count, as far as validating arguments, something we also do, just in the command body, but even this is validation is only partially done as there could be flags that are mutually exclusive, and this would have to be done in the bulk of the logic on execution as there is no simple way to declare this statically. |
There is also some commands that have different requirements, like toggle or set, which for bools only are very specific for their requirements but strings or number can have many. No way to know when deciding if its enough or if its valid for the option. |
Not all validation can be done with static information attached to the command description but we should be setting a min and optional max number of parameters a command will accept so that I'd like to see the parameters parsed eagerly and stored as a Vec of either One difference between Kakoune's and our commands is that Kakoune doesn't provide commands that need to avoid their line being split into parameters like |
Perhaps something like a similar in nature to size_hint?:
This was handled exactly by Pre collecting into a Vec also means that the
So like a With my experience of exploring a flags solution and implementing a working one, I'll say once again the issues there wouldn't be magically solved by having a All I can see this doing is that it moves the validation of basic parameter counts up one level. And this could be done with the current implimentation by adding a Should we try to explore more for flags and see what actually needs to be done, rather than guessing limitations or freedoms one solution offers over another? Because even after reading this, the problems were either solved already by this PR, or the proposed alternative seems to offer no benefits over it, but come with more limitations that need its own work arounds. I understand the gut feeling to revert something you feel sketchy about, I am committed to this work either way, I just feel this was written off too early for vague issues when this solves existing issues and doesn't clearly create more. (besides the escaping rules which is something I did bring up multiple times and was never given a clear answer for until this was merged) I will get another PR up with the fixes from the other PRs added in and we can discuss more there with fresh eyes on the problem. |
…itor#11149)" This reverts commit 64b38d1.
…itor#11149)" This reverts commit 64b38d1.
While attempting to implement issue #11012, it became evident that Shellwords required more substantial changes to correctly interpret how arguments are handled by commands. This was further supported by feedback from @the-mikedavis, highlighting the need for improved parsing strategies, similar to those used by Kakoune.
Summary of Changes
Shellwords Parsing Refactor:
Vec<Cow<str>>
, with a non-allocating iteratorArgs<'_>
. This change aims to streamline argument interaction, making it more linear, performant, and idiomatic.Formalize
:sh
and:pipe
escapingIntroduction of Unescape Function:
unescape
function to process escape sequences in strings and convert them into their corresponding literal characters. Supported sequences include\n
(newlines),\t
(tabs), and Unicode characters (\u{...}
).yank-join
command to unescape separator sequences before joining selections, aligning with user expectations. Similarly, implemented for the:sh
and:pipe
-like commands.Parse JSON Lists
Args::rest
, an argument position aware function that returns the remaining raw text, its now trivial to parse JSON lists from the appropriate syntax, opening up the ability to:toggle
and:set
config options that are lists. e.g.:toggle shell ["nu", "-c"] ["bash", "-c"]
.Visual Examples
sh
Command:Breaking Changes!
:sh
,:yank-join
-like and:pipe
-like commands using the raw args, previous config commands that relied on specific escaping might no longer function as expected.\n
or\t
might now have these replaced with their literal character (This is command specific).Further work
For now, basic conversions are done at the
dap
boundary, where theArgs
iterator is turned into something like aVec<Cow>
orVec<String>
to maintain the previous api. Mainly,Args
still needs to be integrated intodap_start_impl
anddebug_parameter_prompt
. This is being left till later whendap
is developed more. For now,TODO
s are added among the other future work to transition toshellwords::Args
when refactoring.Related Issues
yank-join
#10993