Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: parsing, revisit short option groups, add support for combined short and value #75

Merged
merged 37 commits into from
Mar 12, 2022

Conversation

shadowspawn
Copy link
Collaborator

@shadowspawn shadowspawn commented Mar 4, 2022

This incorporates ideas from #64, #68, and #69. Background on short options in #2.

  1. Refactor parsing to use independent blocks of code, rather than nested cascading context. This makes it easier to reason about the behaviour.

  2. Split out small pieces of logic to named routines to improve readability, and allow extra documentation and examples without cluttering the parsing. (Thanks to @aaronccasanova for inspiration.)

  3. Existing tests untouched to make it clear that the tested functionality has not changed.

  4. Be more explicit about short option group expansion, and ready to throw error in strict mode for string option in the middle of the argument. (See Is strict actually a goal? #11 and feat: Add strict mode to parser #74.)

  5. Add support for short option combined with value (without intervening =). This is what Commander and Open Group Utility Conventions do, but is not what Yargs does. I don't want to block PR on this and happy to comment it out for further discussion if needed. (I have found some interesting variations in the wild.) [Edit: see also Syntax for combined short option and value #78]

  6. Add support for multiple unit tests files. Expand tests from 33 to 113, but many for internal routines rather than testing exposed API.

  7. Added .editorconfig file, mainly for my own convenience!

Copy link
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add “exports”, if we want the utils not to be part of the public api.

package.json Outdated Show resolved Hide resolved
utils.js Outdated Show resolved Hide resolved
utils.js Outdated Show resolved Hide resolved
shadowspawn and others added 5 commits March 4, 2022 16:21
Co-authored-by: Jordan Harband <ljharb@gmail.com>
Co-authored-by: Jordan Harband <ljharb@gmail.com>
Co-authored-by: Jordan Harband <ljharb@gmail.com>
@shadowspawn
Copy link
Collaborator Author

I was neglecting the exports from the shim point of view, but makes good sense from the package point of view. Done. Thanks.

@shadowspawn
Copy link
Collaborator Author

shadowspawn commented Mar 4, 2022

  1. Added tests for duck-typing of misused options, which came out of discussion in Behaviour for zero config --foo=a ? #24 and Behaviour for withValue --foo followed by --bar ? #25.

@shadowspawn shadowspawn requested a review from ljharb March 4, 2022 04:34
utils.js Outdated Show resolved Hide resolved
Co-authored-by: Aaron Casanova <32409546+aaronccasanova@users.noreply.github.com>
Copy link
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do note that adding "exports" is a breaking change, which pre-1.0 increments the second number :-)

package.json Outdated Show resolved Hide resolved
utils.js Outdated Show resolved Hide resolved
utils.js Outdated Show resolved Hide resolved
shadowspawn and others added 3 commits March 5, 2022 14:11
Co-authored-by: Jordan Harband <ljharb@gmail.com>
Co-authored-by: Jordan Harband <ljharb@gmail.com>
Co-authored-by: Jordan Harband <ljharb@gmail.com>
@shadowspawn shadowspawn changed the title Refactor parsing, revisit short option groups, add support for combined short and value refactor!: parsing, revisit short option groups, add support for combined short and value Mar 5, 2022
Copy link
Collaborator

@aaronccasanova aaronccasanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @shadowspawn! Lot to unpack here and didn't get through everything. Submitting this partial review to provide some initial feedback and introduce some potential discussion points/opportunities for clarification. Note: no comments are blocking, so feel free to resolve any of the callouts 👍

index.js Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
utils.js Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
test/dash.js Outdated Show resolved Hide resolved
test/is-lone-long-option.js Show resolved Hide resolved
test/is-lone-short-option.js Show resolved Hide resolved
});

test('when combine string short with value like short option then parsed as value', (t) => {
const passedArgs = ['-a-b'];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (non-blocking): I'm not too familiar with this pattern (e.g. lack of ' ' or '=' delimiters between short option values). Is this common for short options/backed by any guidelines?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are from the hard-core Open Group guidelines, which does not even mention long options. But I find useful to understand the origins and underpinnings of many command-line parsing details.

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html#tag_12_01

(Emphasis on last sentence added by me.)

12.1 Utility Argument Syntax
...
2. Option-arguments are shown separated from their options by characters, except when the option-argument is enclosed in the '[' and ']' notation to indicate that it is optional. This reflects the situation in which an optional option-argument (if present) is included within the same argument string as the option; for a mandatory option-argument, it is the next argument. The Utility Syntax Guidelines in Utility Syntax Guidelines require that the option be a separate argument from its option-argument and that option-arguments not be optional, but there are some exceptions in POSIX.1-2017 to ensure continued operation of historical applications:

a. If the SYNOPSIS of a standard utility shows an option with a mandatory option-argument (as with [ -c option_argument] in the example), a conforming application shall use separate arguments for that option and its option-argument. However, a conforming implementation shall also permit applications to specify the option and option-argument in the same argument string without intervening characters.


https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html

The description has been written to make it clear that getopt(), like the getopts utility, deals with option-arguments whether separated from the option by characters or not. Note that the requirements on getopt() and getopts are more stringent than the Utility Syntax Guidelines.


https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap12.html

SYNOPSIS Shows: -a arg
Conforming application uses: -a arg
System supports: -a arg and -aarg
Non-conforming applications may use: -aarg

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very helpful reference! And reading further answered another question I had regarding what characters are allowed in option arguments:

Guideline 7 allows any string to be an option-argument; an option-argument can begin with any character, can be - or --, and can be an empty string. For example, the commands pr -h -, pr -h --, pr -h -d, pr -h +2, and pr -h " contain the option-arguments -, --, -d, +2, and an empty string, respectively. Conversely, the command pr -h -- -d treats -d as an option, not as an argument, because the -- is an option-argument here, not a delimiter.

Almost makes me wonder if we should always capture the nextArg if options[longOption].type === 'string'. Perhaps we shouldn't open that can of worms 😅 I'm quite satisfied with the current behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, glad you mentioned it! I have actually been thinking about this and doing some research this week. I opened #76 partly to provide a context for revisiting whether options are "greedy", and was leaning towards opening a new issue about that. Now you have raised it, I definitely will!

Before I was clear on the implications I had opened #25 which covers this ground too, but much of the discussion there is about what to store in strict:false mode if the value is missing, and not deep coverage of greedy vs non-greedy.

I'm ok with shipping either way, but would like people to be clearer on where the behaviour stands in comparison with other implementations and why.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like people to be clearer on where the behavior stands in comparison with other implementations and why.

Yes! I think it would be super helpful to have a clear understanding/consensus on what standards and implementations we follow and/or influence the parseArgs API.

Comment on lines 42 to 48
test('when use boolean long option used as string then result as if string', (t) => {
const passedArgs = ['--bool=OOPS'];
const stringOptions = { bool: { short: 'b', type: 'string' } };
const booleanOptions = { bool: { short: 'b', type: 'boolean' } };

const stringConfigResult = parseArgs({ args: passedArgs, options: stringOptions, strict: false });
const booleanConfigResult = parseArgs({ args: passedArgs, options: booleanOptions, strict: false });
Copy link
Collaborator

@aaronccasanova aaronccasanova Mar 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought (non-blocking): I'm surprised by the results of the boolean assertion. If I the author explicitly say --bool is type: 'boolean', I don't understand why we would capture values from incorrect usage by users.

// node user-intent.js --bool=OOPS --more-flags...

const result = parseArgs({ options: { bool: { type: 'boolean' } } })

for (const [longOption, optionValue] of Object.entries(result.options)) {
  if (typeof optionValue === 'string') // process string options
  if (typeof optionValue === 'boolean') // process boolean options
}

☝️ This is an incredibly contrived example, but if my intent as the author was to do one thing with strings and another with booleans it would break due to incorrect argument usage.

Note: This makes total sense to me if there was no configuration options, but if provided, I would think you're saying this is the argument contract my program accepts and relies on internally. If your program allows boolean-like options to accept values, you would simply set the type to string and account for that in your implementation. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% if i configure an arg to be a boolean, it should throw if it's not one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have strict:false and strict:true.

In strict:true mode, the "wrong" usage will throw (well, hopefully, soon!). Note, these tests use strict:false in anticipation of that arriving.

In strict:false mode we are doing best effort parsing no matter the input. We don't want to throw away information which might be needed for the author to make use of. We hope they pay some attention...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood the test was for strict: false and that we would like to capture user intent without throwing away information. However, that seems more applicable to the zero-config behavior to me. If I configure a CLI to accept specific input, I should be confident the args parser will return the type I defined.

TS Playground
I took a shot at implementing some TypeScript definitions for this and it's not feeling very intuitive imo.
image

Notice bar and baz are configured as type: boolean, yet the results can be a string.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this is non-blocking, but think this could lead to less than ideal implementations that require checking if type: 'boolean' options are actually booleans. As well as convoluted TypeScript definitions that have different parse args results based on the strict mode config. Honestly, this isn't a major issue and really just poking at possible holes in the implementation as we get closer to the Node release.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding TypeScript, you said:

// Notice bar and baz are configured as type: boolean, yet the results can be a string.

The converse should apply to foo too, it is configured as type: string, yet the result can be true.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What behavior do you have in mind for mis-using configured types? (Throwing, despite being strict:false?)

Not throw, I was thinking we just return a boolean regardless if it's misconfigured.

// node misconfigured.js --foo=bar
const parsed = parseArgs({ strict: false, options: { foo: { type: 'boolean' } } })
parsed.values.foo // true

The input is already flawed, so might as well return the type I expect (as opposed to a completely different type string). Missing --foo would still be undefined. Thoughts?

and for interest Minimist returns empty string for a configured "string" option which is misused without a value

Not against this either tbh, but don't have strong opinions. I'm more "concerned" about parseArgs returning different types than the author defined.

  1. A high level question is why do you want to use strict:false mode at all?

I will likely always use strict:true and haven't put much thought into what use cases folks have for strict:false. Would be curious to hear your thoughts here!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The converse should apply to foo too, it is configured as type: string, yet the result can be true.

Oh really!? Maybe I do prefer the Minimist approach. i.e. setting an empty string for a misconfigured type: string

It would be great if authors didn't have to check the typeof before using prototype methods for a defined option type: e.g.

// node misconfigured.js --foo
const parsed = parseArgs({ strict: false, options: { foo: { type: 'string' } } })

if (parsed.values.foo?.includes('substring')) // Crashes the program

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if authors didn't have to check the typeof before using prototype methods for a defined option type

Without a throw, coercing the type means the program does not crash, but does not ensure safe and intended behaviour. In that sense crashing is arguably better behaviour than silently doing the wrong thing. Two examples:

// explode is configured as a boolean (but user mis-used)
$ node detonate.js --explode=false
// no-explode is configured as a boolean (but user typo)
$ node detonate.js --no-expode foo bar

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great examples! Definitely demonstrates the severity and potential impact of misconfigured options. So I'm realizing strict:false is probably more suited for bring your own validation and making authors responsible for robustness. Whereas, strict:true allows authors to rely on parseArgs for option validation and throwing errors for misuse.

@shadowspawn
Copy link
Collaborator Author

Thanks @aaronccasanova , you have picked up several errata in the comments and some good suggestions and questions.

shadowspawn and others added 5 commits March 5, 2022 20:29
Co-authored-by: Aaron Casanova <32409546+aaronccasanova@users.noreply.github.com>
Co-authored-by: Aaron Casanova <32409546+aaronccasanova@users.noreply.github.com>
Co-authored-by: Aaron Casanova <32409546+aaronccasanova@users.noreply.github.com>
@shadowspawn
Copy link
Collaborator Author

Thanks for reviews and comments @ljharb and @Eomm and @aaronccasanova . I'll wait for @bcoe to at least have a quick look before I consider merging.

Further comments welcome from any gentle readers.

@shadowspawn shadowspawn merged commit a92600f into pkgjs:main Mar 12, 2022
@shadowspawn
Copy link
Collaborator Author

Waiting no longer. 😄

@shadowspawn shadowspawn deleted the feature/refactor-parse branch June 5, 2022 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants