util: add util.parseArgs() #35015

boneskull · 2020-09-01T22:31:32Z

Add a function, util.parseArgs(), which accepts an array of
arguments and returns a parsed object representation thereof.

Ref: nodejs/tooling#19

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

Motivation

While this has been discussed at length in the Node.js Tooling Group and its associated issue, let me provide a summary:

process.argv.slice(2) is rather awkward boilerplate, especially for those new to Node.js
Parsing options without the use of a userland module reliably requires, well, about as much code as this PR adds; the complexity quickly ramps up going from one argument to two
It's often useful to add a few command-line flags in an ad-hoc manner to, say, app.js
It's often useful to parse command-line flags in example code, which would otherwise demand installation of a userland module (thus complicating things)

process.parseArgs() makes handling command-line arguments "natively" much easier. Given that so many tools are being written in Node.js, it makes sense to center this experience.

Design Considerations

First, let me acknowledge that there are many ways to parse command-line arguments. There is no standard, cross-platform, agreed-upon convention. Command-line arguments look different in Windows vs. POSIX vs. GNU, and there's much variation across programs. And these are still conventions, not hard & fast requirements. We can easily be paralyzed attempting to choose what "styles" to support or not. It is certain that there will be someone who agrees that Node.js should have this feature, but should not do it in this way.

But to implement the feature, we have to do it in some way. This is why the way is the way it is:

I have researched the various features and behavior of many popular userland command-line parsing libraries, and have distilled it down to the most commonly supported features, while striving further to trim any features which are not strictly necessary to get the bulk of the work done. While these do not align to, say, POSIX conventions, they do align with end-user expectations of how a Node.js CLI should work. What follows is consideration of a few specific features.

The Requirement of `=` for Options Expecting a Value

For example, one may argue that --foo=bar should be the only way to use the value bar for the option foo; but users of CLI apps built on Node.js expect --foo bar to work just as well. There was not a single popular argument-parsing library that did not support this behavior. Thus, process.parseArgs() supports this behavior (it cannot be automatic without introducing ambiguity, but I will discuss that later).

Combination of Single-Character Flags

Another one is combining (or concatenating?) "short flags"--those using a single hyphen, like -v--where -vD would be equivalent to -v -D. While this is a POSIX convention, it is not universally supported by the popular command-line parsers. Since it is inherently sugar (and makes the implementation more complicated), we chose not to implement it.

Data Types

Like HTML attribute values (<tag attr="1">), command-line arguments are provided to programs as strings, regardless of the data type they imply. While most of the userland arg parsers support some notion of a "data type"-i.e., this argument value is a number, string, or boolean--it is not strictly necessary. It is up to the user to handle the coercion of these values.

Default Behavior: Boolean Flags

The default behavior is to treat anything that looks like an argument (that's mainly "arguments beginning with one or more dashes") as a boolean flag. The presence of one of these arguments implies true. From investigation of popular CLI apps, we found that most arguments are treated as boolean flags, so it makes sense for this to be the default behavior. This means that a developer who just wants to know whether something is "on" or "off" will not need to provide any options to process.parseArgs().

Handling Values

Some arguments do need values, (e.g., --require my-script.js), and in order to eliminate ambiguity, the API consumer must define which arguments expect a value. This is done via the expectsValue option to process.parseArgs(), which is the only option to process.parseArgs(). This is the only option process.parseArgs() accepts.

Possible alternatives:

Rename expectsValue to something else

Repeated Arguments

It's common to need to support multiple values for a single argument, e.g., --require a.js --require b.js. In this example, require needs to be listed in the expectsValue option. The result is an object containing a require property whose value is an array of strings; ['b.js', 'c.js']. In the example of --require c.js, the value of the require property is a string, 'c.js'.

When working with boolean flags (those not declared in expectsValue), it was trivial to support the case in which repeated arguments result in a count. One -v will result in an object where {v: true}, but -v -v will result in {v: 2}. Either way, the value will be truthy.

Possible alternatives:

Every argument expecting a value (as declared in expectsValue) will parse to Array of strings, even if there is only one string in the Array (e.g., --require c.js becomes {require: ['c.js']}. That makes the API more consistent at the expense of making the common case (no repetition) slightly more awkward.
Remove the "count" behavior. While this is widely supported by modules, I don't often see it used in the wild in Node.js CLI apps.

Positional Arguments

Arguments after -- or without a dash prefix are considered "positional". These are placed into the Array property _ of the returned object. This is a convention used by many other userland argparsers in Node.js. It is always present, even if empty. This also means that _ is reserved as a flag/option name (e.g., --_ will be ignored).

Possible alternatives:

Throw if _ is provided in expectsValue

Intended Audience

It is already possible to build great arg parsing modules on top of what Node.js provides; the prickly API is abstracted away by these modules. Thus, process.parseArgs() is not necessarily intended for library authors; it is intended for developers of simple CLI tools, ad-hoc scripts, deployed Node.js applications, and learning materials.

It is exceedingly difficult to provide an API which would both be friendly to these Node.js users while being extensible enough for libraries to build upon. We chose to prioritize these use cases because these are currently not well-served by Node.js' API.

Questions

In particular, I'm not 100% confident in the terminology I chose for the documentation ("Flags", "Options", "Positionals"). This does align with other documentation I've read on the subject of CLI arguments, I am unsure if introducing this terminology to our documentation is a Good Idea. Perhaps it can be expressed without new terminology.
I sorted some files around my modification in in node.gyp, which looked like it wanted to be in order, but was not. It did not seem to affect the build, but I can revert these changes if need be.
Should it be process.parseArgv()? While it does parse process.argv by default, it does not necessarily need to be used with process.argv.
Do I need to do more input validation, throw more exceptions, or take other defensive measures?

Credits

While this is my implementation, the design is a product of work by myself, @bcoe, @ruyadorno, and @nodejs/tooling.

DerekNonGeneric · 2020-09-01T23:11:33Z

Nice, I didn't know if this was ever going to actually happen or not.

In particular, I'm not 100% confident in the terminology I chose for the documentation ("Flags", "Options", "Positionals").

There's an IEEE standard for this, which might help.

The utility in the example is named utility_name. It is followed by options, option-arguments, and operands. The arguments that consist of characters and single letters or digits, such as 'a', are known as "options" (or, historically, "flags"). Certain options are followed by an "option-argument", as shown with [ -c option_argument]. The arguments following the last options and option-arguments are named "operands".
— https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html

You mention the possibility of renaming the expectsValue option to something else, so you might be interested in the excellent getopt API documentation, which calls this has_arg.

This field says whether the option takes an argument. It is an integer, and there are three legitimate values: no_argument, required_argument and optional_argument.
— https://www.gnu.org/software/libc/manual/html_node/Getopt-Long-Options.html

Hope to have time for a more thorough review later, but I do agree that command-line argument parsing is a fundamental capability that should be readily available. What I've noticed is that there are basically three major styles:

No frills (what you propose)
Subcommands (i.e., the Git CLI git commit -m …)
Docker-style (multiple subcommands & args)

boneskull · 2020-09-01T23:19:26Z

@DerekNonGeneric Thanks, didn't know about the IEEE thing. I can adopt this terminology if we think it'd be more clear.

devsnek · 2020-09-01T23:19:13Z

lib/internal/process/parse_args.js

+      'object',
+      options);
+  }
+  if (typeof options.expectsValue === 'string') {


is there a reason for this overload? it seems somewhat nonsensical.

It's friendly to support a string if the value can be singular, instead of requiring an array. This sort of behavior is very common.

that just seems confusing to me. can we remove it?

I'd rather not!

This overload seems to be a category error, it should be removed.

I'm -0 on having this overload: it doesn't really bother me, but since we should rename expectsValue to something else, and if we rename to something on plural (like optionsWithValues) it might make more sense/be more intuitive to have this as an Array.

For yargs' chaining API, I like that I can provide options as either an array or string. If I only have a couple options, I'll usually write:

yargs .boolean('option-1') .boolean('option-2')

But, if I have many options that I'm configuring, I might write:

yargs.boolean(['option-1', 'option-2', 'option-3', 'option-4')

I don't have as strong of an opinion in this case, given that this API isn't chaining like yargs (you configure all the options in one go any ways).

I don't think either implementation would be a major usability issue, and am supportive of whatever we land on.

Haven't decided yet where I land on "accept string or not", but if we don't accept string it should throw when string is passed, otherwise we'll get situations where the string is treated as an array ("foo" getting treated as ['f', 'o', 'o'] leads to hard to debug issues).

I'm -0 too. Either is fine by me, but I guess if it was up to me, I'd prefer to not permit a string initially, because adding it later is no problem but taking it away once it's out there is a big problem. That said, I doubt we'd have to/want to take it away, hence my -0 rather than -1.

I guess I am also a member of the -0 club.

devsnek · 2020-09-01T23:24:00Z

lib/internal/process/parse_args.js

+    const arg = argv[pos];
+    if (arg === undefined) {
+      return result;
+    }


Might be safer to remove the _ checks below and move them up here, maybe something like if (/^-{0,2}_$/.test(arg)) continue.

can't just do that, because --_ is ignored, --_=foo is ignored, and --_ foo is ignored (in the case of _ in expectsValue), so we need to parse our array further before just skipping ahead (or we introduce ambiguity). We could throw if _ is in expectsValue and make this somewhat easier.

devsnek · 2020-09-01T23:25:53Z

doc/api/process.md

+  object supporting the following property:
+  * `expectsValue` {Array<string>|string} (Optional) One or more argument
+    strings which _expect a value_ when present
+* Returns: {Object} An object having properties corresponding to parsed Options


why not just return { options: { ... }, positionals: [...] }? it gets rid of the awkward _ handling and is imo much cleaner than magic properties.

I'm not opposed to this, but it is based on userland conventions. Personally I would prefer the magic over nested properties, but that's just me (as a potential consumer of this API).

Yeah I've seen the userland libraries. I'd like to think we can do better 😄

I think I'm with Gus here, userland convention originated before deconstructors, so having a _ property could be considered better/more attractive than returning options: { ... }, positionals: [...] }. The explicit object return is, IMO, more intuitive and user-friendly:

const { options, positionals } = processArgs()

_ might be somewhat familiar to users used to yargs, commander etc, but it won't be intuitive for new users.

I think that const { options, positionals } = processArgs() is a pretty elegant compromise, I never loved _ 😆

mscdex · 2020-09-02T02:02:31Z

This seems like something that should live off of util instead.

ljharb

+1 that we can do way better than the magic _

lib/internal/process/parse_args.js

boneskull · 2020-09-02T17:40:52Z

Would like someone else to weigh in on the _ convention for positionals, e.g., @bcoe

boneskull · 2020-09-02T18:17:40Z

I'm interested in feedback on all "possible alternatives" and questions listed in the description. Consolidating it into a list here:

Should we rename the expectsValue option? For example, optionArguments (to follow the IEEE guidelines)
For args expecting a value (e.g., --foo=bar) should the parsed result always be an array of values? {foo: 'bar'} vs {foo: ['bar']} in this case
Should we remove the "count" behavior for repeated flags? What should happen instead, in the case of --foo --foo --foo?
Should we adopt IEEE terminology in the code and documentation?
Should I revert the change to the order of files in node.gyp?
Should the function be named process.parseArgv()?
Should more exceptions be thrown somewhere?

In addition, I would like further input on these:

Should this function live somewhere else besides process, e.g., util?
Should we abandon the _ convention, and return a nested object instead (with e.g., options and positionals properties)?

When answering these, please consider the expected users of this API and the developer experience. What would be easiest to understand at first glance? Once familiar, will the API be ergonomic or tedious?

doc/api/process.md

devsnek

https://github.com/nodejs/node/pull/35015/files#r482259319

boneskull · 2020-09-02T23:16:57Z

@devsnek I will not be changing the API to force expectsValue to be an array (though I'm happy to rename it!), as I strongly believe providing a smoother developer experience is more important than somebody's idea of theoretical purity. It's my hope you will not block the PR for this reason.

devsnek · 2020-09-02T23:19:56Z

I don't think confusing apis provide a smoother developer experience.

boneskull · 2020-09-02T23:20:33Z

@devsnek There's nothing confusing about the API; as I mentioned, this is widely supported across userland.

boneskull · 2020-09-02T23:29:26Z

Sounds like @nodejs/tsc may need to weigh in.

mmarchini · 2020-09-03T02:58:35Z

Some comments on the issue summary first, will look at the code later.

Another one is combining (or concatenating?) "short flags"--those using a single hyphen, like -v--where -vD would be equivalent to -v -D. While this is a POSIX convention, it is not universally supported by the popular command-line parsers. Since it is inherently sugar (and makes the implementation more complicated), we chose not to implement it.

How much more complicated is it to implement this behavior, and why is it more complicated vs more complex?

Every argument expecting a value (as declared in expectsValue) will parse to Array of strings, even if there is only one string in the Array (e.g., --require c.js becomes {require: ['c.js']}. That makes the API more consistent at the expense of making the common case (no repetition) slightly more awkward.

String type when there's no repetition seems fine.

Remove the "count" behavior. While this is widely supported by modules, I don't often see it used in the wild in Node.js CLI apps.

I never seen this behavior on any CLI apps (Node.js or otherwise), IMO it can be removed

In particular, I'm not 100% confident in the terminology I chose for the documentation ("Flags", "Options", "Positionals"). This does align with other documentation I've read on the subject of CLI arguments, I am unsure if introducing this terminology to our documentation is a Good Idea. Perhaps it can be expressed without new terminology.

Python uses the terminology "arguments" (https://docs.python.org/3/library/argparse.html), but we don't have to use the same terminology. We need to use a terminology, the one you proposed seems as good as any, as long as we are consistent across our documentation (including documentation about Node.js CLI options) about it we're good.

I sorted some files around my modification in in node.gyp, which looked like it wanted to be in order, but was not. It did not seem to affect the build, but I can revert these changes if need be.

We tend to avoid reordering things unless necessary because it makes git archaeology harder.

mmarchini

(still reviewing, commenting on the docs first so I don't lose my comments)

mmarchini · 2020-09-03T03:04:16Z

doc/api/process.md

+* `argv` {Array<string>|Object} (Optional) Array of argument strings; defaults
+  to [`process.argv.slice(2)`](process_argv). If an Object, the default is used,
+  and this parameter is considered to be the `options` parameter.
+* `options` {Object} (Optional) The `options` parameter, if present, is an
+  object supporting the following property:
+  * `expectsValue` {Array<string>|string} (Optional) One or more argument
+    strings which _expect a value_ when present


That's not how we usually write optional parameters ("If an..., it is ...")

https://nodejs.org/docs/latest-v12.x/api/crypto.html#crypto_cipher_update_data_inputencoding_outputencoding

I'm worried the over verbosity might make it more complicated to understand.

I've addressed this somewhat, but I find the linked documentation too terse. 🤷

doc/api/process.md

mmarchini · 2020-09-03T03:09:06Z

doc/api/process.md

+  object supporting the following property:
+  * `expectsValue` {Array<string>|string} (Optional) One or more argument
+    strings which _expect a value_ when present
+* Returns: {Object} An object having properties corresponding to parsed Options


I think I'm with Gus here, userland convention originated before deconstructors, so having a _ property could be considered better/more attractive than returning options: { ... }, positionals: [...] }. The explicit object return is, IMO, more intuitive and user-friendly:

const { options, positionals } = processArgs()

_ might be somewhat familiar to users used to yargs, commander etc, but it won't be intuitive for new users.

doc/api/process.md

mmarchini · 2020-09-03T03:23:27Z

doc/api/process.md

+  following a `--` (e.g., `['--', 'script.js']`)
+  * Positionals appear in the Array property `_` of the returned object
+  * The `_` property will _always_ be present and an Array, even if empty
+  * If present in the `argv` Array, `--` is discarded and is omitted from the


I think some user land lib has this behavior and I remember finding it very confusing. If users want to pass through everything after -- to a child process, there's no way of knowing which positionals were after --.

This is a good point, though I'm unsure if it's really common enough to worry about here... again, if this API doesn't cut it, there are great userland libraries that will (though, I'm not sure if they do in this case).

though I'm unsure if it's really common enough to worry about here

It's more common than the count behavior :)

doc/api/process.md

node.gyp

lib/internal/process/parse_args.js

mmarchini · 2020-09-03T03:54:30Z

IMO if a flag with = is provided by the user but the option is not present on expectsValue, parseArgs should throw. I want to hear other's opinion wrt this though.

devsnek · 2020-09-03T03:58:20Z

I think the proper solution would be a system you have to pass in all the args you expect, whether they're flags or options, if it's an option whether it's required, whether certain positional arguments are required, etc.

mmarchini · 2020-09-03T04:09:06Z

@devsnek I kinda agree, this API is kind of a mix of low level and high level API: high level due to the magic bits, but low level because the user has to implement so many boilerplate things that are common on CLI tools (--help, options validation, type conversion, etc.). The way it is implemented today it would be possible to add a "strict" mode in the future though, or a more comprehensive parser builder.

The API clearly has a goal of simplicity and is not intended to fully replace existing parser modules, nor is intended to cater for more complex use cases without offloading the heavy work to the user, or at least that's my understanding.

boneskull · 2020-09-03T07:41:48Z

Regarding the "count" feature: ssh uses it; see man page. But that's the only one I can think of offhand. It's not in wide use, but it's also an information channel that is lost if we do not support it.

Regarding returning diff't data structures (a string for singular option values and an array for multiples), I would rather @bcoe could speak to the relative wisdom of such an API, because that is yargs's default behavior, and it is the most popular userland arg parsing lib. Does it trip people up?

Regarding whether options should be explicitly defined... maybe, but, a better developer experience is to not throw on unknown options, and let the API consumer decide what to do with them. In a CLI app, throwing an exception should be avoided wherever possible--end-users typically don't need to see a stack trace, and whether the exception message would actually provide actionable context is a crapshoot. I think we'd rather let the API consumer decide how to express such an error (or silently accept unknown options).

But otherwise, I'm unclear on what explicit definition of boolean-flag-style options really buys an API consumer. If it doesn't throw (and IMO it should not), should this API silently ignore the unknown options? The API consumer would not know if unknown options were passed, unless we threw them into another bucket property, e.g. unknownFlags or provided another option to instruct parseArgs how to handle unknown options. Adding more ways to control its behavior is a very slippery slope (see yargs-parser; apologies Ben 😉).

Also: @nodejs/tooling in case you missed it

lundibundi · 2020-09-03T08:23:09Z

lib/internal/process/parse_args.js

+  if (typeof options !== 'object' || options === null) {
+    throw new ERR_INVALID_ARG_TYPE(
+      'options',
+      'object',
+      options);
+  }


Suggested change

if (typeof options !== 'object' || options === null) {

throw new ERR_INVALID_ARG_TYPE(

'options',

'object',

options);

}

validateObject(options, 'options');

(from internal/validators)

lundibundi · 2020-09-03T08:25:00Z

lib/internal/process/parse_args.js

+    if (arg === undefined) {
+      return result;
+    }
+    if (StringPrototypeStartsWith(arg, '-')) {


perhaps

Suggested change

if (StringPrototypeStartsWith(arg, '-')) {

if (arg[0] === '-') {

? (and same for other single-char startsWith use-cases)

tniessen · 2020-09-03T16:14:31Z

Many popular languages don't have argument parsers in their standard libraries, including Java, R, C, C++, C#, etc.

But a few standard libraries do have built-in option parsers.

For C/C++, there are getopt, getopt_long, and getopt_long_only in the POSIX/GNU standard library. Even though these are historic remnants and most C/C++ projects implement their own parsing logic or use external libraries, the newer getopt variants do provide a bit of flexibility. For example, they allow defining options before parsing.
Boost has probably almost reached the point of being a standard library for C++, and provides the powerful Boost.Program_options class.
Python has an older, deprecated option parser getopt that is very similar to the traditional getopt function in POSIX C/C++. This parser has since been replaced with the new optparse module, which is much more flexible and configurable.
Similarly, Go has flag, which is also quite flexible.

All popular languages provide option parsers as external libraries. Node.js probably has hundreds via npm, C# has System.CommandLine. Java has many, including Apache Commons CLI. R has optparse.

So it seems to me that any modern framework either brings no option parser with it, or a much more powerful parser than the one being proposed here. Also, I am afraid that adding this to Node.js might reduce awareness for better solutions in the ecosystem.

See also Sam's concerns in nodejs/tooling#19 (comment), nodejs/tooling#19 (comment), nodejs/tooling#19 (comment).

mmarchini · 2020-09-03T16:41:13Z

Regarding whether options should be explicitly defined... maybe, but, a better developer experience is to not throw on unknown options, and let the API consumer decide what to do with them. In a CLI app, throwing an exception should be avoided wherever possible--end-users typically don't need to see a stack trace, and whether the exception message would actually provide actionable context is a crapshoot. I think we'd rather let the API consumer decide how to express such an error (or silently accept unknown options).

But otherwise, I'm unclear on what explicit definition of boolean-flag-style options really buys an API consumer. If it doesn't throw (and IMO it should not), should this API silently ignore the unknown options? The API consumer would not know if unknown options were passed, unless we threw them into another bucket property, e.g. unknownFlags or provided another option to instruct parseArgs how to handle unknown options. Adding more ways to control its behavior is a very slippery slope (see yargs-parser; apologies Ben wink).

With the approach on this PR the developer has to validate all options through if statements or similar and then manually write a help message for their end users, whereas with explicit declaration that throws the developer can catch the error and print the help message (because we can generate a help message with explicit declaration). It seems a lot more work for developers or the developer will take the simple road and it will be a worse experience for their final user.

bcoe · 2020-09-03T19:38:23Z

So it seems to me that any modern framework either brings no option parser with it, or a much more powerful parser than the one being proposed here. Also, I am afraid that adding this to Node.js might reduce awareness for better solutions in the ecosystem.

I would like to chime in from the perspective of someone who is both supportive of this functionality, and leads development yargs (one of the most widely adopted argument parsers in the npm ecosystem).

Thinking from the perspective of my day job

For my job, I work at Google as a DPE. We behave as customer zero for Google Cloud Products:

helping build the client libraries that interact with APIs.
writing samples and documentation for our various products.
interacting with language communities (👋 hello).

It's my work around samples that has me advocating for a built-in command line argument parser in Node.js.

It allows you to provide an elegant snippet of code to folks that just works:

const {projectId, datasetName} = process.parseArgs()
const bigqueryClient = new BigQuery({
  projectId: args
});
const [dataset] = await bigqueryClient.createDataset(datasetName);
console.log(`Dataset ${dataset.id} created.`);

It creates confusion for users that they need to bring other dependencies to the table to use an exemplary snippet of code (beyond the code you're writing an example for).

Thinking from the perspective of a Node.js tooling author

Node.js is a powerful platform, and I would love to be able to do more with it without pulling in dependencies:

Node.js has great http built-ins, fs built-ins, encryption built-ins, etc., and yet when I write a new library I tend to pull in dependencies like: mkdirp, rimraf, or serve. Why? so I can write npm scripts like this:

{
  "scripts": {
    "precompile": "rimraf ./build",
    "compile": "... some build step"
  }
}

Python exposes a variety of useful built-ins as modules, and allows you to run them as command line applications:

python -m http.server 8000 --bind 127.0.0.1

What if I could do this in Node.js?

{
  "scripts": {
    "precompile": "node -m rmdir --recursive ./build",
    "compile": "... some build step"
  }
}

Having a built-in command line argument parser I'm convinced could help unlock a variety of interesting ideas like this.

jkrems · 2020-09-03T19:54:31Z

In terms of opinions for repeated flags: I have seen too many bugs caused by repeated query params breaking apps that didn't bother to check for isArray. In practice this kind of "almost always a string but if you pass 2+ it becomes an array" leads to one of two things: super verbose defensive code that needs to type check every step of the way OR tools that throw errors because the file named "[a,b]" couldn't be found (or even worse "x.substring isn't a function").

So... I personally would prefer if support for repetition would be dropped (it's the much less often used kind afaict), made explicit via a 2nd option, or if it's always an array and never a string.

bcoe

Left a few initial nits, but I feel like this is an awesome start.

doc/api/process.md

bcoe · 2020-09-03T20:29:46Z

doc/api/process.md

+  object supporting the following property:
+  * `expectsValue` {Array<string>|string} (Optional) One or more argument
+    strings which _expect a value_ when present
+* Returns: {Object} An object having properties corresponding to parsed Options


I think that const { options, positionals } = processArgs() is a pretty elegant compromise, I never loved _ 😆

doc/api/process.md

mmarchini · 2020-09-03T20:50:14Z

So... I personally would prefer if support for repetition would be dropped (it's the much less often used kind afaict), made explicit via a 2nd option, or if it's always an array and never a string.

I don't think it's uncommon enough for us to drop it (personally I've used it on quite a few CLIs I wrote). Always an array seems like a good option for me.

One feature I don't think this proposal supports, which I know is quite popular, is the concept of arrays

The current proposal supports arrays :D

devsnek · 2020-09-18T18:42:53Z

Honestly I'm not sure we're ever going to approach "good" behaviour, much less perfect, if so much guessing and coercion has to be involved.

gabrielschulhof · 2020-09-18T19:06:21Z

@boneskull I think we should leave interpretation to the user. The only thing we should absolutely ensure is that, if the option has a parameter, that it makes it into the result as a string so the user can then process it as they see fit.

boneskull · 2020-09-18T19:15:09Z

@gabrielschulhof So essentially you're proposing:

if you do not use optionsWithValues, you can expect a boolean true, string, or undefined
if you use optionsWithValues, you can expect a string (empty string in the case of a missing value) or undefined

(Is that correct?)

boneskull · 2020-09-18T19:17:52Z

It's important that @tniessen gives the green light to this strategy.

tniessen · 2020-09-18T20:45:31Z

please describe how it should work instead of waiting for me to make a change as @tniessen is suggesting, then coming up with something else.

I don't think I suggested any changes directly, at least none that you implemented. I did highlight problems, but I never said I had solutions to these problems (apart from leaving option parsing to the ecosystem).

You are looking for an API that should be as simple as possible, should work with basically zero configuration, and should never throw exceptions based on argv. From the very beginning, I was doubtful towards such an API in core, and I expressed that, like multiple others have in nodejs/tooling#19. My specific concerns are correctness and safety, first mentioned here and later here. Based on that concern, I requested changes here. My position did not change, and I never suggested any changes.

You expect a solution from me, but I don't have one. I simply don't know how to combine the properties I am concerned about (safety and correctness) with the properties you are focusing on (simplicity and not throwing).

This is nothing personal @boneskull, and I understand your frustration. Nobody has proposed any complete solutions, and at this point, I don't even know if a good solution exists :)

if you do not use optionsWithValues, you can expect a boolean true, string, or undefined

As you correctly mentioned in #35015 (comment), this is not safe either, but probably at least closer to being correct. However, it introduces return values with unpredictable types, and I don't understand why this behavior would be preferable over throwing an error. I know that you are not considering that an option (no pun intended), but I personally don't get why.

To me, it seems illogical to accept values for flags, which, according to the documentation, don't have values.

To use the proposed API safely, developers must check whether values were passed for flags (which don't accept values). If a value is permitted for a flag, why didn't the developer declare it as an option, which takes an optional value? If a value is passed for a flag, most applications should probably throw or display an error. In other words, it seems to me that not throwing an error in this API makes safe usage much more difficult for developers, compared to just handling an error. I simply don't see the logic behind this.

This is what I assume safe usage would look like:

const { options, positionals } = util.parseArgs({ optionsWithValues: ['foo'] });

// Since Node.js allows passing anything for flags, we need to check every single flag for a value.
// The value might be 'false' or something similar, which would lead to incorrect behavior if ignored.
// Node.js could absolutely do this itself, but it doesn't want to.
for (const name of ['bar', 'baz', 'qux', 'quux', 'quuz', 'corge', 'grault']) {
  if (options[name] !== undefined && options[name] !== true) {
    console.error(`I did not expect a value for the flag ${name}, but Node.js accepted one.`);
    console.error(`I cannot really help you, other than tell you not to specify a value for this flag.`);
    process.exit(1);
  }
}

// Okay, after manually checking whether Node.js accepted any values for
// flags that should not accept values, we can finally actually use the values.
const { foo, bar, baz, qux, quux, quuz, corge, grault } = options;

// Oh wait, we also need to make sure no unexpected positionals were passed...
// (Not included here)

I honestly don't understand how this is better than simply handling one exception. I know you don't want to consider this, but I am interested in the reason.

It's important that @tniessen gives the green light to this strategy.

If the PR is somewhat close to implementing somewhat correct behavior, and if the resulting unsafe behavior is properly documented, I'll dismiss my request for changes. That doesn't mean I'll approve the PR, but I also won't stand in the way.

There must be some way for the API to be used safely. With its current state, I don't see how that would be possible, and adding inherently unsafe APIs to Node.js does not seem logical to me.

As much as I am not a fan of the proposed solution, it seems to allow safe usage of the API, assuming developers are aware of the potential pitfalls caused by it, and implement their own error checking.

Of course, you can also leave it to the TSC to overrule my request for changes. Maybe others are fine with landing it as it is, despite the resulting safety and correctness issues.

mmarchini · 2020-09-18T21:58:24Z

fwiw I also raised concerns about correctness and safety, but I think those comments got lost when the PR was updated at some point.

boneskull · 2020-09-18T22:02:20Z

@mmarchini I haven't rebased... what are you referring to?

mmarchini · 2020-09-18T22:03:37Z

Also as a point of order: given @tniessen objection and the fact that to reach a correct and safe implementation we might end up with material changes to the API, I don't think it's worth discussing Gus objection right now (as the code they're objecting might not exist on a future solution). If that piece of code remains once @tniessen objection is either resolved or dismissed, the TSC must reach a decision on Gus' objection in a timely manner.

mmarchini · 2020-09-18T22:04:26Z

@boneskull I'll try to find my comments, but as I said they disappeared (at least in the code view), so it might take a while to find it.

boneskull · 2020-09-18T22:06:21Z

Won't be pushing any changes until Tuesday. I plan on implementing this and updating the documentation accordingly.

mmarchini · 2020-09-18T22:20:19Z

Ok, so here are my comments related to correctness and safety (although I didn't use those specific words, and I apologize if I wasn't as clear as @tniessen on my concerns):

(note I didn't make it an explicit objection because I do believe we can have a follow up API with more strict/structured format, but I definitely see why others might think a loose API is not ideal in the first place)

boneskull · 2020-09-18T22:25:58Z

I don't believe balancing these concerns is impossible.

OKAY, what about this:

We want to know if an end-user gave us weird or unexpected input. A flag having a value, for instance. e.g., --verbose=yes is unexpected; --verbose was the correct usage.

Instead of just returning {options: {verbose: 'yes'}}, we return {options: {}, errors: [SomeError]}, where errors is one or more Problems.

This way, we can:

keep the value of "flags" as boolean true (consistent types) and
we lose no information and
we do not throw.

This means that --verbose=false is an "error." The developer can choose how to interpret what was provided. SomeError could be an actual Error or not, but a code and contextual information would be helpful, so I'm leaning towards yes.

Likewise, if optionsWithValues was specified, a missing value would also be considered an "error", instead of returning an empty string.

mmarchini · 2020-09-18T22:27:58Z

doc/api/util.md

+// called via `node script.js --foo bar baz`
+const argv = util.parseArgs();
+
+if (argv.foo === true) {


argv.options.foo?

mmarchini

Marking objection explicitly to make sure this don't fall off the cracks:

util: add util.parseArgs() #35015 (comment)

-- shouldn't be discarded as it makes it impossible for the developer to provide pass-through behavior for their scripts.

What is the behavior of the API today:

> util.parseArgs(['--foo', '--', '--bar'])
{ options: { 'foo': true }, { positionals: ['--bar'] } }

What I expect from the API:

> util.parseArgs(['--foo', '--', '--bar'])
{ options: { 'foo': true }, { positionals: ['--', '--bar'] } }

My objection is resolved with the above change in behavior.

boneskull · 2020-09-18T22:30:34Z

@mmarchini I think that is reasonable.

mmarchini · 2020-09-19T19:27:53Z

Edited my comment just to clarify that my objection is resolved with the change above (just to avoid confusion). Also, I'm happy with other solutions for the passthrough problem, I only suggested that one because it's the simplest solution I could think of.

Other options I find reasonable:

preserving the -- being optional via parseArgs parameter
return passthrough items in a separate array ({options, positionals, passthrough})
return the index of the first passthrough option (same return structure as previous suggestion, but passthrough is a numeric attribute)

ruyadorno · 2020-09-20T00:15:31Z

lib/internal/util/parse_args.js

+) => {
+  if (!ArrayIsArray(argv)) {
+    options = argv;
+    argv = ArrayPrototypeSlice(process.argv, 2);


Defaulting to process.argv.slice(2) is fine but I think the implementation should somewhat also consider the eval usecase, right now if I try running node -p|-e I get a very unhelpful error:

$ node -p 'require('util').parseArgs()' foo bar internal/validators.js:122 throw new ERR_INVALID_ARG_TYPE(name, 'string', value); ^ TypeError [ERR_INVALID_ARG_TYPE]: The "id" argument must be of type string. Received an instance of Object at new NodeError (internal/errors.js:253:15) at validateString (internal/validators.js:122:11) at Module.require (internal/modules/cjs/loader.js:972:3) at require (internal/modules/cjs/helpers.js:88:18) at [eval]:1:1 at Script.runInThisContext (vm.js:132:18) at Object.runInThisContext (vm.js:309:38) at internal/process/execution.js:77:19 at [eval]-wrapper:6:22 at evalScript (internal/process/execution.js:76:60) { code: 'ERR_INVALID_ARG_TYPE' }

In the past I've done something like process.argv.slice(require.main ? 2 : 1) in order to support it (though there might be better ways to handle the check in core).

IMO parseArgs should either handle eval/script properly OR at least throw an useful error instead 😊

kibertoad · 2020-12-21T15:06:27Z

@boneskull Why was this closed?

bcoe · 2020-12-21T22:29:59Z

@kibertoad there are efforts to reopen the discussion around this feature in the new year.

kibertoad · 2020-12-21T22:31:24Z

Awesome!
If any development support would be needed, more than happy to help.

shadowspawn · 2021-11-13T00:59:04Z

Questions
In particular, I'm not 100% confident in the terminology I chose for the documentation ("Flags", "Options", "Positionals"). This does align with other documentation I've read on the subject of CLI arguments, [...]

For possible interest, I researched what terminology to use consistently in Commander docs last year. I settled on different terms, but did mention these as used elsewhere.

Short version, Commander terminology: The command line arguments are made up of options, option-arguments, commands, and command-arguments.

Long version: https://github.com/tj/commander.js/blob/master/docs/terminology.md

nodejs-github-bot added build Issues and PRs related to build files or the CI. process Issues and PRs related to the process subsystem. labels Sep 1, 2020

devsnek reviewed Sep 1, 2020

View reviewed changes

ljharb reviewed Sep 2, 2020

View reviewed changes

addaleax reviewed Sep 2, 2020

View reviewed changes

doc/api/process.md Outdated Show resolved Hide resolved

devsnek suggested changes Sep 2, 2020

View reviewed changes

mmarchini reviewed Sep 3, 2020

View reviewed changes

lib/internal/process/parse_args.js Outdated Show resolved Hide resolved

lundibundi reviewed Sep 3, 2020

View reviewed changes

bcoe suggested changes Sep 3, 2020

View reviewed changes

boneskull added the notable-change PRs with changes that should be highlighted in changelogs. label Sep 18, 2020

mmarchini reviewed Sep 18, 2020

View reviewed changes

mmarchini suggested changes Sep 18, 2020

View reviewed changes

ruyadorno reviewed Sep 20, 2020

View reviewed changes

boneskull closed this Sep 20, 2020

joesepi mentioned this pull request Jan 20, 2021

util: add util.parseArgs() joesepi/node#2

Open

JessNah mentioned this pull request Sep 28, 2021

Fleshing out parseArgs to support withValue and Multiples options pkgjs/parseargs#9

Merged

This was referenced Nov 14, 2021

Is strict actually a goal? pkgjs/parseargs#11

Closed

Finalize variable names for arguments returned by parse pkgjs/parseargs#12

Closed

shadowspawn mentioned this pull request Nov 27, 2021

feat: expand scenarios covered by default arguments for environments pkgjs/parseargs#20

Merged

shadowspawn mentioned this pull request Jan 25, 2022

Improve API for configuring a high number of arguments pkgjs/parseargs#45

Closed

bakkot mentioned this pull request Apr 11, 2022

util: add parseArgs module #42675

Merged

4 tasks

	if (StringPrototypeStartsWith(arg, '-')) {
	if (arg[0] === '-') {

util: add util.parseArgs() #35015

util: add util.parseArgs() #35015

Conversation

boneskull commented Sep 1, 2020 • edited Loading

Checklist

Motivation

Design Considerations

The Requirement of = for Options Expecting a Value

Combination of Single-Character Flags

Data Types

Default Behavior: Boolean Flags

Handling Values

Repeated Arguments

Positional Arguments

Intended Audience

Questions

Credits

DerekNonGeneric commented Sep 1, 2020 • edited Loading

boneskull commented Sep 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcoe Sep 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmarchini Sep 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mscdex commented Sep 2, 2020

ljharb left a comment

Choose a reason for hiding this comment

boneskull commented Sep 2, 2020

boneskull commented Sep 2, 2020

devsnek left a comment

Choose a reason for hiding this comment

boneskull commented Sep 2, 2020

devsnek commented Sep 2, 2020

boneskull commented Sep 2, 2020

boneskull commented Sep 2, 2020

mmarchini commented Sep 3, 2020

mmarchini left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmarchini Sep 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmarchini commented Sep 3, 2020

devsnek commented Sep 3, 2020

mmarchini commented Sep 3, 2020 • edited Loading

boneskull commented Sep 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tniessen commented Sep 3, 2020

mmarchini commented Sep 3, 2020

bcoe commented Sep 3, 2020 • edited Loading

Thinking from the perspective of my day job

Thinking from the perspective of a Node.js tooling author

jkrems commented Sep 3, 2020

bcoe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmarchini commented Sep 3, 2020

devsnek commented Sep 18, 2020 • edited Loading

gabrielschulhof commented Sep 18, 2020

boneskull commented Sep 18, 2020 • edited Loading

boneskull commented Sep 18, 2020

tniessen commented Sep 18, 2020

boneskull commented Sep 1, 2020 •

edited

Loading

The Requirement of `=` for Options Expecting a Value

DerekNonGeneric commented Sep 1, 2020 •

edited

Loading

bcoe Sep 11, 2020 •

edited

Loading

mmarchini Sep 3, 2020 •

edited

Loading

mmarchini Sep 3, 2020 •

edited

Loading

mmarchini commented Sep 3, 2020 •

edited

Loading

bcoe commented Sep 3, 2020 •

edited

Loading

devsnek commented Sep 18, 2020 •

edited

Loading

boneskull commented Sep 18, 2020 •

edited

Loading

boneskull commented Sep 18, 2020 •

edited

Loading

mmarchini left a comment •

edited

Loading

mmarchini commented Sep 19, 2020 •

edited

Loading