Allow adding "external argv"s to be parsed alongside/before command line #748

casey · 2016-11-13T23:56:57Z

Maintainer's notes

This is related to Layering env, args, and config #3113, Designing for layered configs #2763

I wrote up a sketch of how this might look here:

https://github.com/casey/clap-config

Basically, it would be pretty cool to generate a simple config file parser from a clap argument parser. There's a longer description of some use cases in the readme, but it would be great for projects which are configurable with both the command line and a config file, and for helping projects add config file configuration in addition to command line configuration.

kbknapp · 2016-11-14T21:30:11Z

This relates to #251

I'm not against this at all, but I don't have the bandwidth to implement it just yet. I'll keep it on the issue tracker. I'd say supporting YAML and TOML would be fine.

The part I'd still need fleshed out is whether a user can use both the command line and this config file as well? Does one override the other? What about arguments that accept multiple values, and are passed both in the command line and config files, do they get overridden or appended to?

casey · 2016-11-15T05:19:27Z

I think that a MergeType enum would work, something like:

enum MergeType {
  Override, // replace values with those from config file
  Defaults, // use values from config file only where not given on command line
  Conflict. // complain if any value is provided in both places
}

...

matches.merge_from_config(config, MergeType::Override);

For arguments that accept multiple values, I think that until someone has a specific use case for merging values from both places, it's probably best to only allow overriding or ignoring.

ssokolow · 2016-11-23T22:56:49Z

This is actually something I'm going to have to deal with in a project soon. (In Python, I tend to dodge the issue by hard-coding my defaults at the top of the only (or top-level) .py file and telling users to just edit them there).

While I'm not familiar with clap internals, I don't see why it would be necessary to over-complicate things. Let's start by separating what clap needs out from a more generic "config file for my program" solution.

Standard format: So people can write frontends easily, use something like JSON or TOML. (Ideally, a lightweight parser, since clap already results in my experimental "smallest possible libstd-using Rust binaries" tests hitting a wall at 156KiB (static/musl) or 129KiB (dynamic/glibc) with nothing but clap and some stub argument definitions... and it'd be nice to include config file support.)
String-oriented key-value store: As I see it, the only use clap has for a configuration file is as an alternative representation of the command-line arguments, so...
1. the only need to support data types beyond key="string" pairs is to allow things like key=1 as more intuitive alternatives to key="1". (Being able to have syntax highlighting for data types correspond to the data's actual/conceptual/intuitive type is also a nice thing to have when editing manually.)
2. It's only necessary to support non-flat data structures as a comfortable shorthand for things clap has built-in support for producing from key=value pairs, like a list equivalent to --append=this --append=that.
3. Positional arguments generally aren't explicitly supported in configuration files since the situations where giving them defaults are so uncommon and esoteric. (eg. Allowing "target_dir" to be optional in fancy_cp <file> [...] [target_dir] would be an example where it'd actually make sense but be esoteric.)
Simple precedence: I've never actually seen a situation where MergeType is worth the trouble. Generally, a much better solution is to just use simple linear precedence:
1. Make parsing a command-line and parsing a file have the same API so you can call either first and then merge the other. (This also has the benefit that it makes it really easy to build precedence chains for config files like the "system-wide for vendor, system-wide for application, user-specific for vendor, user-specific for application" one that Qt's QSettings API implements.)
2. The only situation I can see where Override wouldn't be frustrating and/or confusing is when a --foo-from-file= argument comes after a --foo= argument in the command-line... so just add "all from file" and "specific field from file" as argument types and call it a day. (ie. --foo=A --foo=B_trumps_A --foo-from-file=contents_trump_B --foo=winner.)
3. The only situation where I've seen Conflict making sense is within the same source (ie. Complaining about specifying two values within the same config file or on the same command line.), so supporting it in an intuitive way is orthogonal to supporting config files. (The key being to understand what purpose Conflict actually serves. Forcing people to modify config files for a one-off run is never a good design (it encourages human error), so Conflict only really makes practical sense as a way to say that a single source is internally contradictory.)

With all of that said, what I'd suggest is that it's not really the config file parsing you need to focus on, but providing a clearly documented API that allows something like (human_readable_source_identifier, parsed_data_structure) to be fed into clap so we can feed in JSON/TOML/whatever at our leisure. (The human-readable source identifier would be used for error messages.)

That would also allow clap defaults to be embedded into larger JSON/TOML/etc. configuration files without complicating clap.

Heck, with a little thought, the "from command-line arguments" functions you already have could just be parsers on the same tier as JSON/TOML/etc. which just feed into the API I'm proposing.

BurntSushi · 2016-11-24T03:59:48Z

The only situation where I've seen Conflict making sense is within the same source (ie. Complaining about specifying two values within the same config file or on the same command line.), so supporting it in an intuitive way is orthogonal to supporting config files.

I'm not sure I really buy this. If I have a flag --foo that takes a value N, then it seems reasonable for me to want to assign a default value to it in a config file, but retain the right to want to override it on the actual command line. Depending on how the --foo flag was defined, this might not be allowed in the form of --foo X --foo Y, so a config file might need special handling?

ssokolow · 2016-11-24T05:36:20Z

I'm not sure I really buy this. If I have a flag --foo that takes a value N, then it seems reasonable for me to want to assign a default value to it in a config file, but retain the right to want to override it on the actual command line. Depending on how the --foo flag was defined, this might not be allowed in the form of --foo X --foo Y, so a config file might need special handling?

You misunderstand. I made only two claims and neither was that Conflict should be mandatory within a single source. Those two claims are:

Conflict never produces a worthwhile behaviour when applied to two pieces of data from different sources.
Regardless of how MergeType behaviour is handled, it is entirely internal to the parsing of each individual source.

What I'm arguing is that:

If we're doing fine without MergeType now, there's nothing about adding support for configuration files which adds it to the list of requirements.
If we need MergeType urgently and implement it before config file support, no beneficial functionality can be gained by expanding that "within a single source" implementation to also operate between multiple sources.

Hence, orthogonal.

First, let's discuss the "never produces worthwhile behaviour" point, starting with a few examples:

If ~/.user_default can't override /etc/global_default, then that's bad.
If --one-time-setting can't override either of those, then that's bad.
If ~/.defaults_a and ~/.defaults_b don't have a defined precedence order, you're forcing users to manage pointless busywork in the best case and, in the worst case, you wind up with a "FAT table copies 1 and 2 disagree? Which one did you intend?" situation.

This derives from two issues:

If at all possible, you want to have a single, authoritative source for settings to limit the potential for human error.
Multiple sources of configuration are introduced specifically to take advantage of defined-precedence overriding to allow customizing in progressively narrower scopes. (eg. site, user, project, task)

Or, let's come at it from the other direction and address Override, Defaults, and Conflict as options.

Defaults needs no justification. When applied to reconcile data from two different sources, it implements the behaviour configuration files serve in every case I've ever encountered.

Conflict is the hardest to justify because:

If you're the administrator, it just cripples the utility of the config file if you can't put anything into it which you might want to override on a task-by-task basis. (And then people end up implementing another layer of default handing using a wrapper which passes in --foo=false unless you call it with --foo=true.)
If you're not the administrator, it means "You can't specify --foo=true because /etc/config_defaults specified foo=false".

The former case is counter-productive and the latter sounds like trying to kitchen-sink some kind of user permissions system into clap when that's a job for code which runs after clap is done.

Finally, Override is just Defaults with the ordering swapped, so it's only justifiable in the one case where you can't swap the order: The command-line arguments, which must always be the last in the chain.

The problem is that, by their very nature, command-line arguments are also the most suitable for one-off overrides, which means that there is no justifiable reason to use Override to silently ignore what the user specified on the command-line because some config file somewhere disagrees.

Therefore, it's much better to just make defined-precedence overriding an inherent, unavoidable feature of how conflicts between multiple sources are resolved. It's the de facto standard way to handle things and it's the only solution which addresses all of the concerns I brought up.

...it's also easy for people to conceptualize. For example, in Python (since it's pseudocode-like):

matches = dict()
matches.update(parse_global_config())
matches.update(parse_local_config())
matches.update(parse_command_line_args())

Therefore, the concept of MergeType is only meaningful within the process of parsing a single source, because merging input from multiple sources should always Defaults.

We may indeed need to discuss the MergeType idea more, but the conclusion we come to has no bearing on a well-designed multi-source implementation.

Parse command-line input
Parse one or more config files (optionally with a source file specified in step 1 via --config-file or the like)
Merge the results using a simple "When in doubt, the last to speak wins" algorithm.

The only potential kink I see is to at least warn users that step 3 might reset config_file so it's not the actual value used if the config file itself contains a config_file value.

The beauty of this approach is that it's made of very small, very extensible pieces and you can get a lot of utility early on, then amend it later. Here are some example steps:

Split the argv parsing out from the rest so that anyone can implement the connecting interface on top of a JSON/TOML/etc. parser. (Now, even if they have to manually hook up the config parser and manually merge things, users don't have to reinvent clap's "apply the validation" code in between those two steps.)
Add a simple function for merging two parse results. (Now users don't have to manually merge entry-by-entry anymore.)
Anything else you think is necessary.

What I'm envisioning is that the argv-parsing side of the interface would take a reference it could use to query the schema-validation side to disambiguate things that would be inherent in the framing of some formats but not others. For example:

Please resolve this key name (so argv can resolve command-line shorthands like -s without complaint while config files can print a deprecation message or exit with a failure message asking the user if they meant sidechannel_data=... in config.cfg)
What type of value does this key take? (so parsers can perform any additional transformations needed, like argv deciding whether --foo --bar actually means --foo=--bar)
Does this key accept a list of values? (so argv can use --foo=this --foo=that --foo=those while the internal representation it outputs can be closer to the {"foo": ["this", "that", "those"]} that JSON would use.)

BurntSushi · 2016-11-24T22:51:39Z

@ssokolow Frankly, I'm having a really hard time following any of what you're saying. In particular, I find it hard to tie what you're saying to a concrete user experience. I'm probably missing some important context. In any case, I discussed some of the issues I personally see with config files: BurntSushi/ripgrep#196 (comment)

ssokolow · 2016-11-25T00:41:47Z

Ugh. That's the worst kind of response because it's a perfectly fair one, yet it's the hardest to formulate a meaningful answer to. Give me an hour or two to do "morning routine" stuff and I'll see what I can do to come at it from a different angle and simplify it so that, at the very least, maybe we can figure out where the disconnect is.

ssokolow · 2016-11-25T01:29:48Z

OK. From a user-experience standpoint, the most intuitive solution is to have later arguments always override earlier ones and to treat the fallback chain from command-line to user defaults to global defaults the same way.

That gets you 99% of the way to supporting all behaviours I've seen in the wild.

For example:

/etc/global_defaults could set foo=bar
~/.user_defaults could override it with foo=baz
alias mycmd="mycmd --foo=quux" could override it with foo=quux
mycmd --foo=glorp could resolve to mycmd --foo=quux --foo=glorp

The end result clap should return is foo=glorp in that circumstance, rather than erroring out because some prior default the user had forgotten about is in conflict with what they want to do this one time.

(ie. They shouldn't have to type command mycmd --foo=glorp to explicitly bypass the shell alias or edit configuration files and then edit them back.)

What I'm arguing is that it's very simple to accomplish this as a collection of small, easily-composed pieces:

Make the --foo=bar --foo=baz resolution mentioned above for non-appending arguments into the default. (Having it error out is, at best, useful only in niche situations thanks to the alias shell built-in and wrapper scripts which don't carry around their own argument parsers.)
Split clap's current get_matches and friends into two pieces:
1. Something which goes from a raw argv string to a structured internal representation. (With access to query the schema to disambiguate things like "Does --foo take a value?" and "Does ---bar append, count, or override when specified multiple times?".)
2. Something which applies the schema to that structured internal representation
Provide documentation for how to write JSON/TOML/etc. adapters which produce the same structured internal representation argv now produces.
Write a convenience function which takes a bunch of parsed outputs (what you currently get from get_matches()) and merges them in a "last in the list wins" manner.

Getting paths to things like XDG configuration directories is external to clap, because we don't want to preclude parsing config files stored elsewhere or tie clap to a specific implementation.

kbknapp · 2016-11-25T02:21:06Z

I've read all the comments and have some thoughts on the matter but I'm on mobile right now so I'll try to write up my thoughts early to tomorrow. It boils down to I'd like command line to override config files or env vars, but those two extra sources to be added in a first come first serve manner. I'd also like to limit claps responsibility to parsing arguments from a source. Not determining source order or source validity.

I'll type up my full thoughts soon.

ssokolow · 2016-11-25T02:37:00Z

It boils down to I'd like command line to override config files or env vars, but those two extra sources to be added in a first come first serve manner. I'd also like to limit claps responsibility to parsing arguments from a source. Not determining source order or source validity.

I fully agree. Hence my idea limiting clap to:

Parse argv to structured representation (more akin to what JSON can represent)
Process structured representation according to schema
Provide a function which can be called to merge the output of multiple parsings according to application-specified precedence order. (As opposed to each application's developer manually reinventing the boilerplate which does this.)

It then becomes easy for the clap-using application or 3rd-party crates to implement things like:

Looking up the path to a config file using an XDG paths crate
Using the parsed output from the command-line args to determine the path for the config files. (Parse the command line, then parse any config file it specifies, then merge the results with the command line winning on conflict.)
Parsing a JSON/TOML/etc. config file into clap's structured representation.
Using a crate equivalent to Python's shlex to tokenize an environment variable's contents into what get_matches_from expects.
Deciding the precedence order to request of clap's merging function and when in the process to perform the merging.

As long as you provide examples of how to accomplish all of this with minimal boilerplate using 3rd-party crates, you can indefinitely defer the question of whether anything else belongs in clap itself.

I'll type up my full thoughts soon.

I look forward to it.

kbknapp · 2016-11-26T21:58:19Z

Ok so it took me a little longer to get to this than I'd planned, but here's where I stand on the issue.

First, I'm not terribly interested in clap handling the file I/O part of this, i.e. the reading files, handling permissions, errors that come from this, precedence, etc. I intend for this to all happen in the consumer code. The consumer would then pass in the deserialized config representation, for now I'm only imagining TOML/YAML, but others could be added.

The goal is ultimately to get clap the info it needs to do it's job, i.e. a normalized structure of an argv. In clap's case simply an ordered vector of strings (meaning anything from OsStr, String, etc.). I'm OK with giving clap either a Toml or Yaml object, and having clap normalize it down to just the ordered Vec.

The details would probably be something like defining a trait Normalize (aside, all names in this proposal are 🚲 🏠-able) that does the normalizing, and as a start implementing that trait on Yaml, Toml, String, and OsString (the latter two in order to support env vars). The App struct would then accept any number of "external argv" sources via something like fn external_argv<T: Normalize>(argv: T), and save these in a first come first serve basis. I.e. the pseudocode below is the same to clap, and totally up to the consumer to decide (which allows us to side step all this precedence discussion and allow clap to stay more focused on a single purpose):

let some_env_var = env::var("SOME_ENV_VAR").ok();
let global_cfg = load_toml("/etc/myconfig.toml");
let user_cfg = load_toml("~/myconfig.toml");
let m = App::new("test")
    .external_argv(some_env_var)
    .external_argv(global_cfg)
    .external_argv(user_cfg)
    .get_matches();

If the consumer wants global_cfg to take precedence over user_cfg for whatever reason, they'd swap those two lines.

This makes parsing very simple because internally clap just parses them in reverse order, and if it reaches an arg that arleady exists in the matches, it just skips it.

There are two outstanding issues that this would present though. One is if you have an arg that accepts multiple values, and has a value in some external argv, should a later value in either another external argv or via the explicit command line add to these values, or entirely override? The proposed system above entirely overrides, which I'm more of a fan of. If a consumer wants to provide a global default and allow users to add to those values, I'd be easiest to simply tell the user to include that default value in their own "overrides" or just re-add that value back after clap is done with it's parsing.

The second issue where this MergeType came into play. Since clap allows two types of conflicts, POSIX style overrides and hard conflicts, a MergeType would allow consumers to effectively convert from hard conflicts to POSIX style overrides only in the case of external argv conflicts. Basically it gives the choice of whether they want hard conflicts or overrides.

This situation, in my estimation, only happens in user defined configs. I.e. a user wants to specify a default that conflicts with other options, but yet may want to override that behaviour at some point. I can't imagine why I consumer would put a conflicting argument into a default config and actually want a hard conflict. Think of unix style aliases, ls="ls -l" yet due to POSIX style overrides, using values that conflict with -l is perfectly fine, so long as they come afterwards. However, as the developer/consumer it's sometimes difficult to decide what should be a hard conflict and what should be POSIX overridable because overrides sometimes seem confusing at runtime, whereas with a hard conflict, the user knows exactly what is going on and how to fix it.

I'm of the thought that all conflicts arising from external argvs should be treated as overridable, and it should just be documented well. I can't think of a concrete example where I'd actually want a hard conflict because I explicitly set something via the commandline.

This may sound like I'm in favor of a MergeType, but actually the more I think about it, the less I am. As I stated earlier, I'd prefer to treat all things as overridable, and disallow adding values at the commandline to values defined in the configs.

The only thing left to determine is how to represent free/positional arguments in these configs. Another 🚲 🏠 for sure, but options, and flags are simple. I'd suggest simply using a single "args" key and assigning the values in sequence such as args="foo bar baz".

Thoughts?

kbknapp · 2016-11-26T22:39:40Z

I wasn't clear about the positional args part, we could equally as easy use the key to individualize them, but I kind of like that they're forced to be in order with only a single key to keep from any confusion by accidentally putting them in the wrong order

ssokolow · 2016-11-27T05:04:30Z

We basically agree on the design aside from whether the merging should be declarative or procedural.

Your declarative approach is definitely nicer to look at, but it's puts more onus on clap to support edge-case features (or constrain users by refusing to), as I'll answer in reply to one of your outstanding issues...

There are two outstanding issues that this would present though. One is if you have an arg that accepts multiple values, and has a value in some external argv, should a later value in either another external argv or via the explicit command line add to these values, or entirely override? The proposed system above entirely overrides, which I'm more of a fan of. If a consumer wants to provide a global default and allow users to add to those values, I'd be easiest to simply tell the user to include that default value in their own "overrides" or just re-add that value back after clap is done with it's parsing.

That's part of the reason I wanted the merge to be a later step. It allows these two cases to be implemented in the consumer:

Supporting things like --config-file. With your solution, one of the following has to happen:

a. Clap needs explicit support for a new argument type
b. Users need to call get_matches, then external_argv(load_toml(matches.value_of('config_file').unwrap())), then get_matches again.

With my solution, they just call get_matches, then call something in the vein of get_matches_via_normalize(load_toml(matches.value_of('config_file').unwrap())), then call a clap-provided merging function like matches1.update(matches2).

No special "look ahead, then dynamically inject an external_arg" or "re-parse from the beginning with changed parameters" necessary.
Allowing users to control behaviour of multiple arguments.

With your solution, clap dictates how it works. With my solution, users can easily extract the values which should add together before doing the merging and then add them together manually.

It also has the benefit that there's less uncertainty about whether clap will allow users to reuse the same App to parse multiple sets of merged inputs. That is, I'd be worried that external_argv might invalidate earlier references, requiring me to throw out App and create it anew every time when I need to do something like this:

let m = App::new("test")

let matches1 = m
    .external_argv(some_env_var1)
    .get_matches_from(args1);

let matches2 = m
    .external_argv(some_env_var2)
    .external_argv(user_cfg2)
    .get_matches_from(args2);

let matches3 = m
    .external_argv(some_env_var3)
    .external_argv(global_cfg3)
    .external_argv(user_cfg3)
    .get_matches_from(&[]);

The only thing left to determine is how to represent free/positional arguments in these configs. Another 🚲 🏠 for sure, but options, and flags are simple. I'd suggest simply using a single "args" key and assigning the values in sequence such as args="foo bar baz".

I wasn't clear about the positional args part, we could equally as easy use the key to individualize them, but I kind of like that they're forced to be in order with only a single key to keep from any confusion by accidentally putting them in the wrong order

At the very least, you'll want it to be args=["foo", "bar", "baz"] to avoid reinventing quoting/escaping.

With that said, this is definitely a tricky thing to address because:

Mapping one key to multiple schema entries feels like undesirable magic behaviour and makes external_argv more than merely a merging, trait-enabled version of get_matches_from, which also feels conceptually wrong.
If clap doesn't enforce the "list of positional arguments is never sparse" invariant, mapping them to individual keys in config files could introduce potential for logic bugs.
If clap does enforce non-sparseness for positional arguments, it could frustrate users when cmd foo bar keeps turning into "cmd foo bar baz` yet the shell is adamant that it's not meddling with it.

(Wrapper scripts and shell functions are the standard, accepted way to augment or meddle with positional arguments. As someone who cares about user experience, I might go as far as slipping my own filter in between load_toml and clap if it didn't provide a way to opt out of positional arguments into the config file... and then we're right back to reinventing the config file schema because clap's implementation is too inflexible.)

I'd just treat positional arguments in config files as a validation failure and wait to see if anyone complains, since it can be relaxed without breaking anything. (I've never seen a config file or environment variable that allows specifying positional arguments in 10 years of using DOS/Windows command-line applications and another 16 of using Linux ones. It's always been wrapper scripts or shell features like alias and function done() { command mv "$@" ~/done/; })

...or, in the case of DOS and Windows, wrapper scripts as .BAT files would looke like these:

REM DOS
move %1 %2 %3 %4 %5 %6 %7 %8 %9 %HOME%\DONE\
REM Windows
move %* %HOME%\Done\

Finally, since you didn't mention them either way, here are a couple of other points:

I have no problem with Normalize as long as it's implemented such that, if .external_argv accepts the output of load_toml directly, there's also an easy way to use only a specified subtree, so the file can contain multiple sections with only one of them being for clap.
You'll need Normalize implementations to have access to the schema because, otherwise, the only attainable normalized form is to flatten structured data from TOML/JSON/YAML/etc. into an argv Vec, just so they can be re-structured again once access to the schema is available... and that feels very unpleasant to think about.

(Without access to the schema, tell me whether --foo --bar is {"foo": true, "bar": true} or {"foo": "--bar"}... or {'foo': 1 + (-1)} for that matter, if you're dealing with options like --verbose and --quiet.)

kbknapp · 2016-11-28T18:14:09Z

Supporting things like --config-file. [...]

IMO, that's a little bit of a niche use-case. I'm not saying it's uncommon, but I think it's FAR more common to simply provide these "defaults" in a predetermined location. Unless you're using aliases, typing $ myprog --config-file foo.toml is basically the same as providing those defaults in the first place.

Due to how the internals of clap work, you can't just have a matches.update(load_yaml!("foo.toml")), at a minimum you'd have to provide the App instance too because that's where the schema is defined. So it would end up being something long the lines of app.update_matches_from(load_yaml!("foo.yaml"), &mut matches)

Also parsing the arguments is very fast, it's building the App instance that takes more of the time. So calling get_matches more than once shouldn't be a real issue in practice, especially if it's only for a small percentage of uses cases (such as --config-file <file>). I'm not saying this is the best solution, but it's definitely the easiest considering I'll have to support this solution 😉

there's less uncertainty about whether clap will allow users to reuse the same App to parse multiple sets of merged inputs.

There shouldn't be any issue with re-using the same App instance to update the matches. This may just need to be documented better in this case. As for the example with matches{1,2,3}, it would depend on how you're trying to do this, but I'd need to see a real use case. The example you posted looks like it would work via this app.update_matches_from(), but not work with an app.external_argv(). Because the app.external_argv() would always allow adding to the App instance, but not taking away from it upon additional parses.

have no problem with Normalize as long as it's implemented such that, if .external_argv accepts the output of load_toml directly, there's also an easy way to use only a specified subtree, so the file can contain multiple sections with only one of them being for clap

That's the plan. As far as using a subtree, that should work, depending on the deserialization framework. I.e. this external_argv would simply take a TomlTable or YamlTree (not reall objects, just abstract ideas), whether thats an entire file's table/tree or just a subset doesn't really matter. If the consumer only wants to provide a subset, they just give that subset to the external_argv and not the entire deserialized file.

You'll need Normalize implementations to have access to the schema because, otherwise, the only attainable normalized form is to flatten structured data from TOML/JSON/YAML/etc. into an argv Vec, just so they can be re-structured again once access to the schema is available... and that feels very unpleasant to think about. (Without access to the schema, tell me whether --foo --bar is {"foo": true, "bar": true} or {"foo": "--bar"}... or {'foo': 1 + (-1)} for that matter, if you're dealing with options like --verbose and --quiet.)

The App instance contains the schema. And parsing a flat argv is exactly what clap does already, so that's not an issue at all. In fact, the Normalize trait would simply flatten the Yaml/Toml/JSON/whatever into what clap is already good at parsing, a flat Vec.

At the very least, you'll want it to be args=["foo", "bar", "baz"] to avoid reinventing quoting/escaping.

Good point. @nabijaczleweli has a branch with a parser which does all the whitespace/quotation handling that we could use/adapt. Although, the more I think about it, the more I'd rather do one of the following

Outlaw providing positional args in these configs (because you're right it could be very confusing to users, and I don't see a huge benefit so if someone does, please let me know)
If we do allow positional args, they could be in the simple 1="foo", 2="bar" form where 2 requires 1...but I still don't really like this because a user that types $ myprog may not realize what he's really doing is $ myprog foo bar, but then typing $ myprog baz does what...$ myprog baz bar? It just seems like extra magic that could go very wrong.

Edit: updated Option 2 about the positional args

TruputiGalvotas · 2017-03-21T11:04:35Z

I don't know whether anyone knows about this here: https://docs.rs/preferences/1.1.0/preferences/

But it seems you could just use this as an optional dependency instead of a separate implementation within clap to achieve the same thing. The clap would still be clap instead of swiss army knife of being a configuration file as well as command line parser.

cmcqueen · 2021-11-12T06:54:36Z

I reckon ConfigArgParse package for Python is a terrific reference for a good useful and featureful implementation.

jaskij · 2021-12-13T22:49:01Z

While looking into solutions to this issue I came across the merge crate. Haven't tested it yet, but it seems that #[derive(Deserialize, StructOpt, Merge)] is a solution

Tthe only fault being the amount of boilerplate needed for defaults - serdes expects to use the Default trait, while StructOpt wants those as strings in field attributes.

The code sample in structopt#150 shows how to have a single authority for those defaults in impl Default.

Kixunil · 2021-12-13T23:06:34Z

Actually, serde allows you to specify custom default function. If StructOpt could use functions too, you could reference the same function.

epage · 2021-12-14T00:36:01Z

Tthe only fault being the amount of boilerplate needed for defaults - serdes expects to use the Default trait, while StructOpt wants those as strings in field attributes.

StructOpt supports Default via #[structopt(default_value)] (no parameter). We've renamed this in clap_derive to #[clap(default_value_t)] because it also supports a typed value, rather than a string value.

The limitation to both is the type needs to implement Display.

jaskij · 2021-12-14T06:50:58Z

StructOpt supports Default via #[structopt(default_value)] (no parameter). We've renamed this in clap_derive to #[clap(default_value_t)] because it also supports a typed value, rather than a string value.

I must have mistunderstood how StructOpt works then, because to me it seemed as if the no-param default_value took the default value from the type of the field, not the config struct's Default impl.

Edit:

I do not want to be argumentative, just stating my observations - a seven value config has ballooned to over fifty lines of code using the method I mentioned.

@Kixunil: I indeed overlooked serde's option of using functions, but I don't think this helps here, especially with the difference in types. clap_derive seems to fix that - although it's hard to tell from the current docs whether it expects a callable or a value.

epage · 2021-12-14T14:38:56Z

I must have mistunderstood how StructOpt works then, because to me it seemed as if the no-param default_value took the default value from the type of the field, not the config struct's Default impl.

Yes, I thought you had meant impl Default for the field, rather than the container. We have #3116 for using the container's Default::default.

clap_derive seems to fix that - although it's hard to tell from the current docs whether it expects a callable or a value.

We support any expression. In general, we need to loosen up our attributes to allow any expression so there isn't a question of what you can or can't do (#3173)

jaskij · 2021-12-14T15:03:31Z

@epage seems like clap V3 will be indeed a big change in a good direction!

I'd still love to see file handling too, but understand that you're reluctant to include it. I've given it some thought myself and it indeed is a tricky thing.

epage · 2021-12-14T15:09:29Z

@jaskij note there is also a discussion on layered config at #2763, exploring different options. I also include the option I've landed on for now

jettero · 2022-08-30T20:54:40Z

Came here looking for the ConfigArgParser functionality as well. I think it's fine to leave the config parsing as an exercise for an external package... The problem I'm having is that if I really use clap and it's handling the defaults and things; then there's no interfaces available for me to complete the merge with commandline overwriting config if specified.

Ideally, I'd like to use --name value if it's there, and fall back on config name if it's there and finally try clap --name default if there wasn't anything else to use.

The problem is I don't see any obvious way to test the matches to see if the user specified the value or if clap fell back on the specified default so there's no way to complete such a merge.

Another strategy I've used in the past is to just read the config from the usual places (and give up on the idea of a --configfile flag) and use anything from the config files as the defaults for the switches and things. But I don't think you can do that at all with either the builder (using arg!()) or with especially with the derived macro interface things.

I'm pretty new to rust, but this seems like a bit of a showstopper on using clap for my current little project... which really bums me out because clap is so neat and there seem to be very few alternatives (perhaps none).

epage · 2022-08-30T20:58:28Z

The problem is I don't see any obvious way to test the matches to see if the user specified the value or if clap fell back on the specified default so there's no way to complete such a merge.

matches.value_source("arg") will let you know what set the value.

btw if you want to read up on more thoughts on layering clap with config, check out #2763

mzagrabe · 2022-08-30T21:20:02Z

On Tue, Aug 30, 2022 at 3:58 PM Ed Page ***@***.***> wrote: The problem is I don't see any obvious way to test the matches to see if the user specified the value or if clap fell back on the specified default so there's no way to complete such a merge. matches.value_source("arg") <https://docs.rs/clap/latest/clap/parser/struct.ArgMatches.html#method.value_source> will let you know what set the value.

@epage, value_source looks pretty slick. Thanks for mentioning it.

btw if you want to read up on more thoughts on layering clap with config, check out #2763 <#2763>

Looking at #2763 it is hard for me to tell if the (eventual) plan for clap is to incorporate some sort of ConfigArgParse functionality. Is that something that clap will eventually provide, or will it always be left up to the application to parse options from a config file? Thanks for the help and answers... and code!

…

-mMessage ID: ***@***.***>

epage · 2022-08-31T02:46:11Z

Looking at #2763 it is hard for me to tell if the (eventual) plan for clap
is to incorporate some sort of ConfigArgParse functionality. Is that
something that clap will eventually provide, or will it always be left up
to the application to parse options from a config file?

The recommended route for layering configs is still TBD. There are multiple strategies people can use now and some improvements that we are making for v4 that will unblock other strategies.

jettero · 2022-08-31T05:44:57Z

Oh, fantastic. Thanks very much for the tips. I feel like there's ways out of the trap now and I can stick with clap (which I like very much). Game on.

I'm daydreaming about coming up with an external crate that solves the problems I want regards to ConfigArgParse behavior. I like the idea that rust packages can be "finished" and to avoid feature creep, this new functionality should perhaps be added in another package. I'm just not the right guy for the job yet as I'm still very new to rust.

I'm encouraged that the stubs/api-thingies exist to let this happen in another crate.

[edit] it also looks like others have already done the things I was thinking about but I didn't know what to search for. (e.g., clap_conf looks promising). Hrm, that seems tied to ^2.33.0 ... oh well, some day.

mzagrabe · 2022-09-28T18:48:41Z

Looking at #2763 it is hard for me to tell if the (eventual) plan for clap
is to incorporate some sort of ConfigArgParse functionality. Is that
something that clap will eventually provide, or will it always be left up
to the application to parse options from a config file?

The recommended route for layering configs is still TBD. There are multiple strategies people can use now and some improvements that we are making for v4 that will unblock other strategies.

@epage, would you be willing to mention the different "multiple strategies" and also what improvements to v4 that have been made to unblock other strategies, and what those "other strategies" would be?

Thanks!

epage · 2022-09-28T19:33:06Z

#2763 explores the different options.

clap v4 added ArgMatches::ids so you can iterate over what arguments are present. Generic laying was a common motivation for people requesting that feature.

kbknapp added A-parsing Area: Parser's logic and needs it changed somehow. D: intermediate labels Nov 14, 2016

kbknapp mentioned this issue Nov 15, 2016

Early support for reading program arguments from env-variable (issue #712) #734

Closed

BurntSushi mentioned this issue Nov 24, 2016

Support persistent configuration (.rgrc?) BurntSushi/ripgrep#196

Closed

This was referenced Jan 11, 2017

Values from environment variables #814

Closed

Add support for automatic negation flags #815

Open

kbknapp changed the title ~~Feature request – allow parsing options from a config file instead of command line arguments~~ Allow parsing options from a config file instead of command line arguments Jan 30, 2017

kbknapp added W: 3.x blocker and removed W: 2.x labels May 9, 2017

TeXitoi mentioned this issue Nov 24, 2017

Support TOML and env vars parsing TeXitoi/structopt#32

Closed

kbknapp added the W: 3.x label Feb 5, 2018

bkchr mentioned this issue Aug 11, 2020

Add support for specifying the cli args for running a node from config file paritytech/substrate#6856

Closed

pksunkara removed this from the 3.0 milestone Jun 16, 2021

pksunkara removed the W: 3.x label Aug 13, 2021

epage mentioned this issue Dec 6, 2021

Allow adding "external argv"s to be parsed alongside/before command line epage/clapng#66

Open

epage added C-enhancement Category: Raise on the bar on expectations and removed E: optional dep labels Dec 8, 2021

epage mentioned this issue Dec 9, 2021

Layering env, args, and config #3113

Open

epage added S-waiting-on-design Status: Waiting on user-facing design to be resolved before implementing and removed P4: nice to have labels Dec 9, 2021

chipsenkbeil mentioned this issue Jul 13, 2022

derive: Store indices, value source next to field values #3846

Open

2 tasks

stanislav-tkach mentioned this issue Jul 14, 2022

Subsystems configuration mintlayer/mintlayer-core#277

Merged

jpmckinney mentioned this issue Dec 20, 2022

2.2. Using config files feedback rust-cli/book#212

Open

tillrohrmann mentioned this issue Jan 13, 2023

Sketch entrypoint for Restate application restatedev/restate#8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow adding "external argv"s to be parsed alongside/before command line #748

Allow adding "external argv"s to be parsed alongside/before command line #748

casey commented Nov 13, 2016 •

edited by epage

Loading

kbknapp commented Nov 14, 2016

casey commented Nov 15, 2016

ssokolow commented Nov 23, 2016 •

edited

Loading

BurntSushi commented Nov 24, 2016

ssokolow commented Nov 24, 2016 •

edited

Loading

BurntSushi commented Nov 24, 2016

ssokolow commented Nov 25, 2016 •

edited

Loading

ssokolow commented Nov 25, 2016

kbknapp commented Nov 25, 2016

ssokolow commented Nov 25, 2016 •

edited

Loading

kbknapp commented Nov 26, 2016

kbknapp commented Nov 26, 2016

ssokolow commented Nov 27, 2016 •

edited

Loading

kbknapp commented Nov 28, 2016 •

edited

Loading

TruputiGalvotas commented Mar 21, 2017

cmcqueen commented Nov 12, 2021

jaskij commented Dec 13, 2021

Kixunil commented Dec 13, 2021

epage commented Dec 14, 2021

jaskij commented Dec 14, 2021 •

edited

Loading

epage commented Dec 14, 2021

jaskij commented Dec 14, 2021

epage commented Dec 14, 2021

jettero commented Aug 30, 2022

epage commented Aug 30, 2022

mzagrabe commented Aug 30, 2022 via email

epage commented Aug 31, 2022

jettero commented Aug 31, 2022 •

edited

Loading

mzagrabe commented Sep 28, 2022

epage commented Sep 28, 2022

Allow adding "external argv"s to be parsed alongside/before command line #748

Allow adding "external argv"s to be parsed alongside/before command line #748

Comments

casey commented Nov 13, 2016 • edited by epage Loading

kbknapp commented Nov 14, 2016

casey commented Nov 15, 2016

ssokolow commented Nov 23, 2016 • edited Loading

BurntSushi commented Nov 24, 2016

ssokolow commented Nov 24, 2016 • edited Loading

BurntSushi commented Nov 24, 2016

ssokolow commented Nov 25, 2016 • edited Loading

ssokolow commented Nov 25, 2016

kbknapp commented Nov 25, 2016

ssokolow commented Nov 25, 2016 • edited Loading

kbknapp commented Nov 26, 2016

kbknapp commented Nov 26, 2016

ssokolow commented Nov 27, 2016 • edited Loading

kbknapp commented Nov 28, 2016 • edited Loading

TruputiGalvotas commented Mar 21, 2017

cmcqueen commented Nov 12, 2021

jaskij commented Dec 13, 2021

Kixunil commented Dec 13, 2021

epage commented Dec 14, 2021

jaskij commented Dec 14, 2021 • edited Loading

epage commented Dec 14, 2021

jaskij commented Dec 14, 2021

epage commented Dec 14, 2021

jettero commented Aug 30, 2022

epage commented Aug 30, 2022

mzagrabe commented Aug 30, 2022 via email

epage commented Aug 31, 2022

jettero commented Aug 31, 2022 • edited Loading

mzagrabe commented Sep 28, 2022

epage commented Sep 28, 2022

casey commented Nov 13, 2016 •

edited by epage

Loading

ssokolow commented Nov 23, 2016 •

edited

Loading

ssokolow commented Nov 24, 2016 •

edited

Loading

ssokolow commented Nov 25, 2016 •

edited

Loading

ssokolow commented Nov 25, 2016 •

edited

Loading

ssokolow commented Nov 27, 2016 •

edited

Loading

kbknapp commented Nov 28, 2016 •

edited

Loading

jaskij commented Dec 14, 2021 •

edited

Loading

jettero commented Aug 31, 2022 •

edited

Loading