Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: CLI list possible values for format options. #549

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

alekzvik
Copy link
Contributor

@alekzvik alekzvik commented Dec 3, 2024

Closes

Description

clap has a ValueEnum trait that gives better help messages for you.
Since Format is in a different crate, I can not add an impl of this trait for it.
I also do not want to have clap in core.
So, I created a wrapper that has a needed trait and then converts into Format

Here is how it looks:

❯ cargo run --
...
Command line interface for stac-rs

Usage: stacrs [OPTIONS] <COMMAND>

Commands:
  [...]
  help       Print this message or the help of the given subcommand(s)

Options:
  -i, --input-format <INPUT_FORMAT>     The input format, if not provided will be inferred from the input file's extension, falling back to json [possible values: json, ndjson, geoparquet]
      --input-option <INPUT_OPTIONS>    key=value pairs to use for the input object store
  -o, --output-format <OUTPUT_FORMAT>   The output format, if not provided will be inferred from the output file's extension, falling back to json [possible values: json, ndjson, geoparquet]

Checklist

Delete any checklist items that do not apply (e.g. if your change is minor, it may not require documentation updates).

  • Unit tests
  • Documentation, including doctests
  • Git history is linear
  • Commit messages are descriptive
  • (optional) Git commit messages follow conventional commits
  • Code is formatted (cargo fmt)
  • cargo test
  • Changes are added to the CHANGELOG

@alekzvik alekzvik requested a review from gadomski as a code owner December 3, 2024 18:54
Copy link
Member

@gadomski gadomski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, good idea for the wrapper. Do you have any ideas on how to show the possibility of specifying the geoparquet compression, ala

#[test]
#[cfg(feature = "geoparquet")]
fn parse_geoparquet_compression() {
let format: Format = "geoparquet[snappy]".parse().unwrap();
assert_eq!(format, Format::Geoparquet(Some(Compression::SNAPPY)));
}
?

@alekzvik
Copy link
Contributor Author

alekzvik commented Dec 3, 2024

I like it, good idea for the wrapper. Do you have any ideas on how to show the possibility of specifying the geoparquet compression, ala

#[test]
#[cfg(feature = "geoparquet")]
fn parse_geoparquet_compression() {
let format: Format = "geoparquet[snappy]".parse().unwrap();
assert_eq!(format, Format::Geoparquet(Some(Compression::SNAPPY)));
}

?

Hmm, I think it can be achieved with a manual implementation of the trait. I'll try and report back.

@gadomski
Copy link
Member

gadomski commented Dec 3, 2024

Hmm, I think it can be achieved with a manual implementation of the trait. I'll try and report back.

Ok, if it doesn't go, we could separate out the format from the "format flavor", e.g. have separate --format and --geoparquet-compresion options (and probably a --json-pretty flag too).

@alekzvik alekzvik marked this pull request as draft December 3, 2024 23:14
Copy link
Contributor Author

@alekzvik alekzvik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried two possible approaches and I hate both. Both share the same constraint, that you have to list all the possible options.
I decided to commit them to share, but have no intention of merging it in this state.
Here is how both versions look on UI (input and output implemented differently):

❯ cargo run -- validate --help
[...]
Options:
  -i, --input-format <INPUT_FORMAT>
          The input format, if not provided will be inferred from the input file's extension, falling back to json

          [possible values: json, json-pretty, ndjson, geoparquet, geoparquet[snappy]]

  -o, --output-format <OUTPUT_FORMAT>
          The output format, if not provided will be inferred from the output file's extension, falling back to json

          Possible values:
          - json-pretty:        Pretty-printed GeoJson
          - json:               GeoJson
          - ndjson
          - geoparquet
          - geoparquet[lz4]
          - geoparquet[snappy]

[...]

At this point, I see two ways to go forward:

  1. Separate compression into a different option, as you proposed
  2. Good-old hardcode in help param. This way we can instead of listing all possible values, curate a few, maybe with an example

@@ -17,7 +20,7 @@ use tracing::metadata::Level;
#[command(author, version, about, long_about = None)]
pub struct Args {
/// The input format, if not provided will be inferred from the input file's extension, falling back to json
#[arg(short, long, global = true)]
#[arg(short, long, global = true, value_parser=PossibleValuesParser::new(["json", "json-pretty", "ndjson", "geoparquet", "geoparquet[snappy]"]).try_map(|s| Format::from_str(&s[..])))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one way, list all possible options as strings, use Format and pass them on to Format::from_str

Comment on lines +383 to +394
fn value_variants<'a>() -> &'a [Self] {
&[
FormatWrapper::Json(true),
FormatWrapper::Json(false),
FormatWrapper::Ndjson,
FormatWrapper::Geoparquet(None),
FormatWrapper::Geoparquet(Some(Compression::LZ4)),
FormatWrapper::Geoparquet(Some(Compression::SNAPPY)),
// Do not know how to deal with options below:
// FormatWrapper::Geoparquet(Some(Compression::BROTLI(???))),
// FormatWrapper::Geoparquet(Some(Compression::from_str("BROTLI").unwrap())),
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another way. Also includes listing all possible options with additional complications - I have no idea how to construct FormatWrapper::Geoparquet(Compression::BROTLI(BrotliLevel)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could just make BROTLI go to "geoparquet[brotli(n)]", as the levels are all u32: https://docs.rs/parquet/latest/src/parquet/basic.rs.html#410-424

@gadomski
Copy link
Member

gadomski commented Dec 4, 2024

Yeah, I'm thinking separate options is the way to go ... by my count this would be 2 + 1 + 8 formats, which is a bit overwhelming for a user I think.

@alekzvik
Copy link
Contributor Author

alekzvik commented Dec 4, 2024

Yeah, I'm thinking separate options is the way to go ... by my count this would be 2 + 1 + 8 formats, which is a bit overwhelming for a user I think.

I tried to extract input and output params into separate structs and then include them only when they are needed in a specific command. With that, I was also adding pretty-json and compression for parquet as a separate params, but only for output.

Here are the problems I have encountered:

  • CLI API design: multiple params that contradict each other and sometimes depend on values of others. It is both cumbersome to implement and make sense of as a user.
  • options param that is shared for both input and output.

Overall, I will put this issue out for some time and think about the CLI API design "in the background" before I proceed.

@gadomski gadomski added the [crate] cli stac-cli label Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[crate] cli stac-cli
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants