Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Use a common from_value method #472

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

polarathene
Copy link
Collaborator

@polarathene polarathene commented Oct 7, 2023

Inspired from the json5 feature (Original PR July 2020, revised PR May 2021) with the refactoring in the revision by @matthiasbeyer

It looked useful for other formats that use serde to simplify their logic (so long as test coverage is informative of no issues 🙏 )


DRY: The json, json5 and toml parsers all leverage serde and can share a common enum to deserialize data into, instead of individual methods performing roughly the same transformations.

  • While ron improves their support for serde untagged enums with v0.9, it is still not compatible with this approach (Their README details why). (UPDATE: At least for the current config-rs test coverage, this appears to be passing now)
  • The yaml support doesn't leverage serde thus is not compatible. (UPDATE: Since there is talk about the unmaintained status, it could be switched for serde-yaml, I've verified it can pass the tests with an extra deserializer method)

from_parsed_value() is based on the approached used by format/json5.rs:

  • It has been adjusted to reflect the ValueKind enum, which could not directly be used due to the Table and Array types using Value as their value storage type instead of self-referencing the enum.
  • Very similar to a impl From, but supports the complimentary uri parameter for each Value derived.

Resolves: #394

@polarathene

This comment was marked as outdated.

@polarathene polarathene force-pushed the refactor/common-parser-transform branch from 6ca707b to 1c8ed59 Compare October 9, 2023 08:46
Explicit `ValueKind` mapping instead of implicitly inferred `Value` type.

Matches the `format/json5.rs` logic.

Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
DRY: The `json`, `json5` and `toml` parsers all leverage `serde`
and can share a common enum to deserialize data into,
instead of individual methods performing roughly the same transformations.

- While `ron` improves their support for serde untagged enums with `v0.9`,
  it is still not compatible with this approach (_Their README details why_).
- The `yaml` support doesn't leverage `serde`, thus is not compatible.

`from_parsed_value()` is based on the approached used by `format/json5.rs`:
- It has been adjusted to reflect the `ValueKind` enum,
  which could not directly be used due to the `Table` and `Array` types using
  `Value` as their value storage type instead of self-referencing the enum.
- Very similar to a `impl From`, but supports the complimentary `uri` parameter
  for each `Value` derived.

Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
- The enum did not properly handle the `Datetime` TOML value which needed
to be converted into a `String` type.
- This workaround approach avoids duplicating `from_parsed_value()` logic
to support one enum variant.

Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
The enum is only intended as a helper for this deserializer into `String` type,
it can be bundled inside. Likewise, no need to `impl Display`.

Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
If no other variant could be deserialized into successfully,
the type is not supported and treated as the `Nil` type.

This better communicates failures within tests when a type is
compared and is not expected to be `Nil`.

Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
@polarathene polarathene force-pushed the refactor/common-parser-transform branch from 1c8ed59 to cfabdba Compare October 10, 2023 12:15
- `Char` variant needed to be converted to `String`.
- `Option` variant could be introduced generically.

NOTE: With `v0.9` of Ron, more types are introduced. Without tests, if these
do not deserialize into a supported type they will be treated as `Nil` type.

Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
@polarathene polarathene force-pushed the refactor/common-parser-transform branch 3 times, most recently from 68df817 to c82013e Compare October 10, 2023 23:05
- Required switching to `serde_yaml`.
- Like with `yaml-rust`, requires special handling for table keys.

Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
@polarathene polarathene force-pushed the refactor/common-parser-transform branch from c82013e to d4f2f35 Compare October 10, 2023 23:47
@polarathene
Copy link
Collaborator Author

This PR has been updated to also include support for formats Ron and Yaml.

  • They're split into their own commits those can be easily dropped (and split into follow-up PRs if preferred, especially for the Yaml crate switch).
  • Alternative PRs have been provided that could optionally be merged before this PR, or if any portion of this PR is rejected.

Ron:

Yaml:


During the integration of Ron and Yaml into this PR (like with earlier TOML test for datetime), both failed running their tests due to the more implicit deserialization_any used by Serde to support deserializing for the untagged enum.

Resolving requires either:

  • Adding a new type to deserialize into for from_value() to then handle
  • Using the serde with attribute to provide a custom deserializer that better guides handling of each formats' types into the expected type for config-rs (eg: Ron Char or Toml Datetime conversion to String).

It's not too different from the prior approach, these types were in the formats own from_value() method, whereas this approach better identifies when this is actually necessary. They still leverage feature toggles for conditional compilation.


For TOML with the Datetime type that was easier to resolve thanks to the specific test failure. Ron and YAML failures were more cryptic usually, such as:

running 7 tests
test test_error_parse ... ok
test test_override_lowercase_value_for_enums ... ok
test test_override_uppercase_value_for_enums ... ok
test test_override_lowercase_value_for_struct ... ok
test test_override_uppercase_value_for_struct ... ok
test test_file ... ok

thread 'test_yaml_parsing_key' has overflowed its stack
fatal runtime error: stack overflow
error: test failed, to rerun pass `--test file_yaml`

Which IIRC I had read was due to how deserialize_any works with the untagged enum, that is a situation of it recursively trying to resolve the enum incorrectly due to reaching a variant that references Self and performing deserialize_any on that and so forth.

  • That's the main maintenance gotcha, but I doubt it would be encountered much going forward, and this approach simplifies the maintenance as a whole quite well.
  • deserialize_any (due to untagged enum approach) won't be as performant as the previous match statements for each individual format, but unlikely a concern for the purpose of config-rs.

Note
Ron 0.9 will introduce a Bytes value variant (Vec<u8>) for byte strings.

  • I don't believe the current approach would match it? You'd get the fallback Nil.
  • Probably acceptable given it's unclear how config-rs would want to support that.

Copy link
Collaborator Author

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some inline context if it assists with review.

If it's easier to follow/digest, I've staged out changes into scoped commits. Each commit has an associated commit message describing the changes 👍

Comment on lines +165 to +169
let key = match key {
serde_yaml::Value::Number(k) => Some(k.to_string()),
serde_yaml::Value::String(k) => Some(k),
_ => None,
};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I tried to use:

let key = serde_yaml::from_value::<String>(k).ok();

However this resulted in a "stack overflow" according to my notes at the time.

IIRC this failure wasn't restricted to the untagged enum approach here, and the explicit match is also required for the existing approach used (including for the map variant in the previous yaml-rust crate).


I also have this failure in the notes that was caused by the keys string being mismatched (eg: instead of the string expected_key, a similar string like unexpected_key) was not caught well by the tests failure output:

Test failure output
running 7 tests
test test_error_parse ... ok
test test_override_uppercase_value_for_struct ... FAILED
test test_override_uppercase_value_for_enums ... FAILED
test test_override_lowercase_value_for_struct ... FAILED
test test_file ... FAILED
test test_yaml_parsing_key ... FAILED
test test_override_lowercase_value_for_enums ... FAILED

failures:

---- test_override_uppercase_value_for_struct stdout ----
thread 'test_override_uppercase_value_for_struct' panicked at tests/file_yaml.rs:151:66:
called `Result::unwrap()` on an `Err` value: missing field `bar`

---- test_override_uppercase_value_for_enums stdout ----
thread 'test_override_uppercase_value_for_enums' panicked at tests/file_yaml.rs:203:54:
called `Result::unwrap()` on an `Err` value: value of enum EnumSettings should be represented by either string or table with exactly one key

---- test_override_lowercase_value_for_struct stdout ----
thread 'test_override_lowercase_value_for_struct' panicked at tests/file_yaml.rs:186:56:
called `Result::unwrap()` on an `Err` value: missing field `foo`

---- test_file stdout ----
thread 'test_file' panicked at tests/file_yaml.rs:43:43:
called `Result::unwrap()` on an `Err` value: missing field `debug`

---- test_yaml_parsing_key stdout ----
thread 'test_yaml_parsing_key' panicked at tests/file_yaml.rs:115:10:
called `Result::unwrap()` on an `Err` value: missing field `inner_string`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- test_override_lowercase_value_for_enums stdout ----
thread 'test_override_lowercase_value_for_enums' panicked at tests/file_yaml.rs:221:54:
called `Result::unwrap()` on an `Err` value: value of enum EnumSettings should be represented by either string or table with exactly one key

Comment on lines +172 to +176
// Option to Result:
match (key, value) {
(Some(k), Some(v)) => Ok((k, v)),
_ => Err(serde::de::Error::custom("should not be serialized to Map")),
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit verbose, but easier to grok?

An alternative with .zip().ok_or_else() was considered, including here if you prefer it.


When I completed my first iteration and was after some feedback for better handling this .map() call having mixed error return types (serde_yaml + de::Error::custom), it was advised:

  • Have the key + value conversions convert their Result to Option.

  • Then if either key or value was None use the desired serde deserialization error here (which deserialize_any uses to understand this variant was unsuccessful and to try deserializing the next variant of the ParsedValue enum).

  • They had refactored to use separate functions for key and value, instead of nested inline within the map() closure.

    While instead of this match they used zip() on the tuple:

    let key = key_to_string(k);
    let value = val_to_parsed_value(v);
    key.zip(value).ok_or_else(|| serde::de::Error::custom("should not be serialized to Map") )

Which accomplishes the same by creating an iterator on the single value of key and value:

  • Produces a tuple (a.zip(b)) of Some(key, value) until one of the iterators encounters a None.
  • Return the option as a Result (ok_or_else()).

Comment on lines +158 to +164
match ParsedMap::deserialize(deserializer)? {
ParsedMap::Table(v) => Ok(v),
#[cfg(feature = "yaml")]
ParsedMap::YamlMap(table) => {
table
.into_iter()
.map(|(key, value)| {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .map() will produce a collection of Result<String, ParsedValue> items, and .collect() will transform that into a single Result<T, E> (where T would be Map<String, ParsedValue>).

An implicit feature of Rust, no explicit transform of each Result to a collection required.


I wasn't sure if it'd short-circuit on first error encountered, but due to the return type with Result for collect() it appears this is supported via FromIterator trait, thus doesn't require try_collect().

src/format.rs Outdated Show resolved Hide resolved
@@ -27,12 +27,13 @@ async = ["async-trait"]
[dependencies]
lazy_static = "1.0"
serde = "1.0.8"
serde_with = "3"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required for the Nil variant fallback (scoped to it's own commit) if deserialize_any fails to match anything prior.

The crate could be avoided if you wanted to implement the equivalent directly 🤷‍♂️

Any format that supports a `char` type would now be matched
for conversion to the expected `String` type.

Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
@polarathene
Copy link
Collaborator Author

@matthiasbeyer this is ready for review, I know you're quite busy so here's a TLDR:

@polarathene polarathene mentioned this pull request Oct 20, 2023
@polarathene
Copy link
Collaborator Author

For custom format support, the format method added here, one I previously contributed (extract_root_table()) and sometimes ParsedValue are required to leverage it (to call ParsedValue::deserialize(input_value)). This would require lib.rs to make the module pub, or some refactoring.

As it's all focused on using Serde with untagged enum, perhaps it shouldn't live in format.rs if it isn't useful to other formats (not that I expect those to be contributed much).


I also need to verify an assumption that other number types like u8 would correctly be cast to u64 supported by config-rs. Prior to this the mapping was more explicit, and I may have a misunderstanding with the untagged enum approach using deserialize_any() this way 😅

I will also share the other formats I integrated locally, if desired they can be contributed via follow-up PR.

@matthiasbeyer
Copy link
Member

So from what I read, we want this in.
@polarathene I'd like this PR to be rebased to latest master, so we get a CI run with latest master.
After that, I think you can merge this!

@polarathene
Copy link
Collaborator Author

After that, I think you can merge this!

I still need to address my concerns from my prior comment here. I will return to config-rs later today or tomorrow hopefully, presently tied up elsewhere for a bit.

After that's sorted locally, I'll rebase as requested and it'd be great to see this get merged :)

Regarding custom format support, how would you approach verifying that? Mostly so that you can catch that format module should be made public, incase someone were to revert that accidentally?

@polarathene
Copy link
Collaborator Author

Note to self, since this PR was raised the proposed yaml alternative (serde compatible) has been archived and a different yaml crate has been introduced instead, so this PR will need to drop that support (unfortunate, since the serde approach was preferrable I think).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants