-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drake application yaml logic handles !!binary tag #22318
base: master
Are you sure you want to change the base?
Drake application yaml logic handles !!binary tag #22318
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+@jwnimmer-tri Here's the promised read functionality. Ignore R1, it's still floating on top of the path/FileSource serialization. R2 has the python stuff and R3 has the C++ side.
Reviewable status: LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @SeanCurtis-TRI)
2af36ef
to
5eda807
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll post my review of the test cases now. Big picture, it seems like the C++ and Python tests lack parity -- details inline below.
For my review of the actual implementation changes, I think my feedback will make the most sense when accompanied by a patch, which I'm working on now and will share a bit later.
Reviewed 1 of 2 files at r2, 2 of 7 files at r3, 8 of 9 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: 12 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @SeanCurtis-TRI)
a discussion (no related file):
This PR only seems to have reading. Usually we add both reading and writing in the same PR, as a way to help cross-check that both sides of the story are mutually compatible.
I suppose if its an easier PR train we can add reading and then PR the writing half immediately after (while backfilling any glitches in reading that show up), but we don't want to delay that very long.
common/yaml/test/example_structs.h
line 70 at r5 (raw file):
} std::vector<std::byte> value{std::byte(0), std::byte(1), std::byte(2)};
nit This should patch the Python test value, i.e., deadbeef
. (Or Python could change to match this.)
common/yaml/test/example_structs.h
line 92 at r5 (raw file):
} struct AllScalarsStruct {
nit Don't we need a std::vector<std::byte>
here, both for completeness and to match the Python tests?
common/yaml/yaml_node.h
line 174 at r4 (raw file):
static constexpr std::string_view kTagStr{"tag:yaml.org,2002:str"}; // https://yaml.org/spec/1.2.2/#generic-string
nit Wrong documentation link
bindings/pydrake/common/test/yaml_typed_test.py
line 38 at r5 (raw file):
@dc.dataclass class IntStruct:
See the comment about 10 lines up.
We need IntStruct
on the C++ side now, too.
bindings/pydrake/common/test/yaml_typed_test.py
line 321 at r5 (raw file):
for value, error_msg in cases: data = f"value: !!binary {value}" with self.assertRaisesRegex(yaml.constructor.ConstructorError,
nit This is overly coupling our test case to the implementation. We don't care about the type, we only care that the message is sensible.
(Therefore also remember to remove the import yaml
.)
Suggestion:
Exception
bindings/pydrake/common/test/yaml_typed_test.py
line 354 at r5 (raw file):
self.assertEqual(x.value, b'') # Using !!binary and assigning it to non-bytes should throw.
Do these test cases exist on the C++ side? I didn't immediately find them.
Our goal is to keep the Python and C++ test case inputs as close alignment as possible. It's okay to diverge in the expected outcome, but the input panel should be roughly the same, declared in the same order, with the same phrasing, data, comments, etc. As naive of a cross-language porting as possible.
bindings/pydrake/common/test/yaml_typed_test.py
line 307 at r4 (raw file):
("!!binary |\n A3Rlc3Rfc3RyAw==", b"\x03test_str\x03"), ("!!binary |\n A3Rlc3Rf\n c3RyAw====", b"\x03test_str\x03"), ("!!binary", b''),
nit Double quotes for consistency?
In general throughout this method, probably the single vs double quotes could be made more uniform. I'll note a couple more instance inline, but not try to ding all of them.
Suggestion:
b""
bindings/pydrake/common/test/yaml_typed_test.py
line 318 at r4 (raw file):
("A3Rfc3RyAw=", "Incorrect padding"), ("A3Rfc*RyAw==", "Invalid base64-encoded string")] for value, error_msg in cases:
nit Disallowed styleguide abbreviation "msg". (I don't think we have any sodium salt of glutamic acid in our yaml parse.)
(Also nit too much whitespace in the middle.)
Suggestion:
for value, error_regex in cases:
bindings/pydrake/common/test/yaml_typed_test.py
line 346 at r4 (raw file):
("true", "Expected.*bytes.*bool"), ] for value, error_message in cases:
BTW
Suggestion:
error_regex
bindings/pydrake/common/test/yaml_typed_test.py
line 355 at r4 (raw file):
# Using !!binary and assigning it to non-bytes should throw. cases = [ (b'.inf', FloatStruct),
BTW Prior tests cases in this method used double quotes for b"stuff"
input. Might as well stay consistent here too?
tools/workspace/yaml_cpp_internal/repository.bzl
line 8 at r4 (raw file):
github_archive( name = name, # local_repository_override = "/home/seancurtis/code/yaml-cpp",
nit Stray line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 13 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @SeanCurtis-TRI)
common/yaml/yaml_read_archive.h
line 1 at r5 (raw file):
#pragma once
See the head commit here for my suggestions:
https://github.com/jwnimmer-tri/drake/commits/yaml-binary-fixups/
Big picture, we want to keep the complexity in the cc file and we must use ReportError for errors (so that we get line numbers).
5eda807
to
c4eb3f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one way in writing binary strings differ between C++ and python. It's apparent in the acceptance strings in the two corresponding tests. Python always formats it as:
value: !!binary |
bacdasldkfj
Whereas C++ prefers:
value: !!binary bacdasldkfj
Reading on both sides have been shown to accept both formats, so I'm choosing not to worry about the fact that they get formatted slightly differently on writiing.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, missing label for release notes (waiting on @SeanCurtis-TRI)
a discussion (no related file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
This PR only seems to have reading. Usually we add both reading and writing in the same PR, as a way to help cross-check that both sides of the story are mutually compatible.
I suppose if its an easier PR train we can add reading and then PR the writing half immediately after (while backfilling any glitches in reading that show up), but we don't want to delay that very long.
I thought I had been clear about its state. Apparently not. I'd intentionally deferred adding writing to the PR until we were in agreement on the reading semantics.
Writing has now been added.
bindings/pydrake/common/test/yaml_typed_test.py
line 318 at r4 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
nit Disallowed styleguide abbreviation "msg". (I don't think we have any sodium salt of glutamic acid in our yaml parse.)
(Also nit too much whitespace in the middle.)
But you gotta love that dash of umami...
bindings/pydrake/common/test/yaml_typed_test.py
line 38 at r5 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
See the comment about 10 lines up.
We need
IntStruct
on the C++ side now, too.
Sadly, reviewable isn't good at providing relevant context for "comment about 10 lines up" as there are no other comments in this file.
However, the request for the IntStruct
is clear.
(FTR, I hadn't bothered because of the difference between the yaml parsing logic in python vs C++. Python does all of the type conversion before we get a hold of the result, whereas C++ we do it ourselves. So, I felt the DoubleStruct
provided coverage. I still believe it does. However, in the name of keeping the two tests in parallel -- in case our C++ yaml parser changes its behavior, it makes sense to provide the redundant coverage now).
bindings/pydrake/common/test/yaml_typed_test.py
line 354 at r5 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
Do these test cases exist on the C++ side? I didn't immediately find them.
Our goal is to keep the Python and C++ test case inputs as close alignment as possible. It's okay to diverge in the expected outcome, but the input panel should be roughly the same, declared in the same order, with the same phrasing, data, comments, etc. As naive of a cross-language porting as possible.
As alluded to above re: IntStruct
, I felt what got tested here got tested there, albeit in a more compact form. However, for the reasons indicated above, I've expanded the C++ test to be a more literal translation of the python.
Binary runs face first into writing json. I'm going to look into it. |
I think it would be okay to leave that as future work, and have the json writer throw (maybe with a change-detector expect-throws test to guard us). The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, wrong thread.
Anyway -- checkpoint on final reviews of all test code.
Reviewed 7 of 11 files at r6, all commit messages.
Reviewable status: 8 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, missing label for release notes (waiting on @SeanCurtis-TRI)
a discussion (no related file):
Previously, SeanCurtis-TRI (Sean Curtis) wrote…
Binary runs face first into writing json. I'm going to look into it.
I think it would be okay to leave that as future work, and have the json writer throw (maybe with a change-detector expect-throws test to guard us). The AllScalarsStruct
could have a way to opt-out of the binary data (e.g., a bool template argument, or a non-serialized bool member field, which skipped it during Serialize).
bindings/pydrake/common/test/yaml_typed_test.py
line 38 at r5 (raw file):
Previously, SeanCurtis-TRI (Sean Curtis) wrote…
Sadly, reviewable isn't good at providing relevant context for "comment about 10 lines up" as there are no other comments in this file.
However, the request for the
IntStruct
is clear.(FTR, I hadn't bothered because of the difference between the yaml parsing logic in python vs C++. Python does all of the type conversion before we get a hold of the result, whereas C++ we do it ourselves. So, I felt the
DoubleStruct
provided coverage. I still believe it does. However, in the name of keeping the two tests in parallel -- in case our C++ yaml parser changes its behavior, it makes sense to provide the redundant coverage now).
Oops. I meant 10 lines up in the original file:
# To provide test coverage for all of the special cases of YAML loading, we'll
# define some dataclasses. These classes mimic
# drake/common/yaml/test/example_structs.h
# and should be roughly kept in sync with the definitions in that file.
Basically the same idea as other threads -- that the C++ and Python tests should very strongly imitate each other. To the extent they differ at all, it should be for meaningful reasons that highlight an actual difference in behavior or capabilities between the two.
Perhaps this comment paragraph (and its twin on the C++ side) need to be expanded?
common/yaml/test/example_structs.h
line 52 at r6 (raw file):
// A value used in the test data below to include a default (placeholder) value // when initializing struct data members. constexpr double kNominalInt = -1;
typo
Suggestion:
int
common/yaml/test/yaml_write_archive_test.cc
line 74 at r6 (raw file):
TEST_F(YamlWriteArchiveTest, Bytes) { const auto test = [](const std::string& value, const std::string& expected) { const auto* data = reinterpret_cast<const std::byte*>(value.c_str());
BTW Using c_str()
implies that we care about the trailing NIL, but in this case it is not relevant. Saying data()
would better align with our intentions.
Suggestion:
value.data()
bindings/pydrake/common/test/yaml_typed_test.py
line 319 at r6 (raw file):
("A3Rfc3RyAw=", "Incorrect padding"), ("A3Rfc*RyAw==", "Invalid base64-encoded string")] for value, error_regex in cases:
nit Whitespace
Suggestion:
value, error_regex
bindings/pydrake/common/test/yaml_typed_test.py
line 330 at r6 (raw file):
("!!str 1234", "Expected.*bytes.*str"), # Int. ("12", "Expected.*bytes.*int"),
nit Match C++ input exactly.
FYI it would also be okay by me to test both "12" and "!!int 12" as separate cases (in both languages), if you think that's better.
Suggestion:
"!!int 12"
bindings/pydrake/common/test/yaml_typed_test.py
line 334 at r6 (raw file):
# Pyyaml defect: 0o3 should be an int. ("0o3", "Expected.*bytes.*str"), # Pyyaml defect: 00:03 should be an int (value of 3).
BTW I am okay with leaving this comment alone, but as background the full answer for whether this is a "defect" depends on which version of the YAML specification pyyaml is claiming to implement. The specified regexes for inferring data types are different across yaml versions.
bindings/pydrake/common/test/yaml_typed_test.py
line 337 at r6 (raw file):
("00:03", "Expected.*bytes.*str"), # Float. ("1234.5", "Expected.*bytes.*float"),
nit Match C++ input exactly.
Suggestion:
("1234.5", "Expected.*bytes.*float"),
("!!float 1234.5", "Expected.*bytes.*float"),
bindings/pydrake/common/test/yaml_typed_test.py
line 383 at r6 (raw file):
x = yaml_load_typed(schema=AllScalarsStruct, data=data, **options) self.assertEqual(x.some_bool, True) self.assertEqual(x.some_bytes, b'test string')
This test case does not match the C++ test case. Both should use the same yaml document as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feature but still should fix some open discussions prior to assigning platform.
Reviewed 1 of 2 files at r2, 1 of 7 files at r3, 4 of 11 files at r6.
Reviewable status: 11 unresolved discussions, needs at least two assigned reviewers, missing label for release notes (waiting on @SeanCurtis-TRI)
a discussion (no related file):
Working
At least in Python (haven't checked C++ yet), the handling of Optional[bytes]
and Union[..., bytes, ...]
does not work correctly. I'm trying to play around a bit to see if I have advice on whether to try to fix it or else declare it to be out of scope. Stay tuned.
tools/workspace/yaml_cpp_internal/patches/upstream/b64_decode_failure_is_empty.patch
line 6 at r6 (raw file):
- The input has an invalid character. However, it doesn't have the proper number of encoding characters (a multiple of
typo
Suggestion:
if the input
common/yaml/yaml_write_archive.cc
line 124 at r6 (raw file):
} else { emitted_tag = node_tag; }
Take note of the comment quoted here. It talks about "json schema" and explains how the data type will be correctly implied by the plain string because we are careful to emit text which the "json schema" regexes will notice when processing the plain string -- and therefore we elect not to emit the !!foo
marker so that we don't clutter up the document.
Those conditions do not hold for !!binary
-- it is not part of the json schema, and it does not have any regexs that would match against a plain scalar.
Therefore, the right way to handle binary is as a distinct else-if branch. See sample code below.
Suggestion:
if ((node_tag == internal::Node::kTagNull) ||
(node_tag == internal::Node::kTagBool) ||
(node_tag == internal::Node::kTagInt) ||
(node_tag == internal::Node::kTagFloat) ||
(node_tag == internal::Node::kTagStr)) {
// In most cases we don't need to emit the "JSON Schema" tags for YAML data,
// because they are implied by default. However, YamlWriteArchive on variant
// types sometimes marks the tag as important.
if (node.IsTagImportant()) {
// The `internal::Node::kTagFoo` all look like "tag:yaml.org,2002:foo".
// We only want the "foo" part (after the second colon).
emitted_tag = std::string("!!");
emitted_tag.append(node_tag.substr(18));
}
} else if (node_tag == internal::Node::kTagBinary) {
// Use the more compact "secondary tag" spelling.
emitted_tag = "!!binary";
} else {
emitted_tag = node_tag;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 unresolved discussions, needs at least two assigned reviewers, missing label for release notes (waiting on @SeanCurtis-TRI)
a discussion (no related file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
Working
At least in Python (haven't checked C++ yet), the handling of
Optional[bytes]
andUnion[..., bytes, ...]
does not work correctly. I'm trying to play around a bit to see if I have advice on whether to try to fix it or else declare it to be out of scope. Stay tuned.
Here is a start on the fix (only for python). See the new head commit:
https://github.com/jwnimmer-tri/drake/commits/yaml-binary-fixups/
Do we want to try to finish this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 unresolved discussions, needs at least two assigned reviewers, missing label for release notes (waiting on @SeanCurtis-TRI)
a discussion (no related file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
Here is a start on the fix (only for python). See the new head commit:
https://github.com/jwnimmer-tri/drake/commits/yaml-binary-fixups/
Do we want to try to finish this approach?
(The essential idea is that in yaml_load_typed, our control flow must be solely dictated by the schema, not the yaml document. We can't validate the yaml value right at the start of the function and expect to be able to decide anything, we need to flow through the conditions that probe against the schema to decide which yaml values are allowed.)
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
I took your commit and extended it for C++. I was explicitly looking at the optional to make sure it was happy when I ran out of time. Hopefully, I'll get a chance to push something new this morning, but my main priority is to get other PRs through platform as efficiently as possible. |
c4eb3f4
to
00f1181
Compare
Note: It just worked, unlike variant. Are the tests redundant?
00f1181
to
7053485
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I've got things updated. I have a third commit that adds coverage for "optional[bytes]". With all previous changes, they all simply seem to work. It's not clear to me if those additional tests (and the related infrastructure) is worthwhile.
There are various other "typed tests" in yaml_typed_test.py
that I didn't explore. Should I? I glanced and they seemed to be testing a higher level of abstraction so i think I'm safe.
Anyhoooo....things seem to currently "work".
Reviewable status: 2 unresolved discussions, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @SeanCurtis-TRI)
common/yaml/yaml_write_archive.cc
line 124 at r6 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
Take note of the comment quoted here. It talks about "json schema" and explains how the data type will be correctly implied by the plain string because we are careful to emit text which the "json schema" regexes will notice when processing the plain string -- and therefore we elect not to emit the
!!foo
marker so that we don't clutter up the document.Those conditions do not hold for
!!binary
-- it is not part of the json schema, and it does not have any regexs that would match against a plain scalar.Therefore, the right way to handle binary is as a distinct else-if branch. See sample code below.
This whole thing felt a bit murky to me.
However, as I follow that vector, it seem obvious that shoving the kBinary
into JsonSchemaTag
is likewise misguided. kStr
is acceptable because the json schema inherits from the failsafe schema. But the !!binary tag isn't part of that.
So, it seems to me that it would be better to use the Node::SetTag()
overload that simply takes the string instead of extending the enumeration in a misleading direction. (This did necessitate a change to VisitScalar()
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me if those additional tests (and the related infrastructure) is worthwhile.
I hear you on this, but I think for parsers (and yaml type checking in particular) having a costly battery of tests pays off in the long term. We used to not have so many, and it was often painful. In this case since we've identified that JSON primitives vs YAML primitives require different logic, I think the tests will be helpful. If we ever add a second kind of YAML-only primitive, that would be a point where we might lean on !!binary
to catch certain kinds of bugs instead of writing a full complement of new tests for the new YAML primitive.
In any case, I'll take a final look at the test suite from scratch, to offer any final thoughts.
Reviewed 12 of 12 files at r7, all commit messages.
Reviewable status: 3 unresolved discussions, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @SeanCurtis-TRI)
common/yaml/yaml_write_archive.cc
line 124 at r6 (raw file):
...
kBinary
intoJsonSchemaTag
is ... misguided.
Ah, good observation. I agree.
For !!binary
, I am fine with calling SetTag with a std::string
. However, FYI another approach would be to rename the enum to cover the wider domain (e.g., all yaml primitives), or to introduce a second enum for yaml-only primitives and overload SetTag on that new enum.
In any case, this thread is resolved.
a discussion (no related file):
Working
One more pass over all of the tests, from scratch.
common/yaml/yaml_write_archive.h
line 239 at r7 (raw file):
if constexpr (std::is_same_v<T, std::string>) { text = value; tag = internal::Node::kTagStr;
Whatever we do for !!binary
tag, we shouldn't use SetTag(std::string)
here for all of the json tags. It's important to keep using the compact (enum) representation in the common case, for performance.
Without writing it on paper to check, my best guess is that the tactic in the snippet below might be the smoothest? Up to you though, I'll let you play around and decide.
// Exactly one of these will be set by the if-else chain.
std::optional<JsonSchemaTag> json_tag;
std::string full_tag;
...
if (json_tag.has_value()) {
scalar.SetTag(*json_tag);
} else {
scalar.SetTag(std::move(full_tag));
}
common/yaml/yaml_read_archive.h
line 356 at r7 (raw file):
} else if constexpr (std::is_same_v<T, std::string>) { return tag == internal::Node::kTagStr; } else if constexpr (std::is_same_v<T, std::vector<std::byte>>) {
BTW The comment atop this is-else chain says "JSON schema tags". Per the other thread, possibly we want to amend that slightly for precision, since !!binary
isn't part of the JSON schema.
Adds the ability for Drake to treat the yaml !!binary tag in a consistent and coherent manner.
This change is