Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enables the StreamingRawReader to detect incomplete text #784

Merged
merged 4 commits into from
Jun 5, 2024

Conversation

zslayton
Copy link
Contributor

@zslayton zslayton commented Jun 5, 2024

In binary encodings, stream items contain enough data for the reader to tell whether they are complete. In text encodings, it's possible for the buffer to end with data that looks like a complete item but is not. The only way to be certain is to try to read again from the input source to confirm there's no more data. Consider the following examples in which Ion is being pulled from a File into a Vec<u8>:

foo /* comment */   ::bar::baz::1000
└────────┬───────┘ └────────┬───────┘
  buffer contents   remaining in File

              $ion _1_0
└────────┬───────┘ └────────┬───────┘
  buffer contents   remaining in File

               75 1.20
────────┬───────┘ └────────┬───────┘
  buffer contents   remaining in File

To avoid misinterpreting the data, the StreamingRawReader now performs a final check for text readers who have emptied their buffer: it does not consider the item complete unless the input source is exhausted.

Other changes:

  • Fixes an indentation issue that affected annotated values emitted by the text writer.
  • Makes IonStream a public part of the experimental-reader-writer API.
  • Fixes a bug in the tooling access API EncodedBinaryValueData_1_0::body_span that caused it to fail if the value in question was annotated.

The three commits are each self-contained; reviewing them separately may be helpful.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@zslayton zslayton force-pushed the detect-incomplete-text branch from a4451e0 to 66a4683 Compare June 5, 2024 20:15
Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR tour 🧭

@@ -402,6 +396,16 @@ impl<'data> LazyRawReader<'data, AnyEncoding> for LazyRawAnyReader<'data> {
Binary_1_1(r) => r.position(),
}
}

fn encoding(&self) -> IonEncoding {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ All raw readers now provide a fn encoding(&self) -> IonEncoding, which allows generic code to tell which Ion encoding it's working with.

@@ -258,7 +258,7 @@ impl<'a, 'top> EncodedBinaryValueData_1_0<'a, 'top> {
/// value's body (that is: the content of the value that follows its opcode and length).
pub fn body_range(&self) -> Range<usize> {
let encoded = &self.value.encoded_value;
let body_offset = encoded.header_length();
let body_offset = encoded.annotations_header_length as usize + encoded.header_length();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This fixes a bug that the ion-cli has to work around.

@zslayton zslayton requested review from popematt and nirosys June 5, 2024 20:18
@zslayton zslayton merged commit 62d0755 into main Jun 5, 2024
32 checks passed
@zslayton zslayton deleted the detect-incomplete-text branch June 5, 2024 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants