Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds binary 1.1 read support for e-expressions, macro expansion #789

Merged
merged 16 commits into from
Jun 14, 2024

Conversation

zslayton
Copy link
Contributor

@zslayton zslayton commented Jun 10, 2024

Introduces a RawBinaryEExpression_1_1 type (mirroring the existing RawTextEExpression_1_1 type) and adds methods to the 1.1 ImmutableBuffer to parse them. This requires access to the signature of the expression being parsed, which in turn requires access to the macro table. Previously, the buffer types (ImmutableBuffer, TextBufferView) each held a reference to the allocator in case they needed scratch space for caching child values or sanitizing/decoding input text. I have replaced the allocator field with a context field--the EncodingContextRef type allows access to both the allocator and the macro table.

Finally, this PR also wires the new binary e-expressions up to the macro evaluator for expansion. It adds a handful of unit tests demonstrating this.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@zslayton zslayton marked this pull request as draft June 10, 2024 16:58
Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR tour 🧭

Comment on lines -265 to +277
Some(Ok(RawValueExpr::MacroInvocation(invocation))) => {
Some(Ok(RawValueExpr::MacroInvocation(LazyRawAnyEExpression {
Some(Ok(RawValueExpr::EExp(invocation))) => {
Some(Ok(RawValueExpr::EExp(LazyRawAnyEExpression {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ I renamed RawValueExpr::MacroInvocation to RawValueExpr::EExp because at the raw level we're always talking about syntactic elements.

Comment on lines -418 to +444
Text_1_0(r) => Ok(r.next(allocator)?.into()),
Text_1_0(r) => Ok(r.next(context)?.into()),
Binary_1_0(r) => Ok(r.next()?.into()),
Text_1_1(r) => Ok(r.next(allocator)?.into()),
Binary_1_1(r) => Ok(r.next()?.into()),
Text_1_1(r) => Ok(r.next(context)?.into()),
Binary_1_1(r) => Ok(r.next(context)?.into()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Most of the buffer types used to hold a reference to the bump allocator in case they needed to decode text escapes or cache child expressions. Now that the reader needs to parse binary e-expressions, the parser needs access to the macro table to look up the macro signature. The encoding context has a reference to both the allocator and the macro table, so now the buffers get a reference to the encoding context.

Comment on lines -79 to 81
pub annotations_header_length: u8,
pub annotations_header_length: u16,
// The number of bytes used to encode the series of symbol IDs inside the annotations wrapper.
pub annotations_sequence_length: u16,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ There was a disagreement between how Ion 1.0 and Ion 1.1 were using these fields.

Ion 1.1 annotations encodings have two parts: a header, and the sequence itself. It treated the annotations_header_length and annotations_sequence_length as descriptions of non-overlapping pieces of the encoding.

Ion 1.0 annotations encodings have several parts: a header, a wrapper length, a sequence length, and the sequence itself. It treated annotations_header_length as the complete length of all of these pieces combined and annotations_sequence_length as the number of bytes at the end of the header that comprised the sequence itself.

For the moment, I've adjusted 1.1's behavior to align with 1.0's. This required me to increase the size of the header field since it's storing the total length. I actually think 1.1's interpretation was better, but switching to that will require changing lots of small accessor methods so I've left it for a future PR.

@@ -125,7 +125,7 @@ impl<'data> LazyRawReader<'data, BinaryEncoding_1_0> for LazyRawBinaryReader_1_0

fn next<'top>(
&'top mut self,
_allocator: &'top BumpAllocator,
_context: EncodingContextRef<'top>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The binary 1.0 reader is the only one that doesn't use anything from the encoding context (the allocator or the macro table) during parsing.

}

#[test]
fn read_eexp_without_args() -> IonResult<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ These tests confirm that the parser is capable of reading e-expressions based on the number of parameters in their signature. They do not perform any evaluation/expansion.

Comment on lines +305 to +313
#[test]
fn expand_binary_template_macro() -> IonResult<()> {
let macro_source = "(macro seventeen () 17)";
let encode_macro_fn = |address| vec![0xE0, 0x01, 0x01, 0xEA, address as u8];
expand_macro_test(macro_source, encode_macro_fn, |mut reader| {
assert_eq!(reader.expect_next()?.read()?.expect_i64()?, 17);
Ok(())
})
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The tests in this file are doing actual expansion of binary encoded e-expressions.

Err(nom::Err::Failure(IonParseError::Invalid(error)))
}
let (span, child_exprs) = match TextListSpanFinder_1_1::new(
self.context.allocator(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The change from self.allocator to self.context.allocator() caused rustfmt to reflow the entire expression.

Err(nom::Err::Failure(IonParseError::Invalid(error)))
}
let (span, fields) = match TextStructSpanFinder_1_1::new(
self.context.allocator(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Same here; rustfmt reflow.

.with_description(format!("{}", e));
Err(nom::Err::Failure(IonParseError::Invalid(error)))
let (span, child_expr_cache) =
match TextSExpSpanFinder_1_1::new(self.context.allocator(), sexp_iter)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Reflow here too.

@@ -34,6 +34,84 @@ pub struct LazyRawTextReader_1_1<'data> {
local_offset: usize,
}

impl<'data> LazyRawReader<'data, TextEncoding_1_1> for LazyRawTextReader_1_1<'data> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ I moved this impl block so it would be right below the definition of LazyRawTextReader_1_1. There are no logic changes.

@zslayton zslayton marked this pull request as ready for review June 10, 2024 18:27
@zslayton zslayton requested review from popematt and nirosys June 10, 2024 18:27
Base automatically changed from binary-1_1-roundtrip to main June 10, 2024 18:29
@@ -21,7 +29,7 @@ use std::ops::Range;
/// and a copy of the `ImmutableBuffer` that starts _after_ the bytes that were parsed.
///
/// Methods that `peek` at the input stream do not return a copy of the buffer.
#[derive(PartialEq, Clone, Copy)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there is any value in providing a PartialEq implementation that ignores the Context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at one time I was using these buffets in unit tests and wanted assert_eq! to work. I don't think we're using them like that anymore though. We can/should give it a new PartialEq impl like that when we need one.

@zslayton zslayton merged commit caebdbd into main Jun 14, 2024
32 checks passed
@zslayton zslayton deleted the binary-1_1-eexp branch December 5, 2024 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants