-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds binary 1.1 read support for e-expressions, macro expansion #789
Conversation
…okes `annotations`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour 🧭
Some(Ok(RawValueExpr::MacroInvocation(invocation))) => { | ||
Some(Ok(RawValueExpr::MacroInvocation(LazyRawAnyEExpression { | ||
Some(Ok(RawValueExpr::EExp(invocation))) => { | ||
Some(Ok(RawValueExpr::EExp(LazyRawAnyEExpression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I renamed RawValueExpr::MacroInvocation
to RawValueExpr::EExp
because at the raw level we're always talking about syntactic elements.
Text_1_0(r) => Ok(r.next(allocator)?.into()), | ||
Text_1_0(r) => Ok(r.next(context)?.into()), | ||
Binary_1_0(r) => Ok(r.next()?.into()), | ||
Text_1_1(r) => Ok(r.next(allocator)?.into()), | ||
Binary_1_1(r) => Ok(r.next()?.into()), | ||
Text_1_1(r) => Ok(r.next(context)?.into()), | ||
Binary_1_1(r) => Ok(r.next(context)?.into()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Most of the buffer types used to hold a reference to the bump allocator in case they needed to decode text escapes or cache child expressions. Now that the reader needs to parse binary e-expressions, the parser needs access to the macro table to look up the macro signature. The encoding context has a reference to both the allocator and the macro table, so now the buffers get a reference to the encoding context.
pub annotations_header_length: u8, | ||
pub annotations_header_length: u16, | ||
// The number of bytes used to encode the series of symbol IDs inside the annotations wrapper. | ||
pub annotations_sequence_length: u16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ There was a disagreement between how Ion 1.0 and Ion 1.1 were using these fields.
Ion 1.1 annotations encodings have two parts: a header, and the sequence itself. It treated the annotations_header_length
and annotations_sequence_length
as descriptions of non-overlapping pieces of the encoding.
Ion 1.0 annotations encodings have several parts: a header, a wrapper length, a sequence length, and the sequence itself. It treated annotations_header_length
as the complete length of all of these pieces combined and annotations_sequence_length
as the number of bytes at the end of the header that comprised the sequence itself.
For the moment, I've adjusted 1.1's behavior to align with 1.0's. This required me to increase the size of the header
field since it's storing the total length. I actually think 1.1's interpretation was better, but switching to that will require changing lots of small accessor methods so I've left it for a future PR.
@@ -125,7 +125,7 @@ impl<'data> LazyRawReader<'data, BinaryEncoding_1_0> for LazyRawBinaryReader_1_0 | |||
|
|||
fn next<'top>( | |||
&'top mut self, | |||
_allocator: &'top BumpAllocator, | |||
_context: EncodingContextRef<'top>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The binary 1.0 reader is the only one that doesn't use anything from the encoding context (the allocator or the macro table) during parsing.
} | ||
|
||
#[test] | ||
fn read_eexp_without_args() -> IonResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ These tests confirm that the parser is capable of reading e-expressions based on the number of parameters in their signature. They do not perform any evaluation/expansion.
#[test] | ||
fn expand_binary_template_macro() -> IonResult<()> { | ||
let macro_source = "(macro seventeen () 17)"; | ||
let encode_macro_fn = |address| vec![0xE0, 0x01, 0x01, 0xEA, address as u8]; | ||
expand_macro_test(macro_source, encode_macro_fn, |mut reader| { | ||
assert_eq!(reader.expect_next()?.read()?.expect_i64()?, 17); | ||
Ok(()) | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The tests in this file are doing actual expansion of binary encoded e-expressions.
Err(nom::Err::Failure(IonParseError::Invalid(error))) | ||
} | ||
let (span, child_exprs) = match TextListSpanFinder_1_1::new( | ||
self.context.allocator(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The change from self.allocator
to self.context.allocator()
caused rustfmt
to reflow the entire expression.
Err(nom::Err::Failure(IonParseError::Invalid(error))) | ||
} | ||
let (span, fields) = match TextStructSpanFinder_1_1::new( | ||
self.context.allocator(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Same here; rustfmt
reflow.
.with_description(format!("{}", e)); | ||
Err(nom::Err::Failure(IonParseError::Invalid(error))) | ||
let (span, child_expr_cache) = | ||
match TextSExpSpanFinder_1_1::new(self.context.allocator(), sexp_iter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Reflow here too.
@@ -34,6 +34,84 @@ pub struct LazyRawTextReader_1_1<'data> { | |||
local_offset: usize, | |||
} | |||
|
|||
impl<'data> LazyRawReader<'data, TextEncoding_1_1> for LazyRawTextReader_1_1<'data> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I moved this impl
block so it would be right below the definition of LazyRawTextReader_1_1
. There are no logic changes.
@@ -21,7 +29,7 @@ use std::ops::Range; | |||
/// and a copy of the `ImmutableBuffer` that starts _after_ the bytes that were parsed. | |||
/// | |||
/// Methods that `peek` at the input stream do not return a copy of the buffer. | |||
#[derive(PartialEq, Clone, Copy)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there is any value in providing a PartialEq implementation that ignores the Context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at one time I was using these buffets in unit tests and wanted assert_eq!
to work. I don't think we're using them like that anymore though. We can/should give it a new PartialEq
impl like that when we need one.
Introduces a
RawBinaryEExpression_1_1
type (mirroring the existingRawTextEExpression_1_1
type) and adds methods to the 1.1ImmutableBuffer
to parse them. This requires access to the signature of the expression being parsed, which in turn requires access to the macro table. Previously, the buffer types (ImmutableBuffer
,TextBufferView
) each held a reference to the allocator in case they needed scratch space for caching child values or sanitizing/decoding input text. I have replaced theallocator
field with acontext
field--theEncodingContextRef
type allows access to both the allocator and the macro table.Finally, this PR also wires the new binary e-expressions up to the macro evaluator for expansion. It adds a handful of unit tests demonstrating this.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.