Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

&convert usage makes self.field unavailable within field's %done hook #1661

Closed
awelzel opened this issue Jan 25, 2024 · 11 comments · Fixed by #1703
Closed

&convert usage makes self.field unavailable within field's %done hook #1661

awelzel opened this issue Jan 25, 2024 · 11 comments · Fixed by #1703
Assignees

Comments

@awelzel
Copy link
Contributor

awelzel commented Jan 25, 2024

A minimal reproducer:

module Test;

import spicy;

public type X = unit {
  x: uint8;
  xs: bytes &eod &chunked &convert=(spicy::bytes_to_hexstring($$)) {
    print "$$", $$;
    print "xs", self.xs;
  }
};

$ printf "\x80\x11\x22\x33\x44\x55\x66" | spicy-driver --increment 4 x.spicy
$$, 112233
[error] terminating with uncaught exception of type hilti::rt::UnsetOptional: unset optional value (x.spicy:9:5)

If one removes the &convert attribute on xs, The current self.xs value becomes available within the %done hook of the field. Seems &convert shouldn't have this effect and instead ensure the "converted" value is available in self.xs.


This came up in Slack.

I'm looking at raising an event whenever one "chunk" of a &chunked field was consumed (think http_entity_data style event).

In the .evt file I'd like to use:

on WebSocket::Frame::chunk -> event websocket_frame_data($conn, $is_orig, self.chunk);

And in the grammar, chunk is roughly the following.

  chunk: bytes &size=self.payload_len &chunked &convert=unmask(self, $$) {
    zeek::protocol_data_in(zeek::is_orig(), $$);
  }

When using self.chunk in the .evt file, a violation is triggered:

unset optional value (analyzer/protocol/websocket/websocket.evt:18)        \x89\x04Zeek

A workaround is to assign self.chunk to $$ within the inline %done hook before zeek::protocol_data_in().

@bbannier bbannier self-assigned this Jan 25, 2024
@bbannier
Copy link
Member

The issue here seems to be that the parsing with &chunked doesn't directly populate xs with a value, but this only occurs after the chunked parsing is done.

@awelzel
Copy link
Contributor Author

awelzel commented Jan 25, 2024

The issue here seems to be that the parsing with &chunked doesn't directly populate xs with a value, but this only occurs after the chunked parsing is done.

Just to make sure this didn't go under, there's no issue if I remove the &convert:

module Test;

import spicy;

public type X = unit {
  x: uint8;
  xs: bytes &eod &chunked {
    print "$$", $$;
    print "xs", self.xs;
  }
};


$ printf "\x80\x11\x22\x33\x44\x55\x66" | spicy-driver --increment 4 x.spicy
$$, \x11"3
xs, \x11"3
$$, DUf
xs, DUf

@bbannier
Copy link
Member

bbannier commented Jan 25, 2024

Just to make sure this didn't go under, there's no issue if I remove the &convert:

Yep, that's the normal assignment of the value after parsing is done. We might need to trigger also trigger that whenever a chunk is processed since we allow users to run hooks at that point.

@bbannier
Copy link
Member

bbannier commented Jan 25, 2024

There are actually more things wrong with chunked parsing. After a field is done parsing in addition to potentially setting the unit field we currently also update offsets,

void postParseField(const Production& p, const production::Meta& meta,
const std::optional<Expression>& pre_container_offset) {
const auto& field = meta.field();
assert(field); // Must only be called if we have a field.
// If the field holds a container we expect to see the offset of the field, not the individual container
// elements inside e.g., this unit's fields hooks. Temporarily restore the previously stored offset.
std::optional<Expression> prev;
if ( pre_container_offset ) {
prev = builder()->addTmp("prev",
builder::ternary(featureConstant(state().unit_id, "uses_offset"),
builder::member(state().self, "__offset"), builder::integer(0)));
pb->guardFeatureCode(state().unit_id, {"uses_offset"}, [&]() {
builder()->addAssign(builder::member(state().self, "__offset"), *pre_container_offset);
});
}
HILTI_DEBUG(spicy::logging::debug::ParserBuilder, fmt("- post-parse field: %s", field->id()));
if ( auto a = AttributeSet::find(field->attributes(), "&try") )
pb->finishBacktracking();
if ( pb->options().getAuxOption<bool>("spicy.track_offsets", false) ) {
assert(field->index());
auto __offsets = builder::member(state().self, "__offsets");
auto cur_offset = builder::memberCall(state().cur, "offset", {});
auto offsets = builder::index(__offsets, *field->index());
builder()->addAssign(offsets, builder::tuple({builder::index(builder::deref(offsets), 0), cur_offset}));
}
auto ncur = state().ncur;
state().ncur = {};
if ( auto a = AttributeSet::find(field->attributes(), "&max-size") ) {
// Check that we did not read into the sentinel byte.
auto cond = builder::greaterEqual(builder::memberCall(state().cur, "offset", {}),
builder::memberCall(*ncur, "offset", {}));
auto exceeded = builder()->addIf(std::move(cond));
pushBuilder(exceeded, [&]() {
// We didn't finish parsing the data, which is an error.
if ( ! field->isAnonymous() )
// Clear the field in case the type parsing has started
// to fill it.
builder()->addExpression(builder::unset(state().self, field->id()));
pb->parseError("parsing not done within &max-size bytes", a->meta());
});
}
else if ( auto a = AttributeSet::find(field->attributes(), "&size") ) {
// Make sure we parsed the entire &size amount.
auto missing = builder::lower(builder::memberCall(state().cur, "offset", {}),
builder::memberCall(*ncur, "offset", {}));
auto insufficient = builder()->addIf(std::move(missing));
pushBuilder(insufficient, [&]() {
// We didn't parse all the data, which is an error.
if ( ! field->isAnonymous() )
// Clear the field in case the type parsing has started
// to fill it.
builder()->addExpression(builder::unset(state().self, field->id()));
pb->parseError("&size amount not consumed", a->meta());
});
}
auto val = destination();
if ( field->convertExpression() ) {
// Value was stored in temporary. Apply expression and store result
// at destination.
popDestination();
pb->applyConvertExpression(*field, val, destination());
}
popState(); // From &size (pushed even if absent).
if ( AttributeSet::find(field->attributes(), "&parse-from") ||
AttributeSet::find(field->attributes(), "&parse-at") ) {
ncur = {};
popState();
pb->saveParsePosition();
}
if ( ncur )
builder()->addAssign(state().cur, *ncur);
if ( ! meta.container() ) {
if ( pb->isEnabledDefaultNewValueForField() && state().literal_mode == LiteralMode::Default )
pb->newValueForField(meta, destination(), val);
}
if ( state().captures )
popState();
if ( prev )
pb->guardFeatureCode(state().unit_id, {"uses_offset"},
[&]() { builder()->addAssign(builder::member(state().self, "__offset"), *prev); });
if ( field->condition() )
popBuilder();
}

module Test;

public type X = unit {
    xs: bytes &eod &chunked {
        print self.offset();
    }
};
# Use incremental feeding so we get multiple chunks.
$ printf "\x80\x11\x22\x33\x44\x55\x66" | spicy-driver -d foo.spicy -i 1
0
0
0
0
0
0
0

I was hoping to call that code after parsing of a chunk is done, but unfortunately we don't seem to have direct access to it from there,

pb->newValueForField(meta, value, target);

@bbannier bbannier removed their assignment Jan 25, 2024
@rsmmr
Copy link
Member

rsmmr commented Jan 26, 2024

I think we should make '$$' work here and let it refer to the current chunk of bytes, and then generally interpret attributes on a &chunked field (like &convert) as applicable to the final chunk only, and assign that to the field once we're done.

@awelzel
Copy link
Contributor Author

awelzel commented Jan 26, 2024

I think we should make '$$' work here ..

Yes, $$ in the .evt would solve this.

... and let it refer to the current chunk of bytes, and then generally interpret attributes on a &chunked field (like &convert) as applicable to the final chunk only, and assign that to the field once we're done.

$$ within .evt would be the &converted chunk of bytes? Otherwise, at least for my use-case, it wouldn't really be useful.

@rsmmr
Copy link
Member

rsmmr commented Jan 26, 2024

Yes, $$ in the .evt would solve this.

+1

$$ within .evt would be the &converted chunk of bytes? Otherwise, at least for my use-case, it wouldn't really be useful.

No, it would be the current bytes value. I believe the converted value would only very rarely be useful (or even meaningful) because it's undefined how the chunking takes place: you could be getting 1 byte or a thousand, and have no control over where things are split. Your bytes_to_hexstring($$) is one of the few cases where it could be useful (basically whenever it's just per-byte conversions, not multiple bytes), but couldn't you do that on your end as well? Just call bytes_to_hexstring($$) on the received value before passing it on (even inside the EVT)?

@bbannier
Copy link
Member

bbannier commented Jan 26, 2024

@rsmmr $$ refers to the converted value. Like you wrote, whether that makes sense for the user is up to them, e.g., if &convert does a base64 conversion the results could be wrong (and the proper tool would be to use a filter).

@rsmmr
Copy link
Member

rsmmr commented Jan 26, 2024

@rsmmr $$ refers to the converted value. Like you wrote, whether that makes sense for the user is up to them, e.g., if &convert does a base64 conversion the results could be wrong (and the proper tool would be to use a filter).

That's my point: I don't think it should refer to the converted value inside a &chunked field. In fact, I'd go so far and disallow using &convert on a &chunked field altogether.

@awelzel
Copy link
Contributor Author

awelzel commented Jan 26, 2024

Your bytes_to_hexstring($$) is one of the few cases where it could be useful (basically whenever it's just per-byte conversions, not multiple bytes),
In fact, I'd go so far and disallow using &convert on a &chunked field altogether.

The use-case was per byte xor decoding. But yes, I see now and agree with the proposed exclusion.

Even pondering a bit if &chunked fields should be recommended to be anonymous due to not being set until the last chunk (and then only with whatever remained).

$$ for my use-case in .evt won't quite help. Yes, could use convert($$) in the .evt, but it's also needed in the %done hook of the chunked field (for zeek::protocol_data_in) and the decoding is not cheap. Doing convert explicit within %done and assigning to a unit variable sounds pragmatic enough now that &convert is out of the picture.

@rsmmr
Copy link
Member

rsmmr commented Mar 28, 2024

Sounds like conclusion here is to remove support for the combination of &chunked and &convert. #1703 puts a deprecation in place.

@rsmmr rsmmr closed this as completed in 47f7019 Apr 3, 2024
rsmmr added a commit that referenced this issue Apr 3, 2024
…ert'

* origin/topic/robin/gh-1661-chunked-convert:
  Deprecate usage of `&convert` with `&chunked`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants