Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Expose non-io_buf API for fb_parser #4683

Merged
merged 3 commits into from
Feb 14, 2024

Conversation

lokitoth
Copy link
Member

This API will be used by consumers like RLClientLib to enable inputting FlatBuffer input.

lokitoth and others added 3 commits February 12, 2024 11:04
* also enables using the fb_parser_test prototype types in downstream projects
* Fix logic issues with detecting presence/absence of feature names and hashes in example data, leading to an access violation crash
* Fix issues with properly parsing ExampleCollection buffers, around reporting doneness and advancing when parsing inner MultiExample objects.
* Fix tests to include coverage for incoming flatbuffers with/without names/hashes
* Fixes validation logic to avoid crashing when feature names/hashes are missing
* Add read_span_flatbuffer tests
Parser logic needs to use VW::finish_example() rather than simple deletion to get rid of examples because the incoming example_factory may be grabbing examples from the example pool.
@@ -420,7 +420,7 @@ if(VW_FEAT_CSV)
endif()

if(VW_FEAT_FLATBUFFERS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to turn this on by default?

if (_active_multi_ex) { RETURN_IF_FAIL(parse_multi_example(all, examples[0], _multi_example_object, status)); }
else if (_active_collection) { RETURN_IF_FAIL(process_collection_item(all, examples, status)); }
#define RETURN_SUCCESS_FINISHED() \
return buffer_pointer ? VW::experimental::error_code::nothing_to_parse : VW::experimental::error_code::success;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to remove experimental? It's been long enough.

bool features_have_names(const Namespace& ns)
{
return flatbuffers::IsFieldPresent(&ns, Namespace::VT_FEATURE_NAMES) && (ns.feature_names()->size() != 0);
// TODO: It is not clear what the right answer is when feature_values->size is 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the name collection is present at all and greater than 0 in length, it should match the value collection lenght for things to be rational

@@ -342,23 +445,13 @@ int parser::parse_namespaces(VW::workspace* all, example* ae, const Namespace* n
// assuming the usecase that if feature names would exist, they would exist for all features in the namespace
if (ns->feature_names()->size() != ns->feature_values()->size())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implicit assumption is that there are either hashes or names and that that must match the values collection in size. An explicit comment here would be helpful for future us :).


auto feature_value_iter = (ns->feature_values())->begin();
const auto feature_value_iter_end = (ns->feature_values())->end();

bool has_hashes = features_have_hashes(*ns);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much easier to read and reason about. Thanks!

int parser::parse_namespaces(VW::workspace* all, example* ae, const Namespace* ns, VW::experimental::api_status* status)
{
#define RETURN_NS_PARSER_ERROR(status, error_code) \
if (_active_collection && _active_multi_ex) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what is happening in this state. A comment would be helpful.

// thus context.size() = sizeof(length) + length
io_buf unused;

// TODO: How do we report errors out of here? (This is a general API problem with the parsers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. This is not great. Does this introduce exceptions to clientlib during parsing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless given that we should get this feature out, probably ok to live with this and improve later if needed.

namespace parsers
{
namespace flatbuffer
{
int flatbuffer_to_examples(VW::workspace* all, io_buf& buf, VW::multi_ex& examples);
bool read_span_flatbuffer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the new interface for clientlib?

@rajan-chari
Copy link
Member

Looks good to me. Added some suggestions for comments.

Copy link
Member

@rajan-chari rajan-chari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. (Left some suggestions for comments.)

@lokitoth lokitoth merged commit 7c2963e into VowpalWabbit:master Feb 14, 2024
114 of 116 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants