-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Flatbuffer format #3989
Merged
Merged
feat: Flatbuffer format #3989
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rajan-chari
reviewed
Jun 20, 2022
rajan-chari
reviewed
Jun 20, 2022
rajan-chari
requested changes
Jun 20, 2022
rajan-chari
reviewed
Jun 20, 2022
rajan-chari
reviewed
Jun 20, 2022
rajan-chari
reviewed
Jun 20, 2022
rajan-chari
reviewed
Jun 20, 2022
rajan-chari
reviewed
Jun 20, 2022
rajan-chari
reviewed
Jun 20, 2022
rajan-chari
reviewed
Jun 20, 2022
rajan-chari
reviewed
Jun 20, 2022
vowpalwabbit/fb_parser/include/vw/fb_parser/parse_example_flatbuffer.h
Outdated
Show resolved
Hide resolved
Sharvani2002
commented
Oct 2, 2022
Closing for now, we can revisit this later, thank you :) |
The reason for odd pass/fail behaviour across the CI is that MSVC introduces additional code in debug mode (or fails to apply certain optimizations, I did not fully investigate) which leads it to use a move on return of the test object (introduced when we removed the copy constructor). Because Linux was optimizing that away in all cases, the test runs in CI (or locally) did not pick up the double-free bug which the previous commit (5deef6b) fixed - an issue that was discovered on Windows[Debug]. The issue now is that the move constructor failed to copy the value of p.
note: the added test is not yet working
In the prototype_namespace_t's validator, we need to carefully iterate the example's features because we need to be in the correct extent. There was an indexing issue where we were using the extent-index in a pure namespace-index manner, which caused us to read beyond the end of the vector, leading to spurious data being loaded.
In order to simulate enough of the parse dispatch loop to get the flatbuffer_parser to read multiple examples, we need to "dispatch" the examples after each pass through the parser, othrwise it is going to attempt to write in place over a single example. When dispatching was introduced in the tests, it was used to swallow an extraneous newline example (which is used and swallowed internally by the example dispatcher to collect examples into multi_ex for, e.g. cb_adf), and without cleanup it leaks.
* Expands the example data generation to create larger test scenarios * Fixes the example dispatch logic to handle dispatching streams of multi_ex examples * Fix support for unlabeled examples
rajan-chari
reviewed
Feb 8, 2024
rajan-chari
reviewed
Feb 8, 2024
rajan-chari
reviewed
Feb 8, 2024
rajan-chari
reviewed
Feb 8, 2024
rajan-chari
reviewed
Feb 8, 2024
rajan-chari
reviewed
Feb 8, 2024
rajan-chari
reviewed
Feb 8, 2024
rajan-chari
reviewed
Feb 8, 2024
rajan-chari
reviewed
Feb 8, 2024
rajan-chari
reviewed
Feb 8, 2024
rajan-chari
approved these changes
Feb 8, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. The error handling is much better.
* Remove api_status as a CLI flag - this is a liberary affordance; when in CLI mode we should always report errors to the user. Later we may want to consider adding a verbosity selector. * Remove an unnecessary bitshift in the alignment padding calculation. * Add explanation of what is going on with the desired_align inputs in the io_buf code.
* finish removing --api_status as a CLI flag
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes the FB data format to be more performant by avoiding extra level of tables at the feature data level.
Remaining work for CI: