Skip to content

Commit

Permalink
fix: Re-create the Flatbuffer test input files to match new schema
Browse files Browse the repository at this point in the history
* Add information about data provenance and generation methodology
  • Loading branch information
lokitoth committed Jan 20, 2024
1 parent 7a4b013 commit 550da45
Show file tree
Hide file tree
Showing 10 changed files with 22 additions and 1 deletion.
Binary file modified test/train-sets/0001.fb
Binary file not shown.
Binary file modified test/train-sets/ccb.fb
Binary file not shown.
Binary file modified test/train-sets/cs.fb
Binary file not shown.
Binary file modified test/train-sets/multiclass.fb
Binary file not shown.
Binary file modified test/train-sets/multilabel.fb
Binary file not shown.
Binary file modified test/train-sets/rcv1_cb_eval.fb
Binary file not shown.
Binary file modified test/train-sets/rcv1_raw_cb_small.fb
Binary file not shown.
Binary file modified test/train-sets/wiki256_no_label.fb
Binary file not shown.
2 changes: 1 addition & 1 deletion vowpalwabbit/fb_parser/tests/flatbuffer_parser_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -256,4 +256,4 @@ TEST(flatbuffer_parser_tests, test_flatbuffer_standalone_example_error_code)
EXPECT_EQ(examples[0]->indices[0], VW::details::CONSTANT_NAMESPACE);

VW::finish_example(*all, *examples[0]);
}
}
21 changes: 21 additions & 0 deletions vowpalwabbit/fb_parser/tests/runtest_data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Flatbuffer RunTests Data Generation

Changes to the FB schema - particularly breaking ones, can easily lead to broken tests with silent, difficult-to-debug failures, because the stored input files use the old version of the schema. After this change becomes part of mainline VW, schema evolution will need to be carefully controlled, but if this PR gets put on hold for a significant time, regenerating the data may prove difficult without a record.

## General Approach

Given a command-line in VW, add `--fb_out <target_file>` and run via `"<build>/utl/flatbuffer/to_flatbuff"`

## Existing data files

| Test ID | --fb_out | Generation Args |
|---------|--------------|------------------------|
| 239 | train-sets/0001.fb | `-d train-sets/0001.fb` |
| 240 | train-sets/rcv1_raw_cb_small.df | `--cb_force_legacy --cb 2 --examples 500` |
| 241 | train-sets/multilabel.fb | `-d multilabel --multilabel_oaa 10` |
| 242 | train-sets/multiclass.fb | `-d multiclass -k --ect 10` |
| 243 | train-sets/cs.fb | `-d cs_test.ldf --invariant --csoaa_ldf multiline` |
| 244 | train-sets/rcv1_cb_eval.fb | `-d rcv1_cb_eval --cb 2 --eval --examples 500` |
| 245 | train-sets/wiki256_no_label.fb | `-d wiki256.dat --lda 100 --lda_alpha 0.01 --lda_rho 0.01 --lda_D 1000 -l 1 -b 13 --minibatch 128 -k` |
| 246 | train-sets/ccb.fb | `-d ccb_test.dat --ccb_explore_adf` |

0 comments on commit 550da45

Please sign in to comment.