feat(fuzzer): Add TopNRowNumberFuzzer #12103

aditi-pandit · 2025-01-16T22:20:04Z

topNRowNumber node is an optimized planNode for SQL with ranking window functions but which limits them to only the topN results. Add a TopNRowNumberFuzzer for plans with this planNode.

This fuzzer is closely modeled after the RowNumberFuzzer. So the common code is abstracted to a RowNumberFuzzerBase class which is used as the parent class for both RowNumberFuzzer and TopnRowNumberFuzzer.

The fuzzer generates plans only for row_number function right now. It will be enhanced to support rank and dense_rank functions after #11554

netlify · 2025-01-16T22:20:23Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`7de27c6`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/67b56c55c9c8110008a7a254

aditi-pandit · 2025-02-02T18:39:30Z

Have run the fuzzer for an hour without failures.

kagamiori

Hi @aditi-pandit, thank you for adding fuzzer coverage for TopNRowNumber node. I noticed that the member methods of TopNRowNumberFuzzer are very similar to RowNumberFuzzer's. Does it make sense to create a RowNumberFuzzerBase class with the common functions and let RowNumberFuzzer and TopNRowNumberFuzzer inherit that? Similar to how we organize AggregationFuzzer, WindowFuzzer, and AggregationFuzzerBase?

kagamiori · 2025-02-05T03:15:22Z

velox/exec/fuzzer/TopNRowNumberFuzzer.cpp

+  int32_t randInt(int32_t min, int32_t max) {
+    return boost::random::uniform_int_distribution<int32_t>(min, max)(rng_);
+  }


You can use the T rand(FuzzerGenerator& rng, T min, T max) in velox/common/fuzzer/Utils.h

kagamiori · 2025-02-05T03:20:39Z

velox/exec/fuzzer/TopNRowNumberFuzzer.cpp

+
+std::pair<std::vector<std::string>, std::vector<TypePtr>>
+TopNRowNumberFuzzer::generateKeys(const std::string& prefix) {
+  static const std::vector<TypePtr> kKeyTypes{


Can the key type be floating-point?

kagamiori · 2025-02-05T03:48:25Z

velox/exec/fuzzer/TopNRowNumberFuzzer.cpp

+          LOG(ERROR) << "Expected\n"
+                     << expected->toString(0, expected->size()) << "\nActual\n"
+                     << actual->toString(0, actual->size());
+          throw;


nit: assertEqualResults() should already print out the unmatched rows. Is that not enough?

i picked this up from some RowNumberFuzzer code. Though you're right, it might not be needed.

kagamiori · 2025-02-05T04:02:49Z

velox/exec/fuzzer/TopNRowNumberFuzzerRunner.cpp

+std::unique_ptr<exec::test::ReferenceQueryRunner> setupReferenceQueryRunner(
+    memory::MemoryPool* aggregatePool,
+    const std::string& prestoUrl,
+    const std::string& runnerName,
+    const uint32_t& reqTimeoutMs) {
+  if (prestoUrl.empty()) {
+    auto duckQueryRunner =
+        std::make_unique<exec::test::DuckQueryRunner>(aggregatePool);
+    LOG(INFO) << "Using DuckDB as the reference DB.";
+    return duckQueryRunner;
+  }
+
+  LOG(INFO) << "Using Presto as the reference DB.";
+  return std::make_unique<exec::test::PrestoQueryRunner>(
+      aggregatePool,
+      prestoUrl,
+      runnerName,
+      static_cast<std::chrono::milliseconds>(reqTimeoutMs));
+}


You can use setupReferenceQueryRunner() defined in exec/fuzzer/FuzzerUtil.cpp.

+1. I moved that in a separate PR as it affected multiple fuzzers. Will change it here as well.

aditi-pandit · 2025-02-06T17:19:36Z

Hi @aditi-pandit, thank you for adding fuzzer coverage for TopNRowNumber node. I noticed that the member methods of TopNRowNumberFuzzer are very similar to RowNumberFuzzer's. Does it make sense to create a RowNumberFuzzerBase class with the common functions and let RowNumberFuzzer and TopNRowNumberFuzzer inherit that? Similar to how we organize AggregationFuzzer, WindowFuzzer, and AggregationFuzzerBase?

@kagamiori : Most of the common logic in those classes could have even broader applicability so I moved them to FuzzerUtil class. What remains are the flags etc. I notice that AggregationFuzzerBase covers flags as well. We can try something like that. Will modify the code shortly.

aditi-pandit · 2025-02-07T03:09:57Z

@kagamiori : Have added a RowNumberFuzzerBase class now. PTAL. Thanks !

kagamiori

Hi @aditi-pandit, the code looks good to me, except a few comments for small refactoring. Thanks!

kagamiori · 2025-02-08T01:42:11Z

velox/exec/fuzzer/FuzzerUtil.cpp

+std::vector<RowVectorPtr> flatten(const std::vector<RowVectorPtr>& vectors) {
+  std::vector<RowVectorPtr> flatVectors;
+  for (const auto& vector : vectors) {
+    auto flat = BaseVector::create<RowVector>(
+        vector->type(), vector->size(), vector->pool());
+    flat->copy(vector.get(), 0, 0, vector->size());
+    flatVectors.push_back(flat);
+  }
+
+  return flatVectors;
+}


nit: Can this method be replaced with BaseVector::flattenVector()?

@kagamiori : BaseVector::flattenVector flattens the original input vector, while this flatten method returns a separate output vector for the flattened list of input vectors. The original vector is retained and we want to run the fuzzer on the original input vectors only.

An option could be to copy the input vector and then use BaseVector::flatten(...) on the copied vector. But the rest of the stubbing around it makes it seem simpler to use the current flatten method. wdyt ?

@kagamiori : Nvm... I realize after using common logVectors method from AggregationFuzzerBase that this method isn't needed. see #12300

Updated with #12300

kagamiori · 2025-02-08T01:49:08Z

velox/exec/fuzzer/FuzzerUtil.cpp

+  // Disable testing with TableScan when input contains TIMESTAMP type, due to
+  // the issue #8127.
+  if (type->kind() == TypeKind::TIMESTAMP) {
+    return false;
+  }


Hi @aditi-pandit, I remember that for running fuzzers with PQR, the issue #8127 can be resolved by adding -Duser.timezone=America/Los_Angeles to etc/jvm.config in the directory where you installed the Presto server (e.g., presto-server-0.284/etc/jvm.config). Could you please have a try and let me know if it works?

@kagamiori : Have tried with the user.timezone config and I don't see the errors either.

But of course it fails in Velox CI here.

So we'll have to change that separately and let this PR remain as is.

wdyt ?

Hi @aditi-pandit, I believe we already set -Duser.timezone=America/Los_Angeles in the CI jobs here, so if you still see an error when enabling Timestamp type, that needs to be looked into.

For now, could we move if (type->kind() == TypeKind::TIMESTAMP) { return false; } to a local method only used in TopNRowNumberFuzzer.cpp, so that this won't affect other fuzzers that use this isTableScanSupported() utility method? Then you can look into the error caused by Timestamp type after this PR is merged.

kagamiori · 2025-02-08T02:05:30Z

velox/exec/fuzzer/RowNumberFuzzerBase.cpp

+void RowNumberFuzzerBase::logInput(const std::vector<RowVectorPtr>& input) {
+  if (VLOG_IS_ON(1)) {
+    // Flatten inputs.
+    const auto flatInput = test::flatten(input);
+    VLOG(1) << "Input: " << input[0]->toString();
+    for (const auto& v : flatInput) {
+      VLOG(1) << std::endl << v->toString(0, v->size());
+    }
+  }
+}


nit: AggregationFuzzerBase also has a logVectors() method. We can move that to a common place to reuse the code.

Check #12300. I'll update this PR post its merge.

Updated with #12300

aditi-pandit · 2025-02-19T04:12:23Z

@kagamiori : Have updated this PR after the refactoring PR is merged. PTAL.

Timestamp issue

kagamiori · 2025-02-20T18:56:42Z

LGTM. Let's look into the error caused by Timestamp type and enable Timestamp in a separate PR.

facebook-github-bot · 2025-02-20T18:58:12Z

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 16, 2025

aditi-pandit force-pushed the topn_fuzzer branch 3 times, most recently from 9a7590d to a9011fa Compare January 16, 2025 23:26

aditi-pandit force-pushed the topn_fuzzer branch 12 times, most recently from f7beb33 to 25ac51c Compare January 31, 2025 03:48

aditi-pandit changed the title ~~Add TopNRowNumberFuzzer~~ feat(Fuzzer) : Add TopNRowNumberFuzzer Jan 31, 2025

aditi-pandit requested review from duanmeng, xiaoxmeng, kagamiori and kgpai January 31, 2025 03:52

aditi-pandit force-pushed the topn_fuzzer branch from 25ac51c to 49a43dd Compare January 31, 2025 03:55

aditi-pandit changed the title ~~feat(Fuzzer) : Add TopNRowNumberFuzzer~~ feat(Fuzzer): Add TopNRowNumberFuzzer Jan 31, 2025

aditi-pandit changed the title ~~feat(Fuzzer): Add TopNRowNumberFuzzer~~ feat(fuzzer): Add TopNRowNumberFuzzer Jan 31, 2025

aditi-pandit force-pushed the topn_fuzzer branch from 49a43dd to f458e56 Compare January 31, 2025 03:58

aditi-pandit marked this pull request as ready for review January 31, 2025 03:59

aditi-pandit requested review from assignUser and majetideepak as code owners January 31, 2025 03:59

aditi-pandit mentioned this pull request Jan 31, 2025

Add TopNRowNumberFuzzer for TopNRowNumber operator #12017

Open

aditi-pandit requested a review from Yuhta January 31, 2025 04:25

aditi-pandit mentioned this pull request Jan 31, 2025

feat: TopNRank optimization #11554

Open

aditi-pandit force-pushed the topn_fuzzer branch 3 times, most recently from 8be77d4 to 7fbe198 Compare February 1, 2025 03:09

aditi-pandit changed the title ~~feat(fuzzer): Add TopNRowNumberFuzzer~~ feat(fuzzer): Add TopNRowNumberFuzzer Feb 2, 2025

kagamiori reviewed Feb 5, 2025

View reviewed changes

aditi-pandit force-pushed the topn_fuzzer branch from 7fbe198 to 04b0749 Compare February 6, 2025 07:23

aditi-pandit force-pushed the topn_fuzzer branch 6 times, most recently from cafd0d1 to cfa8421 Compare February 7, 2025 03:06

kagamiori reviewed Feb 8, 2025

View reviewed changes

aditi-pandit force-pushed the topn_fuzzer branch 2 times, most recently from 2d7ec1a to 6dbf273 Compare February 19, 2025 04:06

feat(fuzzer): Add TopNRowNumberFuzzer

7de27c6

Timestamp issue

aditi-pandit force-pushed the topn_fuzzer branch from 6dbf273 to 7de27c6 Compare February 19, 2025 05:29

kagamiori approved these changes Feb 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(fuzzer): Add TopNRowNumberFuzzer #12103

feat(fuzzer): Add TopNRowNumberFuzzer #12103

aditi-pandit commented Jan 16, 2025 •

edited

Loading

netlify bot commented Jan 16, 2025 •

edited

Loading

aditi-pandit commented Feb 2, 2025

kagamiori left a comment

kagamiori Feb 5, 2025

kagamiori Feb 5, 2025

kagamiori Feb 5, 2025

aditi-pandit Feb 6, 2025

kagamiori Feb 5, 2025

aditi-pandit Feb 6, 2025

aditi-pandit commented Feb 6, 2025

aditi-pandit commented Feb 7, 2025

kagamiori left a comment

kagamiori Feb 8, 2025

aditi-pandit Feb 10, 2025

aditi-pandit Feb 10, 2025

aditi-pandit Feb 18, 2025

kagamiori Feb 8, 2025

aditi-pandit Feb 19, 2025 •

edited

Loading

kagamiori Feb 20, 2025 •

edited

Loading

kagamiori Feb 8, 2025

aditi-pandit Feb 10, 2025

aditi-pandit Feb 18, 2025

aditi-pandit commented Feb 19, 2025

kagamiori commented Feb 20, 2025

facebook-github-bot commented Feb 20, 2025

feat(fuzzer): Add TopNRowNumberFuzzer #12103

Are you sure you want to change the base?

feat(fuzzer): Add TopNRowNumberFuzzer #12103

Conversation

aditi-pandit commented Jan 16, 2025 • edited Loading

netlify bot commented Jan 16, 2025 • edited Loading

✅ Deploy Preview for meta-velox canceled.

aditi-pandit commented Feb 2, 2025

kagamiori left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aditi-pandit commented Feb 6, 2025

aditi-pandit commented Feb 7, 2025

kagamiori left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aditi-pandit Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

kagamiori Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aditi-pandit commented Feb 19, 2025

kagamiori commented Feb 20, 2025

facebook-github-bot commented Feb 20, 2025

aditi-pandit commented Jan 16, 2025 •

edited

Loading

netlify bot commented Jan 16, 2025 •

edited

Loading

aditi-pandit Feb 19, 2025 •

edited

Loading

kagamiori Feb 20, 2025 •

edited

Loading