Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(fuzzer): Add TopNRowNumberFuzzer #12103

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

feat(fuzzer): Add TopNRowNumberFuzzer #12103

wants to merge 1 commit into from

Conversation

aditi-pandit
Copy link
Collaborator

@aditi-pandit aditi-pandit commented Jan 16, 2025

topNRowNumber node is an optimized planNode for SQL with ranking window functions but which limits them to only the topN results. Add a TopNRowNumberFuzzer for plans with this planNode.

This fuzzer is closely modeled after the RowNumberFuzzer. So the common code is abstracted to a RowNumberFuzzerBase class which is used as the parent class for both RowNumberFuzzer and TopnRowNumberFuzzer.

The fuzzer generates plans only for row_number function right now. It will be enhanced to support rank and dense_rank functions after #11554

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 16, 2025
Copy link

netlify bot commented Jan 16, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 7de27c6
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67b56c55c9c8110008a7a254

@aditi-pandit aditi-pandit force-pushed the topn_fuzzer branch 3 times, most recently from 9a7590d to a9011fa Compare January 16, 2025 23:26
@aditi-pandit aditi-pandit force-pushed the topn_fuzzer branch 12 times, most recently from f7beb33 to 25ac51c Compare January 31, 2025 03:48
@aditi-pandit aditi-pandit changed the title Add TopNRowNumberFuzzer feat(Fuzzer) : Add TopNRowNumberFuzzer Jan 31, 2025
@aditi-pandit aditi-pandit changed the title feat(Fuzzer) : Add TopNRowNumberFuzzer feat(Fuzzer): Add TopNRowNumberFuzzer Jan 31, 2025
@aditi-pandit aditi-pandit changed the title feat(Fuzzer): Add TopNRowNumberFuzzer feat(fuzzer): Add TopNRowNumberFuzzer Jan 31, 2025
@aditi-pandit aditi-pandit marked this pull request as ready for review January 31, 2025 03:59
@aditi-pandit aditi-pandit requested a review from Yuhta January 31, 2025 04:25
@aditi-pandit aditi-pandit force-pushed the topn_fuzzer branch 3 times, most recently from 8be77d4 to 7fbe198 Compare February 1, 2025 03:09
@aditi-pandit aditi-pandit changed the title feat(fuzzer): Add TopNRowNumberFuzzer feat(fuzzer): Add TopNRowNumberFuzzer Feb 2, 2025
@aditi-pandit
Copy link
Collaborator Author

Have run the fuzzer for an hour without failures.

Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aditi-pandit, thank you for adding fuzzer coverage for TopNRowNumber node. I noticed that the member methods of TopNRowNumberFuzzer are very similar to RowNumberFuzzer's. Does it make sense to create a RowNumberFuzzerBase class with the common functions and let RowNumberFuzzer and TopNRowNumberFuzzer inherit that? Similar to how we organize AggregationFuzzer, WindowFuzzer, and AggregationFuzzerBase?

Comment on lines 107 to 106
int32_t randInt(int32_t min, int32_t max) {
return boost::random::uniform_int_distribution<int32_t>(min, max)(rng_);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the T rand(FuzzerGenerator& rng, T min, T max) in velox/common/fuzzer/Utils.h


std::pair<std::vector<std::string>, std::vector<TypePtr>>
TopNRowNumberFuzzer::generateKeys(const std::string& prefix) {
static const std::vector<TypePtr> kKeyTypes{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the key type be floating-point?

Comment on lines 400 to 403
LOG(ERROR) << "Expected\n"
<< expected->toString(0, expected->size()) << "\nActual\n"
<< actual->toString(0, actual->size());
throw;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: assertEqualResults() should already print out the unmatched rows. Is that not enough?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i picked this up from some RowNumberFuzzer code. Though you're right, it might not be needed.

Comment on lines 86 to 104
std::unique_ptr<exec::test::ReferenceQueryRunner> setupReferenceQueryRunner(
memory::MemoryPool* aggregatePool,
const std::string& prestoUrl,
const std::string& runnerName,
const uint32_t& reqTimeoutMs) {
if (prestoUrl.empty()) {
auto duckQueryRunner =
std::make_unique<exec::test::DuckQueryRunner>(aggregatePool);
LOG(INFO) << "Using DuckDB as the reference DB.";
return duckQueryRunner;
}

LOG(INFO) << "Using Presto as the reference DB.";
return std::make_unique<exec::test::PrestoQueryRunner>(
aggregatePool,
prestoUrl,
runnerName,
static_cast<std::chrono::milliseconds>(reqTimeoutMs));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use setupReferenceQueryRunner() defined in exec/fuzzer/FuzzerUtil.cpp.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I moved that in a separate PR as it affected multiple fuzzers. Will change it here as well.

@aditi-pandit
Copy link
Collaborator Author

Hi @aditi-pandit, thank you for adding fuzzer coverage for TopNRowNumber node. I noticed that the member methods of TopNRowNumberFuzzer are very similar to RowNumberFuzzer's. Does it make sense to create a RowNumberFuzzerBase class with the common functions and let RowNumberFuzzer and TopNRowNumberFuzzer inherit that? Similar to how we organize AggregationFuzzer, WindowFuzzer, and AggregationFuzzerBase?

@kagamiori : Most of the common logic in those classes could have even broader applicability so I moved them to FuzzerUtil class. What remains are the flags etc. I notice that AggregationFuzzerBase covers flags as well. We can try something like that. Will modify the code shortly.

@aditi-pandit aditi-pandit force-pushed the topn_fuzzer branch 6 times, most recently from cafd0d1 to cfa8421 Compare February 7, 2025 03:06
@aditi-pandit
Copy link
Collaborator Author

@kagamiori : Have added a RowNumberFuzzerBase class now. PTAL. Thanks !

Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aditi-pandit, the code looks good to me, except a few comments for small refactoring. Thanks!

Comment on lines 196 to 206
std::vector<RowVectorPtr> flatten(const std::vector<RowVectorPtr>& vectors) {
std::vector<RowVectorPtr> flatVectors;
for (const auto& vector : vectors) {
auto flat = BaseVector::create<RowVector>(
vector->type(), vector->size(), vector->pool());
flat->copy(vector.get(), 0, 0, vector->size());
flatVectors.push_back(flat);
}

return flatVectors;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can this method be replaced with BaseVector::flattenVector()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kagamiori : BaseVector::flattenVector flattens the original input vector, while this flatten method returns a separate output vector for the flattened list of input vectors. The original vector is retained and we want to run the fuzzer on the original input vectors only.

An option could be to copy the input vector and then use BaseVector::flatten(...) on the copied vector. But the rest of the stubbing around it makes it seem simpler to use the current flatten method. wdyt ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kagamiori : Nvm... I realize after using common logVectors method from AggregationFuzzerBase that this method isn't needed. see #12300

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with #12300

Comment on lines +221 to +213
// Disable testing with TableScan when input contains TIMESTAMP type, due to
// the issue #8127.
if (type->kind() == TypeKind::TIMESTAMP) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aditi-pandit, I remember that for running fuzzers with PQR, the issue #8127 can be resolved by adding -Duser.timezone=America/Los_Angeles to etc/jvm.config in the directory where you installed the Presto server (e.g., presto-server-0.284/etc/jvm.config). Could you please have a try and let me know if it works?

Copy link
Collaborator Author

@aditi-pandit aditi-pandit Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kagamiori : Have tried with the user.timezone config and I don't see the errors either.

But of course it fails in Velox CI here.

So we'll have to change that separately and let this PR remain as is.

wdyt ?

Copy link
Contributor

@kagamiori kagamiori Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aditi-pandit, I believe we already set -Duser.timezone=America/Los_Angeles in the CI jobs here, so if you still see an error when enabling Timestamp type, that needs to be looked into.

For now, could we move if (type->kind() == TypeKind::TIMESTAMP) { return false; } to a local method only used in TopNRowNumberFuzzer.cpp, so that this won't affect other fuzzers that use this isTableScanSupported() utility method? Then you can look into the error caused by Timestamp type after this PR is merged.

Comment on lines 97 to 106
void RowNumberFuzzerBase::logInput(const std::vector<RowVectorPtr>& input) {
if (VLOG_IS_ON(1)) {
// Flatten inputs.
const auto flatInput = test::flatten(input);
VLOG(1) << "Input: " << input[0]->toString();
for (const auto& v : flatInput) {
VLOG(1) << std::endl << v->toString(0, v->size());
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: AggregationFuzzerBase also has a logVectors() method. We can move that to a common place to reuse the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check #12300. I'll update this PR post its merge.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with #12300

@aditi-pandit aditi-pandit force-pushed the topn_fuzzer branch 2 times, most recently from 2d7ec1a to 6dbf273 Compare February 19, 2025 04:06
@aditi-pandit
Copy link
Collaborator Author

@kagamiori : Have updated this PR after the refactoring PR is merged. PTAL.

@kagamiori
Copy link
Contributor

LGTM. Let's look into the error caused by Timestamp type and enable Timestamp in a separate PR.

@facebook-github-bot
Copy link
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants