Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: replace DataAccessor with Table in ProofExpr && remove input_length from ProofPlan::result_evaluate #366

Merged
merged 4 commits into from
Nov 14, 2024

Conversation

iajoiner
Copy link
Contributor

@iajoiner iajoiner commented Nov 11, 2024

Please be sure to look over the pull request guidelines here: https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/CONTRIBUTING.md#submit-pr.

Please go through the following checklist

Rationale for this change

We would like to evaluate ProofExpr on Table similar to how datafusion evaluates PhysicalExpr on RecordBatch. It is better to use a Table than to use a DataAccessor since the latter has access to more than one table while a ProofExpr shouldn't be able to be evaluated on multiple tables (CTEs etc count as tables). Moreover it is absurd to have a single input_length in a ProofPlan since it can have multiple sources.

What changes are included in this PR?

  • remove arg input_length from ProofPlan::result_evaluate
  • remove arg table_length from ProofExpr::result_evaluate
  • replace arg accessor from ProofPlan::result_evaluate with table: &Table<'a, S>
  • replace arg accessor from ProofPlan::prover_evaluate with table: &Table<'a, S>

Are these changes tested?

Yes.

@iajoiner iajoiner force-pushed the refactor/rem-acc branch 6 times, most recently from f5cfcae to 33c745b Compare November 12, 2024 20:09
@iajoiner iajoiner marked this pull request as ready for review November 12, 2024 20:36
@iajoiner iajoiner changed the title refactor!: replace DataAccessor with Table in ProofExpr refactor!: replace DataAccessor with Table in ProofExpr && remove table_length from ProofPlan::result_evaluate Nov 12, 2024
@iajoiner iajoiner changed the title refactor!: replace DataAccessor with Table in ProofExpr && remove table_length from ProofPlan::result_evaluate refactor!: replace DataAccessor with Table in ProofExpr && remove table_length from ProofPlan::result_evaluate Nov 12, 2024
@iajoiner iajoiner changed the title refactor!: replace DataAccessor with Table in ProofExpr && remove table_length from ProofPlan::result_evaluate refactor!: replace DataAccessor with Table in ProofExpr && remove input_length from ProofPlan::result_evaluate Nov 12, 2024
@iajoiner iajoiner force-pushed the refactor/rem-acc branch 4 times, most recently from 695d4d1 to 4b8f3e0 Compare November 12, 2024 21:55
let lhs = lhs_column.as_boolean().expect("lhs is not boolean");
let rhs = rhs_column.as_boolean().expect("rhs is not boolean");
Column::Boolean(alloc.alloc_slice_fill_with(table_length, |i| lhs[i] && rhs[i]))
Column::Boolean(
alloc.alloc_slice_fill_with(table.num_rows().unwrap_or(0), |i| lhs[i] && rhs[i]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that 0 is the correct default.
If table has no columns, it feels like lhs_column and rhs_column will have to be literals.
Eventually, we should get a proper solution with ColumnarValue, but for now, I think it makes more sense to pull the length from the lhs_column and rhs_column (and assert that the lengths equal).

This is also what is done in prover_evaluate.

///
/// Will panic if the column is not found. Shouldn't happen in practice since
/// code in `sql/parse` should have already checked that the column exists.
pub fn get_column<'a, S: Scalar>(&self, table: &Table<'a, S>) -> Column<'a, S> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

) -> Column<'a, S> {
Column::from_literal_with_length(&self.value, table_length, alloc)
Column::from_literal_with_length(&self.value, table.num_rows().unwrap_or(0), alloc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only spot where a default makes sense to me. If there are no columns, it's not obvious to me what the proper behavior here is. Probably 1 instead of 0.

Also, please add a test to cover this case.

alloc: &'a Bump,
accessor: &'a dyn DataAccessor<S>,
) -> Vec<Column<'a, S>> {
let column_refs = self.get_column_references();
let used_table = Table::<'a, S>::try_from_iter(column_refs.iter().map(|column_ref| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

Suggested change
let used_table = Table::<'a, S>::try_from_iter(column_refs.iter().map(|column_ref| {
let used_table = Table::<'a, S>::try_from_iter(column_refs.into_iter().map(|column_ref| {

Comment on lines 181 to 186
let column_refs = self.get_column_references();
let used_table = Table::<'a, S>::try_from_iter(column_refs.iter().map(|column_ref| {
let column = accessor.get_column(*column_ref);
(column_ref.column_id(), column)
}))
.expect("Failed to create table from column references");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to put this in a function since it is repeated multiple times.

Comment on lines 206 to 208
//TODO: Currently we have to have non-empty column references to have a non-empty table
// to evaluate `ProofExpr`s on. Once we restrict [`DataAccessor`] to [`TableExec`]
// and use input `DynProofPlan`s we should no longer need this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be wise to add row_count as a field of Table. This would fix this issue as well as the comments I have above. See RecordBatch.

…ve `table_length` from `ProofPlan::result_evaluate`
@iajoiner iajoiner force-pushed the refactor/rem-acc branch 3 times, most recently from 61da2f0 to 118f432 Compare November 13, 2024 20:11
}
impl<'a, S: Scalar> Table<'a, S> {
/// Creates a new [`Table`].
pub fn try_new(table: IndexMap<Identifier, Column<'a, S>>) -> Result<Self, TableError> {
if table.is_empty() {
return Ok(Self { table });
// `EmptyExec` should have one row for queries such as `SELECT 1`.
return Ok(Self { table, num_rows: 1 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// # Panics
/// Missing columns or column length mismatches can occur if the accessor doesn't
/// contain the necessary columns. In practice, this should not happen.
pub(crate) fn from_columns(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer for this to be a provided method on DataAccessor, since Table doesn't really need to know about the concept of DataAccessor to be useful.

Comment on lines 61 to 67
// TODO: Currently we have to have non-empty column references to have a non-empty table
// to evaluate `ProofExpr`s on. Once we restrict [`DataAccessor`] to [`TableExec`]
// and use input `DynProofPlan`s we should no longer need this.
let input_length = accessor.get_length(table_ref);
let bogus_vec = vec![true; input_length];
let bogus_col = Column::Boolean(alloc.alloc_slice_copy(&bogus_vec));
Table::<'a, S>::try_from_iter(core::iter::once(("bogus".parse().unwrap(), bogus_col)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the proper behavior to create a Table with no columns, but the proper length?

@iajoiner iajoiner merged commit cecf821 into main Nov 14, 2024
11 checks passed
Copy link

🎉 This PR is included in version 0.45.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants