-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor!: replace DataAccessor
with Table
in ProofExpr
&& remove input_length
from ProofPlan::result_evaluate
#366
Conversation
f5cfcae
to
33c745b
Compare
f8ce3b5
to
ed5d485
Compare
ed5d485
to
30cf15b
Compare
DataAccessor
with Table
in ProofExpr
table_length
from ProofPlan::result_evaluate
table_length
from ProofPlan::result_evaluate
DataAccessor
with Table
in ProofExpr
&& remove table_length
from ProofPlan::result_evaluate
DataAccessor
with Table
in ProofExpr
&& remove table_length
from ProofPlan::result_evaluate
DataAccessor
with Table
in ProofExpr
&& remove input_length
from ProofPlan::result_evaluate
695d4d1
to
4b8f3e0
Compare
4b8f3e0
to
e380b59
Compare
let lhs = lhs_column.as_boolean().expect("lhs is not boolean"); | ||
let rhs = rhs_column.as_boolean().expect("rhs is not boolean"); | ||
Column::Boolean(alloc.alloc_slice_fill_with(table_length, |i| lhs[i] && rhs[i])) | ||
Column::Boolean( | ||
alloc.alloc_slice_fill_with(table.num_rows().unwrap_or(0), |i| lhs[i] && rhs[i]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that 0
is the correct default.
If table
has no columns, it feels like lhs_column
and rhs_column
will have to be literals.
Eventually, we should get a proper solution with ColumnarValue
, but for now, I think it makes more sense to pull the length from the lhs_column
and rhs_column
(and assert that the lengths equal).
This is also what is done in prover_evaluate
.
/// | ||
/// Will panic if the column is not found. Shouldn't happen in practice since | ||
/// code in `sql/parse` should have already checked that the column exists. | ||
pub fn get_column<'a, S: Scalar>(&self, table: &Table<'a, S>) -> Column<'a, S> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) -> Column<'a, S> { | ||
Column::from_literal_with_length(&self.value, table_length, alloc) | ||
Column::from_literal_with_length(&self.value, table.num_rows().unwrap_or(0), alloc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only spot where a default makes sense to me. If there are no columns, it's not obvious to me what the proper behavior here is. Probably 1
instead of 0
.
Also, please add a test to cover this case.
alloc: &'a Bump, | ||
accessor: &'a dyn DataAccessor<S>, | ||
) -> Vec<Column<'a, S>> { | ||
let column_refs = self.get_column_references(); | ||
let used_table = Table::<'a, S>::try_from_iter(column_refs.iter().map(|column_ref| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
let used_table = Table::<'a, S>::try_from_iter(column_refs.iter().map(|column_ref| { | |
let used_table = Table::<'a, S>::try_from_iter(column_refs.into_iter().map(|column_ref| { |
let column_refs = self.get_column_references(); | ||
let used_table = Table::<'a, S>::try_from_iter(column_refs.iter().map(|column_ref| { | ||
let column = accessor.get_column(*column_ref); | ||
(column_ref.column_id(), column) | ||
})) | ||
.expect("Failed to create table from column references"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to put this in a function since it is repeated multiple times.
//TODO: Currently we have to have non-empty column references to have a non-empty table | ||
// to evaluate `ProofExpr`s on. Once we restrict [`DataAccessor`] to [`TableExec`] | ||
// and use input `DynProofPlan`s we should no longer need this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be wise to add row_count
as a field of Table
. This would fix this issue as well as the comments I have above. See RecordBatch.
…ve `table_length` from `ProofPlan::result_evaluate`
e380b59
to
56a1885
Compare
61da2f0
to
118f432
Compare
118f432
to
36a8eba
Compare
} | ||
impl<'a, S: Scalar> Table<'a, S> { | ||
/// Creates a new [`Table`]. | ||
pub fn try_new(table: IndexMap<Identifier, Column<'a, S>>) -> Result<Self, TableError> { | ||
if table.is_empty() { | ||
return Ok(Self { table }); | ||
// `EmptyExec` should have one row for queries such as `SELECT 1`. | ||
return Ok(Self { table, num_rows: 1 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This default behavior doesn't really make sense.
This is what arrow does:
https://github.com/apache/arrow-rs/blob/3ee5048c8ea3aa531d111afe33d0a3551eabcd84/arrow-array/src/record_batch.rs#L86
https://github.com/apache/arrow-rs/blob/3ee5048c8ea3aa531d111afe33d0a3551eabcd84/arrow-array/src/record_batch.rs#L501
/// # Panics | ||
/// Missing columns or column length mismatches can occur if the accessor doesn't | ||
/// contain the necessary columns. In practice, this should not happen. | ||
pub(crate) fn from_columns( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer for this to be a provided method on DataAccessor
, since Table
doesn't really need to know about the concept of DataAccessor
to be useful.
// TODO: Currently we have to have non-empty column references to have a non-empty table | ||
// to evaluate `ProofExpr`s on. Once we restrict [`DataAccessor`] to [`TableExec`] | ||
// and use input `DynProofPlan`s we should no longer need this. | ||
let input_length = accessor.get_length(table_ref); | ||
let bogus_vec = vec![true; input_length]; | ||
let bogus_col = Column::Boolean(alloc.alloc_slice_copy(&bogus_vec)); | ||
Table::<'a, S>::try_from_iter(core::iter::once(("bogus".parse().unwrap(), bogus_col))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the proper behavior to create a Table
with no columns, but the proper length?
caf7502
to
e06f251
Compare
e06f251
to
3525484
Compare
3525484
to
e1ac21d
Compare
🎉 This PR is included in version 0.45.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Please be sure to look over the pull request guidelines here: https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/CONTRIBUTING.md#submit-pr.
Please go through the following checklist
!
is used if and only if at least one breaking change has been introduced.source scripts/run_ci_checks.sh
.Rationale for this change
We would like to evaluate
ProofExpr
onTable
similar to howdatafusion
evaluatesPhysicalExpr
onRecordBatch
. It is better to use aTable
than to use aDataAccessor
since the latter has access to more than one table while aProofExpr
shouldn't be able to be evaluated on multiple tables (CTEs etc count as tables). Moreover it is absurd to have a singleinput_length
in aProofPlan
since it can have multiple sources.What changes are included in this PR?
input_length
fromProofPlan::result_evaluate
table_length
fromProofExpr::result_evaluate
accessor
fromProofPlan::result_evaluate
withtable: &Table<'a, S>
accessor
fromProofPlan::prover_evaluate
withtable: &Table<'a, S>
Are these changes tested?
Yes.