Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Initial Support for FixedSizeBinary #240

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tareknaser
Copy link

@tareknaser tareknaser commented Oct 8, 2024

Rationale for this change

This PR adds support for FixedSizeBinary 

This PR is not meant to close #229 but serves as initial support. We should use this as a stepping stone and, after discussions arising from the code, decide how to build on this and address the issue in a follow-up PR.

What changes are included in this PR?

  • Added support for FixedSizeBinary in various parts of the codebase. The proposed representation is FixedSizeBinary(i32, Vec<u8>), with conversion implementations from (i32, Vec<u8>) and (i32, &[u8]).
  • Updated documentation to reflect the new changes (docs/SQLSyntaxSpecification.md)
  • Modified the implementation of impl<T: MontConfig<4>> From<&[u8]> for MontScalar<T> {:
    • Previously, this was only used for strings and hashed the value regardless.
    • Now, if the byte slice is empty, the result is the zero scalar. If the byte slice has length 31 or less, the bytes are directly converted to a scalar. If the byte slice has length 32, the bytes are hashed using blake3 and the result is converted to a scalar.
    • String handling was updated to keep it hashed all the time.

Are these changes tested?

Yes, new tests were added.
However, there are some "FIXME" comments in places where I wasn't sure if the implementation is correct. These are open for discussion and review.

Additional Notes

  • I have not added literal support yet.
  • I have attempted to add support in every required place in the code, but please let me know if I missed any areas!

@JayWhite2357
Copy link
Contributor

This PR is not meant to close #229 but serves as initial support. We should use this as a stepping stone and, after discussions arising from the code, decide how to build on this and address the issue in a follow-up PR.

Thanks for the PR! The CI is not passing (which I think you expected).
If there is anything you want our team to give you feedback on, it would be good if you can highlight that with comments since we won't have time for thorough feedback until a PR is ready to be merged/passing CI.

@tareknaser
Copy link
Author

Thanks, @JayWhite2357

Could we rerun the CI?
cargo test --all-features now passes locally

If there is anything you want our team to give you feedback on,

Could you start by taking a look at DataType, Column, OwnedColumn, and the others? I made some decisions where I wasn’t sure of the best approach, so your feedback would be really helpful. I also added some ‘FIXME’ comments.

By the way, I’d love to connect on Discord. I joined the server but not sure how to reach you there.

Copy link
Contributor

@JayWhite2357 JayWhite2357 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This is heading in the right direction.
I don't have an amazing solution to the concerns here, but hopefully my feedback can help with direction.

@@ -266,6 +303,9 @@ pub enum ColumnType {
/// Mapped to [`Curve25519Scalar`](crate::base::scalar::Curve25519Scalar)
#[serde(alias = "SCALAR", alias = "scalar")]
Scalar,
/// Mapped to fixed size binary
#[serde(alias = "FIXEDSIZEBINARY", alias = "fixedsizebinary")]
FixedSizeBinary(i32),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that would be nice would be having some way of ensuring that the number of bytes must be in the range [0, 32]. (Or at least positive.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I wanted to use FixedSizeBinary(usize) since it made more sense to me, but Arrow actually uses i32 for that, so I decided to stick with the convention.

having some way of ensuring that the number of bytes must be in the range [0, 32]

where do you think such a check should go?

crates/proof-of-sql/src/base/database/owned_column.rs Outdated Show resolved Hide resolved
Comment on lines 65 to 67
// FIXME: Is this interpretation correct? or should we chunk the bytes
// into `byte_width`-sized chunks and convert each chunk into a scalar?
compute_dory_commitment_impl(column, offset, setup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking it into chunks and converting each chunk into a scalar would work.
However, we should avoid that because it is an expensive conversion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe something like this may work:

Suggested change
// FIXME: Is this interpretation correct? or should we chunk the bytes
// into `byte_width`-sized chunks and convert each chunk into a scalar?
compute_dory_commitment_impl(column, offset, setup)
// FIXME: Is this interpretation correct? or should we chunk the bytes
// into `byte_width`-sized chunks and convert each chunk into a scalar?
compute_dory_commitment_impl(&column.iter().chunks_exact(byte_width).collect_vec(), offset, setup)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be sure to add some tests here ensuring that this is behaving correctly. Even though CI is currently passing, this code is currently incorrect.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I’ve updated that based on your suggestion and also added a test we_can_compute_a_dory_commitment_with_fixed_size_binary_values that is currently failing with the following error:

panicked at crates/proof-of-sql/src/proof_primitive/dory/pack_scalars.rs:179:14:

source slice length (1) does not match destination slice length (4)

tracing it back, I found that the pack_bit function is the one panicking. It seems to be related to the implementation of OffsetToBytes.

column.iter().enumerate().for_each(|(i, value)| {
	let index = i + offset;
	let row_offset = (index % num_columns) * bit_table_sum_in_bytes;
	let col_offset = current_byte_size * (index / num_columns);
	let offset_index = row_offset + col_offset + byte_offset;

	packed_scalars[offset_index..offset_index + current_byte_size]
		.copy_from_slice(&value.offset_to_bytes()[..]);
});

The issue is that it’s currently iterating over values and calling offset_to_bytes, treating each element in the column as its own value.

This is not ideal for FixedSizeBinary because multiple consecutive values form a single value (byte_size). I’m not quite sure how to handle this properly. Could you point me in the right direction?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any updates on this? I’ve rebased on top of main @JayWhite2357

Copy link
Contributor

@JayWhite2357 JayWhite2357 Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still marked as draft. I just approved the workflows. Once it's ready, mark it as ready for review.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked it as ready for review. The test we_can_compute_a_dory_commitment_with_fixed_size_binary_values is still failing the CI, though. Let me know what you think

@JayWhite2357 JayWhite2357 self-requested a review November 6, 2024 18:05
@tareknaser tareknaser marked this pull request as ready for review November 8, 2024 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants