Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ByteArray to storage #229

Closed
wants to merge 5 commits into from

Conversation

glihm
Copy link

@glihm glihm commented Dec 11, 2023

Pull Request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build-related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

StarkWare is working on ByteArray support. Currently there is no support for ByteArray (aka string) in storage of contracts.

What is the new behavior?

This PR introduces a naive implementation for ByteArray support in storage.

  • simple value in storage.
  • as a value of a LegacyMap (induces by the first).
  • as a key of a LegacyMap by deriving LegacyHash.

Does this introduce a breaking change?

  • Yes
  • No

Other information

About the Store trait, I've a doubt on the size which returns 1 currently. In the current design, we only store one felt at the base address. Other addresses are derived by a computation of keys.

I bumped to 2.4.0 to have it compiling, all tests are in green. But if it's preferable to have a PR dedicated to the bump, I can rebase later.

@glihm glihm requested a review from 0xLucqs as a code owner December 11, 2023 03:31
src/storage/src/byte_array.cairo Show resolved Hide resolved
src/storage/src/byte_array.cairo Outdated Show resolved Hide resolved
src/storage/src/tests/byte_array_test.cairo Outdated Show resolved Hide resolved
src/storage/src/tests/byte_array_test.cairo Outdated Show resolved Hide resolved
@glihm glihm requested a review from 0xLucqs December 11, 2023 15:58
@0xLucqs
Copy link
Collaborator

0xLucqs commented Dec 11, 2023

Are we storing 1 array / "cell" or are we storing 1 byte / "cell" ?

@glihm
Copy link
Author

glihm commented Dec 11, 2023

Are we storing 1 array / "cell" or are we storing 1 byte / "cell" ?

In this proposed PR, the idea is to not rely on segments (perhaps we should?), and use one slot for each element of the ByteArray struct and required data.

ByteArray {
    data: array![1, 2],
    pending_word: 0,
    pending_word_len: 0,
}

This will use 5 slots:

slot 0: base address (usually the name of the storage variable) => data.len()
slot 1: pending_word
slot 2: pending_word_len
slot 3: data[0]
slot 4: data[1]

And the slots addresses are all derived from the base address when write/read is called, and derived with poseidon from a known set of keys.

Is this something that can work, or we should rework this approach?

@0xLucqs
Copy link
Collaborator

0xLucqs commented Dec 12, 2023

I'm just wondering the difference between that and the List<T> already done, because here it's just a specific usage of the list where T = Byte31 ?

@glihm
Copy link
Author

glihm commented Dec 12, 2023

I'm just wondering the difference between that and the List<T> already done, because here it's just a specific usage of the list where T = Byte31 ?

In a sense yes, agree on that. But on the other hand, the management of the pending word is then totally manual.
With the current solution, the whole data structure is handled by the trait.

@0xLucqs
Copy link
Collaborator

0xLucqs commented Dec 12, 2023

Does the pending word make sense in this context ?

@glihm
Copy link
Author

glihm commented Dec 12, 2023

Does the pending word make sense in this context ?

To be sure I get you, you are talking about using the List<T> as the underlying type for storing the info in the StoreByteArray trait? Or not using this trait at all and let the user deal with List<T>?

In any case, we do need to keep the exact state of the pending word, as it can be any content between 0 and 30 bytes long. And Bytes31 does not retain the length, or does it? We need to store it explicitly if not.

@0xLucqs
Copy link
Collaborator

0xLucqs commented Dec 12, 2023

2nd option!
To get the length of the last word we can add a helper function to get the next power of 2. WDYT ?

@glihm
Copy link
Author

glihm commented Dec 12, 2023

2nd option! To get the length of the last word we can add a helper function to get the next power of 2. WDYT ?

I'm really not sure about the DevX in this situation. Users will want to use "strings" (aka ByteArray) in legacy maps too I guess, which means implementing the LegacyHash trait on List<bytes31>?

Let me know if there is something I'm missing, but it would be like this right?

    #[storage]
    struct Storage {
        my_string: List<bytes31>,
        strings: List<List<bytes31>>,
    }

Where we also need to reconstruct / deconstruct the ByteArray to put it in the list with the helper function. So any read/write require a call to an helper function to format the data.

Where we can simply do the following:

    #[storage]
    struct Storage {
        my_string: ByteArray,
        strings: List<ByteArray>,
    }

If it's a matter of efficiency by using the storage segment, I can adapt.
Otherwise, if you do prefer to keep the bytes31, feel free to close the PR as it's not relevant anymore as list is already implemented. 👍

@0xLucqs
Copy link
Collaborator

0xLucqs commented Dec 14, 2023

I don't have a strong opinion, just want to understand the trade offs here

@0xLucqs
Copy link
Collaborator

0xLucqs commented Dec 14, 2023

I see where you're going, ok this pr makes sense (misunderstood it)

Copy link
Collaborator

@0xLucqs 0xLucqs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to make sure that you're not overwriting non zero cells. also i still see some unwrap syscalls in functions that return a result

@glihm
Copy link
Author

glihm commented Dec 14, 2023

you need to make sure that you're not overwriting non zero cells. also i still see some unwrap syscalls in functions that return a result

Sorry I shouldn't have resolved the first change request you've made here where I added a comment to mention that ? operator is not supported in loop (at least in 2.4.0). 👍

@glihm
Copy link
Author

glihm commented Dec 14, 2023

you need to make sure that you're not overwriting non zero cells. also i still see some unwrap syscalls in functions that return a result

And about not re-writing 0 cell, I'm not sure why we need to do that. We rely on poseidon and huge slot space, no?
Because when for instance a user want to update it's value in the storage with a write, we inevitably overwrite a non-0 slot.

@0xLucqs
Copy link
Collaborator

0xLucqs commented Dec 14, 2023

instance a user want to update it's value in the storage with a write, we inevitably overwrite a non-0 slot.

Yes ofc i just don't want that the byte array in cell n-1 overwrites the byte array in cell n, i didn't check what storage addresses looked like when computed that way (if they're far enough from themselves). If it's far enough i'll be too expensive to store something that would "spill" over the next one so it should be fine

@glihm
Copy link
Author

glihm commented Dec 14, 2023

instance a user want to update it's value in the storage with a write, we inevitably overwrite a non-0 slot.

Yes ofc i just don't want that the byte array in cell n-1 overwrites the byte array in cell n, i didn't check what storage addresses looked like when computed that way (if they're far enough from themselves). If it's far enough i'll be too expensive to store something that would "spill" over the next one so it should be fine

Totally agree, that's why I've not use the segments, because in my understanding segments are a region of 256 consecutive slots.
So in the proposed PR, I use poseidon for each field of ByteArray, and then poseidon again for each element in data to ensure as much entropy as possible by selecting a fresh new slot (base address computed with poseidon each time).

@0xLucqs
Copy link
Collaborator

0xLucqs commented Dec 14, 2023

Now i think that the size isn't correct anymore and should be the length of the ByteArray ?

@glihm
Copy link
Author

glihm commented Dec 14, 2023

Now i think that the size isn't correct anymore and should be the length of the ByteArray ?

Here I have my doubt yes. Because the ByteArray has 3 fields, but data is an array. So I in the sense I've understood this, the size is what we expect in the segment (because it is bounded by u8 type). And in the current implementation, we always have only 1 felt of the ByteArray (data.len()) that is stored in the segment given by the base address. That's why I've put 1.

And the length of data of the ByteArray can't be known as the size method does not take any instance as input. So yes, it's not clear to me what should be this value, as it's only constant size that should be returned by this function.

@glihm
Copy link
Author

glihm commented Dec 18, 2023

@LucasLvy did you have more inputs about the conformity in the storage for this solution?

@0xLucqs
Copy link
Collaborator

0xLucqs commented Dec 19, 2023

lol sorry for that they'll add it in the core lib so i guess we don't need to answer this question

@0xLucqs
Copy link
Collaborator

0xLucqs commented Dec 19, 2023

@glihm
Copy link
Author

glihm commented Dec 19, 2023

starkware-libs/cairo#4596

That even better! Thank you very much and sorry I didn't see it lately.
Will close. :)

@glihm glihm closed this Dec 19, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Dec 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants