Layouts now store a reference to array_data instead of a value #51

JohanMabille · 2024-03-27T20:34:01Z

Fixes #50

include/sparrow/fixed_size_layout.hpp

Klaim · 2024-03-27T20:47:46Z

include/sparrow/fixed_size_layout.hpp

@@ -121,7 +121,7 @@ namespace sparrow
        // using iterator = layout_iterator<self_type, false>;
        // using const_iterator = layout_iterator<self_type, true>;

-        explicit fixed_size_layout(array_data p);
+        explicit fixed_size_layout(array_data& data);


In general, can layouts be moved or copied? My assumption is yes but if so we need to add ways for these operations to be safe, in particular in the context of moving array_data objects around (if that would happen). Because now that it's a pointer being stored, moved-from layouts might lead to UB?

Ah good catch, the overall structure would be something like:

tempate < .... > class typed_array { private: array_data m_data; layout_type m_layout; ];

With m_layout referencing m_data, and everything being movable / copiable ...

One solution would be to move m_data into a std::unique_ptr to guarantee in-memory stability.
Another solution is to have the layout classes provide a void reset(array_data&) method to update its pointer after the array_data has been moved.

A third option is to delete copy and move semantics of layouts and reinstantiate them each time we copy / move a typed_array. This is actually my favorite option.

So that leads to questions like:

if we put a unique_ptr in typed_array , is the cost of allocation a problem? (assuming both members would be contained in the allocated object, like a pimpl)

is it ok to implement typed_array's copy operations manually? what happens to m_layout if it's not copyable but we need to copy the typed_array?

I feel like keeping layouts copy and move enabled might be the simplest solution, not sure...

The copy / move constructor of typed_array would be something like:

typed_array(const typed_array& rhs) : m_data(rhs.m_data) , m_layout(m_data) { } typed_array(typed_array&& rhs) : m_data(std::move(rhs.m_data)) , m_layout(m_data) { }

The issue is for assignment operators (copy and move). We probably need an additional "reset_array_data" method to be able to implement these operators.

The issue with keeping the copy and move semantic is that it will be sliently wrong the layout ogf a copied array will still refer to the array_data of the initial array.

if we put a unique_ptr in typed_array , is the cost of allocation a problem? (assuming both members would be contained in the allocated object, like a pimpl)

I don't really know; I would say that as much as we can avoid dynamic allocation, we should do it. Especially if the copy / move semantic of typed_array is "easy enough" to implement without the unique_ptr (even if I admit that "easy enough" is totally suggestive)

I would say that as much as we can avoid dynamic allocation, we should do it.

I agree.

The issue is for assignment operators (copy and move). We probably need an additional "reset_array_data" method to be able to implement these operators.

In those cases, only the m_data should be moved and copied, as m_layout always refers to the local m_data. No need to change m_layout... Or am I missing something? Does m_layout have an actual state? Depends on the layout type?

In those cases, only the m_data should be moved and copied, as m_layout always refers to the local m_data. No need to change m_layout... Or am I missing something?

Oh indeed, the layout will still point to the same object, so no need to do anything!

Does m_layout have an actual state? Depends on the layout type?

For me a layout class is just a "way" to read the data, it should be stateless.

Klaim · 2024-03-27T20:49:13Z

include/sparrow/fixed_size_layout.hpp

    {
        // We only require the presence of the bitmap and the first buffer.
-        assert(m_data.buffers.size() > 0);
-        assert(m_data.length == m_data.bitmap.size());
+        assert(p_data->buffers.size() > 0);


We might need a free function checking if an array_data seems well-formed, and use that in asserts too.

Whether an array_data is well-formed is actually specific to the layout (even if some layouts share some invariants).

In that case it might be necessary to make a concept for layouts that requests, among other thing, that these layout types provide each a way to check if the provided array_data matches the expected structure.
Then, because of the nature of the construction of the layout, the check must be done whatever the build context and then we must throw an exception (or abort, that depends on what we decide to support as error handling) or keep assertions and document that providing an array_data to constructor layout that doesnt match the expected structure is undefined (and maybe provide a way to activate the check).
Lots of tradeoff choices here.

I think we should do this in a dedicated PR. I'll open an issue to track it.

include/sparrow/fixed_size_layout.hpp

Klaim

LGTM, dont forget to note the things to do in future prs

Klaim reviewed Mar 27, 2024

View reviewed changes

Alex-PLACET reviewed Mar 28, 2024

View reviewed changes

include/sparrow/fixed_size_layout.hpp Outdated Show resolved Hide resolved

JohanMabille changed the title ~~Layouts now store a pointer to array_data instead of a value~~ Layouts now store a reference to array_data instead of a value Mar 28, 2024

Alex-PLACET approved these changes Mar 28, 2024

View reviewed changes

jjerphan approved these changes Apr 2, 2024

View reviewed changes

JohanMabille mentioned this pull request Apr 3, 2024

Add dictionary encoded layout #56

Merged

JohanMabille added 4 commits April 6, 2024 08:19

Layouts now store a pointer to array_data instead of a value

f1d9df5

Delete copy and move semantics in layout classes

387fc4f

Store reference wrapper instead of pointer

64912a4

Rebased and adpated dictionary_encoded layout

8d7c2b2

JohanMabille force-pushed the array_data_ref branch from 4615b78 to 8d7c2b2 Compare April 8, 2024 04:42

Removed copy and move semantics of dictionary_encoded_layout

398c200

Klaim approved these changes Apr 8, 2024

View reviewed changes

JohanMabille merged commit 9887596 into man-group:main Apr 8, 2024
20 checks passed

JohanMabille deleted the array_data_ref branch April 8, 2024 12:59

JohanMabille mentioned this pull request Apr 9, 2024

Implement typed_array #65

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layouts now store a reference to array_data instead of a value #51

Layouts now store a reference to array_data instead of a value #51

JohanMabille commented Mar 27, 2024

Klaim Mar 27, 2024

JohanMabille Mar 27, 2024 •

edited

Loading

JohanMabille Mar 27, 2024 •

edited

Loading

JohanMabille Mar 28, 2024

Klaim Mar 28, 2024

JohanMabille Mar 28, 2024

Klaim Mar 28, 2024

JohanMabille Mar 28, 2024 •

edited

Loading

Klaim Mar 27, 2024

JohanMabille Mar 28, 2024

Klaim Mar 28, 2024

JohanMabille Apr 8, 2024

Klaim left a comment

Layouts now store a reference to array_data instead of a value #51

Layouts now store a reference to array_data instead of a value #51

Conversation

JohanMabille commented Mar 27, 2024

Choose a reason for hiding this comment

JohanMabille Mar 27, 2024 • edited Loading

Choose a reason for hiding this comment

JohanMabille Mar 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohanMabille Mar 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Klaim left a comment

Choose a reason for hiding this comment

JohanMabille Mar 27, 2024 •

edited

Loading

JohanMabille Mar 27, 2024 •

edited

Loading

JohanMabille Mar 28, 2024 •

edited

Loading