Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

add computer-reorder feature to allow in-place sorting #1483

Open
bguo068 opened this issue May 13, 2023 · 0 comments
Open

add computer-reorder feature to allow in-place sorting #1483

bguo068 opened this issue May 13, 2023 · 0 comments

Comments

@bguo068
Copy link

bguo068 commented May 13, 2023

If I understand it correctly, multiple-column sorting in arrow2 is done in two steps:

While the method works quite well, it sometimes causes unnecessary memory allocation in step 2. For instance, when references to the columns are unique (not shared), we can do in-place sorting without allocation.

To avoid allocation, maybe we can have a reorder function (within a feature like compute-reorder) to do it.
The idea has been implemented for a single slice here (source:

fn sort_by_indices<T>(data: &mut [T], mut indices: Vec<usize>) {
    for idx in 0..data.len() {
        if indices[idx] != idx {
            let mut current_idx = idx;
            loop {
                let target_idx = indices[current_idx];
                indices[current_idx] = current_idx;
                if indices[target_idx] == target_idx {
                    break;
                }
                data.swap(current_idx, target_idx);
                current_idx = target_idx;
            }
        }
    }
}

This idea can be easily expanded to multiple columns or chunks.

Hope this makes sense. If not please comment on how in-place sorting can be done with the current version of arrow2. Thanks!!

@bguo068 bguo068 changed the title add computer-reorder feature to allow inplace sortting add computer-reorder feature to allow in-place sorting May 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant