fix: Fix `read_csv` to respect the order specified by the `columns` argument #13240

romanovacca · 2023-12-24T15:25:50Z

in fn parse_csv(), we use get_projection(), which uses the projection that contains the correct columns and order. That function mentions:
we also need to sort the projection to have predictable output, which happens via sort_unstable(), which leads to a different order of the projection/columns.

In my fix, we now take the original projection and use that to fetch the matching column names, and when all the chunks of data are processed, I reorder the dataframe (if column_names was specified), into the order that was initially passed.

I would expect this reordering to be present somewhere already, but I couldn't find it

stinodego · 2024-02-14T22:25:04Z

I did a rebase. However, I don't think this is the right fix. The offending code seems to be here:

polars/crates/polars-io/src/csv/read_impl/mod.rs

Lines 227 to 240 in 11c6f9b

    
           if let Some(cols) = columns { 
        
               let mut prj = Vec::with_capacity(cols.len()); 
        
               for col in cols { 
        
                   let i = schema.try_index_of(&col)?; 
        
                   prj.push(i); 
        
               } 
        
               // update null values with projection 
        
               if let Some(nv) = null_values.as_mut() { 
        
                   nv.apply_projection(&prj); 
        
               } 
        
               projection = Some(prj); 
        
           }

After this, the columns information is lost. So we should make sure the projection is correct at that point, rather than try to fix it later.

Since this PR is pretty old and has been on draft for a while, and it doesn't seem (close to) ready, I'll close it for now. Feel free to open a fresh PR and continue the work!

romanovacca requested review from ritchie46, stinodego, alexander-beedie, MarcoGorelli and orlp as code owners December 24, 2023 15:25

romanovacca changed the title ~~#13066 order column~~ fix(rust): fix specified column order for read_csv Dec 24, 2023

github-actions bot added fix Bug fix rust Related to Rust Polars labels Dec 24, 2023

romanovacca marked this pull request as draft December 24, 2023 15:36

stinodego changed the title ~~fix(rust): fix specified column order for read_csv~~ fix: Fix read_csv to respect the order specified by the columns argument Feb 14, 2024

github-actions bot added the python Related to Python Polars label Feb 14, 2024

romanovacca added 2 commits February 14, 2024 23:17

add reordering if column is specified

c5169d1

added test

9cfeae3

stinodego force-pushed the #13066_order_column branch from e86ea56 to 9cfeae3 Compare February 14, 2024 22:17

stinodego closed this Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix `read_csv` to respect the order specified by the `columns` argument #13240

fix: Fix `read_csv` to respect the order specified by the `columns` argument #13240

romanovacca commented Dec 24, 2023 •

edited

Loading

stinodego commented Feb 14, 2024 •

edited

Loading

fix: Fix read_csv to respect the order specified by the columns argument #13240

fix: Fix read_csv to respect the order specified by the columns argument #13240

Conversation

romanovacca commented Dec 24, 2023 • edited Loading

stinodego commented Feb 14, 2024 • edited Loading

fix: Fix `read_csv` to respect the order specified by the `columns` argument #13240

fix: Fix `read_csv` to respect the order specified by the `columns` argument #13240

romanovacca commented Dec 24, 2023 •

edited

Loading

stinodego commented Feb 14, 2024 •

edited

Loading