pola-rs · braaannigan · Jun 6, 2024 · Jun 6, 2024 · Jun 6, 2024 · Jun 6, 2024
@@ -253,7 +253,10 @@ The snippet is delimited by `--8<-- [start:<snippet_name>]` and `--8<-- [end:<sn
 
 #### Linting
 
-Before committing, install `dprint` (see above) and run `dprint fmt` from the `docs` directory to lint the markdown files.
+Before committing:
+
+- install `dprint` (see above) and run `dprint fmt` from the `docs` directory to lint the markdown files
+- run `cargo fmt` for the `docs` directory to format the Rust code snippets
 
 ### API reference
 

@@ -26,6 +26,30 @@
 print(df_vertical_concat)
 # --8<-- [end:vertical]
 
+# --8<-- [start:vertical_relaxed]
+df_v1 = pl.DataFrame(
+    {
+        "a": [1.0],
+        "b": [3],
+    },
+)
+df_v2 = pl.DataFrame(
+    {
+        "a": [2],
+        "b": [4],
+    },
+)
+df_vertical_relaxed_concat = pl.concat(
+    [
+        df_v1,
+        df_v2,
+    ],
+    how="vertical_relaxed",
+)
+print(df_vertical_relaxed_concat)
+# --8<-- [end:vertical_relaxed]
+
+
 # --8<-- [start:horizontal]
 df_h1 = pl.DataFrame(
     {
@@ -73,6 +97,21 @@
 print(df_horizontal_concat)
 # --8<-- [end:horizontal_different_lengths]
 
+# --8<-- [start:horizontal_align]
+df_h1 = pl.DataFrame({"a": ["a", "b", "d", "e", "e"], "b": [1, 2, 4, 5, 6]})
+df_h2 = pl.DataFrame({"a": ["a", "b", "c", "d", "e"], "d": ["w", "x", "y", "z", None]})
+df_align = pl.concat(
+    [
+        df_h1,
+        df_h2,
+    ],
+    how="align",
+)
+print(df_align)
+
+# --8<-- [end:horizontal_align]
+
+
 # --8<-- [start:cross]
 df_d1 = pl.DataFrame(
     {

@@ -20,6 +20,23 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
     println!("{}", &df_vertical_concat);
     // --8<-- [end:vertical]
 
+    // --8<-- [start:vertical_relaxed]
+    let df_v1 = df!(
+            "a"=> &[1.0],
+            "b"=> &[3],
+    )?;
+    let df_v2 = df!(
+            "a"=> &[2],
+            "b"=> &[4],
+    )?;
+    let df_vertical_relaxed_concat = concat(
+        [df_v1.clone().lazy(), df_v2.clone().lazy()],
+        UnionArgs::default(),
+    )?
+    .collect()?;
+    println!("{}", &df_vertical_relaxed_concat);
+    // --8<-- [end:vertical_relaxed]
+
     // --8<-- [start:horizontal]
     let df_h1 = df!(
             "l1"=> &[1, 2],
@@ -47,6 +64,11 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
     println!("{}", &df_horizontal_concat);
     // --8<-- [end:horizontal_different_lengths]
 
+    // --8<-- [start:horizontal_align]
+    println!("Not available in Rust");
+
+    // --8<-- [end:horizontal_align]
+
     // --8<-- [start:cross]
     let df_d1 = df!(
         "a"=> &[1],

@@ -17,7 +17,15 @@ In a vertical concatenation you combine all of the rows from a list of `DataFram
 --8<-- "python/user-guide/transformations/concatenation.py:vertical"
 ```
 
-Vertical concatenation fails when the dataframes do not have the same column names.
+Vertical concatenation fails when the dataframes do not have the same column names and dtypes.
+
+For certain differences in dtypes, Polars can do a relaxed vertical concatenation where the differences in dtype are resolved by casting all columns with the same name but different dtypes to a _supertype_. For example, if column `'a'` in the first `DataFrame` is `Float32` but column `'a'` in the second `DataFrame` is `Int64`, then both columns are cast to their supertype `Float64` before concatenation. If the set of dtypes for a column do not have a supertype, the concatenation fails. The supertype mappings are defined internally in Polars.
+
+{{code_block('user-guide/transformations/concatenation','vertical_relaxed',['concat'])}}
+
+```python exec="on" result="text" session="user-guide/transformations/concatenation"
+--8<-- "python/user-guide/transformations/concatenation.py:vertical_relaxed"
+```
 
 ## Horizontal concatenation - getting wider
 
@@ -40,21 +48,29 @@ columns will be padded with `null` values at the end up to the maximum length.
 --8<-- "python/user-guide/transformations/concatenation.py:horizontal_different_lengths"
 ```
 
+An alternative horizontal concatenation method is `align` where Polars combines frames horizontally by determining the common key columns and aligning rows.
+{{code_block('user-guide/transformations/concatenation','horizontal_align',['concat'])}}
+
+```python exec="on" result="text" session="user-guide/transformations/concatenation"
+--8<-- "python/user-guide/transformations/concatenation.py:horizontal_align"
+```
+
 ## Diagonal concatenation - getting longer, wider and `null`ier
 
-In a diagonal concatenation you combine all of the row and columns from a list of `DataFrames` into a single longer and/or wider `DataFrame`.
+In a diagonal concatenation you combine all of the rows and columns from a list of `DataFrames` into a single longer and/or wider `DataFrame`.
 
 {{code_block('user-guide/transformations/concatenation','cross',['concat'])}}
 
 ```python exec="on" result="text" session="user-guide/transformations/concatenation"
 --8<-- "python/user-guide/transformations/concatenation.py:cross"
 ```
 
-Diagonal concatenation generates nulls when the column names do not overlap.
+Diagonal concatenation generates nulls when the column names do not overlap but fails if the dtypes do not match for columns with the same name. As with vertical concatenation there is an alternative `diagonal_relaxed` method that tries to cast columns to a supertype if columns with the same name have different dtypes.
 
 When the dataframe shapes do not match and we have an overlapping semantic key then [we can join the dataframes](joins.md) instead of concatenating them.
 
 ## Rechunking
 
-Before a concatenation we have two dataframes `df1` and `df2`. Each column in `df1` and `df2` is in one or more chunks in memory. By default, during concatenation the chunks in each column are copied to a single new chunk - this is known as **rechunking**. Rechunking is an expensive operation, but is often worth it because future operations will be faster.
-If you do not want Polars to rechunk the concatenated `DataFrame` you specify `rechunk = False` when doing the concatenation.
+We have a `list` of `DataFrames` and we want to concatenate them. Each column in each `DataFrame` is stored in one or more chunks in memory. When we concatenate the `DataFrames` then the data from each column in each `DataFrame` can be copied to a single location in memory - this is known as **rechunking**. Rechunking is an expensive process as it requires copying data from one location to another. However, rechunking can make subsequent operations faster as the data is in a single location in memory.
+
+By default when we do a concatenation in eager mode rechunking does not happen. If we want Polars to rechunk the concatenated `DataFrame` then specify `rechunk = True` when doing the concatenation. In lazy mode the query optimizer assesses whether to do rechunking based on the query plan.