Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main' into unify-temporal
Browse files Browse the repository at this point in the history
  • Loading branch information
MarcoGorelli committed Nov 2, 2023
2 parents 6d6a997 + 240ad78 commit 2b18749
Show file tree
Hide file tree
Showing 48 changed files with 319 additions and 264 deletions.
5 changes: 5 additions & 0 deletions .github/workflows/release-python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,11 @@ jobs:
exclude:
- os: windows-32gb-ram
architecture: aarch64
# Temporarily disable linux aarch64 cross compilation
# TODO: Renable this when issue is fixed
# https://github.com/pola-rs/polars/issues/12180
- os: ubuntu-latest
architecture: aarch64

steps:
- uses: actions/checkout@v4
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ serde = "1.0.188"
serde_json = "1"
simd-json = { version = "0.13", features = ["known-key"] }
smartstring = "1"
sqlparser = "0.38"
sqlparser = "0.39"
strum_macros = "0.25"
thiserror = "1"
tokio = "1.26"
Expand Down
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ shape: (5, 8)
└────────┴────────┘
>>> ## OPTION 2
>>> # Don't materialize the query, but return as LazyFrame
>>> # and continue in python
>>> # and continue in Python
>>> lf = context.execute(query)
>>> (lf.join(other_table)
... .group_by("foo")
Expand Down Expand Up @@ -158,7 +158,7 @@ Refer to the [Polars CLI repository](https://github.com/pola-rs/polars-cli) for
Polars is very fast. In fact, it is one of the best performing solutions available.
See the results in [DuckDB's db-benchmark](https://duckdblabs.github.io/db-benchmark/).

In the [TPCH benchmarks](https://www.pola.rs/benchmarks.html) polars is orders of magnitudes faster than pandas, dask, modin and vaex
In the [TPCH benchmarks](https://www.pola.rs/benchmarks.html) Polars is orders of magnitudes faster than pandas, dask, modin and vaex
on full queries (including IO).

### Lightweight
Expand Down Expand Up @@ -200,8 +200,8 @@ You can also install the dependencies directly.
| Tag | Description |
| ---------- | ---------------------------------------------------------------------------- |
| **all** | Install all optional dependencies (all of the following) |
| pandas | Install with Pandas for converting data to and from Pandas DataFrames/Series |
| numpy | Install with numpy for converting data to and from numpy arrays |
| pandas | Install with pandas for converting data to and from pandas DataFrames/Series |
| numpy | Install with NumPy for converting data to and from NumPy arrays |
| pyarrow | Reading data formats using PyArrow |
| fsspec | Support for reading from remote file systems |
| connectorx | Support for reading from SQL databases |
Expand All @@ -228,9 +228,9 @@ Required Rust version `>=1.71`.

Want to contribute? Read our [contribution guideline](/CONTRIBUTING.md).

## Python: compile polars from source
## Python: compile Polars from source

If you want a bleeding edge release or maximal performance you should compile **polars** from source.
If you want a bleeding edge release or maximal performance you should compile **Polars** from source.

This can be done by going through the following steps in sequence:

Expand All @@ -249,24 +249,24 @@ Note that the Rust crate implementing the Python bindings is called `py-polars`
Rust crate `polars` itself. However, both the Python package and the Python module are named `polars`, so you
can `pip install polars` and `import polars`.

## Use custom Rust function in python?
## Use custom Rust function in Python?

Extending polars with UDFs compiled in Rust is easy. We expose pyo3 extensions for `DataFrame` and `Series`
Extending Polars with UDFs compiled in Rust is easy. We expose pyo3 extensions for `DataFrame` and `Series`
data structures. See more in https://github.com/pola-rs/pyo3-polars.

## Going big...

Do you expect more than `2^32` ~4,2 billion rows? Compile polars with the `bigidx` feature flag.

Or for python users install `pip install polars-u64-idx`.
Or for Python users install `pip install polars-u64-idx`.

Don't use this unless you hit the row boundary as the default polars is faster and consumes less memory.
Don't use this unless you hit the row boundary as the default Polars is faster and consumes less memory.

## Legacy

Do you want polars to run on an old CPU (e.g. dating from before 2011), or on an `x86-64` build
Do you want Polars to run on an old CPU (e.g. dating from before 2011), or on an `x86-64` build
of Python on Apple Silicon under Rosetta? Install `pip install polars-lts-cpu`. This version of
polars is compiled without [AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) target
Polars is compiled without [AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) target
features.

## Sponsors
Expand Down
4 changes: 1 addition & 3 deletions crates/polars-sql/src/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -223,13 +223,11 @@ impl SQLContext {
concatenated.map(|lf| lf.unique(None, UniqueKeepStrategy::Any))
},
// UNION ALL BY NAME
// TODO: add recognition for SetQuantifier::DistinctByName
// when "https://github.com/sqlparser-rs/sqlparser-rs/pull/997" is available
#[cfg(feature = "diagonal_concat")]
SetQuantifier::AllByName => concat_lf_diagonal(vec![left, right], opts),
// UNION [DISTINCT] BY NAME
#[cfg(feature = "diagonal_concat")]
SetQuantifier::ByName => {
SetQuantifier::ByName | SetQuantifier::DistinctByName => {
let concatenated = concat_lf_diagonal(vec![left, right], opts);
concatenated.map(|lf| lf.unique(None, UniqueKeepStrategy::Any))
},
Expand Down
38 changes: 28 additions & 10 deletions crates/polars-sql/src/sql_expr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ use polars_plan::prelude::{col, lit, when};
use rand::distributions::Alphanumeric;
use rand::{thread_rng, Rng};
use sqlparser::ast::{
ArrayAgg, BinaryOperator as SQLBinaryOperator, BinaryOperator, DataType as SQLDataType,
Expr as SqlExpr, Function as SQLFunction, Ident, JoinConstraint, OrderByExpr,
Query as Subquery, SelectItem, TrimWhereField, UnaryOperator, Value as SqlValue,
ArrayAgg, ArrayElemTypeDef, BinaryOperator as SQLBinaryOperator, BinaryOperator, CastFormat,
DataType as SQLDataType, Expr as SqlExpr, Function as SQLFunction, Ident, JoinConstraint,
OrderByExpr, Query as Subquery, SelectItem, TrimWhereField, UnaryOperator, Value as SqlValue,
};
use sqlparser::dialect::GenericDialect;
use sqlparser::parser::{Parser, ParserOptions};
Expand All @@ -19,7 +19,8 @@ use crate::SQLContext;

pub(crate) fn map_sql_polars_datatype(data_type: &SQLDataType) -> PolarsResult<DataType> {
Ok(match data_type {
SQLDataType::Array(Some(inner_type)) => {
SQLDataType::Array(ArrayElemTypeDef::AngleBracket(inner_type))
| SQLDataType::Array(ArrayElemTypeDef::SquareBracket(inner_type)) => {
DataType::List(Box::new(map_sql_polars_datatype(inner_type)?))
},
SQLDataType::BigInt(_) => DataType::Int64,
Expand All @@ -32,7 +33,7 @@ pub(crate) fn map_sql_polars_datatype(data_type: &SQLDataType) -> PolarsResult<D
| SQLDataType::Character(_)
| SQLDataType::CharacterVarying(_)
| SQLDataType::Clob(_)
| SQLDataType::String
| SQLDataType::String(_)
| SQLDataType::Text
| SQLDataType::Uuid
| SQLDataType::Varchar(_) => DataType::Utf8,
Expand Down Expand Up @@ -90,7 +91,11 @@ impl SqlExprVisitor<'_> {
high,
} => self.visit_between(expr, *negated, low, high),
SqlExpr::BinaryOp { left, op, right } => self.visit_binary_op(left, op, right),
SqlExpr::Cast { expr, data_type } => self.visit_cast(expr, data_type),
SqlExpr::Cast {
expr,
data_type,
format,
} => self.visit_cast(expr, data_type, format),
SqlExpr::Ceil { expr, .. } => Ok(self.visit_expr(expr)?.ceil()),
SqlExpr::CompoundIdentifier(idents) => self.visit_compound_identifier(idents),
SqlExpr::Floor { expr, .. } => Ok(self.visit_expr(expr)?.floor()),
Expand Down Expand Up @@ -124,7 +129,8 @@ impl SqlExprVisitor<'_> {
expr,
trim_where,
trim_what,
} => self.visit_trim(expr, trim_where, trim_what),
trim_characters,
} => self.visit_trim(expr, trim_where, trim_what, trim_characters),
SqlExpr::UnaryOp { op, expr } => self.visit_unary_op(op, expr),
SqlExpr::Value(value) => self.visit_literal(value),
e @ SqlExpr::Case { .. } => self.visit_when_then(e),
Expand Down Expand Up @@ -342,7 +348,15 @@ impl SqlExprVisitor<'_> {
/// Visit a SQL CAST
///
/// e.g. `CAST(column AS INT)` or `column::INT`
fn visit_cast(&mut self, expr: &SqlExpr, data_type: &SQLDataType) -> PolarsResult<Expr> {
fn visit_cast(
&mut self,
expr: &SqlExpr,
data_type: &SQLDataType,
format: &Option<CastFormat>,
) -> PolarsResult<Expr> {
if format.is_some() {
return Err(polars_err!(ComputeError: "unsupported use of FORMAT in CAST expression"));
}
let polars_type = map_sql_polars_datatype(data_type)?;
let expr = self.visit_expr(expr)?;

Expand Down Expand Up @@ -440,15 +454,19 @@ impl SqlExprVisitor<'_> {
expr: &SqlExpr,
trim_where: &Option<TrimWhereField>,
trim_what: &Option<Box<SqlExpr>>,
trim_characters: &Option<Vec<SqlExpr>>,
) -> PolarsResult<Expr> {
if trim_characters.is_some() {
// TODO: allow compact snowflake/bigquery syntax?
return Err(polars_err!(ComputeError: "unsupported TRIM syntax"));
};
let expr = self.visit_expr(expr)?;
let trim_what = trim_what.as_ref().map(|e| self.visit_expr(e)).transpose()?;
let trim_what = match trim_what {
Some(Expr::Literal(LiteralValue::Utf8(val))) => Some(val),
None => None,
_ => return self.err(&expr),
};

Ok(match (trim_where, trim_what) {
(None | Some(TrimWhereField::Both), None) => expr.str().strip_chars(lit(Null)),
(None | Some(TrimWhereField::Both), Some(val)) => expr.str().strip_chars(lit(val)),
Expand Down Expand Up @@ -676,7 +694,7 @@ pub(super) fn process_join_constraint(
) -> PolarsResult<(Vec<Expr>, Vec<Expr>)> {
if let JoinConstraint::On(SqlExpr::BinaryOp { left, op, right }) = constraint {
if op != &BinaryOperator::Eq {
polars_bail!(InvalidOperation:
polars_bail!(InvalidOperation:
"SQL interface (currently) only supports basic equi-join \
constraints; found '{:?}' op in\n{:?}", op, constraint)
}
Expand Down
2 changes: 1 addition & 1 deletion docs/getting-started/expressions.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Expressions

`Expressions` are the core strength of `Polars`. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we will cover the basic components that serve as building block (or in `Polars` terminology contexts) for all your queries:
`Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we will cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries:

- `select`
- `filter`
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Polars is a highly performant DataFrame library for manipulating structured data

## About this guide

The Polars user guide is intended to live alongside the API documentation. Its purpose is to explain (new) users how to use `Polars` and to provide meaningful examples. The guide is split into two parts:
The Polars user guide is intended to live alongside the API documentation. Its purpose is to explain (new) users how to use Polars and to provide meaningful examples. The guide is split into two parts:

- [Getting started](getting-started/intro.md): A 10 minute helicopter view of the library and its primary function.
- [User guide](user-guide/index.md): A detailed explanation of how the library is setup and how to use it most effectively.
Expand Down
4 changes: 2 additions & 2 deletions docs/releases/upgrade/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ A full list of all changes is available in the [changelog](../changelog.md).

!!! rust "Note"

There is no upgrade guide yet for Rust releases.
It will be added once the rate of breaking changes to the rust API slows down and a [deprecation policy](../../development/versioning.md#deprecation-period) is added.
There are no upgrade guides yet for Rust releases.
These will be added once the rate of breaking changes to the Rust API slows down and a [deprecation policy](../../development/versioning.md#deprecation-period) is added.
Loading

0 comments on commit 2b18749

Please sign in to comment.