Abstract
Run SQL queries against DataFrame/LazyFrame data.
+Optional
frames: Record<string, LazyDataFrame | pl.DataFrame>Create an empty DataFrame
+Create a DataFrame from a JavaScript object
+object or array of data
+Optional
options: { options
+Optional
columns?: any[]column names
+Optional
inferThe maximum number of rows to scan for schema inference. If set to None, the full data may be scanned (this can be slow). This parameter only applies if the input data is a sequence or generator of rows; other input is read as-is. +The number of entries in the schema should match the underlying data dimensions, unless a sequence of dictionaries is being passed, in which case a partial schema can be declared to prevent specific fields from being loaded.
+Optional
orient?: "row" | "col"orientation of the data [row, col] +Whether to interpret two-dimensional data as columns or as rows. If None, the orientation is inferred by matching the columns and data dimensions. If this does not yield conclusive results, column orientation is used.
+Optional
schema?: Record<string, string | DataType>The schema of the resulting DataFrame. The schema may be declared in several ways:
+- As a dict of {name:type} pairs; if type is None, it will be auto-inferred.
+
+- As a list of column names; in this case types are automatically inferred.
+
+- As a list of (name,type) pairs; this is equivalent to the dictionary form.
+
+If you supply a list of column names that does not match the names in the underlying data, the names given here will overwrite them. The number of names given in the schema should match the underlying data dimensions.
+If set to null (default), the schema is inferred from the data.
+Optional
schemaSupport type specification or override of one or more columns; note that any dtypes inferred from the schema param will be overridden.
+Run SQL queries against DataFrame/LazyFrame data.
+Optional
frames: Record<string, LazyDataFrame | pl.DataFrame>Creates a new Series from a set of values.
+— A set of values to include in the new Series object.
+Create a new named series
+The name of the series
+A set of values to include in the new Series object.
+Optional
dtype: anyCreates an array from an array-like object.
+— An array-like object to convert to an array.
+Returns a new Series from a set of elements.
+Rest
...items: T3[]— A set of elements to include in the new Series object.
+Rest
...items: T3[]Compute the bitwise AND horizontally across columns.
+ >>> const df = pl.DataFrame(
{
"a": [false, false, true, true],
"b": [false, true, null, true],
"c": ["w", "x", "y", "z"],
}
)
>>> df.withColumns(pl.allHorizontal([pl.col("a"), pl.col("b")]))
shape: (4, 4)
┌───────┬───────┬─────┬───────┐
│ a ┆ b ┆ c ┆ all │
│ --- ┆ --- ┆ --- ┆ --- │
│ bool ┆ bool ┆ str ┆ bool │
╞═══════╪═══════╪═════╪═══════╡
│ false ┆ false ┆ w ┆ false │
│ false ┆ true ┆ x ┆ false │
│ true ┆ null ┆ y ┆ null │
│ true ┆ true ┆ z ┆ true │
└───────┴───────┴─────┴───────┘
+
+
+Compute the bitwise OR horizontally across columns.
+ >>> const df = pl.DataFrame(
... {
... "a": [false, false, true, null],
... "b": [false, true, null, null],
... "c": ["w", "x", "y", "z"],
... }
... )
>>> df.withColumns(pl.anyHorizontal([pl.col("a"), pl.col("b")]))
shape: (4, 4)
┌───────┬───────┬─────┬───────┐
│ a ┆ b ┆ c ┆ any │
│ --- ┆ --- ┆ --- ┆ --- │
│ bool ┆ bool ┆ str ┆ bool │
╞═══════╪═══════╪═════╪═══════╡
│ false ┆ false ┆ w ┆ false │
│ false ┆ true ┆ x ┆ true │
│ true ┆ null ┆ y ┆ true │
│ null ┆ null ┆ z ┆ null │
└───────┴───────┴─────┴───────┘
+
+
+A column in a DataFrame. +Can be used to select:
+"*"
^
and ends with $
> df = pl.DataFrame({
> "ham": [1, 2, 3],
> "hamburger": [11, 22, 33],
> "foo": [3, 2, 1]})
> df.select(col("foo"))
shape: (3, 1)
╭─────╮
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 3 │
├╌╌╌╌╌┤
│ 2 │
├╌╌╌╌╌┤
│ 1 │
╰─────╯
> df.select(col("*"))
shape: (3, 3)
╭─────┬───────────┬─────╮
│ ham ┆ hamburger ┆ foo │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═══════════╪═════╡
│ 1 ┆ 11 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 22 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 33 ┆ 1 │
╰─────┴───────────┴─────╯
> df.select(col("^ham.*$"))
shape: (3, 2)
╭─────┬───────────╮
│ ham ┆ hamburger │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═══════════╡
│ 1 ┆ 11 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 22 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 33 │
╰─────┴───────────╯
> df.select(col("*").exclude("ham"))
shape: (3, 2)
╭───────────┬─────╮
│ hamburger ┆ foo │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═══════════╪═════╡
│ 11 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 22 ┆ 2 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 33 ┆ 1 │
╰───────────┴─────╯
> df.select(col(["hamburger", "foo"])
shape: (3, 2)
╭───────────┬─────╮
│ hamburger ┆ foo │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═══════════╪═════╡
│ 11 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 22 ┆ 2 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 33 ┆ 1 │
╰───────────┴─────╯
> df.select(col(pl.Series(["hamburger", "foo"]))
shape: (3, 2)
╭───────────┬─────╮
│ hamburger ┆ foo │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═══════════╪═════╡
│ 11 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 22 ┆ 2 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 33 ┆ 1 │
╰───────────┴─────╯
+
+
+Aggregate all the Dataframes/Series in a List of DataFrames/Series to a single DataFrame/Series.
+DataFrames/Series/LazyFrames to concatenate.
+Optional
options: ConcatOptions> const df1 = pl.DataFrame({"a": [1], "b": [3]});
> const df2 = pl.DataFrame({"a": [2], "b": [4]});
> pl.concat([df1, df2]);
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 4 │
└─────┴─────┘
> const a = pl.DataFrame({ a: ["a", "b"], b: [1, 2] });
> const b = pl.DataFrame({ c: [5, 6], d: [7, 8], e: [9, 10]});
> pl.concat([a, b], { how: "horizontal" });
shape: (2, 5)
┌─────┬─────┬─────┬─────┬──────┐
│ a ┆ b ┆ c ┆ d ┆ e │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╪═════╪══════╡
│ a ┆ 1.0 ┆ 5.0 ┆ 7.0 ┆ 9.0 │
│ b ┆ 2.0 ┆ 6.0 ┆ 8.0 ┆ 10.0 │
└─────┴─────┴─────┴─────┴──────┘
> const df_d1 = pl.DataFrame({"a": [1], "b": [3]});
> const df_d2 = pl.DataFrame({"a": [2], "c": [4]});
> pl.concat([df_d1, df_d2], { how: "diagonal" });
shape: (2, 3)
┌─────┬──────┬──────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪══════╪══════╡
│ 1 ┆ 3 ┆ null │
│ 2 ┆ null ┆ 4 │
└─────┴──────┴──────┘
+
+
+Optional
options: ConcatOptionsConcat the arrays in a Series dtype List in linear time.
+Columns to concat into a List Series
+Rest
...exprs: ExprOrString[]Rest
...exprs: ExprOrString[]Concat Utf8 Series in linear time. Non utf8 columns are cast to utf8.
+Optional
ignoreOptional
sep: stringOptional
ignoreNulls: booleanAlias for an element in evaluated in an eval
expression.
*
* A horizontal rank computation by taking the elements of a list
*
* >df = pl.DataFrame({"a": [1, 8, 3], "b": [4, 5, 2]})
* >df.withColumn(
* ... pl.concatList(["a", "b"]).arr.eval(pl.element().rank()).alias("rank")
* ... )
* shape: (3, 3)
* ┌─────┬─────┬────────────┐
* │ a ┆ b ┆ rank │
* │ --- ┆ --- ┆ --- │
* │ i64 ┆ i64 ┆ list[f32] │
* ╞═════╪═════╪════════════╡
* │ 1 ┆ 4 ┆ [1.0, 2.0] │
* ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
* │ 8 ┆ 5 ┆ [2.0, 1.0] │
* ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
* │ 3 ┆ 2 ┆ [2.0, 1.0] │
* └─────┴─────┴────────────┘
*
* A mathematical operation on array elements
*
* >df = pl.DataFrame({"a": [1, 8, 3], "b": [4, 5, 2]})
* >df.withColumn(
* ... pl.concatList(["a", "b"]).arr.eval(pl.element().multiplyBy(2)).alias("a_b_doubled")
* ... )
* shape: (3, 3)
* ┌─────┬─────┬─────────────┐
* │ a ┆ b ┆ a_b_doubled │
* │ --- ┆ --- ┆ --- │
* │ i64 ┆ i64 ┆ list[i64] │
* ╞═════╪═════╪═════════════╡
* │ 1 ┆ 4 ┆ [2, 8] │
* ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
* │ 8 ┆ 5 ┆ [16, 10] │
* ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
* │ 3 ┆ 2 ┆ [6, 4] │
* └─────┴─────┴─────────────┘
+
+
+String format utility for expressions
+Note: strings will be interpolated as col(<value>)
. if you want a literal string, use lit(<value>)
Rest
...expr: ExprOrString[]> df = pl.DataFrame({
... "a": ["a", "b", "c"],
... "b": [1, 2, 3],
... })
> df.select(
... pl.format("foo_{}_bar_{}", pl.col("a"), "b").alias("fmt"),
... )
shape: (3, 1)
┌─────────────┐
│ fmt │
│ --- │
│ str │
╞═════════════╡
│ foo_a_bar_1 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ foo_b_bar_2 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ foo_c_bar_3 │
└─────────────┘
// You can use format as tag function as well
> pl.format("foo_{}_bar_{}", pl.col("a"), "b") // is the same as
> pl.format`foo_${pl.col("a")}_bar_${"b"}`
+
+
+Generate a range of integers.
+This can be used in a select
, with_column
etc.
+Be sure that the range size is equal to the DataFrame you are collecting.
Expr or Series Column of integer data type dtype
.
> df.lazy()
> .filter(pl.col("foo").lt(pl.intRange(0, 100)))
> .collect()
+
+
+Generate an index column by using intRange
in conjunction with :func:len
.
df = pl.DataFrame({"a": [1, 3, 5], "b": [2, 4, 6]})
df.select(
... pl.intRange(pl.len()).alias("index"),
... pl.all(),
... )
shape: (3, 3)
┌───────┬─────┬─────┐
│ index ┆ a ┆ b │
│ --- ┆ --- ┆ --- │
│ u32 ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╡
│ 0 ┆ 1 ┆ 2 │
│ 1 ┆ 3 ┆ 4 │
│ 2 ┆ 5 ┆ 6 │
└───────┴─────┴─────┘
+
+
+Generate a range of integers for each row of the input columns.
+Start of the range (inclusive). Defaults to 0.
+End of the range (exclusive). If set to None
(default), the value of start
is used and start
is set to 0
.
Optional
step: numberStep size of the range.
+Optional
dtype: DataTypeInteger data type of the ranges. Defaults to Int64
.
Optional
eager: falseEvaluate immediately and return a Series
. If set to False
(default), return an expression instead.
List(dtype)
.Optional
step: numberOptional
dtype: DataTypeOptional
eager: trueReturn the number of elements in the column.
+This is similar to COUNT(*)
in SQL.
Expr - Expression of data type :class:UInt32
.
>>> const df = pl.DataFrame(
... {
... "a": [1, 2, None],
... "b": [3, None, None],
... "c": ["foo", "bar", "foo"],
... }
... )
>>> df.select(pl.len())
shape: (1, 1)
┌─────┐
│ len │
│ --- │
│ u32 │
╞═════╡
│ 3 │
└─────┘
+
+
+Generate an index column by using len
in conjunction with :func:intRange
.
>>> df.select(
... pl.intRange(pl.len(), dtype=pl.UInt32).alias("index"),
... pl.all(),
... )
shape: (3, 4)
┌───────┬──────┬──────┬─────┐
│ index ┆ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ i64 ┆ i64 ┆ str │
╞═══════╪══════╪══════╪═════╡
│ 0 ┆ 1 ┆ 3 ┆ foo │
│ 1 ┆ 2 ┆ null ┆ bar │
│ 2 ┆ null ┆ null ┆ foo │
└───────┴──────┴──────┴─────┘
+
+
+Get the maximum value horizontally across columns.
+ >>> const df = pl.DataFrame(
... {
... "a": [1, 8, 3],
... "b": [4, 5, null],
... "c": ["x", "y", "z"],
... }
... )
>>> df.withColumns(pl.maxHorizontal(pl.col("a"), pl.col("b")))
shape: (3, 4)
┌─────┬──────┬─────┬─────┐
│ a ┆ b ┆ c ┆ max │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 │
╞═════╪══════╪═════╪═════╡
│ 1 ┆ 4 ┆ x ┆ 4 │
│ 8 ┆ 5 ┆ y ┆ 8 │
│ 3 ┆ null ┆ z ┆ 3 │
└─────┴──────┴─────┴─────┘
+
+
+Get the minimum value horizontally across columns.
+ >>> const df = pl.DataFrame(
... {
... "a": [1, 8, 3],
... "b": [4, 5, null],
... "c": ["x", "y", "z"],
... }
... )
>>> df.withColumns(pl.minHorizontal(pl.col("a"), pl.col("b")))
shape: (3, 4)
┌─────┬──────┬─────┬─────┐
│ a ┆ b ┆ c ┆ min │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 │
╞═════╪══════╪═════╪═════╡
│ 1 ┆ 4 ┆ x ┆ 1 │
│ 8 ┆ 5 ┆ y ┆ 5 │
│ 3 ┆ null ┆ z ┆ 3 │
└─────┴──────┴─────┴─────┘
+
+
+Read into a DataFrame from an avro file.
+Optional
options: Partial<ReadAvroOptions>Read a CSV file or string into a Dataframe.
+path or buffer or string
+file.csv
.Optional
options: Partial<ReadCsvOptions>DataFrame
+Read a stream into a Dataframe.
+Warning: this is much slower than scanCSV
or readCSV
This will consume the entire stream into a single buffer and then call readCSV
+Only use it when you must consume from a stream, or when performance is not a major consideration
readable stream containing csv data
+Optional
options: Partial<ReadCsvOptions>Promise
>>> const readStream = new Stream.Readable({read(){}});
>>> readStream.push(`a,b\n`);
>>> readStream.push(`1,2\n`);
>>> readStream.push(`2,2\n`);
>>> readStream.push(`3,2\n`);
>>> readStream.push(`4,2\n`);
>>> readStream.push(null);
>>> pl.readCSVStream(readStream).then(df => console.log(df));
shape: (4, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 4 ┆ 2 │
└─────┴─────┘
+
+
+Read into a DataFrame from Arrow IPC file (Feather v2).
+path or buffer or string
+file.ipc
.Optional
options: Partial<ReadIPCOptions>Read into a DataFrame from Arrow IPC stream.
+path or buffer or string
+file.ipc
.Optional
options: Partial<ReadIPCOptions>Read a JSON file or string into a DataFrame.
+path or buffer or string
+file.csv
.Optional
options: Partial<ReadJsonOptions>const jsonString = `
{"a", 1, "b", "foo", "c": 3}
{"a": 2, "b": "bar", "c": 6}
`
> const df = pl.readJSON(jsonString)
> console.log(df)
shape: (2, 3)
╭─────┬─────┬─────╮
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞═════╪═════╪═════╡
│ 1 ┆ foo ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ bar ┆ 6 │
╰─────┴─────┴─────╯
+
+
+Read a newline delimited JSON stream into a DataFrame.
+readable stream containing json data
+Optional
options: Partial<ReadJsonOptions>>>> const readStream = new Stream.Readable({read(){}});
>>> readStream.push(`${JSON.stringify({a: 1, b: 2})} \n`);
>>> readStream.push(`${JSON.stringify({a: 2, b: 2})} \n`);
>>> readStream.push(`${JSON.stringify({a: 3, b: 2})} \n`);
>>> readStream.push(`${JSON.stringify({a: 4, b: 2})} \n`);
>>> readStream.push(null);
>>> pl.readJSONStream(readStream, { format: "lines" }).then(df => console.log(df));
shape: (4, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 4 ┆ 2 │
└─────┴─────┘
+
+
+Lazily read from a CSV file or multiple files via glob patterns.
+This allows the query optimizer to push down predicates and +projections to the scan level, thereby potentially reducing +memory overhead.
+path to a file
+Optional
options: Partial<ScanCsvOptions>Lazily read from an Arrow IPC file (Feather v2) or multiple files via glob patterns.
+Path to a IPC file.
+Optional
options: Partial<ScanIPCOptions>Read a JSON file or string into a DataFrame.
+Note: Currently only newline delimited JSON is supported
+path to json file
+./file.json
.Optional
options: Partial<ScanJsonOptions>Lazily read from a local or cloud-hosted parquet file (or files).
+This function allows the query optimizer to push down predicates and projections to +the scan level, typically increasing performance and reducing memory overhead.
+This allows the query optimizer to push down predicates and projections to the scan level, +thereby potentially reducing memory overhead.
+Path(s) to a file. If a single path is given, it can be a globbing pattern.
+Sum all values horizontally across columns.
+ >>> const df = pl.DataFrame(
... {
... "a": [1, 8, 3],
... "b": [4, 5, null],
... "c": ["x", "y", "z"],
... }
... )
>>> df.withColumns(pl.sumHorizontal(pl.col("a"), ol.col("b")))
shape: (3, 4)
┌─────┬──────┬─────┬──────┐
│ a ┆ b ┆ c ┆ sum │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 │
╞═════╪══════╪═════╪══════╡
│ 1 ┆ 4 ┆ x ┆ 5 │
│ 8 ┆ 5 ┆ y ┆ 13 │
│ 3 ┆ null ┆ z ┆ null │
└─────┴──────┴─────┴──────┘
+
+
+Start a when, then, otherwise expression.
+// Below we add a column with the value 1, where column "foo" > 2 and the value -1 where it isn't.
> df = pl.DataFrame({"foo": [1, 3, 4], "bar": [3, 4, 0]})
> df.withColumn(pl.when(pl.col("foo").gt(2)).then(pl.lit(1)).otherwise(pl.lit(-1)))
shape: (3, 3)
┌─────┬─────┬─────────┐
│ foo ┆ bar ┆ literal │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i32 │
╞═════╪═════╪═════════╡
│ 1 ┆ 3 ┆ -1 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 4 ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 4 ┆ 0 ┆ 1 │
└─────┴─────┴─────────┘
// Or with multiple `when, thens` chained:
> df.with_column(
... pl.when(pl.col("foo").gt(2))
... .then(1)
... .when(pl.col("bar").gt(2))
... .then(4)
... .otherwise(-1)
... )
shape: (3, 3)
┌─────┬─────┬─────────┐
│ foo ┆ bar ┆ literal │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i32 │
╞═════╪═════╪═════════╡
│ 1 ┆ 3 ┆ 4 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 4 ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 4 ┆ 0 ┆ 1 │
└─────┴─────┴─────────┘
+
+
+Polars: Blazingly fast DataFrames in Rust, Python, Node.js, R and SQL
+ +Documentation: Node.js +- Rust +- Python +- R +|StackOverflow: Node.js +- Rust +- Python +| User Guide +| Discord
+// esm
import pl from 'nodejs-polars';
// require
const pl = require('nodejs-polars');
+
+
+> const fooSeries = pl.Series("foo", [1, 2, 3])
> fooSeries.sum()
6
// a lot operations support both positional and named arguments
// you can see the full specs in the docs or the type definitions
> fooSeries.sort(true)
> fooSeries.sort({reverse: true})
shape: (3,)
Series: 'foo' [f64]
[
3
2
1
]
> fooSeries.toArray()
[1, 2, 3]
// Series are 'Iterables' so you can use javascript iterable syntax on them
> [...fooSeries]
[1, 2, 3]
> fooSeries[0]
1
+
+
+>const df = pl.DataFrame(
... {
... A: [1, 2, 3, 4, 5],
... fruits: ["banana", "banana", "apple", "apple", "banana"],
... B: [5, 4, 3, 2, 1],
... cars: ["beetle", "audi", "beetle", "beetle", "beetle"],
... }
... )
> df.sort("fruits").select(
... "fruits",
... "cars",
... pl.lit("fruits").alias("literal_string_fruits"),
... pl.col("B").filter(pl.col("cars").eq(pl.lit("beetle"))).sum(),
... pl.col("A").filter(pl.col("B").gt(2)).sum().over("cars").alias("sum_A_by_cars"),
... pl.col("A").sum().over("fruits").alias("sum_A_by_fruits"),
... pl.col("A").reverse().over("fruits").flatten().alias("rev_A_by_fruits")
... )
shape: (5, 8)
┌──────────┬──────────┬──────────────┬─────┬─────────────┬─────────────┬─────────────┐
│ fruits ┆ cars ┆ literal_stri ┆ B ┆ sum_A_by_ca ┆ sum_A_by_fr ┆ rev_A_by_fr │
│ --- ┆ --- ┆ ng_fruits ┆ --- ┆ rs ┆ uits ┆ uits │
│ str ┆ str ┆ --- ┆ i64 ┆ --- ┆ --- ┆ --- │
│ ┆ ┆ str ┆ ┆ i64 ┆ i64 ┆ i64 │
╞══════════╪══════════╪══════════════╪═════╪═════════════╪═════════════╪═════════════╡
│ "apple" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 7 ┆ 4 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "apple" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 7 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "banana" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 8 ┆ 5 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "banana" ┆ "audi" ┆ "fruits" ┆ 11 ┆ 2 ┆ 8 ┆ 2 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "banana" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 8 ┆ 1 │
└──────────┴──────────┴──────────────┴─────┴─────────────┴─────────────┴─────────────┘
+
+
+> df["cars"] // or df.getColumn("cars")
shape: (5,)
Series: 'cars' [str]
[
"beetle"
"beetle"
"beetle"
"audi"
"beetle"
]
+
+
+Install the latest polars version with:
+$ yarn add nodejs-polars # yarn
$ npm i -s nodejs-polars # npm
+
+
+Releases happen quite often (weekly / every few days) at the moment, so updating polars regularly to get the latest bugfixes / features might not be a bad idea.
+>=18
>=1.59
- Only needed for developmentIn Deno modules you can import polars straight from npm
:
import pl from "npm:nodejs-polars";
+
+
+With Deno 1.37, you can use the display
function to display a DataFrame
in the notebook:
import pl from "npm:nodejs-polars";
import { display } from "https://deno.land/x/display@v1.1.1/mod.ts";
let response = await fetch(
"https://cdn.jsdelivr.net/npm/world-atlas@1/world/110m.tsv",
);
let data = await response.text();
let df = pl.readCSV(data, { sep: "\t" });
await display(df)
+
+
+With Deno 1.38, you only have to make the dataframe be the last expression in the cell:
+import pl from "npm:nodejs-polars";
let response = await fetch(
"https://cdn.jsdelivr.net/npm/world-atlas@1/world/110m.tsv",
);
let data = await response.text();
let df = pl.readCSV(data, { sep: "\t" });
df
+
+
+
+Want to know about all the features Polars supports? Read the docs!
+$ pip3 install polars
$ yarn install nodejs-polars
Want to contribute? Read our contribution guideline.
+If you want a bleeding edge release or maximal performance you should compile polars from source.
+npm|yarn install
$ cd nodejs-polars && yarn build && yarn build:ts # this will generate a /bin directory with the compiles TS code, as well as the rust binary
+
+
+$ cd nodejs-polars && yarn build:debug && yarn build:ts # this will generate a /bin directory with the compiles TS code, as well as the rust binary
+
+
+To use nodejs-polars
with Webpack please use node-loader and webpack.config.js
Development of Polars is proudly powered by
+ +DataFrame constructor
+Create an empty DataFrame
+Create a DataFrame from a JavaScript object
+object or array of data
+Optional
options: { options
+Optional
columns?: any[]column names
+Optional
inferThe maximum number of rows to scan for schema inference. If set to None, the full data may be scanned (this can be slow). This parameter only applies if the input data is a sequence or generator of rows; other input is read as-is. +The number of entries in the schema should match the underlying data dimensions, unless a sequence of dictionaries is being passed, in which case a partial schema can be declared to prevent specific fields from being loaded.
+Optional
orient?: "row" | "col"orientation of the data [row, col] +Whether to interpret two-dimensional data as columns or as rows. If None, the orientation is inferred by matching the columns and data dimensions. If this does not yield conclusive results, column orientation is used.
+Optional
schema?: Record<string, string | DataType>The schema of the resulting DataFrame. The schema may be declared in several ways:
+- As a dict of {name:type} pairs; if type is None, it will be auto-inferred.
+
+- As a list of column names; in this case types are automatically inferred.
+
+- As a list of (name,type) pairs; this is equivalent to the dictionary form.
+
+If you supply a list of column names that does not match the names in the underlying data, the names given here will overwrite them. The number of names given in the schema should match the underlying data dimensions.
+If set to null (default), the schema is inferred from the data.
+Optional
schemaSupport type specification or override of one or more columns; note that any dtypes inferred from the schema param will be overridden.
+Starts a new GroupBy operation.
+Use multiple aggregations on columns. +This can be combined with complete lazy API and is considered idiomatic polars.
+// use lazy api rest parameter style
> df.groupBy('foo', 'bar')
> .agg(pl.sum('ham'), col('spam').tail(4).sum())
// use lazy api array style
> df.groupBy('foo', 'bar')
> .agg([pl.sum('ham'), col('spam').tail(4).sum()])
// use a mapping
> df.groupBy('foo', 'bar')
> .agg({'spam': ['sum', 'min']})
+
+
+Return first n rows of each group.
+Optional
n: numberNumber of values of the group to select
+> df = pl.DataFrame({
> "letters": ["c", "c", "a", "c", "a", "b"],
> "nrs": [1, 2, 3, 4, 5, 6]
> })
> df
shape: (6, 2)
╭─────────┬─────╮
│ letters ┆ nrs │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═════╡
│ "c" ┆ 1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 2 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "a" ┆ 3 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 4 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "a" ┆ 5 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "b" ┆ 6 │
╰─────────┴─────╯
> df.groupby("letters")
> .head(2)
> .sort("letters");
> >>
shape: (5, 2)
╭─────────┬─────╮
│ letters ┆ nrs │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═════╡
│ "a" ┆ 3 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "a" ┆ 5 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "b" ┆ 6 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 2 │
╰─────────┴─────╯
+
+
+Do a pivot operation based on the group key, a pivot column and an aggregation function on the values column.
+options for join operations
+Options for scanJson
+De-serializes buffer via serde
+options for lazy join operations
+Optional
allowOptional
forceOptional
howjoin type
+Optional
leftleft join column
+Optional
onleft and right join column
+Optional
rightright join column
+Optional
suffixOptions for readParquet
+options for rolling window operations
+options for rolling quantile operations
+options for rolling mean operations
+Options for scanParquet
+Options for
+Optional
batchOptional
dateOptional
datetimeOptional
floatOptional
includeOptional
includeOptional
lineOptional
maintainOptional
nullOptional
quoteOptional
quoteOptional
separatorOptional
timeOptions for
+Optional
compressionOptional
compressionOptional
dataOptional
maintainOptional
noOptional
predicateOptional
projectionOptional
rowOptional
simplifyOptional
sliceOptional
statisticsOptional
typenamespace containing expr string functions
+Decodes a value using the provided encoding
+hex | base64
+Optional
strict: booleanhow to handle invalid inputs
+- true: method will throw error if unable to decode a value
+- false: unhandled values will be replaced with `null`
+
+Decodes a value using the provided encoding
+Optional
strict?: booleanExtract the target capture group from provided patterns.
+Index of the targeted capture group. +Group 0 mean the whole pattern, first group begin at index 1 +Default to the first capture group
+Utf8 array. Contain null if original value is null or regex capture nothing.
+> df = pl.DataFrame({
... 'a': [
... 'http://vote.com/ballon_dor?candidate=messi&ref=polars',
... 'http://vote.com/ballon_dor?candidat=jorginho&ref=polars',
... 'http://vote.com/ballon_dor?candidate=ronaldo&ref=polars'
... ]})
> df.select(pl.col('a').str.extract(/candidate=(\w+)/, 1))
shape: (3, 1)
┌─────────┐
│ a │
│ --- │
│ str │
╞═════════╡
│ messi │
├╌╌╌╌╌╌╌╌╌┤
│ null │
├╌╌╌╌╌╌╌╌╌┤
│ ronaldo │
└─────────┘
+
+
+Parse string values as JSON. +Throw errors if encounter invalid JSON strings.
+Optional
dtype: DataTypeOptional
inferSchemaLength: numberDF with struct
+Not implemented ATM
+>>> df = pl.DataFrame( {json: ['{"a":1, "b": true}', null, '{"a":2, "b": false}']} )
>>> df.select(pl.col("json").str.jsonDecode())
shape: (3, 1)
┌─────────────┐
│ json │
│ --- │
│ struct[2] │
╞═════════════╡
│ {1,true} │
│ {null,null} │
│ {2,false} │
└─────────────┘
See Also
----------
jsonPathMatch : Extract the first match of json string with provided JSONPath expression.
+
+
+Parse string values as JSON. +Throw errors if encounter invalid JSON strings.
+Optional
dtype: DataTypeOptional
inferSchemaLength: numberDF with struct
+Not implemented ATM
+0.8.4
+>>> df = pl.DataFrame( {json: ['{"a":1, "b": true}', null, '{"a":2, "b": false}']} )
>>> df.select(pl.col("json").str.jsonExtract())
shape: (3, 1)
┌─────────────┐
│ json │
│ --- │
│ struct[2] │
╞═════════════╡
│ {1,true} │
│ {null,null} │
│ {2,false} │
└─────────────┘
See Also
----------
jsonPathMatch : Extract the first match of json string with provided JSONPath expression.
+
+
+Extract the first match of json string with provided JSONPath expression. +Throw errors if encounter invalid json strings. +All return value will be casted to Utf8 regardless of the original value.
+Utf8 array. Contain null if original value is null or the jsonPath
return nothing.
https://goessner.net/articles/JsonPath/
+>>> df = pl.DataFrame({
... 'json_val': [
... '{"a":"1"}',
... null,
... '{"a":2}',
... '{"a":2.1}',
... '{"a":true}'
... ]
... })
>>> df.select(pl.col('json_val').str.jsonPathMatch('$.a')
shape: (5,)
Series: 'json_val' [str]
[
"1"
null
"2"
"2.1"
"true"
]
+
+
+Add a trailing fillChar to a string until string length is reached. +If string is longer or equal to given length no modifications will be done
+of the final string
+that will fill the string.
+If a string longer than 1 character is provided only the first character will be used +*
+> df = pl.DataFrame({
... 'foo': [
... "a",
... "b",
... "LONG_WORD",
... "cow"
... ]})
> df.select(pl.col('foo').str.padEnd("_", 3)
shape: (4, 1)
┌──────────┐
│ a │
│ -------- │
│ str │
╞══════════╡
│ a__ │
├╌╌╌╌╌╌╌╌╌╌┤
│ b__ │
├╌╌╌╌╌╌╌╌╌╌┤
│ LONG_WORD│
├╌╌╌╌╌╌╌╌╌╌┤
│ cow │
└──────────┘
+
+
+Add a leading fillChar to a string until string length is reached. +If string is longer or equal to given length no modifications will be done
+of the final string
+that will fill the string.
+If a string longer than 1 character is provided only the first character will be used
+> df = pl.DataFrame({
... 'foo': [
... "a",
... "b",
... "LONG_WORD",
... "cow"
... ]})
> df.select(pl.col('foo').str.padStart("_", 3)
shape: (4, 1)
┌──────────┐
│ a │
│ -------- │
│ str │
╞══════════╡
│ __a │
├╌╌╌╌╌╌╌╌╌╌┤
│ __b │
├╌╌╌╌╌╌╌╌╌╌┤
│ LONG_WORD│
├╌╌╌╌╌╌╌╌╌╌┤
│ cow │
└──────────┘
+
+
+Parse a Series of dtype Utf8 to a Date/Datetime Series.
+Date or Datetime.
+Optional
fmt: stringformatting syntax. Read more
+Parse a Series of dtype Utf8 to a Date/Datetime Series.
+Date or Datetime.
+Optional
fmt: stringformatting syntax. Read more
+Parse a Series of dtype Utf8 to a Date/Datetime Series.
+Add leading "0" to a string until string length is reached. +If string is longer or equal to given length no modifications will be done
+padStart +*
+> df = pl.DataFrame({
... 'foo': [
... "a",
... "b",
... "LONG_WORD",
... "cow"
... ]})
> df.select(pl.col('foo').str.justify(3)
shape: (4, 1)
┌──────────┐
│ a │
│ -------- │
│ str │
╞══════════╡
│ 00a │
├╌╌╌╌╌╌╌╌╌╌┤
│ 00b │
├╌╌╌╌╌╌╌╌╌╌┤
│ LONG_WORD│
├╌╌╌╌╌╌╌╌╌╌┤
│ cow │
└──────────┘
+
+
+Struct functions
+Options for writing Avro files
+Options for DataFrame.writeCSV
+Optional
batchOptional
dateOptional
datetimeOptional
floatOptional
includeOptional
includeOptional
lineOptional
nullOptional
quoteOptional
sepOptional
timeOptions for DataFrame.writeIPC
+Options for DataFrame.writeJSON
+Options for DataFrame.writeParquet
+Configure polars; offers options for table formatting and more.
+A DataFrame is a two-dimensional data structure that represents data as a table +with rows and columns.
+Object, Array, or Series +Two-dimensional data in various forms. object must contain Arrays. +Array may contain Series or other Arrays.
+Array of str, default undefined +Column labels to use for resulting DataFrame. If specified, overrides any +labels already present in the data. Must match data dimensions.
+'col' | 'row' default undefined +Whether to interpret two-dimensional data as columns or as rows. If None, +the orientation is inferred by matching the columns and data dimensions. If +this does not yield conclusive results, column orientation is used.
+Constructing a DataFrame from an object :
+> const data = {'a': [1n, 2n], 'b': [3, 4]};
> const df = pl.DataFrame(data);
> console.log(df.toString());
shape: (2, 2)
╭─────┬─────╮
│ a ┆ b │
│ --- ┆ --- │
│ u64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 4 │
╰─────┴─────╯
+
+
+Notice that the dtype is automatically inferred as a polars Int64:
+> df.dtypes
['UInt64', `Int64']
+
+
+In order to specify dtypes for your columns, initialize the DataFrame with a list +of Series instead:
+> const data = [pl.Series('col1', [1, 2], pl.Float32), pl.Series('col2', [3, 4], pl.Int64)];
> const df2 = pl.DataFrame(series);
> console.log(df2.toString());
shape: (2, 2)
╭──────┬──────╮
│ col1 ┆ col2 │
│ --- ┆ --- │
│ f32 ┆ i64 │
╞══════╪══════╡
│ 1 ┆ 3 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ 4 │
╰──────┴──────╯
+
+
+Constructing a DataFrame from a list of lists, row orientation inferred:
+> const data = [[1, 2, 3], [4, 5, 6]];
> const df4 = pl.DataFrame(data, ['a', 'b', 'c']);
> console.log(df4.toString());
shape: (2, 3)
╭─────┬─────┬─────╮
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1 ┆ 2 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 4 ┆ 5 ┆ 6 │
╰─────┴─────┴─────╯
+
+
+Optional
destOrOptions: anyOptional
options: anysince 0.4.0 use writeCSV
+Write the DataFrame disk in avro format.
+Optional
options: WriteAvroOptionsOptional
options: WriteAvroOptionsWrite DataFrame to comma-separated values file (csv).
+If no options are specified, it will return a new string containing the contents
+> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
> df.writeCSV();
foo,bar,ham
1,6,a
2,7,b
3,8,c
// using a file path
> df.head(1).writeCSV("./foo.csv")
// foo.csv
foo,bar,ham
1,6,a
// using a write stream
> const writeStream = new Stream.Writable({
... write(chunk, encoding, callback) {
... console.log("writeStream: %O', chunk.toString());
... callback(null);
... }
... });
> df.head(1).writeCSV(writeStream, {includeHeader: false});
writeStream: '1,6,a'
+
+
+Optional
options: WriteCsvOptionsWrite to Arrow IPC feather file, either to a file path or to a write stream.
+Optional
options: WriteIPCOptionsOptional
options: WriteIPCOptionsWrite to Arrow IPC stream file, either to a file path or to a write stream.
+Optional
options: WriteIPCOptionsOptional
options: WriteIPCOptionsWrite Dataframe to JSON string, file, or write stream
+Optional
options: { json | lines
+> const df = pl.DataFrame({
... foo: [1,2,3],
... bar: ['a','b','c']
... })
> df.writeJSON({format:"json"})
`[ {"foo":1.0,"bar":"a"}, {"foo":2.0,"bar":"b"}, {"foo":3.0,"bar":"c"}]`
> df.writeJSON({format:"lines"})
`{"foo":1.0,"bar":"a"}
{"foo":2.0,"bar":"b"}
{"foo":3.0,"bar":"c"}`
// writing to a file
> df.writeJSON("/path/to/file.json", {format:'lines'})
+
+
+Optional
options: { Write the DataFrame disk in parquet format.
+Optional
options: WriteParquetOptionsOptional
options: WriteParquetOptionsOptional
destination: anyOptional
options: anysince 0.4.0 use writeIPC
+Optional
destination: anyOptional
options: anysince 0.4.0 use writeParquet
+Sample from this DataFrame by setting either n
or frac
.
Optional
opts: { Optional
seed?: number | bigintOptional
withOptional
opts: { Optional
seed?: number | bigintOptional
withOptional
n: numberOptional
frac: numberOptional
withReplacement: booleanOptional
seed: number | bigintSummary statistics for a DataFrame.
+Only summarizes numeric datatypes at the moment and returns nulls for non numeric datatypes.
+Example
+> const df = pl.DataFrame({
... 'a': [1.0, 2.8, 3.0],
... 'b': [4, 5, 6],
... "c": [True, False, True]
... });
... df.describe()
shape: (5, 4)
╭──────────┬───────┬─────┬──────╮
│ describe ┆ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 │
╞══════════╪═══════╪═════╪══════╡
│ "mean" ┆ 2.267 ┆ 5 ┆ null │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤
│ "std" ┆ 1.102 ┆ 1 ┆ null │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤
│ "min" ┆ 1 ┆ 4 ┆ 0.0 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤
│ "max" ┆ 3 ┆ 6 ┆ 1 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤
│ "median" ┆ 2.8 ┆ 5 ┆ null │
╰──────────┴───────┴─────┴──────╯
+
+
+Remove column from DataFrame and return as new.
+> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ['a', 'b', 'c'],
... "apple": ['a', 'b', 'c']
... });
> console.log(df.drop(['ham', 'apple']).toString());
shape: (3, 2)
╭─────┬─────╮
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1 ┆ 6 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 7 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 8 │
╰─────┴─────╯
+
+
+Rest
...names: string[]Return a new DataFrame where the null values are dropped.
+This method only drops nulls row-wise if any single value of the row is null.
+> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, null, 8],
... "ham": ['a', 'b', 'c']
... });
> console.log(df.dropNulls().toString());
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ "a" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" │
└─────┴─────┴─────┘
+
+
+Rest
...columns: string[]Explode DataFrame
to long format by exploding a column with Lists.
> const df = pl.DataFrame({
... "letters": ["c", "c", "a", "c", "a", "b"],
... "nrs": [[1, 2], [1, 3], [4, 3], [5, 5, 5], [6], [2, 1, 2]]
... });
> console.log(df.toString());
shape: (6, 2)
╭─────────┬────────────╮
│ letters ┆ nrs │
│ --- ┆ --- │
│ str ┆ list [i64] │
╞═════════╪════════════╡
│ "c" ┆ [1, 2] │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "c" ┆ [1, 3] │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "a" ┆ [4, 3] │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "c" ┆ [5, 5, 5] │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "a" ┆ [6] │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "b" ┆ [2, 1, 2] │
╰─────────┴────────────╯
> df.explode("nrs")
shape: (13, 2)
╭─────────┬─────╮
│ letters ┆ nrs │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═════╡
│ "c" ┆ 1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 2 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 3 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 5 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "a" ┆ 6 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "b" ┆ 2 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "b" ┆ 1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "b" ┆ 2 │
╰─────────┴─────╯
+
+
+Rest
...columns: ExprOrString[]Extend the memory backed by this DataFrame
with the values from other
.
Different from vstack
which adds the chunks from other
to the chunks of this DataFrame
+extent
appends the data from other
to the underlying memory locations and thus may cause a reallocation.
If this does not cause a reallocation, the resulting data structure will not have any extra chunks +and thus will yield faster queries.
+Prefer extend
over vstack
when you want to do a query after a single append. For instance during
+online operations where you add n
rows and rerun a query.
Prefer vstack
over extend
when you want to append many times before doing a query. For instance
+when you read in multiple files and when to store them in a single DataFrame
.
+In the latter case, finish the sequence of vstack
operations with a rechunk
.
Fill null/missing values by a filling strategy
+One of:
+DataFrame with None replaced with the filling strategy.
+Filter the rows in the DataFrame based on a predicate expression.
+Expression that evaluates to a boolean Series.
+> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
// Filter on one condition
> df.filter(pl.col("foo").lt(3))
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 7 ┆ b │
└─────┴─────┴─────┘
// Filter on multiple conditions
> df.filter(
... pl.col("foo").lt(3)
... .and(pl.col("ham").eq(pl.lit("a")))
... )
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
└─────┴─────┴─────┘
+
+
+Apply a horizontal reduction on a DataFrame.
+This can be used to effectively determine aggregations on a row level, +and can be applied to any DataType that can be supercasted (casted to a similar parent type).
+An example of the supercast rules when applying an arithmetic operation on two DataTypes are for instance:
+Series
+> // A horizontal sum operation
> let df = pl.DataFrame({
... "a": [2, 1, 3],
... "b": [1, 2, 3],
... "c": [1.0, 2.0, 3.0]
... });
> df.fold((s1, s2) => s1.plus(s2))
Series: 'a' [f64]
[
4
5
9
]
> // A horizontal minimum operation
> df = pl.DataFrame({
... "a": [2, 1, 3],
... "b": [1, 2, 3],
... "c": [1.0, 2.0, 3.0]
... });
> df.fold((s1, s2) => s1.zipWith(s1.lt(s2), s2))
Series: 'a' [f64]
[
1
1
3
]
> // A horizontal string concatenation
> df = pl.DataFrame({
... "a": ["foo", "bar", 2],
... "b": [1, 2, 3],
... "c": [1.0, 2.0, 3.0]
... })
> df.fold((s1, s2) => s.plus(s2))
Series: '' [f64]
[
"foo11"
"bar22
"233"
]
+
+
+Groups based on a time value (or index value of type Int32, Int64). Time windows are calculated and rows are assigned to windows. +Different from a normal groupby is that a row can be member of multiple groups. The time/index window could +be seen as a rolling window, with a window size determined by dates/times/values instead of slots in the DataFrame.
+A window is defined by:
+The every
, period
and offset
arguments are created with
+the following string language:
Or combine them: +"3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds
+In case of a groupbyDynamic on an integer column, the windows are defined by:
+Optional
by?: ColumnsOrExprOptional
check_Optional
closed?: Optional
includeOptional
offset?: stringOptional
period?: stringCreate rolling groups based on a time column (or index value of type Int32, Int64).
+Different from a rolling groupby the windows are now determined by the individual values and are not of constant +intervals. For constant intervals use groupByDynamic
+The period
and offset
arguments are created with
+the following string language:
Or combine them: +"3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds
+In case of a groupby_rolling on an integer column, the windows are defined by:
+Optional
by?: ColumnsOrExprOptional
check_Optional
closed?: Optional
offset?: string
>dates = [
... "2020-01-01 13:45:48",
... "2020-01-01 16:42:13",
... "2020-01-01 16:45:09",
... "2020-01-02 18:12:48",
... "2020-01-03 19:45:32",
... "2020-01-08 23:16:43",
... ]
>df = pl.DataFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]}).withColumn(
... pl.col("dt").str.strptime(pl.Datetime)
... )
>out = df.groupbyRolling({indexColumn:"dt", period:"2d"}).agg(
... [
... pl.sum("a").alias("sum_a"),
... pl.min("a").alias("min_a"),
... pl.max("a").alias("max_a"),
... ]
... )
>assert(out["sum_a"].toArray() === [3, 10, 15, 24, 11, 1])
>assert(out["max_a"].toArray() === [3, 7, 7, 9, 9, 1])
>assert(out["min_a"].toArray() === [3, 3, 3, 3, 2, 1])
>out
shape: (6, 4)
┌─────────────────────┬───────┬───────┬───────┐
│ dt ┆ a_sum ┆ a_max ┆ a_min │
│ --- ┆ --- ┆ --- ┆ --- │
│ datetime[ms] ┆ i64 ┆ i64 ┆ i64 │
╞═════════════════════╪═══════╪═══════╪═══════╡
│ 2020-01-01 13:45:48 ┆ 3 ┆ 3 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-01 16:42:13 ┆ 10 ┆ 7 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-01 16:45:09 ┆ 15 ┆ 7 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-02 18:12:48 ┆ 24 ┆ 9 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-03 19:45:32 ┆ 11 ┆ 9 ┆ 2 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-08 23:16:43 ┆ 1 ┆ 1 ┆ 1 │
└─────────────────────┴───────┴───────┴───────┘
+
+
+Hash and combine the rows in this DataFrame. (Hash value is UInt64)
+Optional
k0: numberseed parameter
+Optional
k1: numberseed parameter
+Optional
k2: numberseed parameter
+Optional
k3: numberseed parameter
+Optional
k0?: numberOptional
k1?: numberOptional
k2?: numberOptional
k3?: numberGet first N rows as DataFrame.
+Optional
length: numberLength of the head.
+> const df = pl.DataFrame({
... "foo": [1, 2, 3, 4, 5],
... "bar": [6, 7, 8, 9, 10],
... "ham": ['a', 'b', 'c', 'd','e']
... });
> df.head(3)
shape: (3, 3)
╭─────┬─────┬─────╮
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ "a" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" │
╰─────┴─────┴─────╯
+
+
+Return a new DataFrame grown horizontally by stacking multiple Series to it.
+> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
> const x = pl.Series("apple", [10, 20, 30])
> df.hStack([x])
shape: (3, 4)
╭─────┬─────┬─────┬───────╮
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 │
╞═════╪═════╪═════╪═══════╡
│ 1 ┆ 6 ┆ "a" ┆ 10 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" ┆ 20 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" ┆ 30 │
╰─────┴─────┴─────┴───────╯
+
+
+SQL like joins.
+> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ['a', 'b', 'c']
... });
> const otherDF = pl.DataFrame({
... "apple": ['x', 'y', 'z'],
... "ham": ['a', 'b', 'd']
... });
> df.join(otherDF, {on: 'ham'})
shape: (2, 4)
╭─────┬─────┬─────┬───────╮
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ str │
╞═════╪═════╪═════╪═══════╡
│ 1 ┆ 6 ┆ "a" ┆ "x" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" ┆ "y" │
╰─────┴─────┴─────┴───────╯
+
+
+Perform an asof join. This is similar to a left-join except that we +match on nearest key rather than equal keys.
+Both DataFrames must be sorted by the asofJoin key.
+For each row in the left DataFrame:
+A "backward" search selects the last row in the right DataFrame whose +'on' key is less than or equal to the left's key.
+A "forward" search selects the first row in the right DataFrame whose +'on' key is greater than or equal to the left's key.
+The default is "backward".
+DataFrame to join with.
+Optional
allowAllow the physical plan to optionally evaluate the computation of both DataFrames up to the join in parallel.
+Optional
by?: string | string[]Optional
byjoin on these columns before doing asof join
+Optional
byjoin on these columns before doing asof join
+Optional
forceForce the physical plan to evaluate the computation of both DataFrames up to the join in parallel.
+Optional
leftJoin column of the left DataFrame.
+Optional
on?: stringJoin column of both DataFrames. If set, leftOn
and rightOn
should be undefined.
Optional
rightJoin column of the right DataFrame.
+Optional
strategy?: "backward" | "forward"One of 'forward', 'backward'
+Optional
suffix?: stringSuffix to append to columns with a duplicate name.
+Optional
tolerance?: string | numberNumeric tolerance. By setting this the join will only be done if the near keys are within this distance. +If an asof join is done on columns of dtype "Date", "Datetime" you +use the following string language:
+Or combine them:
+> const gdp = pl.DataFrame({
... date: [
... new Date('2016-01-01'),
... new Date('2017-01-01'),
... new Date('2018-01-01'),
... new Date('2019-01-01'),
... ], // note record date: Jan 1st (sorted!)
... gdp: [4164, 4411, 4566, 4696],
... })
> const population = pl.DataFrame({
... date: [
... new Date('2016-05-12'),
... new Date('2017-05-12'),
... new Date('2018-05-12'),
... new Date('2019-05-12'),
... ], // note record date: May 12th (sorted!)
... "population": [82.19, 82.66, 83.12, 83.52],
... })
> population.joinAsof(
... gdp,
... {leftOn:"date", rightOn:"date", strategy:"backward"}
... )
shape: (4, 3)
┌─────────────────────┬────────────┬──────┐
│ date ┆ population ┆ gdp │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ i64 │
╞═════════════════════╪════════════╪══════╡
│ 2016-05-12 00:00:00 ┆ 82.19 ┆ 4164 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2017-05-12 00:00:00 ┆ 82.66 ┆ 4411 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2018-05-12 00:00:00 ┆ 83.12 ┆ 4566 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2019-05-12 00:00:00 ┆ 83.52 ┆ 4696 │
└─────────────────────┴────────────┴──────┘
+
+
+Aggregate the columns of this DataFrame to their mean value.
+Optional
nullStrategy: "ignore" | "propagate"Create a spreadsheet-style pivot table as a DataFrame.
+Column values to aggregate. Can be multiple columns if the columns arguments contains multiple columns as well
+Optional
aggregateAny of: +- "sum" +- "max" +- "min" +- "mean" +- "median" +- "first" +- "last" +- "count" +Defaults to "first"
+One or multiple keys to group by
+Optional
maintainSort the grouped keys so that the output order is predictable.
+Optional
separator?: stringUsed as separator/delimiter in generated column names.
+Optional
sortSort the transposed columns by name. Default is by order of discovery.
+ > const df = pl.DataFrame(
... {
... "foo": ["one", "one", "one", "two", "two", "two"],
... "bar": ["A", "B", "C", "A", "B", "C"],
... "baz": [1, 2, 3, 4, 5, 6],
... }
... );
> df.pivot("baz", {index:"foo", columns:"bar"});
shape: (2, 4)
┌─────┬─────┬─────┬─────┐
│ foo ┆ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╪═════╡
│ one ┆ 1 ┆ 2 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ two ┆ 4 ┆ 5 ┆ 6 │
└─────┴─────┴─────┴─────┘
+
+
+Rename column names.
+Key value pairs that map from old name to new name.
+> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
> df.rename({"foo": "apple"});
╭───────┬─────┬─────╮
│ apple ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═══════╪═════╪═════╡
│ 1 ┆ 6 ┆ "a" │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" │
╰───────┴─────┴─────╯
+
+
+Replace a column at an index location.
+> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
> const x = pl.Series("apple", [10, 20, 30]);
> df.replaceAtIdx(0, x);
shape: (3, 3)
╭───────┬─────┬─────╮
│ apple ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═══════╪═════╪═════╡
│ 10 ┆ 6 ┆ "a" │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 20 ┆ 7 ┆ "b" │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 30 ┆ 8 ┆ "c" │
╰───────┴─────┴─────╯
+
+
+Serializes object to desired format via serde
+Shift the values by a given period and fill the parts that will be empty due to this operation
+with Nones
.
Number of places to shift (may be negative).
+> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
> df.shift(1);
shape: (3, 3)
┌──────┬──────┬──────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ null ┆ null ┆ null │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1 ┆ 6 ┆ "a" │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" │
└──────┴──────┴──────┘
> df.shift(-1)
shape: (3, 3)
┌──────┬──────┬──────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 2 ┆ 7 ┆ "b" │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ null ┆ null ┆ null │
└──────┴──────┴──────┘
+
+
+Shift the values by a given period and fill the parts that will be empty due to this operation
+with the result of the fill_value
expression.
> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
> df.shiftAndFill({n:1, fill_value:0});
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 0 ┆ 0 ┆ "0" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1 ┆ 6 ┆ "a" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" │
└─────┴─────┴─────┘
+
+
+Shrink memory usage of this DataFrame to fit the exact capacity needed to hold the data.
+Slice this DataFrame over the rows direction.
+Length of the slice
+Offset index.
+> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ['a', 'b', 'c']
... });
> df.slice(1, 2); // Alternatively `df.slice({offset:1, length:2})`
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2 ┆ 7 ┆ "b" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" │
└─────┴─────┴─────┘
+
+
+Sort the DataFrame by column.
+By which columns to sort. Only accepts string.
+Optional
descending: booleanOptional
nulls_last: booleanOptional
maintain_order: booleanOptional
descending?: booleanOptional
maintain_Optional
nulls_Aggregate the columns of this DataFrame to their mean value.
+Optional
nullStrategy: "ignore" | "propagate"Optional
length: number> const df = pl.DataFrame({
... "letters": ["c", "c", "a", "c", "a", "b"],
... "nrs": [1, 2, 3, 4, 5, 6]
... });
> console.log(df.toString());
shape: (6, 2)
╭─────────┬─────╮
│ letters ┆ nrs │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═════╡
│ "c" ┆ 1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 2 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "a" ┆ 3 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 4 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "a" ┆ 5 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "b" ┆ 6 │
╰─────────┴─────╯
> df.groupby("letters")
... .tail(2)
... .sort("letters")
shape: (5, 2)
╭─────────┬─────╮
│ letters ┆ nrs │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═════╡
│ "a" ┆ 3 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "a" ┆ 5 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "b" ┆ 6 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 2 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 4 │
╰─────────┴─────╯
+
+
+Transpose a DataFrame over the diagonal.
+Optional
options: { Optional
columnOptional generator/iterator that yields column names. Will be used to replace the columns in the DataFrame.
+Optional
headerIf includeHeader
is set, this determines the name of the column that will be inserted
Optional
includeIf set, the column names will be added as first column.
+This is a very expensive operation. Perhaps you can do it differently.
+> const df = pl.DataFrame({"a": [1, 2, 3], "b": [1, 2, 3]});
> df.transpose({includeHeader:true})
shape: (2, 4)
┌────────┬──────────┬──────────┬──────────┐
│ column ┆ column_0 ┆ column_1 ┆ column_2 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 │
╞════════╪══════════╪══════════╪══════════╡
│ a ┆ 1 ┆ 2 ┆ 3 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ b ┆ 1 ┆ 2 ┆ 3 │
└────────┴──────────┴──────────┴──────────┘
// replace the auto generated column names with a list
> df.transpose({includeHeader:false, columnNames:["a", "b", "c"]})
shape: (2, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1 ┆ 2 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1 ┆ 2 ┆ 3 │
└─────┴─────┴─────┘
// Include the header as a separate column
> df.transpose({
... includeHeader:true,
... headerName:"foo",
... columnNames:["a", "b", "c"]
... })
shape: (2, 4)
┌─────┬─────┬─────┬─────┐
│ foo ┆ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ a ┆ 1 ┆ 2 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ b ┆ 1 ┆ 2 ┆ 3 │
└─────┴─────┴─────┴─────┘
// Replace the auto generated column with column names from a generator function
> function *namesGenerator() {
... const baseName = "my_column_";
... let count = 0;
... let name = `${baseName}_${count}`;
... count++;
... yield name;
... }
> df.transpose({includeHeader:false, columnNames:namesGenerator})
shape: (2, 3)
┌─────────────┬─────────────┬─────────────┐
│ my_column_0 ┆ my_column_1 ┆ my_column_2 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════════════╪═════════════╪═════════════╡
│ 1 ┆ 2 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 2 ┆ 3 │
└─────────────┴─────────────┴─────────────┘
+
+
+Drop duplicate rows from this DataFrame.
+Note that this fails if there is a column of type List
in the DataFrame.
Optional
maintainOrder: booleanOptional
subset: ColumnSelectionsubset to drop duplicates for
+Optional
keep: "first" | "last""first" | "last"
+Optional
keep?: "first" | "last"Optional
maintainOptional
subset?: ColumnSelectionDecompose a struct into its fields. The fields will be inserted in to the DataFrame
on the
+location of the struct
type.
Names of the struct columns that will be decomposed by its fields
+> const df = pl.DataFrame({
... "int": [1, 2],
... "str": ["a", "b"],
... "bool": [true, null],
... "list": [[1, 2], [3]],
... })
... .toStruct("my_struct")
... .toFrame();
> df
shape: (2, 1)
┌─────────────────────────────┐
│ my_struct │
│ --- │
│ struct[4]{'int',...,'list'} │
╞═════════════════════════════╡
│ {1,"a",true,[1, 2]} │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ {2,"b",null,[3]} │
└─────────────────────────────┘
> df.unnest("my_struct")
shape: (2, 4)
┌─────┬─────┬──────┬────────────┐
│ int ┆ str ┆ bool ┆ list │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ bool ┆ list [i64] │
╞═════╪═════╪══════╪════════════╡
│ 1 ┆ a ┆ true ┆ [1, 2] │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ b ┆ null ┆ [3] │
└─────┴─────┴──────┴────────────┘
+
+
+Unpivot DataFrame to long format.
+Columns to use as identifier variables.
+Values to use as value variables.
+> const df1 = pl.DataFrame({
... 'id': [1],
... 'asset_key_1': ['123'],
... 'asset_key_2': ['456'],
... 'asset_key_3': ['abc'],
... });
> df1.unpivot('id', ['asset_key_1', 'asset_key_2', 'asset_key_3']);
shape: (3, 3)
┌─────┬─────────────┬───────┐
│ id ┆ variable ┆ value │
│ --- ┆ --- ┆ --- │
│ f64 ┆ str ┆ str │
╞═════╪═════════════╪═══════╡
│ 1 ┆ asset_key_1 ┆ 123 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 1 ┆ asset_key_2 ┆ 456 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 1 ┆ asset_key_3 ┆ abc │
└─────┴─────────────┴───────┘
+
+
+Upsample a DataFrame at a regular frequency.
+The every
and offset
arguments are created with the following string language:
Or combine them:
+By "calendar day", we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). +Similarly for "calendar week", "calendar month", "calendar quarter", and "calendar year".
+Time column will be used to determine a date range. +Note that this column has to be sorted for the output to make sense.
+Interval will start 'every' duration.
+Optional
by: string | string[]First group by these columns and then upsample for every group.
+Optional
maintainOrder: booleanKeep the ordering predictable. This is slower.
+DataFrame
+Result will be sorted by timeColumn
(but note that if by
columns are passed, it will only be sorted within each by
group).
Upsample a DataFrame by a certain interval.
+++++++const df = pl.DataFrame({ +"date": [ +new Date(2024, 1, 1), +new Date(2024, 3, 1), +new Date(2024, 4, 1), +new Date(2024, 5, 1), +], +"groups": ["A", "B", "A", "B"], +"values": [0, 1, 2, 3], +}) +.withColumn(pl.col("date").cast(pl.Date).alias("date")) +.sort("date");
+
++++++df.upsample({timeColumn: "date", every: "1mo", by: "groups", maintainOrder: true}) +.select(pl.col("*").forwardFill()); +shape: (7, 3) +┌────────────┬────────┬────────┐ +│ date ┆ groups ┆ values │ +│ --- ┆ --- ┆ --- │ +│ date ┆ str ┆ f64 │ +╞════════════╪════════╪════════╡ +│ 2024-02-01 ┆ A ┆ 0.0 │ +│ 2024-03-01 ┆ A ┆ 0.0 │ +│ 2024-04-01 ┆ A ┆ 0.0 │ +│ 2024-05-01 ┆ A ┆ 2.0 │ +│ 2024-04-01 ┆ B ┆ 1.0 │ +│ 2024-05-01 ┆ B ┆ 1.0 │ +│ 2024-06-01 ┆ B ┆ 3.0 │ +└────────────┴────────┴────────┘
+
Optional
by?: string | string[]Optional
maintainGrow this DataFrame vertically by stacking a DataFrame to it.
+> const df1 = pl.DataFrame({
... "foo": [1, 2],
... "bar": [6, 7],
... "ham": ['a', 'b']
... });
> const df2 = pl.DataFrame({
... "foo": [3, 4],
... "bar": [8 , 9],
... "ham": ['c', 'd']
... });
> df1.vstack(df2);
shape: (4, 3)
╭─────┬─────┬─────╮
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ "a" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 4 ┆ 9 ┆ "d" │
╰─────┴─────┴─────╯
+
+
+Return a new DataFrame with the column renamed.
+Expressions that can be used in various contexts.
+Datetime namespace
+List namespace
+String namespace
+Struct namespace
+Clip (limit) the values in an array to any value that fits in 64 floating point range. +Only works for the following dtypes: {Int32, Int64, Float32, Float64, UInt32}. +If you want to clip other dtypes, consider writing a when -> then -> otherwise expression
+Minimum value
+Maximum value
+Sample from this DataFrame by setting either n
or frac
.
Optional
opts: { Optional
seed?: number | bigintOptional
withOptional
opts: { Optional
seed?: number | bigintOptional
withOptional
n: numberOptional
frac: numberOptional
withReplacement: booleanOptional
seed: number | bigintGet the group indexes of the group by operation. +Should be used in aggregation context only.
+ >>> const df = pl.DataFrame(
... {
... "group": [
... "one",
... "one",
... "one",
... "two",
... "two",
... "two",
... ],
... "value": [94, 95, 96, 97, 97, 99],
... }
... )
>>> df.group_by("group", maintain_order=True).agg(pl.col("value").aggGroups())
shape: (2, 2)
┌───────┬───────────┐
│ group ┆ value │
│ --- ┆ --- │
│ str ┆ list[u32] │
╞═══════╪═══════════╡
│ one ┆ [0, 1, 2] │
│ two ┆ [3, 4, 5] │
└───────┴───────────┘
+
+
+Rename the output of an expression.
+new name
+> const df = pl.DataFrame({
... "a": [1, 2, 3],
... "b": ["a", "b", None],
... });
> df
shape: (3, 2)
╭─────┬──────╮
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪══════╡
│ 1 ┆ "a" │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ "b" │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ null │
╰─────┴──────╯
> df.select([
... pl.col("a").alias("bar"),
... pl.col("b").alias("foo"),
... ])
shape: (3, 2)
╭─────┬──────╮
│ bar ┆ foo │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪══════╡
│ 1 ┆ "a" │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ "b" │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ null │
╰─────┴──────╯
+
+
+Get the index values that would sort this column.
+Optional
reverse: booleanfalse -> order from small to large. +- true -> order from large to small.
+Optional
maintain_order: booleanUInt32 Series
+Optional
maintain_Optional
reverse?: booleanCalculate the n-th discrete difference.
+number of slots to shift
+ignore or drop
+Exponentially-weighted moving average.
+Expr that evaluates to a float 64 Series.
+Optional
alpha: numberOptional
adjust: booleanOptional
minPeriods: numberOptional
bias: booleanOptional
ignoreNulls: booleanOptional
adjust?: booleanOptional
alpha?: numberOptional
bias?: booleanOptional
ignoreOptional
minExponentially-weighted standard deviation.
+Expr that evaluates to a float 64 Series.
+Optional
alpha: numberOptional
adjust: booleanOptional
minPeriods: numberOptional
bias: booleanOptional
ignoreNulls: booleanOptional
adjust?: booleanOptional
alpha?: numberOptional
bias?: booleanOptional
ignoreOptional
minExponentially-weighted variance.
+Expr that evaluates to a float 64 Series.
+Optional
alpha: numberOptional
adjust: booleanOptional
minPeriods: numberOptional
bias: booleanOptional
ignoreNulls: booleanOptional
adjust?: booleanOptional
alpha?: numberOptional
bias?: booleanOptional
ignoreOptional
minExclude certain columns from a wildcard/regex selection.
+You may also use regexes in the exclude list. They must start with ^
and end with $
.
Rest
...columns: string[]Column(s) to exclude from selection
+ > const df = pl.DataFrame({
... "a": [1, 2, 3],
... "b": ["a", "b", None],
... "c": [None, 2, 1],
...});
> df
shape: (3, 3)
╭─────┬──────┬──────╮
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞═════╪══════╪══════╡
│ 1 ┆ "a" ┆ null │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ "b" ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ null ┆ 1 │
╰─────┴──────┴──────╯
> df.select(
... pl.col("*").exclude("b"),
... );
shape: (3, 2)
╭─────┬──────╮
│ a ┆ c │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════╡
│ 1 ┆ null │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ 1 │
╰─────┴──────╯
+
+
+Extend the Series with given number of values.
+The value to extend the Series with. This value may be null to fill with nulls.
+The number of values to extend.
+Extend the Series with given number of values.
+The value to extend the Series with. This value may be null to fill with nulls.
+The number of values to extend.
+Hash the Series.
+Optional
k0: numberOptional
k1: numberOptional
k2: numberOptional
k3: numberOptional
k0?: numberOptional
k1?: numberOptional
k2?: numberOptional
k3?: numberCheck if elements of this Series are in the right Series, or List values of the right Series.
+Series of primitive type or List type.
+Expr that evaluates to a Boolean Series.
+> const df = pl.DataFrame({
... "sets": [[1, 2, 3], [1, 2], [9, 10]],
... "optional_members": [1, 2, 3]
... });
> df.select(
... pl.col("optional_members").isIn("sets").alias("contains")
... );
shape: (3, 1)
┌──────────┐
│ contains │
│ --- │
│ bool │
╞══════════╡
│ true │
├╌╌╌╌╌╌╌╌╌╌┤
│ true │
├╌╌╌╌╌╌╌╌╌╌┤
│ false │
└──────────┘
+
+
+Keep the original root name of the expression.
+A groupby aggregation often changes the name of a column.
+With keepName
we can keep the original name of the column
> const df = pl.DataFrame({
... "a": [1, 2, 3],
... "b": ["a", "b", None],
... });
> df
... .groupBy("a")
... .agg(pl.col("b").list())
... .sort({by:"a"});
shape: (3, 2)
╭─────┬────────────╮
│ a ┆ b_agg_list │
│ --- ┆ --- │
│ i64 ┆ list [str] │
╞═════╪════════════╡
│ 1 ┆ [a] │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ [b] │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ [null] │
╰─────┴────────────╯
Keep the original column name:
> df
... .groupby("a")
... .agg(col("b").list().keepName())
... .sort({by:"a"})
shape: (3, 2)
╭─────┬────────────╮
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ list [str] │
╞═════╪════════════╡
│ 1 ┆ [a] │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ [b] │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ [null] │
╰─────┴────────────╯
+
+
+Apply window function over a subgroup.
+This is similar to a groupby + aggregation + self join. +Or similar to window functions in Postgres
+Rest
...partitionBy: ExprOrString[]Column(s) to partition by.
+> const df = pl.DataFrame({
... "groups": [1, 1, 2, 2, 1, 2, 3, 3, 1],
... "values": [1, 2, 3, 4, 5, 6, 7, 8, 8],
... });
> df.select(
... pl.col("groups").sum().over("groups")
... );
╭────────┬────────╮
│ groups ┆ values │
│ --- ┆ --- │
│ i32 ┆ i32 │
╞════════╪════════╡
│ 1 ┆ 16 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1 ┆ 16 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ 13 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ 13 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1 ┆ 16 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ 13 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3 ┆ 15 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3 ┆ 15 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1 ┆ 16 │
╰────────┴────────╯
+
+
+Add a prefix the to root column name of the expression.
+> const df = pl.DataFrame({
... "A": [1, 2, 3, 4, 5],
... "fruits": ["banana", "banana", "apple", "apple", "banana"],
... "B": [5, 4, 3, 2, 1],
... "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
... });
shape: (5, 4)
╭─────┬──────────┬─────┬──────────╮
│ A ┆ fruits ┆ B ┆ cars │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞═════╪══════════╪═════╪══════════╡
│ 1 ┆ "banana" ┆ 5 ┆ "beetle" │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ "banana" ┆ 4 ┆ "audi" │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ "apple" ┆ 3 ┆ "beetle" │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4 ┆ "apple" ┆ 2 ┆ "beetle" │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 5 ┆ "banana" ┆ 1 ┆ "beetle" │
╰─────┴──────────┴─────┴──────────╯
> df.select(
... pl.col("*").reverse().prefix("reverse_"),
... )
shape: (5, 8)
╭───────────┬────────────────┬───────────┬──────────────╮
│ reverse_A ┆ reverse_fruits ┆ reverse_B ┆ reverse_cars │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞═══════════╪════════════════╪═══════════╪══════════════╡
│ 5 ┆ "banana" ┆ 1 ┆ "beetle" │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4 ┆ "apple" ┆ 2 ┆ "beetle" │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ "apple" ┆ 3 ┆ "beetle" │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ "banana" ┆ 4 ┆ "audi" │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ "banana" ┆ 5 ┆ "beetle" │
╰───────────┴────────────────┴───────────┴──────────────╯
+
+
+Replace the given values by different values of the same data type.
+Value or sequence of values to replace. +Accepts expression input. Sequences are parsed as Series, other non-expression inputs are parsed as literals.
+Value or sequence of values to replace by.
+Accepts expression input. Sequences are parsed as Series, other non-expression inputs are parsed as literals.
+Length must match the length of old
or have length 1.
Replace a single value by another value. Values that were not replaced remain unchanged.
+ >>> const df = pl.DataFrame({"a": [1, 2, 2, 3]});
>>> df.withColumns(pl.col("a").replace(2, 100).alias("replaced"));
shape: (4, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════════╡
│ 1 ┆ 1 │
│ 2 ┆ 100 │
│ 2 ┆ 100 │
│ 3 ┆ 3 │
└─────┴──────────┘
+
+
+Replace multiple values by passing sequences to the old
and new_
parameters.
>>> df.withColumns(pl.col("a").replace([2, 3], [100, 200]).alias("replaced"));
shape: (4, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════════╡
│ 1 ┆ 1 │
│ 2 ┆ 100 │
│ 2 ┆ 100 │
│ 3 ┆ 200 │
└─────┴──────────┘
+
+
+Passing a mapping with replacements is also supported as syntactic sugar. +Specify a default to set all values that were not matched.
+ >>> const mapping = {2: 100, 3: 200};
>>> df.withColumns(pl.col("a").replace({ old: mapping }).alias("replaced");
shape: (4, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════════╡
│ 1 ┆ -1 │
│ 2 ┆ 100 │
│ 2 ┆ 100 │
│ 3 ┆ 200 │
└─────┴──────────┘
+
+
+Replace values by different values.
+Value or sequence of values to replace. +Accepts expression input. Sequences are parsed as Series, other non-expression inputs are parsed as literals.
+Value or sequence of values to replace by.
+Accepts expression input. Sequences are parsed as Series, other non-expression inputs are parsed as literals.
+Length must match the length of old
or have length 1.
Optional
default_: Set values that were not replaced to this value. +Defaults to keeping the original value. +Accepts expression input. Non-expression inputs are parsed as literals.
+Optional
returnDtype: DataTypeThe data type of the resulting expression. If set to None
(default), the data type is determined automatically based on the other inputs.
Replace a single value by another value. Values that were not replaced remain unchanged.
+ >>> const df = pl.DataFrame({"a": [1, 2, 2, 3]});
>>> df.withColumns(pl.col("a").replace(2, 100).alias("replaced"));
shape: (4, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════════╡
│ 1 ┆ 1 │
│ 2 ┆ 100 │
│ 2 ┆ 100 │
│ 3 ┆ 3 │
└─────┴──────────┘
+
+
+Replace multiple values by passing sequences to the old
and new_
parameters.
>>> df.withColumns(pl.col("a").replace([2, 3], [100, 200]).alias("replaced"));
shape: (4, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════════╡
│ 1 ┆ 1 │
│ 2 ┆ 100 │
│ 2 ┆ 100 │
│ 3 ┆ 200 │
└─────┴──────────┘
+
+
+Passing a mapping with replacements is also supported as syntactic sugar. +Specify a default to set all values that were not matched.
+ >>> const mapping = {2: 100, 3: 200};
>>> df.withColumns(pl.col("a").replaceStrict({ old: mapping, default_: -1, returnDtype: pl.Int64 }).alias("replaced");
shape: (4, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════════╡
│ 1 ┆ -1 │
│ 2 ┆ 100 │
│ 2 ┆ 100 │
│ 3 ┆ 200 │
└─────┴──────────┘
+
+
+Replacing by values of a different data type sets the return type based on
+a combination of the new_
data type and either the original data type or the
+default data type if it was set.
>>> const df = pl.DataFrame({"a": ["x", "y", "z"]});
>>> const mapping = {"x": 1, "y": 2, "z": 3};
>>> df.withColumns(pl.col("a").replaceStrict({ old: mapping }).alias("replaced"));
shape: (3, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪══════════╡
│ x ┆ 1 │
│ y ┆ 2 │
│ z ┆ 3 │
└─────┴──────────┘
>>> df.withColumns(pl.col("a").replaceStrict({ old: mapping, default_: None }).alias("replaced"));
shape: (3, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════════╡
│ x ┆ 1 │
│ y ┆ 2 │
│ z ┆ 3 │
└─────┴──────────┘
+
+
+Set the returnDtype
parameter to control the resulting data type directly.
>>> df.withColumns(pl.col("a").replaceStrict({ old: mapping, returnDtype: pl.UInt8 }).alias("replaced"));
shape: (3, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ str ┆ u8 │
╞═════╪══════════╡
│ x ┆ 1 │
│ y ┆ 2 │
│ z ┆ 3 │
└─────┴──────────┘
+
+
+Expression input is supported for all parameters.
+ >>> const df = pl.DataFrame({"a": [1, 2, 2, 3], "b": [1.5, 2.5, 5.0, 1.0]});
>>> df.withColumns(
... pl.col("a").replaceStrict({
... old: pl.col("a").max(),
... new_: pl.col("b").sum(),
... default_: pl.col("b"),
... }).alias("replaced")
... );
shape: (4, 3)
┌─────┬─────┬──────────┐
│ a ┆ b ┆ replaced │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 │
╞═════╪═════╪══════════╡
│ 1 ┆ 1.5 ┆ 1.5 │
│ 2 ┆ 2.5 ┆ 2.5 │
│ 2 ┆ 5.0 ┆ 5.0 │
│ 3 ┆ 1.0 ┆ 10.0 │
└─────┴─────┴──────────┘
+
+
+Serializes object to desired format via serde
+Shift the values by a given period and fill the parts that will be empty due to this operation
+Optional
periods: numbernumber of places to shift (may be negative).
+Shift the values by a given period and fill the parts that will be empty due to this operation
+Number of places to shift (may be negative).
+Fill null values with the result of this expression.
+Compute the sample skewness of a data set. +For normally distributed data, the skewness should be about zero. For +unimodal continuous distributions, a skewness value greater than zero means +that there is more weight in the right tail of the distribution.
+Optional
bias: booleanIf False, then the calculations are corrected for statistical bias.
+Sort this column. In projection/ selection context the whole column is sorted.
+Optional
reverse: booleanOptional
nullsLast: booleanIf true nulls are considered to be larger than any valid value
+Optional
nullsOptional
reverse?: booleanSort this column by the ordering of another column, or multiple other columns. +In projection/ selection context the whole column is sorted. +If used in a groupby context, the groups are sorted.
+The column(s) used for sorting.
+Optional
reverse: boolean | boolean[]false -> order from small to large. +true -> order from large to small.
+Optional
reverse?: boolean | boolean[]Apply a rolling max (moving max) over the values in this Series.
+A window of length window_size
will traverse the series. The values that fill this window
+will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
+Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanApply a rolling mean (moving mean) over the values in this Series.
+A window of length window_size
will traverse the series. The values that fill this window
+will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
+Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanApply a rolling min (moving min) over the values in this Series.
+A window of length window_size
will traverse the series. The values that fill this window
+will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
+Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanCompute a rolling quantile
+Optional
interpolation: InterpolationMethodOptional
windowSize: numberOptional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanOptional
by: stringOptional
closed: ClosedWindowCompute a rolling skew
+Size of the rolling window
+Optional
bias: booleanIf false, then the calculations are corrected for statistical bias.
+Compute a rolling skew
+Compute a rolling std dev
+A window of length window_size
will traverse the array. The values that fill this window
+will (optionally) be multiplied with the weights given by the weight
vector. The resulting
+values will be aggregated to their sum.
Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanOptional
ddof: numberApply a rolling sum (moving sum) over the values in this Series.
+A window of length window_size
will traverse the series. The values that fill this window
+will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
+Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanCompute a rolling variance.
+A window of length window_size
will traverse the series. The values that fill this window
+will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
+Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanOptional
ddof: numberRepresentation of a Lazy computation graph / query.
+Cache the result once the execution of the physical plan hits this node.
+Collect into a DataFrame.
+Note: use fetch
if you want to run this query on the first n
rows only.
+This can be a huge time saver in debugging queries.
Optional
opts: LazyOptionsDataFrame
+Optional
opts: LazyOptionsA string representation of the optimized query plan.
+Optional
opts: LazyOptionsDrop duplicate rows from this DataFrame.
+Note that this fails if there is a column of type List
in the DataFrame.
Optional
maintainOrder: booleanOptional
subset: ColumnSelectionsubset to drop duplicates for
+Optional
keep: "first" | "last""first" | "last"
+Optional
keep?: "first" | "last"Optional
maintainOptional
subset?: ColumnSelectionRemove one or multiple columns from a DataFrame.
+Rest
...names: string[]Drop rows with null values from this DataFrame. +This method only drops nulls row-wise if any single value of the row is null.
+Rest
...columns: string[]Explode lists to long format.
+Rest
...columns: ExprOrString[]Fetch is like a collect operation, but it overwrites the number of rows read by every scan
+Note that the fetch does not guarantee the final number of rows in the DataFrame. +Filter, join operations and a lower number of rows available in the scanned file influence +the final number of rows.
+Optional
numRows: numbercollect 'n' number of rows from data source
+Fill missing values
+Filter the rows in the DataFrame based on a predicate expression.
+> lf = pl.DataFrame({
> "foo": [1, 2, 3],
> "bar": [6, 7, 8],
> "ham": ['a', 'b', 'c']
> }).lazy()
> // Filter on one condition
> lf.filter(pl.col("foo").lt(3)).collect()
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 7 ┆ b │
└─────┴─────┴─────┘
+
+
+Groups based on a time value (or index value of type Int32, Int64). Time windows are calculated and rows are assigned to windows. +Different from a normal groupby is that a row can be member of multiple groups. The time/index window could +be seen as a rolling window, with a window size determined by dates/times/values instead of slots in the DataFrame.
+A window is defined by:
+The every
, period
and offset
arguments are created with
+the following string language:
Or combine them: +"3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds
+In case of a groupbyDynamic on an integer column, the windows are defined by:
+Optional
by?: ColumnsOrExprOptional
check_Optional
closed?: Optional
includeOptional
offset?: stringOptional
period?: stringCreate rolling groups based on a time column (or index value of type Int32, Int64).
+Different from a rolling groupby the windows are now determined by the individual values and are not of constant +intervals. For constant intervals use groupByDynamic
+The period
and offset
arguments are created with
+the following string language:
Or combine them: +"3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds
+In case of a groupby_rolling on an integer column, the windows are defined by:
+Optional
by?: ColumnsOrExprOptional
check_Optional
closed?: Optional
offset?: string
>dates = [
... "2020-01-01 13:45:48",
... "2020-01-01 16:42:13",
... "2020-01-01 16:45:09",
... "2020-01-02 18:12:48",
... "2020-01-03 19:45:32",
... "2020-01-08 23:16:43",
... ]
>df = pl.DataFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]}).withColumn(
... pl.col("dt").str.strptime(pl.Datetime)
... )
>out = df.groupbyRolling({indexColumn:"dt", period:"2d"}).agg(
... [
... pl.sum("a").alias("sum_a"),
... pl.min("a").alias("min_a"),
... pl.max("a").alias("max_a"),
... ]
... )
>assert(out["sum_a"].toArray() === [3, 10, 15, 24, 11, 1])
>assert(out["max_a"].toArray() === [3, 7, 7, 9, 9, 1])
>assert(out["min_a"].toArray() === [3, 3, 3, 3, 2, 1])
>out
shape: (6, 4)
┌─────────────────────┬───────┬───────┬───────┐
│ dt ┆ a_sum ┆ a_max ┆ a_min │
│ --- ┆ --- ┆ --- ┆ --- │
│ datetime[ms] ┆ i64 ┆ i64 ┆ i64 │
╞═════════════════════╪═══════╪═══════╪═══════╡
│ 2020-01-01 13:45:48 ┆ 3 ┆ 3 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-01 16:42:13 ┆ 10 ┆ 7 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-01 16:45:09 ┆ 15 ┆ 7 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-02 18:12:48 ┆ 24 ┆ 9 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-03 19:45:32 ┆ 11 ┆ 9 ┆ 2 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-08 23:16:43 ┆ 1 ┆ 1 ┆ 1 │
└─────────────────────┴───────┴───────┴───────┘
+
+
+Gets the first n
rows of the DataFrame. You probably don't want to use this!
Consider using the fetch
operation.
+The fetch
operation will truly load the first n
rows lazily.
Optional
length: numberSQL like joins.
+>>> const df = pl.DataFrame({
>>> foo: [1, 2, 3],
>>> bar: [6.0, 7.0, 8.0],
>>> ham: ['a', 'b', 'c'],
>>> }).lazy()
>>>
>>> const otherDF = pl.DataFrame({
>>> apple: ['x', 'y', 'z'],
>>> ham: ['a', 'b', 'd'],
>>> }).lazy();
>>> const result = await df.join(otherDF, { on: 'ham', how: 'inner' }).collect();
shape: (2, 4)
╭─────┬─────┬─────┬───────╮
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ str │
╞═════╪═════╪═════╪═══════╡
│ 1 ┆ 6 ┆ "a" ┆ "x" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" ┆ "y" │
╰─────┴─────┴─────┴───────╯
+
+
+Optional
allowOptional
forceOptional
suffix?: stringPerform an asof join. This is similar to a left-join except that we +match on nearest key rather than equal keys.
+Both DataFrames must be sorted by the asof_join key.
+For each row in the left DataFrame:
+A "backward" search selects the last row in the right DataFrame whose +'on' key is less than or equal to the left's key.
+A "forward" search selects the first row in the right DataFrame whose +'on' key is greater than or equal to the left's key.
+The default is "backward".
+DataFrame to join with.
+Optional
allowAllow the physical plan to optionally evaluate the computation of both DataFrames up to the join in parallel.
+Optional
by?: string | string[]Optional
byjoin on these columns before doing asof join
+Optional
byjoin on these columns before doing asof join
+Optional
forceForce the physical plan to evaluate the computation of both DataFrames up to the join in parallel.
+Optional
leftJoin column of the left DataFrame.
+Optional
on?: stringJoin column of both DataFrames. If set, leftOn
and rightOn
should be undefined.
Optional
rightJoin column of the right DataFrame.
+Optional
strategy?: "backward" | "forward"One of {'forward', 'backward'}
+Optional
suffix?: stringSuffix to append to columns with a duplicate name.
+Optional
tolerance?: string | numberNumeric tolerance. By setting this the join will only be done if the near keys are within this distance. +If an asof join is done on columns of dtype "Date", "Datetime" you +use the following string language:
+Or combine them:
+ >const gdp = pl.DataFrame({
... date: [
... new Date('2016-01-01'),
... new Date('2017-01-01'),
... new Date('2018-01-01'),
... new Date('2019-01-01'),
... ], // note record date: Jan 1st (sorted!)
... gdp: [4164, 4411, 4566, 4696],
... })
>const population = pl.DataFrame({
... date: [
... new Date('2016-05-12'),
... new Date('2017-05-12'),
... new Date('2018-05-12'),
... new Date('2019-05-12'),
... ], // note record date: May 12th (sorted!)
... "population": [82.19, 82.66, 83.12, 83.52],
... })
>population.joinAsof(
... gdp,
... {leftOn:"date", rightOn:"date", strategy:"backward"}
... )
shape: (4, 3)
┌─────────────────────┬────────────┬──────┐
│ date ┆ population ┆ gdp │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ i64 │
╞═════════════════════╪════════════╪══════╡
│ 2016-05-12 00:00:00 ┆ 82.19 ┆ 4164 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2017-05-12 00:00:00 ┆ 82.66 ┆ 4411 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2018-05-12 00:00:00 ┆ 83.12 ┆ 4566 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2019-05-12 00:00:00 ┆ 83.52 ┆ 4696 │
└─────────────────────┴────────────┴──────┘
+
+
+Get the last row of the DataFrame.
+Optional
n: numberReverse the DataFrame.
+Rest
...columns: ExprOrString[]Serializes object to desired format via serde
+Evaluate the query in streaming mode and write to a CSV file.
+.. warning:: +Streaming mode is considered unstable. It may be changed +at any point without it being considered a breaking change.
+This allows streaming results that are larger than RAM to be written to disk.
+File path to which the file should be written.
+Optional
options: SinkCsvOptionsEvaluate the query in streaming mode and write to a Parquet file.
+.. warning:: +Streaming mode is considered unstable. It may be changed +at any point without it being considered a breaking change.
+This allows streaming results that are larger than RAM to be written to disk.
+File path to which the file should be written.
+Optional
options: SinkParquetOptionsOptional
descending: ValueOrArray<boolean>Optional
nulls_last: booleanOptional
maintain_order: booleanOptional
descending?: ValueOrArray<boolean>Optional
maintain_Optional
nulls_Aggregate the columns in the DataFrame to their sum value.
+Get the last n
rows of the DataFrame.
Optional
length: numberDrop duplicate rows from this DataFrame.
+Note that this fails if there is a column of type List
in the DataFrame.
Optional
maintainOrder: booleanOptional
subset: ColumnSelectionsubset to drop duplicates for
+Optional
keep: "first" | "last""first" | "last"
+Optional
keep?: "first" | "last"Optional
maintainOptional
subset?: ColumnSelectionAggregate the columns in the DataFrame to their variance value.
+Add or overwrite column in a DataFrame.
+Add or overwrite multiple columns in a DataFrame.
+Add a column at index 0 that counts the rows.
+A Series represents a single column in a polars DataFrame.
+Get an array with the cumulative product computed at every element.
+Optional
reverse: booleanreverse the operation
+Clip (limit) the values in an array to any value that fits in 64 floating point range. +Only works for the following dtypes: {Int32, Int64, Float32, Float64, UInt32}. +If you want to clip other dtypes, consider writing a when -> then -> otherwise expression
+Minimum value
+Maximum value
+Round underlying floating point data by decimals
digits.
Similar functionality to javascript toFixed
number of decimals to round by.
+Sample from this DataFrame by setting either n
or frac
.
Optional
opts: { Optional
seed?: number | bigintOptional
withOptional
opts: { Optional
seed?: number | bigintOptional
withOptional
n: numberOptional
frac: numberOptional
withReplacement: booleanOptional
seed: number | bigintIndex location of the sorted variant of this Series.
+indexes - Indexes that can be used to sort this array.
+__Quick summary statistics of a series. __
+Series with mixed datatypes will return summary statistics for the datatype of the first value.
+> const seriesNum = pl.Series([1,2,3,4,5])
> series_num.describe()
shape: (6, 2)
┌──────────────┬────────────────────┐
│ statistic ┆ value │
│ --- ┆ --- │
│ str ┆ f64 │
╞══════════════╪════════════════════╡
│ "min" ┆ 1 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "max" ┆ 5 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "null_count" ┆ 0.0 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "mean" ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "std" ┆ 1.5811388300841898 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "count" ┆ 5 │
└──────────────┴────────────────────┘
> series_str = pl.Series(["a", "a", None, "b", "c"])
> series_str.describe()
shape: (3, 2)
┌──────────────┬───────┐
│ statistic ┆ value │
│ --- ┆ --- │
│ str ┆ i64 │
╞══════════════╪═══════╡
│ "unique" ┆ 4 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ "null_count" ┆ 1 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ "count" ┆ 5 │
└──────────────┴───────┘
+
+
+Calculates the n-th discrete difference.
+number of slots to shift
+'ignore' | 'drop'
Exponentially-weighted moving average.
+Expr that evaluates to a float 64 Series.
+Optional
alpha: numberOptional
adjust: booleanOptional
minPeriods: numberOptional
bias: booleanOptional
ignoreNulls: booleanOptional
adjust?: booleanOptional
alpha?: numberOptional
bias?: booleanOptional
ignoreOptional
minExponentially-weighted standard deviation.
+Expr that evaluates to a float 64 Series.
+Optional
alpha: numberOptional
adjust: booleanOptional
minPeriods: numberOptional
bias: booleanOptional
ignoreNulls: booleanOptional
adjust?: booleanOptional
alpha?: numberOptional
bias?: booleanOptional
ignoreOptional
minExponentially-weighted variance.
+Expr that evaluates to a float 64 Series.
+Optional
alpha: numberOptional
adjust: booleanOptional
minPeriods: numberOptional
bias: booleanOptional
ignoreNulls: booleanOptional
adjust?: booleanOptional
alpha?: numberOptional
bias?: booleanOptional
ignoreOptional
minFill null values with a filling strategy.
+Filling Strategy
+Hash the Series
+The hash value is of type UInt64
Optional
k0: number | bigintseed parameter
+Optional
k1: number | bigintseed parameter
+Optional
k2: number | bigintseed parameter
+Optional
k3: number | bigintseed parameter
+Optional
k0?: number | bigintOptional
k1?: number | bigintOptional
k2?: number | bigintOptional
k3?: number | bigintInterpolate intermediate values.
+The interpolation method is linear.
+Optional
method: InterpolationMethodCompute the kurtosis (Fisher or Pearson) of a dataset.
+Kurtosis is the fourth central moment divided by the square of the +variance. If Fisher's definition is used, then 3.0 is subtracted from +the result to give 0.0 for a normal distribution. +If bias is False then the kurtosis is calculated using k statistics to +eliminate bias coming from biased moment estimators
+Optional
bias: booleanOptional
bias?: booleanOptional
fisher?: booleanAssign ranks to data, dealing with ties appropriately.
+Optional
method: RankMethodThe method used to assign ranks to tied elements. +The following methods are available: default is 'average'
+a
.a
.Reinterpret the underlying bits as a signed/unsigned integer.
+This operation is only allowed for 64bit integers. For lower bits integers, +you can safely use that cast operation.
+Optional
signed: booleansigned or unsigned
+Rename this Series.
+new name
+Optional
inSerializes object to desired format via serde
+Check if series is equal with another Series.
+Shift the values by a given period
+the parts that will be empty due to this operation will be filled with fillValue
.
Number of places to shift (may be negative).
+Fill null & undefined values with the result of this expression.
+Compute the sample skewness of a data set.
+For normally distributed data, the skewness should be about zero. For
+unimodal continuous distributions, a skewness value greater than zero means
+that there is more weight in the right tail of the distribution. The
+function skewtest
can be used to determine if the skewness value
+is close enough to zero, statistically speaking.
Optional
bias: booleanIf false, then the calculations are corrected for statistical bias.
+Get dummy/indicator variables.
+Optional
separator: stringOptional
dropFirst: booleanconst s = pl.Series("a", [1, 2, 3])
>>> s.toDummies()
shape: (3, 3)
┌─────┬─────┬─────┐
│ a_1 ┆ a_2 ┆ a_3 │
│ --- ┆ --- ┆ --- │
│ u8 ┆ u8 ┆ u8 │
╞═════╪═════╪═════╡
│ 1 ┆ 0 ┆ 0 │
│ 0 ┆ 1 ┆ 0 │
│ 0 ┆ 0 ┆ 1 │
└─────┴─────┴─────┘
>>> s.toDummies(":", true)
shape: (3, 2)
┌─────┬─────┐
│ a:2 ┆ a:3 │
│ --- ┆ --- │
│ u8 ┆ u8 │
╞═════╪═════╡
│ 0 ┆ 0 │
│ 1 ┆ 0 │
│ 0 ┆ 1 │
└─────┴─────┘
+
+
+Count the unique values in a Series.
+Optional
sort: booleanSort the output by count in descending order.
+If set to False
(default), the order of the output is random.
Optional
parallel: booleanExecute the computation in parallel. +.. note:: +This option should likely not be enabled in a group by context, +as the computation is already parallelized per group.
+Optional
name: stringGive the resulting count column a specific name;
+if normalize
is True defaults to "count", otherwise defaults to "proportion".
Optional
normalize: booleanIf true gives relative frequencies of the unique values
+Apply a rolling max (moving max) over the values in this Series.
+A window of length window_size
will traverse the series. The values that fill this window
+will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
+Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanApply a rolling mean (moving mean) over the values in this Series.
+A window of length window_size
will traverse the series. The values that fill this window
+will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
+Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanCompute a rolling median
+Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanApply a rolling min (moving min) over the values in this Series.
+A window of length window_size
will traverse the series. The values that fill this window
+will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
+Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanCompute a rolling quantile
+Optional
interpolation: InterpolationMethodOptional
windowSize: numberOptional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanOptional
by: stringOptional
closed: ClosedWindowCompute a rolling skew
+Size of the rolling window
+Optional
bias: booleanIf false, then the calculations are corrected for statistical bias.
+Compute a rolling skew
+Compute a rolling std dev
+A window of length window_size
will traverse the array. The values that fill this window
+will (optionally) be multiplied with the weights given by the weight
vector. The resulting
+values will be aggregated to their sum.
Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanOptional
ddof: numberApply a rolling sum (moving sum) over the values in this Series.
+A window of length window_size
will traverse the series. The values that fill this window
+will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
+Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanCompute a rolling variance.
+A window of length window_size
will traverse the series. The values that fill this window
+will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
+Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanOptional
ddof: numberClosedWindow types
+Downsample rules
+Fill null strategies
+Interpolation types
+Join types
+options for lazy operations
+Rank methods
+Const
Const
Calendar date and time type
+