From 9f98a90fa1fe06f31de8bcdfb7893dd78667dc17 Mon Sep 17 00:00:00 2001
From: philss The DataFrame struct and API. Dataframes are two-dimensional tabular data structures similar to a spreadsheet.
-For example, the Iris dataset: This dataframe has 150 rows and five columns. Each column is an put(backend)
Examples
-iex> Explorer.Backend.put(Lib.CustomBackend)
+
diff --git a/Explorer.DataFrame.html b/Explorer.DataFrame.html
index 2cbea44f7..5db57a4ad 100644
--- a/Explorer.DataFrame.html
+++ b/Explorer.DataFrame.html
@@ -113,38 +113,38 @@ iex> Explorer.Backend.put(Lib.CustomBackend)
Explorer.PolarsBackend
-iex> Explorer.Backend.get()
+iex> Explorer.Backend.get()
Lib.CustomBackend
iex> Explorer.Datasets.iris()
-#Explorer.DataFrame<
- Polars[150 x 5]
- sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
- petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
- petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
Explorer.Series
-of the same size (150):iex> df = Explorer.Datasets.iris()
-iex> df["sepal_length"]
-#Explorer.Series<
- Polars[150]
- float [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0, 5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6, 5.3, 5.0, ...]
->
+For example, the Iris dataset:
iex> Explorer.Datasets.iris()
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
+ petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
+ petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
This dataframe has 150 rows and five columns. Each column is an Explorer.Series
+of the same size (150):
iex> df = Explorer.Datasets.iris()
+iex> df["sepal_length"]
+#Explorer.Series<
+ Polars[150]
+ float [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0, 5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6, 5.3, 5.0, ...]
+>
Dataframes can be created from normal Elixir terms. The main way you might do this is
-with the new/1
function. For example:
iex> Explorer.DataFrame.new(a: ["a", "b"], b: [1, 2])
-#Explorer.DataFrame<
- Polars[2 x 2]
- a string ["a", "b"]
- b integer [1, 2]
->
Or with a list of maps:
iex> Explorer.DataFrame.new([%{"col1" => "a", "col2" => 1}, %{"col1" => "b", "col2" => 2}])
-#Explorer.DataFrame<
- Polars[2 x 2]
- col1 string ["a", "b"]
- col2 integer [1, 2]
->
new/1
function. For example:iex> Explorer.DataFrame.new(a: ["a", "b"], b: [1, 2])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ a string ["a", "b"]
+ b integer [1, 2]
+>
Or with a list of maps:
iex> Explorer.DataFrame.new([%{"col1" => "a", "col2" => 1}, %{"col1" => "b", "col2" => 2}])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ col1 string ["a", "b"]
+ col2 integer [1, 2]
+>
Explorer supports reading and writing of:
Adbc
in from_query/3
The convention Explorer uses is to have from_*
and to_*
functions to read and write
to files in the formats above. load_*
and dump_*
versions are also available to read
and write those formats directly in memory.
Files can be fetched from local or remote file system, such as S3, using the following formats:
# path to a file in disk
-Explorer.DataFrame.from_parquet("/path/to/file.parquet")
+Explorer.DataFrame.from_parquet("/path/to/file.parquet")
# path to a URL schema (with optional configuration)
-Explorer.DataFrame.from_parquet("s3://bucket/file.parquet", config: FSS.S3.config_from_system_env())
+Explorer.DataFrame.from_parquet("s3://bucket/file.parquet", config: FSS.S3.config_from_system_env())
# it's possible to configure using keyword lists
-Explorer.DataFrame.from_parquet("s3://bucket/file.parquet", config: [access_key_id: "my-key", secret_access_key: "my-secret"])
+Explorer.DataFrame.from_parquet("s3://bucket/file.parquet", config: [access_key_id: "my-key", secret_access_key: "my-secret"])
# a FSS entry (it already includes its config)
-Explorer.DataFrame.from_parquet(FSS.S3.parse("s3://bucket/file.parquet"))
The :config
option of from_*
functions is only required if the filename is a path
+Explorer.DataFrame.from_parquet(FSS.S3.parse("s3://bucket/file.parquet"))
The :config
option of from_*
functions is only required if the filename is a path
to a remote resource. In case it's a FSS entry, the requirement is that the config is passed
inside the entry struct.
Explorer.DataFrame
also implements the Access
behaviour (also known as the brackets
syntax). This should be familiar for users coming from other language with dataframes
-such as R or Python. For example:
iex> df = Explorer.Datasets.wine()
-iex> df["class"]
-#Explorer.Series<
- Polars[178]
- integer [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
->
Accessing the dataframe with a column name either as a string or an atom, will return -the column. You can also pass an integer representing the column order:
iex> df = Explorer.Datasets.wine()
-iex> df[0]
-#Explorer.Series<
- Polars[178]
- integer [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
->
You can also pass a list, a range, or a regex to return a dataframe matching -the given data type. For example, by passing a list:
iex> df = Explorer.Datasets.wine()
-iex> df[["class", "hue"]]
-#Explorer.DataFrame<
- Polars[178 x 2]
- class integer [1, 1, 1, 1, 1, ...]
- hue float [1.04, 1.05, 1.03, 0.86, 1.04, ...]
->
Or a range for the given positions:
iex> df = Explorer.Datasets.wine()
-iex> df[0..2]
-#Explorer.DataFrame<
- Polars[178 x 3]
- class integer [1, 1, 1, 1, 1, ...]
- alcohol float [14.23, 13.2, 13.16, 14.37, 13.24, ...]
- malic_acid float [1.71, 1.78, 2.36, 1.95, 2.59, ...]
->
Or a regex to keep only columns matching a given pattern:
iex> df = Explorer.Datasets.wine()
-iex> df[~r/(class|hue)/]
-#Explorer.DataFrame<
- Polars[178 x 2]
- class integer [1, 1, 1, 1, 1, ...]
- hue float [1.04, 1.05, 1.03, 0.86, 1.04, ...]
->
Given you can also access a series using its index, you can use -multiple accesses to select a column and row at the same time:
iex> df = Explorer.Datasets.wine()
-iex> df["class"][3]
+such as R or Python. For example:iex> df = Explorer.Datasets.wine()
+iex> df["class"]
+#Explorer.Series<
+ Polars[178]
+ integer [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
+>
Accessing the dataframe with a column name either as a string or an atom, will return
+the column. You can also pass an integer representing the column order:
iex> df = Explorer.Datasets.wine()
+iex> df[0]
+#Explorer.Series<
+ Polars[178]
+ integer [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
+>
You can also pass a list, a range, or a regex to return a dataframe matching
+the given data type. For example, by passing a list:
iex> df = Explorer.Datasets.wine()
+iex> df[["class", "hue"]]
+#Explorer.DataFrame<
+ Polars[178 x 2]
+ class integer [1, 1, 1, 1, 1, ...]
+ hue float [1.04, 1.05, 1.03, 0.86, 1.04, ...]
+>
Or a range for the given positions:
iex> df = Explorer.Datasets.wine()
+iex> df[0..2]
+#Explorer.DataFrame<
+ Polars[178 x 3]
+ class integer [1, 1, 1, 1, 1, ...]
+ alcohol float [14.23, 13.2, 13.16, 14.37, 13.24, ...]
+ malic_acid float [1.71, 1.78, 2.36, 1.95, 2.59, ...]
+>
Or a regex to keep only columns matching a given pattern:
iex> df = Explorer.Datasets.wine()
+iex> df[~r/(class|hue)/]
+#Explorer.DataFrame<
+ Polars[178 x 2]
+ class integer [1, 1, 1, 1, 1, ...]
+ hue float [1.04, 1.05, 1.03, 0.86, 1.04, ...]
+>
Given you can also access a series using its index, you can use
+multiple accesses to select a column and row at the same time:
iex> df = Explorer.Datasets.wine()
+iex> df["class"][3]
1
@@ -1603,15 +1603,15 @@
Series can be given either as keyword lists or maps -where the keys are the name and the values are series:
iex> Explorer.DataFrame.new(%{
-...> floats: Explorer.Series.from_list([1.0, 2.0]),
-...> ints: Explorer.Series.from_list([1, nil])
-...> })
-#Explorer.DataFrame<
- Polars[2 x 2]
- floats float [1.0, 2.0]
- ints integer [1, nil]
->
iex> Explorer.DataFrame.new(%{
+...> floats: Explorer.Series.from_list([1.0, 2.0]),
+...> ints: Explorer.Series.from_list([1, nil])
+...> })
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ floats float [1.0, 2.0]
+ ints integer [1, nil]
+>
To create dataframe from tensors, you can pass a matrix as argument. Each matrix column becomes a dataframe column with names x1, x2, x3, -etc:
iex> Explorer.DataFrame.new(Nx.tensor([
-...> [1, 2, 3],
-...> [4, 5, 6]
-...> ]))
-#Explorer.DataFrame<
- Polars[2 x 3]
- x1 integer [1, 4]
- x2 integer [2, 5]
- x3 integer [3, 6]
->
Explorer expects tensors to have certain types, so you may need to cast
-the data accordingly. See Explorer.Series.from_tensor/2
for more info.
You can also pass a keyword list or maps of vectors (rank 1 tensors):
iex> Explorer.DataFrame.new(%{
-...> floats: Nx.tensor([1.0, 2.0], type: :f64),
-...> ints: Nx.tensor([3, 4])
-...> })
-#Explorer.DataFrame<
- Polars[2 x 2]
- floats float [1.0, 2.0]
- ints integer [3, 4]
->
Use dtypes to force a particular representation:
iex> Explorer.DataFrame.new([
-...> floats: Nx.tensor([1.0, 2.0], type: :f64),
-...> times: Nx.tensor([3_000, 4_000])
-...> ], dtypes: [times: :time])
-#Explorer.DataFrame<
- Polars[2 x 2]
- floats float [1.0, 2.0]
- times time [00:00:00.000003, 00:00:00.000004]
->
iex> Explorer.DataFrame.new(Nx.tensor([
+...> [1, 2, 3],
+...> [4, 5, 6]
+...> ]))
+#Explorer.DataFrame<
+ Polars[2 x 3]
+ x1 integer [1, 4]
+ x2 integer [2, 5]
+ x3 integer [3, 6]
+>
Explorer expects tensors to have certain types, so you may need to cast
+the data accordingly. See Explorer.Series.from_tensor/2
for more info.
You can also pass a keyword list or maps of vectors (rank 1 tensors):
iex> Explorer.DataFrame.new(%{
+...> floats: Nx.tensor([1.0, 2.0], type: :f64),
+...> ints: Nx.tensor([3, 4])
+...> })
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ floats float [1.0, 2.0]
+ ints integer [3, 4]
+>
Use dtypes to force a particular representation:
iex> Explorer.DataFrame.new([
+...> floats: Nx.tensor([1.0, 2.0], type: :f64),
+...> times: Nx.tensor([3_000, 4_000])
+...> ], dtypes: [times: :time])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ floats float [1.0, 2.0]
+ times time [00:00:00.000003, 00:00:00.000004]
+>
Tabular data can be either columnar or row-based. -Let's start with column data:
iex> Explorer.DataFrame.new(%{floats: [1.0, 2.0], ints: [1, nil]})
-#Explorer.DataFrame<
- Polars[2 x 2]
- floats float [1.0, 2.0]
- ints integer [1, nil]
->
-
-iex> Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
-#Explorer.DataFrame<
- Polars[2 x 2]
- floats float [1.0, 2.0]
- ints integer [1, nil]
->
-
-iex> Explorer.DataFrame.new([floats: [1.0, 2.0], ints: [1, nil], binaries: [<<239, 191, 19>>, nil]], dtypes: [{:binaries, :binary}])
-#Explorer.DataFrame<
- Polars[2 x 3]
- floats float [1.0, 2.0]
- ints integer [1, nil]
- binaries binary [<<239, 191, 19>>, nil]
->
-
-iex> Explorer.DataFrame.new(%{floats: [1.0, 2.0], ints: [1, "wrong"]})
-** (ArgumentError) cannot create series "ints": the value "wrong" does not match the inferred series dtype :integer
From row data:
iex> rows = [%{id: 1, name: "JosƩ"}, %{id: 2, name: "Christopher"}, %{id: 3, name: "Cristine"}]
-iex> Explorer.DataFrame.new(rows)
-#Explorer.DataFrame<
- Polars[3 x 2]
- id integer [1, 2, 3]
- name string ["JosƩ", "Christopher", "Cristine"]
->
-
-iex> rows = [[id: 1, name: "JosƩ"], [id: 2, name: "Christopher"], [id: 3, name: "Cristine"]]
-iex> Explorer.DataFrame.new(rows)
-#Explorer.DataFrame<
- Polars[3 x 2]
- id integer [1, 2, 3]
- name string ["JosƩ", "Christopher", "Cristine"]
->
+Let's start with column data:iex> Explorer.DataFrame.new(%{floats: [1.0, 2.0], ints: [1, nil]})
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ floats float [1.0, 2.0]
+ ints integer [1, nil]
+>
+
+iex> Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ floats float [1.0, 2.0]
+ ints integer [1, nil]
+>
+
+iex> Explorer.DataFrame.new([floats: [1.0, 2.0], ints: [1, nil], binaries: [<<239, 191, 19>>, nil]], dtypes: [{:binaries, :binary}])
+#Explorer.DataFrame<
+ Polars[2 x 3]
+ floats float [1.0, 2.0]
+ ints integer [1, nil]
+ binaries binary [<<239, 191, 19>>, nil]
+>
+
+iex> Explorer.DataFrame.new(%{floats: [1.0, 2.0], ints: [1, "wrong"]})
+** (ArgumentError) cannot create series "ints": the value "wrong" does not match the inferred series dtype :integer
From row data:
iex> rows = [%{id: 1, name: "JosƩ"}, %{id: 2, name: "Christopher"}, %{id: 3, name: "Cristine"}]
+iex> Explorer.DataFrame.new(rows)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ id integer [1, 2, 3]
+ name string ["JosƩ", "Christopher", "Cristine"]
+>
+
+iex> rows = [[id: 1, name: "JosƩ"], [id: 2, name: "Christopher"], [id: 3, name: "Cristine"]]
+iex> Explorer.DataFrame.new(rows)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ id integer [1, 2, 3]
+ name string ["JosƩ", "Christopher", "Cristine"]
+>
iex> df = Explorer.DataFrame.new(ints: [1, nil], floats: [1.0, 2.0])
-iex> Explorer.DataFrame.to_columns(df)
-%{"floats" => [1.0, 2.0], "ints" => [1, nil]}
+iex> df = Explorer.DataFrame.new(ints: [1, nil], floats: [1.0, 2.0])
+iex> Explorer.DataFrame.to_columns(df)
+%{"floats" => [1.0, 2.0], "ints" => [1, nil]}
-iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
-iex> Explorer.DataFrame.to_columns(df, atom_keys: true)
-%{floats: [1.0, 2.0], ints: [1, nil]}
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
+iex> Explorer.DataFrame.to_columns(df, atom_keys: true)
+%{floats: [1.0, 2.0], ints: [1, nil]}
iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
-iex> Explorer.DataFrame.to_rows(df)
-[%{"floats" => 1.0, "ints" => 1}, %{"floats" => 2.0 ,"ints" => nil}]
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
+iex> Explorer.DataFrame.to_rows(df)
+[%{"floats" => 1.0, "ints" => 1}, %{"floats" => 2.0 ,"ints" => nil}]
-iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
-iex> Explorer.DataFrame.to_rows(df, atom_keys: true)
-[%{floats: 1.0, ints: 1}, %{floats: 2.0, ints: nil}]
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
+iex> Explorer.DataFrame.to_rows(df, atom_keys: true)
+[%{floats: 1.0, ints: 1}, %{floats: 2.0, ints: nil}]
iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
-iex> Explorer.DataFrame.to_rows_stream(df) |> Enum.map(& &1)
-[%{"floats" => 1.0, "ints" => 1}, %{"floats" => 2.0 ,"ints" => nil}]
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
+iex> Explorer.DataFrame.to_rows_stream(df) |> Enum.map(& &1)
+[%{"floats" => 1.0, "ints" => 1}, %{"floats" => 2.0 ,"ints" => nil}]
-iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
-iex> Explorer.DataFrame.to_rows_stream(df, atom_keys: true) |> Enum.map(& &1)
-[%{floats: 1.0, ints: 1}, %{floats: 2.0, ints: nil}]
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
+iex> Explorer.DataFrame.to_rows_stream(df, atom_keys: true) |> Enum.map(& &1)
+[%{floats: 1.0, ints: 1}, %{floats: 2.0, ints: nil}]
iex> df = Explorer.DataFrame.new(ints: [1, nil], floats: [1.0, 2.0])
-iex> map = Explorer.DataFrame.to_series(df)
-iex> Explorer.Series.to_list(map["floats"])
-[1.0, 2.0]
-iex> Explorer.Series.to_list(map["ints"])
-[1, nil]
+iex> df = Explorer.DataFrame.new(ints: [1, nil], floats: [1.0, 2.0])
+iex> map = Explorer.DataFrame.to_series(df)
+iex> Explorer.Series.to_list(map["floats"])
+[1.0, 2.0]
+iex> Explorer.Series.to_list(map["ints"])
+[1, nil]
A single column name will sort ascending by that column:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
-iex> Explorer.DataFrame.arrange(df, a)
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["a", "b", "c"]
- b integer [3, 1, 2]
->
You can also sort descending:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
-iex> Explorer.DataFrame.arrange(df, desc: a)
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["c", "b", "a"]
- b integer [2, 1, 3]
->
Sorting by more than one column sorts them in the order they are entered:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.arrange(df, asc: total, desc: country)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- year integer [2010, 2010, 2011, 2011, 2012, ...]
- country string ["NIUE", "TUVALU", "TUVALU", "NIUE", "NIUE", ...]
- total integer [1, 2, 2, 2, 2, ...]
- solid_fuel integer [0, 0, 0, 0, 0, ...]
- liquid_fuel integer [1, 2, 2, 2, 2, ...]
- gas_fuel integer [0, 0, 0, 0, 0, ...]
- cement integer [0, 0, 0, 0, 0, ...]
- gas_flaring integer [0, 0, 0, 0, 0, ...]
- per_capita float [0.52, 0.0, 0.0, 1.04, 1.04, ...]
- bunker_fuels integer [0, 0, 0, 0, 0, ...]
->
A single column name will sort ascending by that column:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
+iex> Explorer.DataFrame.arrange(df, a)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["a", "b", "c"]
+ b integer [3, 1, 2]
+>
You can also sort descending:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
+iex> Explorer.DataFrame.arrange(df, desc: a)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["c", "b", "a"]
+ b integer [2, 1, 3]
+>
Sorting by more than one column sorts them in the order they are entered:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.arrange(df, asc: total, desc: country)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ year integer [2010, 2010, 2011, 2011, 2012, ...]
+ country string ["NIUE", "TUVALU", "TUVALU", "NIUE", "NIUE", ...]
+ total integer [1, 2, 2, 2, 2, ...]
+ solid_fuel integer [0, 0, 0, 0, 0, ...]
+ liquid_fuel integer [1, 2, 2, 2, 2, ...]
+ gas_fuel integer [0, 0, 0, 0, 0, ...]
+ cement integer [0, 0, 0, 0, 0, ...]
+ gas_flaring integer [0, 0, 0, 0, 0, ...]
+ per_capita float [0.52, 0.0, 0.0, 1.04, 1.04, ...]
+ bunker_fuels integer [0, 0, 0, 0, 0, ...]
+>
Here is an example using the Iris dataset. We group by species and then we try to sort the dataframe by species and petal length, but only "petal length" is taken into account -because "species" is a group.
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.arrange(grouped, desc: species, asc: sepal_width)
-#Explorer.DataFrame<
- Polars[150 x 5]
- Groups: ["species"]
- sepal_length float [4.5, 4.4, 4.9, 4.8, 4.3, ...]
- sepal_width float [2.3, 2.9, 3.0, 3.0, 3.0, ...]
- petal_length float [1.3, 1.4, 1.4, 1.4, 1.1, ...]
- petal_width float [0.3, 0.2, 0.2, 0.1, 0.1, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+because "species" is a group.iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.arrange(grouped, desc: species, asc: sepal_width)
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ Groups: ["species"]
+ sepal_length float [4.5, 4.4, 4.9, 4.8, 4.3, ...]
+ sepal_width float [2.3, 2.9, 3.0, 3.0, 3.0, ...]
+ petal_length float [1.3, 1.4, 1.4, 1.4, 1.1, ...]
+ petal_width float [0.3, 0.2, 0.2, 0.1, 0.1, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
A single column name will sort ascending by that column:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
-iex> Explorer.DataFrame.arrange_with(df, &(&1["a"]))
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["a", "b", "c"]
- b integer [3, 1, 2]
->
You can also sort descending:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
-iex> Explorer.DataFrame.arrange_with(df, &[desc: &1["a"]])
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["c", "b", "a"]
- b integer [2, 1, 3]
->
Sorting by more than one column sorts them in the order they are entered:
iex> df = Explorer.DataFrame.new(a: [3, 1, 3], b: [2, 1, 3])
-iex> Explorer.DataFrame.arrange_with(df, &[desc: &1["a"], asc: &1["b"]])
-#Explorer.DataFrame<
- Polars[3 x 2]
- a integer [3, 3, 1]
- b integer [2, 3, 1]
->
A single column name will sort ascending by that column:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
+iex> Explorer.DataFrame.arrange_with(df, &(&1["a"]))
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["a", "b", "c"]
+ b integer [3, 1, 2]
+>
You can also sort descending:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
+iex> Explorer.DataFrame.arrange_with(df, &[desc: &1["a"]])
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["c", "b", "a"]
+ b integer [2, 1, 3]
+>
Sorting by more than one column sorts them in the order they are entered:
iex> df = Explorer.DataFrame.new(a: [3, 1, 3], b: [2, 1, 3])
+iex> Explorer.DataFrame.arrange_with(df, &[desc: &1["a"], asc: &1["b"]])
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a integer [3, 3, 1]
+ b integer [2, 3, 1]
+>
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.arrange_with(grouped, &[desc: &1["species"], asc: &1["sepal_width"]])
-#Explorer.DataFrame<
- Polars[150 x 5]
- Groups: ["species"]
- sepal_length float [4.5, 4.4, 4.9, 4.8, 4.3, ...]
- sepal_width float [2.3, 2.9, 3.0, 3.0, 3.0, ...]
- petal_length float [1.3, 1.4, 1.4, 1.4, 1.1, ...]
- petal_width float [0.3, 0.2, 0.2, 0.1, 0.1, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.arrange_with(grouped, &[desc: &1["species"], asc: &1["sepal_width"]])
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ Groups: ["species"]
+ sepal_length float [4.5, 4.4, 4.9, 4.8, 4.3, ...]
+ sepal_width float [2.3, 2.9, 3.0, 3.0, 3.0, ...]
+ petal_length float [1.3, 1.4, 1.4, 1.4, 1.1, ...]
+ petal_width float [0.3, 0.2, 0.2, 0.1, 0.1, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
iex> df = Explorer.DataFrame.new(a: ["d", nil, "f"], b: [1, 2, 3], c: ["a", "b", "c"])
-iex> Explorer.DataFrame.describe(df)
-#Explorer.DataFrame<
- Polars[9 x 4]
- describe string ["count", "null_count", "mean", "std", "min", ...]
- a string ["3", "1", nil, nil, "d", ...]
- b float [3.0, 0.0, 2.0, 1.0, 1.0, ...]
- c string ["3", "0", nil, nil, "a", ...]
->
-
-iex> df = Explorer.DataFrame.new(a: ["d", nil, "f"], b: [1, 2, 3], c: ["a", "b", "c"])
-iex> Explorer.DataFrame.describe(df, percentiles: [0.3, 0.5, 0.8])
-#Explorer.DataFrame<
- Polars[9 x 4]
- describe string ["count", "null_count", "mean", "std", "min", ...]
- a string ["3", "1", nil, nil, "d", ...]
- b float [3.0, 0.0, 2.0, 1.0, 1.0, ...]
- c string ["3", "0", nil, nil, "a", ...]
->
+iex> df = Explorer.DataFrame.new(a: ["d", nil, "f"], b: [1, 2, 3], c: ["a", "b", "c"])
+iex> Explorer.DataFrame.describe(df)
+#Explorer.DataFrame<
+ Polars[9 x 4]
+ describe string ["count", "null_count", "mean", "std", "min", ...]
+ a string ["3", "1", nil, nil, "d", ...]
+ b float [3.0, 0.0, 2.0, 1.0, 1.0, ...]
+ c string ["3", "0", nil, nil, "a", ...]
+>
+
+iex> df = Explorer.DataFrame.new(a: ["d", nil, "f"], b: [1, 2, 3], c: ["a", "b", "c"])
+iex> Explorer.DataFrame.describe(df, percentiles: [0.3, 0.5, 0.8])
+#Explorer.DataFrame<
+ Polars[9 x 4]
+ describe string ["count", "null_count", "mean", "std", "min", ...]
+ a string ["3", "1", nil, nil, "d", ...]
+ b float [3.0, 0.0, 2.0, 1.0, 1.0, ...]
+ c string ["3", "0", nil, nil, "a", ...]
+>
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.discard(df, ["b"])
-#Explorer.DataFrame<
- Polars[3 x 1]
- a string ["a", "b", "c"]
->
-
-iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
-iex> Explorer.DataFrame.discard(df, ["a", "b"])
-#Explorer.DataFrame<
- Polars[3 x 1]
- c integer [4, 5, 6]
->
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.discard(df, ["b"])
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a string ["a", "b", "c"]
+>
+
+iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
+iex> Explorer.DataFrame.discard(df, ["a", "b"])
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ c integer [4, 5, 6]
+>
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
You cannot discard grouped columns. You need to ungroup before removing them:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.discard(grouped, ["species"])
-#Explorer.DataFrame<
- Polars[150 x 5]
- Groups: ["species"]
- sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
- petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
- petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+You cannot discard grouped columns. You need to ungroup before removing them:
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.discard(grouped, ["species"])
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
+ petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
+ petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
By default will return unique values of the requested columns:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.distinct(df, ["year", "country"])
-#Explorer.DataFrame<
- Polars[1094 x 2]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
->
If keep_all
is set to true
, then the first value of each column not in the requested
-columns will be returned:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.distinct(df, ["year", "country"], keep_all: true)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
A callback on the dataframe's names can be passed instead of a list (like select/2
):
iex> df = Explorer.DataFrame.new(x1: [1, 3, 3], x2: ["a", "c", "c"], y1: [1, 2, 3])
-iex> Explorer.DataFrame.distinct(df, &String.starts_with?(&1, "x"))
-#Explorer.DataFrame<
- Polars[2 x 2]
- x1 integer [1, 3]
- x2 string ["a", "c"]
->
If the dataframe has groups, then the columns of each group will be added to the distinct columns:
iex> df = Explorer.DataFrame.new(x1: [1, 3, 3], x2: ["a", "c", "c"], y1: [1, 2, 3])
-iex> df = Explorer.DataFrame.group_by(df, "x1")
-iex> Explorer.DataFrame.distinct(df, ["x2"])
-#Explorer.DataFrame<
- Polars[2 x 2]
- Groups: ["x1"]
- x1 integer [1, 3]
- x2 string ["a", "c"]
->
+By default will return unique values of the requested columns:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.distinct(df, ["year", "country"])
+#Explorer.DataFrame<
+ Polars[1094 x 2]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+>
If keep_all
is set to true
, then the first value of each column not in the requested
+columns will be returned:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.distinct(df, ["year", "country"], keep_all: true)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
A callback on the dataframe's names can be passed instead of a list (like select/2
):
iex> df = Explorer.DataFrame.new(x1: [1, 3, 3], x2: ["a", "c", "c"], y1: [1, 2, 3])
+iex> Explorer.DataFrame.distinct(df, &String.starts_with?(&1, "x"))
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ x1 integer [1, 3]
+ x2 string ["a", "c"]
+>
If the dataframe has groups, then the columns of each group will be added to the distinct columns:
iex> df = Explorer.DataFrame.new(x1: [1, 3, 3], x2: ["a", "c", "c"], y1: [1, 2, 3])
+iex> df = Explorer.DataFrame.group_by(df, "x1")
+iex> Explorer.DataFrame.distinct(df, ["x2"])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ Groups: ["x1"]
+ x1 integer [1, 3]
+ x2 string ["a", "c"]
+>
To drop nils on all columns:
iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3])
-iex> Explorer.DataFrame.drop_nil(df)
-#Explorer.DataFrame<
- Polars[1 x 2]
- a integer [1]
- b integer [1]
->
To drop nils on a single column:
iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3])
-iex> Explorer.DataFrame.drop_nil(df, :a)
-#Explorer.DataFrame<
- Polars[2 x 2]
- a integer [1, 2]
- b integer [1, nil]
->
To drop some columns:
iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3], c: [nil, 5, 6])
-iex> Explorer.DataFrame.drop_nil(df, [:a, :c])
-#Explorer.DataFrame<
- Polars[1 x 3]
- a integer [2]
- b integer [nil]
- c integer [5]
->
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
To drop nils on all columns:
iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3])
+iex> Explorer.DataFrame.drop_nil(df)
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ a integer [1]
+ b integer [1]
+>
To drop nils on a single column:
iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3])
+iex> Explorer.DataFrame.drop_nil(df, :a)
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ a integer [1, 2]
+ b integer [1, nil]
+>
To drop some columns:
iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3], c: [nil, 5, 6])
+iex> Explorer.DataFrame.drop_nil(df, [:a, :c])
+#Explorer.DataFrame<
+ Polars[1 x 3]
+ a integer [2]
+ b integer [nil]
+ c integer [5]
+>
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
To mark a single column as dummy:
iex> df = Explorer.DataFrame.new(col_x: ["a", "b", "a", "c"], col_y: ["b", "a", "b", "d"])
-iex> Explorer.DataFrame.dummies(df, "col_x")
-#Explorer.DataFrame<
- Polars[4 x 3]
- col_x_a integer [1, 0, 1, 0]
- col_x_b integer [0, 1, 0, 0]
- col_x_c integer [0, 0, 0, 1]
->
Or multiple columns:
iex> df = Explorer.DataFrame.new(col_x: ["a", "b", "a", "c"], col_y: ["b", "a", "b", "d"])
-iex> Explorer.DataFrame.dummies(df, ["col_x", "col_y"])
-#Explorer.DataFrame<
- Polars[4 x 6]
- col_x_a integer [1, 0, 1, 0]
- col_x_b integer [0, 1, 0, 0]
- col_x_c integer [0, 0, 0, 1]
- col_y_b integer [1, 0, 1, 0]
- col_y_a integer [0, 1, 0, 0]
- col_y_d integer [0, 0, 0, 1]
->
Or all string columns:
iex> df = Explorer.DataFrame.new(num: [1, 2, 3, 4], col_y: ["b", "a", "b", "d"])
-iex> Explorer.DataFrame.dummies(df, fn _name, type -> type == :string end)
-#Explorer.DataFrame<
- Polars[4 x 3]
- col_y_b integer [1, 0, 1, 0]
- col_y_a integer [0, 1, 0, 0]
- col_y_d integer [0, 0, 0, 1]
->
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
To mark a single column as dummy:
iex> df = Explorer.DataFrame.new(col_x: ["a", "b", "a", "c"], col_y: ["b", "a", "b", "d"])
+iex> Explorer.DataFrame.dummies(df, "col_x")
+#Explorer.DataFrame<
+ Polars[4 x 3]
+ col_x_a integer [1, 0, 1, 0]
+ col_x_b integer [0, 1, 0, 0]
+ col_x_c integer [0, 0, 0, 1]
+>
Or multiple columns:
iex> df = Explorer.DataFrame.new(col_x: ["a", "b", "a", "c"], col_y: ["b", "a", "b", "d"])
+iex> Explorer.DataFrame.dummies(df, ["col_x", "col_y"])
+#Explorer.DataFrame<
+ Polars[4 x 6]
+ col_x_a integer [1, 0, 1, 0]
+ col_x_b integer [0, 1, 0, 0]
+ col_x_c integer [0, 0, 0, 1]
+ col_y_b integer [1, 0, 1, 0]
+ col_y_a integer [0, 1, 0, 0]
+ col_y_d integer [0, 0, 0, 1]
+>
Or all string columns:
iex> df = Explorer.DataFrame.new(num: [1, 2, 3, 4], col_y: ["b", "a", "b", "d"])
+iex> Explorer.DataFrame.dummies(df, fn _name, type -> type == :string end)
+#Explorer.DataFrame<
+ Polars[4 x 3]
+ col_y_b integer [1, 0, 1, 0]
+ col_y_a integer [0, 1, 0, 0]
+ col_y_d integer [0, 0, 0, 1]
+>
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter(df, col2 > 2)
-#Explorer.DataFrame<
- Polars[1 x 2]
- col1 string ["c"]
- col2 integer [3]
->
-
-iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter(df, col1 == "b")
-#Explorer.DataFrame<
- Polars[1 x 2]
- col1 string ["b"]
- col2 integer [2]
->
-
-iex> df = Explorer.DataFrame.new(col1: [5, 4, 3], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter(df, [col1 > 3, col2 < 3])
-#Explorer.DataFrame<
- Polars[2 x 2]
- col1 integer [5, 4]
- col2 integer [1, 2]
->
Returning a non-boolean expression errors:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter(df, cumulative_max(col2))
-** (ArgumentError) expecting the function to return a boolean LazySeries, but instead it returned a LazySeries of type :integer
Which can be addressed by converting it to boolean:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter(df, cumulative_max(col2) == 1)
-#Explorer.DataFrame<
- Polars[1 x 2]
- col1 string ["a"]
- col2 integer [1]
->
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter(df, col2 > 2)
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ col1 string ["c"]
+ col2 integer [3]
+>
+
+iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter(df, col1 == "b")
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ col1 string ["b"]
+ col2 integer [2]
+>
+
+iex> df = Explorer.DataFrame.new(col1: [5, 4, 3], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter(df, [col1 > 3, col2 < 3])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ col1 integer [5, 4]
+ col2 integer [1, 2]
+>
Returning a non-boolean expression errors:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter(df, cumulative_max(col2))
+** (ArgumentError) expecting the function to return a boolean LazySeries, but instead it returned a LazySeries of type :integer
Which can be addressed by converting it to boolean:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter(df, cumulative_max(col2) == 1)
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ col1 string ["a"]
+ col2 integer [1]
+>
In a grouped dataframe, the aggregation is calculated within each group.
In the following example we select the flowers of the Iris dataset that have the "petal length" -above the average of each species group.
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.filter(grouped, petal_length > mean(petal_length))
-#Explorer.DataFrame<
- Polars[79 x 5]
- Groups: ["species"]
- sepal_length float [4.6, 5.4, 5.0, 4.9, 5.4, ...]
- sepal_width float [3.1, 3.9, 3.4, 3.1, 3.7, ...]
- petal_length float [1.5, 1.7, 1.5, 1.5, 1.5, ...]
- petal_width float [0.2, 0.4, 0.2, 0.1, 0.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+above the average of each species group.iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.filter(grouped, petal_length > mean(petal_length))
+#Explorer.DataFrame<
+ Polars[79 x 5]
+ Groups: ["species"]
+ sepal_length float [4.6, 5.4, 5.0, 4.9, 5.4, ...]
+ sepal_width float [3.1, 3.9, 3.4, 3.1, 3.7, ...]
+ petal_length float [1.5, 1.7, 1.5, 1.5, 1.5, ...]
+ petal_width float [0.2, 0.4, 0.2, 0.1, 0.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter_with(df, &Explorer.Series.greater(&1["col2"], 2))
-#Explorer.DataFrame<
- Polars[1 x 2]
- col1 string ["c"]
- col2 integer [3]
->
-
-iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.equal(df["col1"], "b") end)
-#Explorer.DataFrame<
- Polars[1 x 2]
- col1 string ["b"]
- col2 integer [2]
->
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter_with(df, &Explorer.Series.greater(&1["col2"], 2))
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ col1 string ["c"]
+ col2 integer [3]
+>
+
+iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.equal(df["col1"], "b") end)
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ col1 string ["b"]
+ col2 integer [2]
+>
In a grouped dataframe, the aggregation is calculated within each group.
In the following example we select the flowers of the Iris dataset that have the "petal length" -above the average of each species group.
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.filter_with(grouped, &Explorer.Series.greater(&1["petal_length"], Explorer.Series.mean(&1["petal_length"])))
-#Explorer.DataFrame<
- Polars[79 x 5]
- Groups: ["species"]
- sepal_length float [4.6, 5.4, 5.0, 4.9, 5.4, ...]
- sepal_width float [3.1, 3.9, 3.4, 3.1, 3.7, ...]
- petal_length float [1.5, 1.7, 1.5, 1.5, 1.5, ...]
- petal_width float [0.2, 0.4, 0.2, 0.1, 0.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+above the average of each species group.iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.filter_with(grouped, &Explorer.Series.greater(&1["petal_length"], Explorer.Series.mean(&1["petal_length"])))
+#Explorer.DataFrame<
+ Polars[79 x 5]
+ Groups: ["species"]
+ sepal_length float [4.6, 5.4, 5.0, 4.9, 5.4, ...]
+ sepal_width float [3.1, 3.9, 3.4, 3.1, 3.7, ...]
+ petal_length float [1.5, 1.7, 1.5, 1.5, 1.5, ...]
+ petal_width float [0.2, 0.4, 0.2, 0.1, 0.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
iex> df = Explorer.DataFrame.new(a: ["a", "a", "b"], b: [1, 1, nil])
-iex> Explorer.DataFrame.frequencies(df, [:a, :b])
-#Explorer.DataFrame<
- Polars[2 x 3]
- a string ["a", "b"]
- b integer [1, nil]
- counts integer [2, 1]
->
+iex> df = Explorer.DataFrame.new(a: ["a", "a", "b"], b: [1, 1, nil])
+iex> Explorer.DataFrame.frequencies(df, [:a, :b])
+#Explorer.DataFrame<
+ Polars[2 x 3]
+ a string ["a", "b"]
+ b integer [1, nil]
+ counts integer [2, 1]
+>
You can group by a single variable:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.group_by(df, "country")
-#Explorer.DataFrame<
- Polars[1094 x 10]
- Groups: ["country"]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Or you can group by multiple columns in a given list:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.group_by(df, ["country", "year"])
-#Explorer.DataFrame<
- Polars[1094 x 10]
- Groups: ["country", "year"]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Or by a range:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.group_by(df, 0..1)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- Groups: ["year", "country"]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Regexes and functions are also accepted in column names, as in select/2
.
You can group by a single variable:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.group_by(df, "country")
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ Groups: ["country"]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Or you can group by multiple columns in a given list:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.group_by(df, ["country", "year"])
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ Groups: ["country", "year"]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Or by a range:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.group_by(df, 0..1)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ Groups: ["year", "country"]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Regexes and functions are also accepted in column names, as in select/2
.
This function must only be used when you need to select rows based on external values that are not available to the dataframe. For example, -you can pass a list:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.mask(df, [false, true, false])
-#Explorer.DataFrame<
- Polars[1 x 2]
- col1 string ["b"]
- col2 integer [2]
->
You must avoid using masks when the masks themselves are computed from -other columns. For example, DO NOT do this:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.mask(df, Explorer.Series.greater(df["col2"], 1))
-#Explorer.DataFrame<
- Polars[2 x 2]
- col1 string ["b", "c"]
- col2 integer [2, 3]
->
Instead, do this:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.greater(df["col2"], 1) end)
-#Explorer.DataFrame<
- Polars[2 x 2]
- col1 string ["b", "c"]
- col2 integer [2, 3]
->
The filter_with/2
version is much more efficient because it doesn't need
+you can pass a list:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.mask(df, [false, true, false])
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ col1 string ["b"]
+ col2 integer [2]
+>
You must avoid using masks when the masks themselves are computed from +other columns. For example, DO NOT do this:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.mask(df, Explorer.Series.greater(df["col2"], 1))
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ col1 string ["b", "c"]
+ col2 integer [2, 3]
+>
Instead, do this:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.greater(df["col2"], 1) end)
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ col1 string ["b", "c"]
+ col2 integer [2, 3]
+>
The filter_with/2
version is much more efficient because it doesn't need
to create intermediate series representations to apply the mask.
Mutations are useful to add or modify columns in your dataframe:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate(df, c: b + 1)
-#Explorer.DataFrame<
- Polars[3 x 3]
- a string ["a", "b", "c"]
- b integer [1, 2, 3]
- c integer [2, 3, 4]
->
It's also possible to overwrite existing columns:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate(df, a: b * 2)
-#Explorer.DataFrame<
- Polars[3 x 2]
- a integer [2, 4, 6]
- b integer [1, 2, 3]
->
Scalar values are repeated to fill the series:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate(df, a: 4)
-#Explorer.DataFrame<
- Polars[3 x 2]
- a integer [4, 4, 4]
- b integer [1, 2, 3]
->
It's also possible to use functions from the Series module, like Explorer.Series.window_sum/3
:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> Explorer.DataFrame.mutate(df, b: window_sum(a, 2))
-#Explorer.DataFrame<
- Polars[3 x 2]
- a integer [1, 2, 3]
- b integer [1, 3, 5]
->
Alternatively, all of the above works with a map instead of a keyword list:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate(df, %{"c" => cast(b, :float)})
-#Explorer.DataFrame<
- Polars[3 x 3]
- a string ["a", "b", "c"]
- b integer [1, 2, 3]
- c float [1.0, 2.0, 3.0]
->
Mutations are useful to add or modify columns in your dataframe:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate(df, c: b + 1)
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ a string ["a", "b", "c"]
+ b integer [1, 2, 3]
+ c integer [2, 3, 4]
+>
It's also possible to overwrite existing columns:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate(df, a: b * 2)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a integer [2, 4, 6]
+ b integer [1, 2, 3]
+>
Scalar values are repeated to fill the series:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate(df, a: 4)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a integer [4, 4, 4]
+ b integer [1, 2, 3]
+>
It's also possible to use functions from the Series module, like Explorer.Series.window_sum/3
:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> Explorer.DataFrame.mutate(df, b: window_sum(a, 2))
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a integer [1, 2, 3]
+ b integer [1, 3, 5]
+>
Alternatively, all of the above works with a map instead of a keyword list:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate(df, %{"c" => cast(b, :float)})
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ a string ["a", "b", "c"]
+ b integer [1, 2, 3]
+ c float [1.0, 2.0, 3.0]
+>
summarise/2
,
but repeating the results for each member in the group.
For example, if we want to count how many elements of a given group, we can add a new
-column with that aggregation:iex> df = Explorer.DataFrame.new(id: ["a", "a", "b"], b: [1, 2, 3])
-iex> grouped = Explorer.DataFrame.group_by(df, :id)
-iex> Explorer.DataFrame.mutate(grouped, count: count(b))
-#Explorer.DataFrame<
- Polars[3 x 3]
- Groups: ["id"]
- id string ["a", "a", "b"]
- b integer [1, 2, 3]
- count integer [2, 2, 1]
->
In case we want to get the average size of the petal length from the Iris dataset, we can:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.mutate(grouped, petal_length_avg: mean(petal_length))
-#Explorer.DataFrame<
- Polars[150 x 6]
- Groups: ["species"]
- sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
- petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
- petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
- petal_length_avg float [1.4640000000000004, 1.4640000000000004, 1.4640000000000004, 1.4640000000000004, 1.4640000000000004, ...]
->
+column with that aggregation:iex> df = Explorer.DataFrame.new(id: ["a", "a", "b"], b: [1, 2, 3])
+iex> grouped = Explorer.DataFrame.group_by(df, :id)
+iex> Explorer.DataFrame.mutate(grouped, count: count(b))
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ Groups: ["id"]
+ id string ["a", "a", "b"]
+ b integer [1, 2, 3]
+ count integer [2, 2, 1]
+>
In case we want to get the average size of the petal length from the Iris dataset, we can:
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.mutate(grouped, petal_length_avg: mean(petal_length))
+#Explorer.DataFrame<
+ Polars[150 x 6]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
+ petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
+ petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+ petal_length_avg float [1.4640000000000004, 1.4640000000000004, 1.4640000000000004, 1.4640000000000004, 1.4640000000000004, ...]
+>
Here is an example of a new column that sums the value of two other columns:
iex> df = Explorer.DataFrame.new(a: [4, 5, 6], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate_with(df, &[c: Explorer.Series.add(&1["a"], &1["b"])])
-#Explorer.DataFrame<
- Polars[3 x 3]
- a integer [4, 5, 6]
- b integer [1, 2, 3]
- c integer [5, 7, 9]
->
You can overwrite existing columns as well:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate_with(df, &[b: Explorer.Series.pow(&1["b"], 2)])
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["a", "b", "c"]
- b float [1.0, 4.0, 9.0]
->
It's possible to "reuse" a variable for different computations:
iex> df = Explorer.DataFrame.new(a: [4, 5, 6], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate_with(df, fn ldf ->
-iex> c = Explorer.Series.add(ldf["a"], ldf["b"])
-iex> [c: c, d: Explorer.Series.window_sum(c, 2)]
-iex> end)
-#Explorer.DataFrame<
- Polars[3 x 4]
- a integer [4, 5, 6]
- b integer [1, 2, 3]
- c integer [5, 7, 9]
- d integer [5, 12, 16]
->
Here is an example of a new column that sums the value of two other columns:
iex> df = Explorer.DataFrame.new(a: [4, 5, 6], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate_with(df, &[c: Explorer.Series.add(&1["a"], &1["b"])])
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ a integer [4, 5, 6]
+ b integer [1, 2, 3]
+ c integer [5, 7, 9]
+>
You can overwrite existing columns as well:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate_with(df, &[b: Explorer.Series.pow(&1["b"], 2)])
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["a", "b", "c"]
+ b float [1.0, 4.0, 9.0]
+>
It's possible to "reuse" a variable for different computations:
iex> df = Explorer.DataFrame.new(a: [4, 5, 6], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate_with(df, fn ldf ->
+iex> c = Explorer.Series.add(ldf["a"], ldf["b"])
+iex> [c: c, d: Explorer.Series.window_sum(c, 2)]
+iex> end)
+#Explorer.DataFrame<
+ Polars[3 x 4]
+ a integer [4, 5, 6]
+ b integer [1, 2, 3]
+ c integer [5, 7, 9]
+ d integer [5, 12, 16]
+>
Mutations in grouped dataframes takes the context of the group. For example, if we want to count how many elements of a given group, -we can add a new column with that aggregation:
iex> df = Explorer.DataFrame.new(id: ["a", "a", "b"], b: [1, 2, 3])
-iex> grouped = Explorer.DataFrame.group_by(df, :id)
-iex> Explorer.DataFrame.mutate_with(grouped, &[count: Explorer.Series.count(&1["b"])])
-#Explorer.DataFrame<
- Polars[3 x 3]
- Groups: ["id"]
- id string ["a", "a", "b"]
- b integer [1, 2, 3]
- count integer [2, 2, 1]
->
+we can add a new column with that aggregation:iex> df = Explorer.DataFrame.new(id: ["a", "a", "b"], b: [1, 2, 3])
+iex> grouped = Explorer.DataFrame.group_by(df, :id)
+iex> Explorer.DataFrame.mutate_with(grouped, &[count: Explorer.Series.count(&1["b"])])
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ Groups: ["id"]
+ id string ["a", "a", "b"]
+ b integer [1, 2, 3]
+ count integer [2, 2, 1]
+>
iex> df = Explorer.DataFrame.new(a: ["d", nil, "f"], b: [nil, 2, nil], c: ["a", "b", "c"])
-iex> Explorer.DataFrame.nil_count(df)
-#Explorer.DataFrame<
- Polars[1 x 3]
- a integer [1]
- b integer [2]
- c integer [0]
->
+iex> df = Explorer.DataFrame.new(a: ["d", nil, "f"], b: [nil, 2, nil], c: ["a", "b", "c"])
+iex> Explorer.DataFrame.nil_count(df)
+#Explorer.DataFrame<
+ Polars[1 x 3]
+ a integer [1]
+ b integer [2]
+ c integer [0]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.pivot_longer(df, &String.ends_with?(&1, "fuel"))
-#Explorer.DataFrame<
- Polars[3282 x 9]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
- variable string ["solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", ...]
- value integer [627, 117, 332, 0, 0, ...]
->
-
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.pivot_longer(df, &String.ends_with?(&1, "fuel"), select: ["year", "country"])
-#Explorer.DataFrame<
- Polars[3282 x 4]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- variable string ["solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", ...]
- value integer [627, 117, 332, 0, 0, ...]
->
-
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.pivot_longer(df, ["total"], select: ["year", "country"], discard: ["country"])
-#Explorer.DataFrame<
- Polars[1094 x 3]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- variable string ["total", "total", "total", "total", "total", ...]
- value integer [2308, 1254, 32500, 141, 7924, ...]
->
-
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.pivot_longer(df, ["total"], select: [], names_to: "my_var", values_to: "my_value")
-#Explorer.DataFrame<
- Polars[1094 x 2]
- my_var string ["total", "total", "total", "total", "total", ...]
- my_value integer [2308, 1254, 32500, 141, 7924, ...]
->
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.pivot_longer(df, &String.ends_with?(&1, "fuel"))
+#Explorer.DataFrame<
+ Polars[3282 x 9]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+ variable string ["solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", ...]
+ value integer [627, 117, 332, 0, 0, ...]
+>
+
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.pivot_longer(df, &String.ends_with?(&1, "fuel"), select: ["year", "country"])
+#Explorer.DataFrame<
+ Polars[3282 x 4]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ variable string ["solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", ...]
+ value integer [627, 117, 332, 0, 0, ...]
+>
+
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.pivot_longer(df, ["total"], select: ["year", "country"], discard: ["country"])
+#Explorer.DataFrame<
+ Polars[1094 x 3]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ variable string ["total", "total", "total", "total", "total", ...]
+ value integer [2308, 1254, 32500, 141, 7924, ...]
+>
+
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.pivot_longer(df, ["total"], select: [], names_to: "my_var", values_to: "my_value")
+#Explorer.DataFrame<
+ Polars[1094 x 2]
+ my_var string ["total", "total", "total", "total", "total", ...]
+ my_value integer [2308, 1254, 32500, 141, 7924, ...]
+>
In the following example we want to take the Iris dataset and increase the number of rows by pivoting the "sepal_length" column. This dataset is grouped by "species", so the resultant -dataframe is going to keep the "species" group:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.pivot_longer(grouped, ["sepal_length"])
-#Explorer.DataFrame<
- Polars[150 x 6]
- Groups: ["species"]
- sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
- petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
- petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
- variable string ["sepal_length", "sepal_length", "sepal_length", "sepal_length", "sepal_length", ...]
- value float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
->
Now we want to do something different: we want to pivot the "species" column that is also a group. -This is going to remove the group in the resultant dataframe:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.pivot_longer(grouped, ["species"])
-#Explorer.DataFrame<
- Polars[150 x 6]
- sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
- petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
- petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
- variable string ["species", "species", "species", "species", "species", ...]
- value string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+dataframe is going to keep the "species" group:iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.pivot_longer(grouped, ["sepal_length"])
+#Explorer.DataFrame<
+ Polars[150 x 6]
+ Groups: ["species"]
+ sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
+ petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
+ petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+ variable string ["sepal_length", "sepal_length", "sepal_length", "sepal_length", "sepal_length", ...]
+ value float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
+>
Now we want to do something different: we want to pivot the "species" column that is also a group. +This is going to remove the group in the resultant dataframe:
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.pivot_longer(grouped, ["species"])
+#Explorer.DataFrame<
+ Polars[150 x 6]
+ sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
+ petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
+ petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
+ variable string ["species", "species", "species", "species", "species", ...]
+ value string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
Suppose we have a basketball court and multiple teams that want to train in that court. They need to share a schedule with the hours each team is going to use it. Here is a dataframe representing -that schedule:
iex> Explorer.DataFrame.new(
-iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
-iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
-iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
-iex> )
This dataframe is going to look like this - using table/2
:
+----------------------------------------------+
- | Explorer DataFrame: [rows: 10, columns: 3] |
+that schedule:iex> Explorer.DataFrame.new(
+iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
+iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
+iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
+iex> )
This dataframe is going to look like this - using table/2
:
+----------------------------------------------+
+ | Explorer DataFrame: [rows: 10, columns: 3] |
+---------------+--------------+---------------+
| weekday | team | hour |
| <string> | <string> | <integer> |
@@ -3181,22 +3181,22 @@ pivot_wider(df, names_from, values_from, op
| Friday | A | 16 |
+---------------+--------------+---------------+
You can see that the "weekday" repeats, and it's not clear how free the agenda is.
We can solve that by pivoting the "weekday" column in multiple columns, making each weekday
-a new column in the resultant dataframe.
iex> df = Explorer.DataFrame.new(
-iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
-iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
-iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
-iex> )
-iex> Explorer.DataFrame.pivot_wider(df, "weekday", "hour")
-#Explorer.DataFrame<
- Polars[3 x 6]
- team string ["A", "B", "C"]
- Monday integer [10, nil, 15]
- Tuesday integer [14, 9, nil]
- Wednesday integer [nil, 16, 10]
- Thursday integer [10, nil, 14]
- Friday integer [16, 11, nil]
->
Now if we print that same dataframe with table/2
, we get a better picture of the schedule:
+----------------------------------------------------------------------+
- | Explorer DataFrame: [rows: 3, columns: 6] |
+a new column in the resultant dataframe.iex> df = Explorer.DataFrame.new(
+iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
+iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
+iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
+iex> )
+iex> Explorer.DataFrame.pivot_wider(df, "weekday", "hour")
+#Explorer.DataFrame<
+ Polars[3 x 6]
+ team string ["A", "B", "C"]
+ Monday integer [10, nil, 15]
+ Tuesday integer [14, 9, nil]
+ Wednesday integer [nil, 16, 10]
+ Thursday integer [10, nil, 14]
+ Friday integer [16, 11, nil]
+>
Now if we print that same dataframe with table/2
, we get a better picture of the schedule:
+----------------------------------------------------------------------+
+ | Explorer DataFrame: [rows: 3, columns: 6] |
+----------+-----------+-----------+-----------+-----------+-----------+
| team | Monday | Tuesday | Wednesday | Thursday | Friday |
| <string> | <integer> | <integer> | <integer> | <integer> | <integer> |
@@ -3207,93 +3207,93 @@ pivot_wider(df, names_from, values_from, op
+----------+-----------+-----------+-----------+-----------+-----------+
| C | 15 | | 10 | 14 | |
+----------+-----------+-----------+-----------+-----------+-----------+
Pivot wider can create unpredictable column names, and sometimes they can conflict with ID columns.
-In that scenario, we add a number as suffix to duplicated column names. Here is an example:
iex> df = Explorer.DataFrame.new(
-iex> product_id: [1, 1, 1, 1, 2, 2, 2, 2],
-iex> property: ["product_id", "width_cm", "height_cm", "length_cm", "product_id", "width_cm", "height_cm", "length_cm"],
-iex> property_value: [1, 42, 40, 64, 2, 35, 20, 40]
-iex> )
-iex> Explorer.DataFrame.pivot_wider(df, "property", "property_value")
-#Explorer.DataFrame<
- Polars[2 x 5]
- product_id integer [1, 2]
- product_id_1 integer [1, 2]
- width_cm integer [42, 35]
- height_cm integer [40, 20]
- length_cm integer [64, 40]
->
But if the option :names_prefix
is used, that suffix is not added:
iex> df = Explorer.DataFrame.new(
-iex> product_id: [1, 1, 1, 1, 2, 2, 2, 2],
-iex> property: ["product_id", "width_cm", "height_cm", "length_cm", "product_id", "width_cm", "height_cm", "length_cm"],
-iex> property_value: [1, 42, 40, 64, 2, 35, 20, 40]
-iex> )
-iex> Explorer.DataFrame.pivot_wider(df, "property", "property_value", names_prefix: "col_")
-#Explorer.DataFrame<
- Polars[2 x 5]
- product_id integer [1, 2]
- col_product_id integer [1, 2]
- col_width_cm integer [42, 35]
- col_height_cm integer [40, 20]
- col_length_cm integer [64, 40]
->
Multiple columns are accepted for the values_from
parameter, but the behaviour is slightly
+In that scenario, we add a number as suffix to duplicated column names. Here is an example:
iex> df = Explorer.DataFrame.new(
+iex> product_id: [1, 1, 1, 1, 2, 2, 2, 2],
+iex> property: ["product_id", "width_cm", "height_cm", "length_cm", "product_id", "width_cm", "height_cm", "length_cm"],
+iex> property_value: [1, 42, 40, 64, 2, 35, 20, 40]
+iex> )
+iex> Explorer.DataFrame.pivot_wider(df, "property", "property_value")
+#Explorer.DataFrame<
+ Polars[2 x 5]
+ product_id integer [1, 2]
+ product_id_1 integer [1, 2]
+ width_cm integer [42, 35]
+ height_cm integer [40, 20]
+ length_cm integer [64, 40]
+>
But if the option :names_prefix
is used, that suffix is not added:
iex> df = Explorer.DataFrame.new(
+iex> product_id: [1, 1, 1, 1, 2, 2, 2, 2],
+iex> property: ["product_id", "width_cm", "height_cm", "length_cm", "product_id", "width_cm", "height_cm", "length_cm"],
+iex> property_value: [1, 42, 40, 64, 2, 35, 20, 40]
+iex> )
+iex> Explorer.DataFrame.pivot_wider(df, "property", "property_value", names_prefix: "col_")
+#Explorer.DataFrame<
+ Polars[2 x 5]
+ product_id integer [1, 2]
+ col_product_id integer [1, 2]
+ col_width_cm integer [42, 35]
+ col_height_cm integer [40, 20]
+ col_length_cm integer [64, 40]
+>
Multiple columns are accepted for the values_from
parameter, but the behaviour is slightly
different for the naming of new columns in the resultant dataframe. The new columns are going
to be prefixed by the name of the original value column, followed by an underscore and the
-original column name, followed by the name of the variable.
iex> df = Explorer.DataFrame.new(
-iex> product_id: [1, 1, 1, 1, 2, 2, 2, 2],
-iex> property: ["product_id", "width_cm", "height_cm", "length_cm", "product_id", "width_cm", "height_cm", "length_cm"],
-iex> property_value: [1, 42, 40, 64, 2, 35, 20, 40],
-iex> another_value: [1, 43, 41, 65, 2, 36, 21, 42]
-iex> )
-iex> Explorer.DataFrame.pivot_wider(df, "property", ["property_value", "another_value"])
-#Explorer.DataFrame<
- Polars[2 x 9]
- product_id integer [1, 2]
- property_value_property_product_id integer [1, 2]
- property_value_property_width_cm integer [42, 35]
- property_value_property_height_cm integer [40, 20]
- property_value_property_length_cm integer [64, 40]
- another_value_property_product_id integer [1, 2]
- another_value_property_width_cm integer [43, 36]
- another_value_property_height_cm integer [41, 21]
- another_value_property_length_cm integer [65, 42]
->
+original column name, followed by the name of the variable.iex> df = Explorer.DataFrame.new(
+iex> product_id: [1, 1, 1, 1, 2, 2, 2, 2],
+iex> property: ["product_id", "width_cm", "height_cm", "length_cm", "product_id", "width_cm", "height_cm", "length_cm"],
+iex> property_value: [1, 42, 40, 64, 2, 35, 20, 40],
+iex> another_value: [1, 43, 41, 65, 2, 36, 21, 42]
+iex> )
+iex> Explorer.DataFrame.pivot_wider(df, "property", ["property_value", "another_value"])
+#Explorer.DataFrame<
+ Polars[2 x 9]
+ product_id integer [1, 2]
+ property_value_property_product_id integer [1, 2]
+ property_value_property_width_cm integer [42, 35]
+ property_value_property_height_cm integer [40, 20]
+ property_value_property_length_cm integer [64, 40]
+ another_value_property_product_id integer [1, 2]
+ another_value_property_width_cm integer [43, 36]
+ another_value_property_height_cm integer [41, 21]
+ another_value_property_length_cm integer [65, 42]
+>
Grouped examples
Now using the same idea, we can see that there is not much difference for grouped dataframes.
-The only detail is that groups that are not ID columns are discarded.
iex> df = Explorer.DataFrame.new(
-iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
-iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
-iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
-iex> )
-iex> grouped = Explorer.DataFrame.group_by(df, "team")
-iex> Explorer.DataFrame.pivot_wider(grouped, "weekday", "hour")
-#Explorer.DataFrame<
- Polars[3 x 6]
- Groups: ["team"]
- team string ["A", "B", "C"]
- Monday integer [10, nil, 15]
- Tuesday integer [14, 9, nil]
- Wednesday integer [nil, 16, 10]
- Thursday integer [10, nil, 14]
- Friday integer [16, 11, nil]
->
In the following example the group "weekday" is going to be removed, because the column is going
-to be pivoted in multiple columns:
iex> df = Explorer.DataFrame.new(
-iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
-iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
-iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
-iex> )
-iex> grouped = Explorer.DataFrame.group_by(df, "weekday")
-iex> Explorer.DataFrame.pivot_wider(grouped, "weekday", "hour")
-#Explorer.DataFrame<
- Polars[3 x 6]
- team string ["A", "B", "C"]
- Monday integer [10, nil, 15]
- Tuesday integer [14, 9, nil]
- Wednesday integer [nil, 16, 10]
- Thursday integer [10, nil, 14]
- Friday integer [16, 11, nil]
->
+The only detail is that groups that are not ID columns are discarded.iex> df = Explorer.DataFrame.new(
+iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
+iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
+iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
+iex> )
+iex> grouped = Explorer.DataFrame.group_by(df, "team")
+iex> Explorer.DataFrame.pivot_wider(grouped, "weekday", "hour")
+#Explorer.DataFrame<
+ Polars[3 x 6]
+ Groups: ["team"]
+ team string ["A", "B", "C"]
+ Monday integer [10, nil, 15]
+ Tuesday integer [14, 9, nil]
+ Wednesday integer [nil, 16, 10]
+ Thursday integer [10, nil, 14]
+ Friday integer [16, 11, nil]
+>
In the following example the group "weekday" is going to be removed, because the column is going
+to be pivoted in multiple columns:
iex> df = Explorer.DataFrame.new(
+iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
+iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
+iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
+iex> )
+iex> grouped = Explorer.DataFrame.group_by(df, "weekday")
+iex> Explorer.DataFrame.pivot_wider(grouped, "weekday", "hour")
+#Explorer.DataFrame<
+ Polars[3 x 6]
+ team string ["A", "B", "C"]
+ Monday integer [10, nil, 15]
+ Tuesday integer [14, 9, nil]
+ Wednesday integer [nil, 16, 10]
+ Thursday integer [10, nil, 14]
+ Friday integer [16, 11, nil]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.pull(df, "total")
-#Explorer.Series<
- Polars[1094]
- integer [2308, 1254, 32500, 141, 7924, 41, 143, 51246, 1150, 684, 106589, 18408, 8366, 451, 7981, 16345, 403, 17192, 30222, 147, 1388, 166, 133, 5802, 1278, 114468, 47, 2237, 12030, 535, 58, 1367, 145806, 152, 152, 72, 141, 19703, 2393248, 20773, 44, 540, 19, 2064, 1900, 5501, 10465, 2102, 30428, 18122, ...]
->
-
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.pull(df, 2)
-#Explorer.Series<
- Polars[1094]
- integer [2308, 1254, 32500, 141, 7924, 41, 143, 51246, 1150, 684, 106589, 18408, 8366, 451, 7981, 16345, 403, 17192, 30222, 147, 1388, 166, 133, 5802, 1278, 114468, 47, 2237, 12030, 535, 58, 1367, 145806, 152, 152, 72, 141, 19703, 2393248, 20773, 44, 540, 19, 2064, 1900, 5501, 10465, 2102, 30428, 18122, ...]
->
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.pull(df, "total")
+#Explorer.Series<
+ Polars[1094]
+ integer [2308, 1254, 32500, 141, 7924, 41, 143, 51246, 1150, 684, 106589, 18408, 8366, 451, 7981, 16345, 403, 17192, 30222, 147, 1388, 166, 133, 5802, 1278, 114468, 47, 2237, 12030, 535, 58, 1367, 145806, 152, 152, 72, 141, 19703, 2393248, 20773, 44, 540, 19, 2064, 1900, 5501, 10465, 2102, 30428, 18122, ...]
+>
+
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.pull(df, 2)
+#Explorer.Series<
+ Polars[1094]
+ integer [2308, 1254, 32500, 141, 7924, 41, 143, 51246, 1150, 684, 106589, 18408, 8366, 451, 7981, 16345, 403, 17192, 30222, 147, 1388, 166, 133, 5802, 1278, 114468, 47, 2237, 12030, 535, 58, 1367, 145806, 152, 152, 72, 141, 19703, 2393248, 20773, 44, 540, 19, 2064, 1900, 5501, 10465, 2102, 30428, 18122, ...]
+>
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> Explorer.DataFrame.put(df, :b, Explorer.Series.transform(df[:a], fn n -> n * 2 end))
-#Explorer.DataFrame<
- Polars[3 x 2]
- a integer [1, 2, 3]
- b integer [2, 4, 6]
->
-
-iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> Explorer.DataFrame.put(df, :b, Explorer.Series.from_list([4, 5, 6]))
-#Explorer.DataFrame<
- Polars[3 x 2]
- a integer [1, 2, 3]
- b integer [4, 5, 6]
->
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> Explorer.DataFrame.put(df, :b, Explorer.Series.transform(df[:a], fn n -> n * 2 end))
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a integer [1, 2, 3]
+ b integer [2, 4, 6]
+>
+
+iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> Explorer.DataFrame.put(df, :b, Explorer.Series.from_list([4, 5, 6]))
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a integer [1, 2, 3]
+ b integer [4, 5, 6]
+>
If the dataframe is grouped, put/3
is going to ignore the groups.
-So the series must be of the same size of the entire dataframe.
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> grouped = Explorer.DataFrame.group_by(df, "a")
-iex> series = Explorer.Series.from_list([9, 8, 7])
-iex> Explorer.DataFrame.put(grouped, :b, series)
-#Explorer.DataFrame<
- Polars[3 x 2]
- Groups: ["a"]
- a integer [1, 2, 3]
- b integer [9, 8, 7]
->
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> grouped = Explorer.DataFrame.group_by(df, "a")
+iex> series = Explorer.Series.from_list([9, 8, 7])
+iex> Explorer.DataFrame.put(grouped, :b, series)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ Groups: ["a"]
+ a integer [1, 2, 3]
+ b integer [9, 8, 7]
+>
You can also put tensors into the dataframe:
iex> df = Explorer.DataFrame.new([])
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor([1, 2, 3]))
-#Explorer.DataFrame<
- Polars[3 x 1]
- a integer [1, 2, 3]
->
You can specify which dtype the tensor represents. +
You can also put tensors into the dataframe:
iex> df = Explorer.DataFrame.new([])
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor([1, 2, 3]))
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a integer [1, 2, 3]
+>
You can specify which dtype the tensor represents. For example, a tensor of s64 represents integers by default, but it may also represent timestamps -in microseconds from the Unix epoch:
iex> df = Explorer.DataFrame.new([])
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor([1, 2, 3]), dtype: :datetime)
-#Explorer.DataFrame<
- Polars[3 x 1]
- a datetime [1970-01-01 00:00:00.000001, 1970-01-01 00:00:00.000002, 1970-01-01 00:00:00.000003]
->
If there is already a column where we want to place the tensor, +in microseconds from the Unix epoch:
iex> df = Explorer.DataFrame.new([])
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor([1, 2, 3]), dtype: :datetime)
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a datetime [1970-01-01 00:00:00.000001, 1970-01-01 00:00:00.000002, 1970-01-01 00:00:00.000003]
+>
If there is already a column where we want to place the tensor, the column dtype will be automatically used, this means that updating dataframes in place while preserving their types is -straight-forward:
iex> df = Explorer.DataFrame.new(a: [~N[1970-01-01 00:00:00]])
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor(529550625987654))
-#Explorer.DataFrame<
- Polars[1 x 1]
- a datetime [1986-10-13 01:23:45.987654]
->
This is particularly useful for categorical columns:
iex> cat = Explorer.Series.from_list(["foo", "bar", "baz"], dtype: :category)
-iex> df = Explorer.DataFrame.new(a: cat)
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor([2, 1, 0]))
-#Explorer.DataFrame<
- Polars[3 x 1]
- a category ["baz", "bar", "foo"]
->
On the other hand, if you try to put a floating tensor on +straight-forward:
iex> df = Explorer.DataFrame.new(a: [~N[1970-01-01 00:00:00]])
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor(529550625987654))
+#Explorer.DataFrame<
+ Polars[1 x 1]
+ a datetime [1986-10-13 01:23:45.987654]
+>
This is particularly useful for categorical columns:
iex> cat = Explorer.Series.from_list(["foo", "bar", "baz"], dtype: :category)
+iex> df = Explorer.DataFrame.new(a: cat)
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor([2, 1, 0]))
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a category ["baz", "bar", "foo"]
+>
On the other hand, if you try to put a floating tensor on
an integer column, an error will be raised unless a dtype
-or dtype: :infer
is given:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor(1.0, type: :f64))
+or dtype: :infer
is given:iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor(1.0, type: :f64))
** (ArgumentError) dtype integer expects a tensor of type {:s, 64} but got type {:f, 64}
-iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor(1.0, type: :f64), dtype: :float)
-#Explorer.DataFrame<
- Polars[3 x 1]
- a float [1.0, 1.0, 1.0]
->
-
-iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor(1.0, type: :f64), dtype: :infer)
-#Explorer.DataFrame<
- Polars[3 x 1]
- a float [1.0, 1.0, 1.0]
->
+
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor(1.0, type: :f64), dtype: :float)
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a float [1.0, 1.0, 1.0]
+>
+
+iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor(1.0, type: :f64), dtype: :infer)
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a float [1.0, 1.0, 1.0]
+>
Similar to tensors, we can also put lists in the dataframe:
iex> df = Explorer.DataFrame.new([])
-iex> Explorer.DataFrame.put(df, :a, [1, 2, 3])
-#Explorer.DataFrame<
- Polars[3 x 1]
- a integer [1, 2, 3]
->
The same considerations as above apply.
+Similar to tensors, we can also put lists in the dataframe:
iex> df = Explorer.DataFrame.new([])
+iex> Explorer.DataFrame.put(df, :a, [1, 2, 3])
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a integer [1, 2, 3]
+>
The same considerations as above apply.
Relocate a single column
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1], c: [nil, 5, 6])
-iex> Explorer.DataFrame.relocate(df, "a", after: "c")
-#Explorer.DataFrame<
- Polars[3 x 3]
- b integer [1, 3, 1]
- c integer [nil, 5, 6]
- a string ["a", "b", "a"]
->
Relocate (and reorder) multiple columns to the beginning
iex> df = Explorer.DataFrame.new(a: [1, 2], b: [5.1, 5.2], c: [4, 5], d: ["yes", "no"])
-iex> Explorer.DataFrame.relocate(df, ["d", 1], before: 0)
-#Explorer.DataFrame<
- Polars[2 x 4]
- d string ["yes", "no"]
- b float [5.1, 5.2]
- a integer [1, 2]
- c integer [4, 5]
->
Relocate before another column
iex> df = Explorer.DataFrame.new(a: [1, 2], b: [5.1, 5.2], c: [4, 5], d: ["yes", "no"])
-iex> Explorer.DataFrame.relocate(df, ["a", "c"], before: "b")
-#Explorer.DataFrame<
- Polars[2 x 4]
- a integer [1, 2]
- c integer [4, 5]
- b float [5.1, 5.2]
- d string ["yes", "no"]
->
+Relocate a single column
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1], c: [nil, 5, 6])
+iex> Explorer.DataFrame.relocate(df, "a", after: "c")
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ b integer [1, 3, 1]
+ c integer [nil, 5, 6]
+ a string ["a", "b", "a"]
+>
Relocate (and reorder) multiple columns to the beginning
iex> df = Explorer.DataFrame.new(a: [1, 2], b: [5.1, 5.2], c: [4, 5], d: ["yes", "no"])
+iex> Explorer.DataFrame.relocate(df, ["d", 1], before: 0)
+#Explorer.DataFrame<
+ Polars[2 x 4]
+ d string ["yes", "no"]
+ b float [5.1, 5.2]
+ a integer [1, 2]
+ c integer [4, 5]
+>
Relocate before another column
iex> df = Explorer.DataFrame.new(a: [1, 2], b: [5.1, 5.2], c: [4, 5], d: ["yes", "no"])
+iex> Explorer.DataFrame.relocate(df, ["a", "c"], before: "b")
+#Explorer.DataFrame<
+ Polars[2 x 4]
+ a integer [1, 2]
+ c integer [4, 5]
+ b float [5.1, 5.2]
+ d string ["yes", "no"]
+>
You can pass in a list of new names:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
-iex> Explorer.DataFrame.rename(df, ["c", "d"])
-#Explorer.DataFrame<
- Polars[3 x 2]
- c string ["a", "b", "a"]
- d integer [1, 3, 1]
->
Or you can rename individual columns using keyword args:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
-iex> Explorer.DataFrame.rename(df, a: "first")
-#Explorer.DataFrame<
- Polars[3 x 2]
- first string ["a", "b", "a"]
- b integer [1, 3, 1]
->
Or you can rename individual columns using a map:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
-iex> Explorer.DataFrame.rename(df, %{"a" => "first"})
-#Explorer.DataFrame<
- Polars[3 x 2]
- first string ["a", "b", "a"]
- b integer [1, 3, 1]
->
+You can pass in a list of new names:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
+iex> Explorer.DataFrame.rename(df, ["c", "d"])
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ c string ["a", "b", "a"]
+ d integer [1, 3, 1]
+>
Or you can rename individual columns using keyword args:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
+iex> Explorer.DataFrame.rename(df, a: "first")
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ first string ["a", "b", "a"]
+ b integer [1, 3, 1]
+>
Or you can rename individual columns using a map:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
+iex> Explorer.DataFrame.rename(df, %{"a" => "first"})
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ first string ["a", "b", "a"]
+ b integer [1, 3, 1]
+>
If no columns are specified, it will apply the function to all column names:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.rename_with(df, &String.upcase/1)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- YEAR integer [2010, 2010, 2010, 2010, 2010, ...]
- COUNTRY string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- TOTAL integer [2308, 1254, 32500, 141, 7924, ...]
- SOLID_FUEL integer [627, 117, 332, 0, 0, ...]
- LIQUID_FUEL integer [1601, 953, 12381, 141, 3649, ...]
- GAS_FUEL integer [74, 7, 14565, 0, 374, ...]
- CEMENT integer [5, 177, 2598, 0, 204, ...]
- GAS_FLARING integer [0, 0, 2623, 0, 3697, ...]
- PER_CAPITA float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- BUNKER_FUELS integer [9, 7, 663, 0, 321, ...]
->
A callback can be used to filter the column names that will be renamed, similarly to select/2
:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.rename_with(df, &String.ends_with?(&1, "_fuel"), &String.trim_trailing(&1, "_fuel"))
-#Explorer.DataFrame<
- Polars[1094 x 10]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid integer [627, 117, 332, 0, 0, ...]
- liquid integer [1601, 953, 12381, 141, 3649, ...]
- gas integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Or you can just pass in the list of column names you'd like to apply the function to:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.rename_with(df, ["total", "cement"], &String.upcase/1)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- TOTAL integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- CEMENT integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
If no columns are specified, it will apply the function to all column names:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.rename_with(df, &String.upcase/1)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ YEAR integer [2010, 2010, 2010, 2010, 2010, ...]
+ COUNTRY string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ TOTAL integer [2308, 1254, 32500, 141, 7924, ...]
+ SOLID_FUEL integer [627, 117, 332, 0, 0, ...]
+ LIQUID_FUEL integer [1601, 953, 12381, 141, 3649, ...]
+ GAS_FUEL integer [74, 7, 14565, 0, 374, ...]
+ CEMENT integer [5, 177, 2598, 0, 204, ...]
+ GAS_FLARING integer [0, 0, 2623, 0, 3697, ...]
+ PER_CAPITA float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ BUNKER_FUELS integer [9, 7, 663, 0, 321, ...]
+>
A callback can be used to filter the column names that will be renamed, similarly to select/2
:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.rename_with(df, &String.ends_with?(&1, "_fuel"), &String.trim_trailing(&1, "_fuel"))
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid integer [627, 117, 332, 0, 0, ...]
+ liquid integer [1601, 953, 12381, 141, 3649, ...]
+ gas integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Or you can just pass in the list of column names you'd like to apply the function to:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.rename_with(df, ["total", "cement"], &String.upcase/1)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ TOTAL integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ CEMENT integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
You can select a single column:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.select(df, "a")
-#Explorer.DataFrame<
- Polars[3 x 1]
- a string ["a", "b", "c"]
->
Or a list of names:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.select(df, ["a"])
-#Explorer.DataFrame<
- Polars[3 x 1]
- a string ["a", "b", "c"]
->
You can also use a range or a list of integers:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
-iex> Explorer.DataFrame.select(df, [0, 1])
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["a", "b", "c"]
- b integer [1, 2, 3]
->
-
-iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
-iex> Explorer.DataFrame.select(df, 0..1)
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["a", "b", "c"]
- b integer [1, 2, 3]
->
Or you can use a callback function that takes the dataframe's names as its first argument:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.select(df, &String.starts_with?(&1, "b"))
-#Explorer.DataFrame<
- Polars[3 x 1]
- b integer [1, 2, 3]
->
Or, if you prefer, a regex:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.select(df, ~r/^b$/)
-#Explorer.DataFrame<
- Polars[3 x 1]
- b integer [1, 2, 3]
->
Or a callback function that takes names and types:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.select(df, fn _name, type -> type == :integer end)
-#Explorer.DataFrame<
- Polars[3 x 1]
- b integer [1, 2, 3]
->
You can select a single column:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.select(df, "a")
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a string ["a", "b", "c"]
+>
Or a list of names:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.select(df, ["a"])
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a string ["a", "b", "c"]
+>
You can also use a range or a list of integers:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
+iex> Explorer.DataFrame.select(df, [0, 1])
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["a", "b", "c"]
+ b integer [1, 2, 3]
+>
+
+iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
+iex> Explorer.DataFrame.select(df, 0..1)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["a", "b", "c"]
+ b integer [1, 2, 3]
+>
Or you can use a callback function that takes the dataframe's names as its first argument:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.select(df, &String.starts_with?(&1, "b"))
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ b integer [1, 2, 3]
+>
Or, if you prefer, a regex:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.select(df, ~r/^b$/)
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ b integer [1, 2, 3]
+>
Or a callback function that takes names and types:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.select(df, fn _name, type -> type == :integer end)
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ b integer [1, 2, 3]
+>
Columns that are also groups cannot be removed, -you need to ungroup before removing these columns.
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.select(grouped, ["sepal_width"])
-#Explorer.DataFrame<
- Polars[150 x 2]
- Groups: ["species"]
- sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+you need to ungroup before removing these columns.iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.select(grouped, ["sepal_width"])
+#Explorer.DataFrame<
+ Polars[150 x 2]
+ Groups: ["species"]
+ sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> grouped_df = Explorer.DataFrame.group_by(df, "year")
-iex> Explorer.DataFrame.summarise(grouped_df, total_max: max(total), total_min: min(total))
-#Explorer.DataFrame<
- Polars[5 x 3]
- year integer [2010, 2011, 2012, 2013, 2014]
- total_max integer [2393248, 2654360, 2734817, 2797384, 2806634]
- total_min integer [1, 2, 2, 2, 3]
->
Suppose you want to get the mean petal length of each Iris species. You could do something -like this:
iex> df = Explorer.Datasets.iris()
-iex> grouped_df = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.summarise(grouped_df, mean_petal_length: mean(petal_length))
-#Explorer.DataFrame<
- Polars[3 x 2]
- species string ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
- mean_petal_length float [1.464, 4.26, 5.552]
->
In case aggregations for all the dataframe is what you want, you can use ungrouped -dataframes:
iex> df = Explorer.Datasets.iris()
-iex> Explorer.DataFrame.summarise(df, mean_petal_length: mean(petal_length))
-#Explorer.DataFrame<
- Polars[1 x 1]
- mean_petal_length float [3.758666666666667]
->
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> grouped_df = Explorer.DataFrame.group_by(df, "year")
+iex> Explorer.DataFrame.summarise(grouped_df, total_max: max(total), total_min: min(total))
+#Explorer.DataFrame<
+ Polars[5 x 3]
+ year integer [2010, 2011, 2012, 2013, 2014]
+ total_max integer [2393248, 2654360, 2734817, 2797384, 2806634]
+ total_min integer [1, 2, 2, 2, 3]
+>
Suppose you want to get the mean petal length of each Iris species. You could do something +like this:
iex> df = Explorer.Datasets.iris()
+iex> grouped_df = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.summarise(grouped_df, mean_petal_length: mean(petal_length))
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ species string ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
+ mean_petal_length float [1.464, 4.26, 5.552]
+>
In case aggregations for all the dataframe is what you want, you can use ungrouped +dataframes:
iex> df = Explorer.Datasets.iris()
+iex> Explorer.DataFrame.summarise(df, mean_petal_length: mean(petal_length))
+#Explorer.DataFrame<
+ Polars[1 x 1]
+ mean_petal_length float [3.758666666666667]
+>
iex> alias Explorer.{DataFrame, Series}
-iex> df = Explorer.Datasets.fossil_fuels() |> DataFrame.group_by("year")
-iex> DataFrame.summarise_with(df, &[total_max: Series.max(&1["total"]), countries: Series.n_distinct(&1["country"])])
-#Explorer.DataFrame<
- Polars[5 x 3]
- year integer [2010, 2011, 2012, 2013, 2014]
- total_max integer [2393248, 2654360, 2734817, 2797384, 2806634]
- countries integer [217, 217, 220, 220, 220]
->
-
-iex> alias Explorer.{DataFrame, Series}
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> DataFrame.summarise_with(df, &[total_max: Series.max(&1["total"]), countries: Series.n_distinct(&1["country"])])
-#Explorer.DataFrame<
- Polars[1 x 2]
- total_max integer [2806634]
- countries integer [222]
->
+iex> alias Explorer.{DataFrame, Series}
+iex> df = Explorer.Datasets.fossil_fuels() |> DataFrame.group_by("year")
+iex> DataFrame.summarise_with(df, &[total_max: Series.max(&1["total"]), countries: Series.n_distinct(&1["country"])])
+#Explorer.DataFrame<
+ Polars[5 x 3]
+ year integer [2010, 2011, 2012, 2013, 2014]
+ total_max integer [2393248, 2654360, 2734817, 2797384, 2806634]
+ countries integer [217, 217, 220, 220, 220]
+>
+
+iex> alias Explorer.{DataFrame, Series}
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> DataFrame.summarise_with(df, &[total_max: Series.max(&1["total"]), countries: Series.n_distinct(&1["country"])])
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ total_max integer [2806634]
+ countries integer [222]
+>
Ungroups all by default:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> df = Explorer.DataFrame.group_by(df, ["country", "year"])
-iex> Explorer.DataFrame.ungroup(df)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Ungrouping a single column:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> df = Explorer.DataFrame.group_by(df, ["country", "year"])
-iex> Explorer.DataFrame.ungroup(df, "country")
-#Explorer.DataFrame<
- Polars[1094 x 10]
- Groups: ["year"]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Lists, ranges, regexes, and functions are also accepted in column names, as in select/2
.
Ungroups all by default:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> df = Explorer.DataFrame.group_by(df, ["country", "year"])
+iex> Explorer.DataFrame.ungroup(df)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Ungrouping a single column:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> df = Explorer.DataFrame.group_by(df, ["country", "year"])
+iex> Explorer.DataFrame.ungroup(df, "country")
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ Groups: ["year"]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Lists, ranges, regexes, and functions are also accepted in column names, as in select/2
.
iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
-iex> df2 = Explorer.DataFrame.new(z: [4, 5, 6], a: ["d", "e", "f"])
-iex> Explorer.DataFrame.concat_columns([df1, df2])
-#Explorer.DataFrame<
- Polars[3 x 4]
- x integer [1, 2, 3]
- y string ["a", "b", "c"]
- z integer [4, 5, 6]
- a string ["d", "e", "f"]
->
Conflicting names are suffixed with the index of the dataframe in the array:
iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
-iex> df2 = Explorer.DataFrame.new(x: [4, 5, 6], a: ["d", "e", "f"])
-iex> Explorer.DataFrame.concat_columns([df1, df2])
-#Explorer.DataFrame<
- Polars[3 x 4]
- x integer [1, 2, 3]
- y string ["a", "b", "c"]
- x_1 integer [4, 5, 6]
- a string ["d", "e", "f"]
->
+iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
+iex> df2 = Explorer.DataFrame.new(z: [4, 5, 6], a: ["d", "e", "f"])
+iex> Explorer.DataFrame.concat_columns([df1, df2])
+#Explorer.DataFrame<
+ Polars[3 x 4]
+ x integer [1, 2, 3]
+ y string ["a", "b", "c"]
+ z integer [4, 5, 6]
+ a string ["d", "e", "f"]
+>
Conflicting names are suffixed with the index of the dataframe in the array:
iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
+iex> df2 = Explorer.DataFrame.new(x: [4, 5, 6], a: ["d", "e", "f"])
+iex> Explorer.DataFrame.concat_columns([df1, df2])
+#Explorer.DataFrame<
+ Polars[3 x 4]
+ x integer [1, 2, 3]
+ y string ["a", "b", "c"]
+ x_1 integer [4, 5, 6]
+ a string ["d", "e", "f"]
+>
iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
-iex> df2 = Explorer.DataFrame.new(x: [4, 5, 6], y: ["d", "e", "f"])
-iex> Explorer.DataFrame.concat_rows([df1, df2])
-#Explorer.DataFrame<
- Polars[6 x 2]
- x integer [1, 2, 3, 4, 5, ...]
- y string ["a", "b", "c", "d", "e", ...]
->
-
-iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
-iex> df2 = Explorer.DataFrame.new(x: [4.2, 5.3, 6.4], y: ["d", "e", "f"])
-iex> Explorer.DataFrame.concat_rows([df1, df2])
-#Explorer.DataFrame<
- Polars[6 x 2]
- x float [1.0, 2.0, 3.0, 4.2, 5.3, ...]
- y string ["a", "b", "c", "d", "e", ...]
->
+iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
+iex> df2 = Explorer.DataFrame.new(x: [4, 5, 6], y: ["d", "e", "f"])
+iex> Explorer.DataFrame.concat_rows([df1, df2])
+#Explorer.DataFrame<
+ Polars[6 x 2]
+ x integer [1, 2, 3, 4, 5, ...]
+ y string ["a", "b", "c", "d", "e", ...]
+>
+
+iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
+iex> df2 = Explorer.DataFrame.new(x: [4.2, 5.3, 6.4], y: ["d", "e", "f"])
+iex> Explorer.DataFrame.concat_rows([df1, df2])
+#Explorer.DataFrame<
+ Polars[6 x 2]
+ x float [1.0, 2.0, 3.0, 4.2, 5.3, ...]
+ y string ["a", "b", "c", "d", "e", ...]
+>
Inner join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
-iex> Explorer.DataFrame.join(left, right)
-#Explorer.DataFrame<
- Polars[3 x 3]
- a integer [1, 2, 2]
- b string ["a", "b", "b"]
- c string ["d", "e", "f"]
->
Left join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
-iex> Explorer.DataFrame.join(left, right, how: :left)
-#Explorer.DataFrame<
- Polars[4 x 3]
- a integer [1, 2, 2, 3]
- b string ["a", "b", "b", "c"]
- c string ["d", "e", "f", nil]
->
Right join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
-iex> Explorer.DataFrame.join(left, right, how: :right)
-#Explorer.DataFrame<
- Polars[3 x 3]
- a integer [1, 2, 4]
- c string ["d", "e", "f"]
- b string ["a", "b", nil]
->
Outer join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
-iex> Explorer.DataFrame.join(left, right, how: :outer)
-#Explorer.DataFrame<
- Polars[4 x 3]
- a integer [1, 2, 4, 3]
- b string ["a", "b", nil, "c"]
- c string ["d", "e", "f", nil]
->
Cross join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
-iex> Explorer.DataFrame.join(left, right, how: :cross)
-#Explorer.DataFrame<
- Polars[9 x 4]
- a integer [1, 1, 1, 2, 2, ...]
- b string ["a", "a", "a", "b", "b", ...]
- a_right integer [1, 2, 4, 1, 2, ...]
- c string ["d", "e", "f", "d", "e", ...]
->
Inner join with different names:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(d: [1, 2, 2], c: ["d", "e", "f"])
-iex> Explorer.DataFrame.join(left, right, on: [{"a", "d"}])
-#Explorer.DataFrame<
- Polars[3 x 3]
- a integer [1, 2, 2]
- b string ["a", "b", "b"]
- c string ["d", "e", "f"]
->
Inner join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
+iex> Explorer.DataFrame.join(left, right)
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ a integer [1, 2, 2]
+ b string ["a", "b", "b"]
+ c string ["d", "e", "f"]
+>
Left join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
+iex> Explorer.DataFrame.join(left, right, how: :left)
+#Explorer.DataFrame<
+ Polars[4 x 3]
+ a integer [1, 2, 2, 3]
+ b string ["a", "b", "b", "c"]
+ c string ["d", "e", "f", nil]
+>
Right join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
+iex> Explorer.DataFrame.join(left, right, how: :right)
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ a integer [1, 2, 4]
+ c string ["d", "e", "f"]
+ b string ["a", "b", nil]
+>
Outer join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
+iex> Explorer.DataFrame.join(left, right, how: :outer)
+#Explorer.DataFrame<
+ Polars[4 x 3]
+ a integer [1, 2, 4, 3]
+ b string ["a", "b", nil, "c"]
+ c string ["d", "e", "f", nil]
+>
Cross join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
+iex> Explorer.DataFrame.join(left, right, how: :cross)
+#Explorer.DataFrame<
+ Polars[9 x 4]
+ a integer [1, 1, 1, 2, 2, ...]
+ b string ["a", "a", "a", "b", "b", ...]
+ a_right integer [1, 2, 4, 1, 2, ...]
+ c string ["d", "e", "f", "d", "e", ...]
+>
Inner join with different names:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(d: [1, 2, 2], c: ["d", "e", "f"])
+iex> Explorer.DataFrame.join(left, right, on: [{"a", "d"}])
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ a integer [1, 2, 2]
+ b string ["a", "b", "b"]
+ c string ["d", "e", "f"]
+>
When doing a join operation with grouped dataframes, the joined dataframe -may keep the groups from only one side.
An inner join operation will keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
-iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
-iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
-iex> Explorer.DataFrame.join(grouped_left, grouped_right)
-#Explorer.DataFrame<
- Polars[3 x 3]
- Groups: ["b"]
- a integer [1, 2, 2]
- b string ["a", "b", "b"]
- c string ["d", "e", "f"]
->
A left join operation will keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
-iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
-iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
-iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :left)
-#Explorer.DataFrame<
- Polars[4 x 3]
- Groups: ["b"]
- a integer [1, 2, 2, 3]
- b string ["a", "b", "b", "c"]
- c string ["d", "e", "f", nil]
->
A right join operation will keep the groups from the right-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
-iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
-iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
-iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :right)
-#Explorer.DataFrame<
- Polars[3 x 3]
- Groups: ["c"]
- a integer [1, 2, 4]
- c string ["d", "e", "f"]
- b string ["a", "b", nil]
->
An outer join operation is going to keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
-iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
-iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
-iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :outer)
-#Explorer.DataFrame<
- Polars[4 x 3]
- Groups: ["b"]
- a integer [1, 2, 4, 3]
- b string ["a", "b", nil, "c"]
- c string ["d", "e", "f", nil]
->
A cross join operation is going to keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
-iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
-iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
-iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :cross)
-#Explorer.DataFrame<
- Polars[9 x 4]
- Groups: ["b"]
- a integer [1, 1, 1, 2, 2, ...]
- b string ["a", "a", "a", "b", "b", ...]
- a_right integer [1, 2, 4, 1, 2, ...]
- c string ["d", "e", "f", "d", "e", ...]
->
+may keep the groups from only one side.An inner join operation will keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
+iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
+iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
+iex> Explorer.DataFrame.join(grouped_left, grouped_right)
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ Groups: ["b"]
+ a integer [1, 2, 2]
+ b string ["a", "b", "b"]
+ c string ["d", "e", "f"]
+>
A left join operation will keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
+iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
+iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
+iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :left)
+#Explorer.DataFrame<
+ Polars[4 x 3]
+ Groups: ["b"]
+ a integer [1, 2, 2, 3]
+ b string ["a", "b", "b", "c"]
+ c string ["d", "e", "f", nil]
+>
A right join operation will keep the groups from the right-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
+iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
+iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
+iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :right)
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ Groups: ["c"]
+ a integer [1, 2, 4]
+ c string ["d", "e", "f"]
+ b string ["a", "b", nil]
+>
An outer join operation is going to keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
+iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
+iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
+iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :outer)
+#Explorer.DataFrame<
+ Polars[4 x 3]
+ Groups: ["b"]
+ a integer [1, 2, 4, 3]
+ b string ["a", "b", nil, "c"]
+ c string ["d", "e", "f", nil]
+>
A cross join operation is going to keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
+iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
+iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
+iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :cross)
+#Explorer.DataFrame<
+ Polars[9 x 4]
+ Groups: ["b"]
+ a integer [1, 1, 1, 2, 2, ...]
+ b string ["a", "a", "a", "b", "b", ...]
+ a_right integer [1, 2, 4, 1, 2, ...]
+ c string ["d", "e", "f", "d", "e", ...]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.head(df)
-#Explorer.DataFrame<
- Polars[5 x 10]
- year integer [2010, 2010, 2010, 2010, 2010]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA"]
- total integer [2308, 1254, 32500, 141, 7924]
- solid_fuel integer [627, 117, 332, 0, 0]
- liquid_fuel integer [1601, 953, 12381, 141, 3649]
- gas_fuel integer [74, 7, 14565, 0, 374]
- cement integer [5, 177, 2598, 0, 204]
- gas_flaring integer [0, 0, 2623, 0, 3697]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37]
- bunker_fuels integer [9, 7, 663, 0, 321]
->
-
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.head(df, 2)
-#Explorer.DataFrame<
- Polars[2 x 10]
- year integer [2010, 2010]
- country string ["AFGHANISTAN", "ALBANIA"]
- total integer [2308, 1254]
- solid_fuel integer [627, 117]
- liquid_fuel integer [1601, 953]
- gas_fuel integer [74, 7]
- cement integer [5, 177]
- gas_flaring integer [0, 0]
- per_capita float [0.08, 0.43]
- bunker_fuels integer [9, 7]
->
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.head(df)
+#Explorer.DataFrame<
+ Polars[5 x 10]
+ year integer [2010, 2010, 2010, 2010, 2010]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA"]
+ total integer [2308, 1254, 32500, 141, 7924]
+ solid_fuel integer [627, 117, 332, 0, 0]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649]
+ gas_fuel integer [74, 7, 14565, 0, 374]
+ cement integer [5, 177, 2598, 0, 204]
+ gas_flaring integer [0, 0, 2623, 0, 3697]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37]
+ bunker_fuels integer [9, 7, 663, 0, 321]
+>
+
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.head(df, 2)
+#Explorer.DataFrame<
+ Polars[2 x 10]
+ year integer [2010, 2010]
+ country string ["AFGHANISTAN", "ALBANIA"]
+ total integer [2308, 1254]
+ solid_fuel integer [627, 117]
+ liquid_fuel integer [1601, 953]
+ gas_fuel integer [74, 7]
+ cement integer [5, 177]
+ gas_flaring integer [0, 0]
+ per_capita float [0.08, 0.43]
+ bunker_fuels integer [9, 7]
+>
Using grouped dataframes makes head/2
return n rows from each group.
-Here is an example using the Iris dataset, and returning two rows from each group:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.head(grouped, 2)
-#Explorer.DataFrame<
- Polars[6 x 5]
- Groups: ["species"]
- sepal_length float [5.1, 4.9, 7.0, 6.4, 6.3, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.2, 3.3, ...]
- petal_length float [1.4, 1.4, 4.7, 4.5, 6.0, ...]
- petal_width float [0.2, 0.2, 1.4, 1.5, 2.5, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
->
+Here is an example using the Iris dataset, and returning two rows from each group:iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.head(grouped, 2)
+#Explorer.DataFrame<
+ Polars[6 x 5]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.9, 7.0, 6.4, 6.3, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.2, 3.3, ...]
+ petal_length float [1.4, 1.4, 4.7, 4.5, 6.0, ...]
+ petal_width float [0.2, 0.2, 1.4, 1.5, 2.5, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
+>
You can sample N rows:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.sample(df, 3, seed: 100)
-#Explorer.DataFrame<
- Polars[3 x 10]
- year integer [2011, 2012, 2011]
- country string ["SERBIA", "FALKLAND ISLANDS (MALVINAS)", "SWAZILAND"]
- total integer [13422, 15, 286]
- solid_fuel integer [9355, 3, 102]
- liquid_fuel integer [2537, 12, 184]
- gas_fuel integer [1188, 0, 0]
- cement integer [342, 0, 0]
- gas_flaring integer [0, 0, 0]
- per_capita float [1.49, 5.21, 0.24]
- bunker_fuels integer [39, 0, 1]
->
Or you can sample a proportion of rows:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.sample(df, 0.03, seed: 100)
-#Explorer.DataFrame<
- Polars[32 x 10]
- year integer [2011, 2012, 2012, 2013, 2010, ...]
- country string ["URUGUAY", "FRENCH POLYNESIA", "ICELAND", "PERU", "TUNISIA", ...]
- total integer [2117, 222, 491, 15586, 7543, ...]
- solid_fuel integer [1, 0, 96, 784, 15, ...]
- liquid_fuel integer [1943, 222, 395, 7097, 3138, ...]
- gas_fuel integer [40, 0, 0, 3238, 3176, ...]
- cement integer [132, 0, 0, 1432, 1098, ...]
- gas_flaring integer [0, 0, 0, 3036, 116, ...]
- per_capita float [0.63, 0.81, 1.52, 0.51, 0.71, ...]
- bunker_fuels integer [401, 45, 170, 617, 219, ...]
->
You can sample N rows:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.sample(df, 3, seed: 100)
+#Explorer.DataFrame<
+ Polars[3 x 10]
+ year integer [2011, 2012, 2011]
+ country string ["SERBIA", "FALKLAND ISLANDS (MALVINAS)", "SWAZILAND"]
+ total integer [13422, 15, 286]
+ solid_fuel integer [9355, 3, 102]
+ liquid_fuel integer [2537, 12, 184]
+ gas_fuel integer [1188, 0, 0]
+ cement integer [342, 0, 0]
+ gas_flaring integer [0, 0, 0]
+ per_capita float [1.49, 5.21, 0.24]
+ bunker_fuels integer [39, 0, 1]
+>
Or you can sample a proportion of rows:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.sample(df, 0.03, seed: 100)
+#Explorer.DataFrame<
+ Polars[32 x 10]
+ year integer [2011, 2012, 2012, 2013, 2010, ...]
+ country string ["URUGUAY", "FRENCH POLYNESIA", "ICELAND", "PERU", "TUNISIA", ...]
+ total integer [2117, 222, 491, 15586, 7543, ...]
+ solid_fuel integer [1, 0, 96, 784, 15, ...]
+ liquid_fuel integer [1943, 222, 395, 7097, 3138, ...]
+ gas_fuel integer [40, 0, 0, 3238, 3176, ...]
+ cement integer [132, 0, 0, 1432, 1098, ...]
+ gas_flaring integer [0, 0, 0, 3036, 116, ...]
+ per_capita float [0.63, 0.81, 1.52, 0.51, 0.71, ...]
+ bunker_fuels integer [401, 45, 170, 617, 219, ...]
+>
In the following example we have the Iris dataset grouped by species, and we want to take a sample of two plants from each group. Since we have three species, the -resultant dataframe is going to have six rows (2 * 3).
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.sample(grouped, 2, seed: 100)
-#Explorer.DataFrame<
- Polars[6 x 5]
- Groups: ["species"]
- sepal_length float [5.3, 5.1, 5.1, 5.6, 6.2, ...]
- sepal_width float [3.7, 3.8, 2.5, 2.7, 3.4, ...]
- petal_length float [1.5, 1.9, 3.0, 4.2, 5.4, ...]
- petal_width float [0.2, 0.4, 1.1, 1.3, 2.3, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
->
The behaviour is similar when you want to take a fraction of the rows from each group. The main -difference is that each group can have more or less rows, depending on its size.
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.sample(grouped, 0.1, seed: 100)
-#Explorer.DataFrame<
- Polars[15 x 5]
- Groups: ["species"]
- sepal_length float [5.3, 5.1, 4.7, 5.7, 5.1, ...]
- sepal_width float [3.7, 3.8, 3.2, 3.8, 3.5, ...]
- petal_length float [1.5, 1.9, 1.3, 1.7, 1.4, ...]
- petal_width float [0.2, 0.4, 0.2, 0.3, 0.3, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+resultant dataframe is going to have six rows (2 * 3).iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.sample(grouped, 2, seed: 100)
+#Explorer.DataFrame<
+ Polars[6 x 5]
+ Groups: ["species"]
+ sepal_length float [5.3, 5.1, 5.1, 5.6, 6.2, ...]
+ sepal_width float [3.7, 3.8, 2.5, 2.7, 3.4, ...]
+ petal_length float [1.5, 1.9, 3.0, 4.2, 5.4, ...]
+ petal_width float [0.2, 0.4, 1.1, 1.3, 2.3, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
+>
The behaviour is similar when you want to take a fraction of the rows from each group. The main +difference is that each group can have more or less rows, depending on its size.
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.sample(grouped, 0.1, seed: 100)
+#Explorer.DataFrame<
+ Polars[15 x 5]
+ Groups: ["species"]
+ sepal_length float [5.3, 5.1, 4.7, 5.7, 5.1, ...]
+ sepal_width float [3.7, 3.8, 3.2, 3.8, 3.5, ...]
+ petal_length float [1.5, 1.9, 1.3, 1.7, 1.4, ...]
+ petal_width float [0.2, 0.4, 0.2, 0.3, 0.3, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.shuffle(df, seed: 100)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- year integer [2014, 2014, 2014, 2012, 2010, ...]
- country string ["ISRAEL", "ARGENTINA", "NETHERLANDS", "YEMEN", "GRENADA", ...]
- total integer [17617, 55638, 45624, 5091, 71, ...]
- solid_fuel integer [6775, 1588, 9070, 129, 0, ...]
- liquid_fuel integer [6013, 25685, 18272, 4173, 71, ...]
- gas_fuel integer [3930, 26368, 18010, 414, 0, ...]
- cement integer [898, 1551, 272, 375, 0, ...]
- gas_flaring integer [0, 446, 0, 0, 0, ...]
- per_capita float [2.22, 1.29, 2.7, 0.2, 0.68, ...]
- bunker_fuels integer [1011, 2079, 14210, 111, 4, ...]
->
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.shuffle(df, seed: 100)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ year integer [2014, 2014, 2014, 2012, 2010, ...]
+ country string ["ISRAEL", "ARGENTINA", "NETHERLANDS", "YEMEN", "GRENADA", ...]
+ total integer [17617, 55638, 45624, 5091, 71, ...]
+ solid_fuel integer [6775, 1588, 9070, 129, 0, ...]
+ liquid_fuel integer [6013, 25685, 18272, 4173, 71, ...]
+ gas_fuel integer [3930, 26368, 18010, 414, 0, ...]
+ cement integer [898, 1551, 272, 375, 0, ...]
+ gas_flaring integer [0, 446, 0, 0, 0, ...]
+ per_capita float [2.22, 1.29, 2.7, 0.2, 0.68, ...]
+ bunker_fuels integer [1011, 2079, 14210, 111, 4, ...]
+>
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> Explorer.DataFrame.slice(df, [0, 2])
-#Explorer.DataFrame<
- Polars[2 x 2]
- a integer [1, 3]
- b string ["a", "c"]
->
With a series
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> Explorer.DataFrame.slice(df, Explorer.Series.from_list([0, 2]))
-#Explorer.DataFrame<
- Polars[2 x 2]
- a integer [1, 3]
- b string ["a", "c"]
->
With a range:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> Explorer.DataFrame.slice(df, 1..2)
-#Explorer.DataFrame<
- Polars[2 x 2]
- a integer [2, 3]
- b string ["b", "c"]
->
With a range with negative first and last:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> Explorer.DataFrame.slice(df, -2..-1)
-#Explorer.DataFrame<
- Polars[2 x 2]
- a integer [2, 3]
- b string ["b", "c"]
->
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> Explorer.DataFrame.slice(df, [0, 2])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ a integer [1, 3]
+ b string ["a", "c"]
+>
With a series
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> Explorer.DataFrame.slice(df, Explorer.Series.from_list([0, 2]))
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ a integer [1, 3]
+ b string ["a", "c"]
+>
With a range:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> Explorer.DataFrame.slice(df, 1..2)
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ a integer [2, 3]
+ b string ["b", "c"]
+>
With a range with negative first and last:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> Explorer.DataFrame.slice(df, -2..-1)
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ a integer [2, 3]
+ b string ["b", "c"]
+>
We are going to once again use the Iris dataset. In this example we want to take elements at indexes -0 and 2:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.slice(grouped, [0, 2])
-#Explorer.DataFrame<
- Polars[6 x 5]
- Groups: ["species"]
- sepal_length float [5.1, 4.7, 7.0, 6.9, 6.3, ...]
- sepal_width float [3.5, 3.2, 3.2, 3.1, 3.3, ...]
- petal_length float [1.4, 1.3, 4.7, 4.9, 6.0, ...]
- petal_width float [0.2, 0.2, 1.4, 1.5, 2.5, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
->
Now we want to take the first 3 rows of each group.
-This is going to work with the range 0..2
:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.slice(grouped, 0..2)
-#Explorer.DataFrame<
- Polars[9 x 5]
- Groups: ["species"]
- sepal_length float [5.1, 4.9, 4.7, 7.0, 6.4, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.2, 3.2, ...]
- petal_length float [1.4, 1.4, 1.3, 4.7, 4.5, ...]
- petal_width float [0.2, 0.2, 0.2, 1.4, 1.5, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", ...]
->
+0 and 2:iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.slice(grouped, [0, 2])
+#Explorer.DataFrame<
+ Polars[6 x 5]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.7, 7.0, 6.9, 6.3, ...]
+ sepal_width float [3.5, 3.2, 3.2, 3.1, 3.3, ...]
+ petal_length float [1.4, 1.3, 4.7, 4.9, 6.0, ...]
+ petal_width float [0.2, 0.2, 1.4, 1.5, 2.5, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
+>
Now we want to take the first 3 rows of each group.
+This is going to work with the range 0..2
:
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.slice(grouped, 0..2)
+#Explorer.DataFrame<
+ Polars[9 x 5]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.9, 4.7, 7.0, 6.4, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.2, 3.2, ...]
+ petal_length float [1.4, 1.4, 1.3, 4.7, 4.5, ...]
+ petal_width float [0.2, 0.2, 0.2, 1.4, 1.5, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", ...]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.slice(df, 1, 2)
-#Explorer.DataFrame<
- Polars[2 x 10]
- year integer [2010, 2010]
- country string ["ALBANIA", "ALGERIA"]
- total integer [1254, 32500]
- solid_fuel integer [117, 332]
- liquid_fuel integer [953, 12381]
- gas_fuel integer [7, 14565]
- cement integer [177, 2598]
- gas_flaring integer [0, 2623]
- per_capita float [0.43, 0.9]
- bunker_fuels integer [7, 663]
->
Negative offsets count from the end of the series:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.slice(df, -10, 2)
-#Explorer.DataFrame<
- Polars[2 x 10]
- year integer [2014, 2014]
- country string ["UNITED STATES OF AMERICA", "URUGUAY"]
- total integer [1432855, 1840]
- solid_fuel integer [450047, 2]
- liquid_fuel integer [576531, 1700]
- gas_fuel integer [390719, 25]
- cement integer [11314, 112]
- gas_flaring integer [4244, 0]
- per_capita float [4.43, 0.54]
- bunker_fuels integer [30722, 251]
->
If the length would run past the end of the dataframe, the result may be shorter than the length:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.slice(df, -10, 20)
-#Explorer.DataFrame<
- Polars[10 x 10]
- year integer [2014, 2014, 2014, 2014, 2014, ...]
- country string ["UNITED STATES OF AMERICA", "URUGUAY", "UZBEKISTAN", "VANUATU", "VENEZUELA", ...]
- total integer [1432855, 1840, 28692, 42, 50510, ...]
- solid_fuel integer [450047, 2, 1677, 0, 204, ...]
- liquid_fuel integer [576531, 1700, 2086, 42, 28445, ...]
- gas_fuel integer [390719, 25, 23929, 0, 12731, ...]
- cement integer [11314, 112, 1000, 0, 1088, ...]
- gas_flaring integer [4244, 0, 0, 0, 8042, ...]
- per_capita float [4.43, 0.54, 0.97, 0.16, 1.65, ...]
- bunker_fuels integer [30722, 251, 0, 10, 1256, ...]
->
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.slice(df, 1, 2)
+#Explorer.DataFrame<
+ Polars[2 x 10]
+ year integer [2010, 2010]
+ country string ["ALBANIA", "ALGERIA"]
+ total integer [1254, 32500]
+ solid_fuel integer [117, 332]
+ liquid_fuel integer [953, 12381]
+ gas_fuel integer [7, 14565]
+ cement integer [177, 2598]
+ gas_flaring integer [0, 2623]
+ per_capita float [0.43, 0.9]
+ bunker_fuels integer [7, 663]
+>
Negative offsets count from the end of the series:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.slice(df, -10, 2)
+#Explorer.DataFrame<
+ Polars[2 x 10]
+ year integer [2014, 2014]
+ country string ["UNITED STATES OF AMERICA", "URUGUAY"]
+ total integer [1432855, 1840]
+ solid_fuel integer [450047, 2]
+ liquid_fuel integer [576531, 1700]
+ gas_fuel integer [390719, 25]
+ cement integer [11314, 112]
+ gas_flaring integer [4244, 0]
+ per_capita float [4.43, 0.54]
+ bunker_fuels integer [30722, 251]
+>
If the length would run past the end of the dataframe, the result may be shorter than the length:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.slice(df, -10, 20)
+#Explorer.DataFrame<
+ Polars[10 x 10]
+ year integer [2014, 2014, 2014, 2014, 2014, ...]
+ country string ["UNITED STATES OF AMERICA", "URUGUAY", "UZBEKISTAN", "VANUATU", "VENEZUELA", ...]
+ total integer [1432855, 1840, 28692, 42, 50510, ...]
+ solid_fuel integer [450047, 2, 1677, 0, 204, ...]
+ liquid_fuel integer [576531, 1700, 2086, 42, 28445, ...]
+ gas_fuel integer [390719, 25, 23929, 0, 12731, ...]
+ cement integer [11314, 112, 1000, 0, 1088, ...]
+ gas_flaring integer [4244, 0, 0, 0, 8042, ...]
+ per_capita float [4.43, 0.54, 0.97, 0.16, 1.65, ...]
+ bunker_fuels integer [30722, 251, 0, 10, 1256, ...]
+>
We want to take the first 3 rows of each group. We need the offset 0 and the length 3:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.slice(grouped, 0, 3)
-#Explorer.DataFrame<
- Polars[9 x 5]
- Groups: ["species"]
- sepal_length float [5.1, 4.9, 4.7, 7.0, 6.4, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.2, 3.2, ...]
- petal_length float [1.4, 1.4, 1.3, 4.7, 4.5, ...]
- petal_width float [0.2, 0.2, 0.2, 1.4, 1.5, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", ...]
->
We can also pass a negative offset:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.slice(grouped, -6, 3)
-#Explorer.DataFrame<
- Polars[9 x 5]
- Groups: ["species"]
- sepal_length float [5.1, 4.8, 5.1, 5.6, 5.7, ...]
- sepal_width float [3.8, 3.0, 3.8, 2.7, 3.0, ...]
- petal_length float [1.9, 1.4, 1.6, 4.2, 4.2, ...]
- petal_width float [0.4, 0.3, 0.2, 1.3, 1.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", ...]
->
+We want to take the first 3 rows of each group. We need the offset 0 and the length 3:
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.slice(grouped, 0, 3)
+#Explorer.DataFrame<
+ Polars[9 x 5]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.9, 4.7, 7.0, 6.4, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.2, 3.2, ...]
+ petal_length float [1.4, 1.4, 1.3, 4.7, 4.5, ...]
+ petal_width float [0.2, 0.2, 0.2, 1.4, 1.5, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", ...]
+>
We can also pass a negative offset:
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.slice(grouped, -6, 3)
+#Explorer.DataFrame<
+ Polars[9 x 5]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.8, 5.1, 5.6, 5.7, ...]
+ sepal_width float [3.8, 3.0, 3.8, 2.7, 3.0, ...]
+ petal_length float [1.9, 1.4, 1.6, 4.2, 4.2, ...]
+ petal_width float [0.4, 0.3, 0.2, 1.3, 1.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", ...]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.tail(df)
-#Explorer.DataFrame<
- Polars[5 x 10]
- year integer [2014, 2014, 2014, 2014, 2014]
- country string ["VIET NAM", "WALLIS AND FUTUNA ISLANDS", "YEMEN", "ZAMBIA", "ZIMBABWE"]
- total integer [45517, 6, 6190, 1228, 3278]
- solid_fuel integer [19246, 0, 137, 132, 2097]
- liquid_fuel integer [12694, 6, 5090, 797, 1005]
- gas_fuel integer [5349, 0, 581, 0, 0]
- cement integer [8229, 0, 381, 299, 177]
- gas_flaring integer [0, 0, 0, 0, 0]
- per_capita float [0.49, 0.44, 0.24, 0.08, 0.22]
- bunker_fuels integer [761, 1, 153, 33, 9]
->
-
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.tail(df, 2)
-#Explorer.DataFrame<
- Polars[2 x 10]
- year integer [2014, 2014]
- country string ["ZAMBIA", "ZIMBABWE"]
- total integer [1228, 3278]
- solid_fuel integer [132, 2097]
- liquid_fuel integer [797, 1005]
- gas_fuel integer [0, 0]
- cement integer [299, 177]
- gas_flaring integer [0, 0]
- per_capita float [0.08, 0.22]
- bunker_fuels integer [33, 9]
->
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.tail(df)
+#Explorer.DataFrame<
+ Polars[5 x 10]
+ year integer [2014, 2014, 2014, 2014, 2014]
+ country string ["VIET NAM", "WALLIS AND FUTUNA ISLANDS", "YEMEN", "ZAMBIA", "ZIMBABWE"]
+ total integer [45517, 6, 6190, 1228, 3278]
+ solid_fuel integer [19246, 0, 137, 132, 2097]
+ liquid_fuel integer [12694, 6, 5090, 797, 1005]
+ gas_fuel integer [5349, 0, 581, 0, 0]
+ cement integer [8229, 0, 381, 299, 177]
+ gas_flaring integer [0, 0, 0, 0, 0]
+ per_capita float [0.49, 0.44, 0.24, 0.08, 0.22]
+ bunker_fuels integer [761, 1, 153, 33, 9]
+>
+
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.tail(df, 2)
+#Explorer.DataFrame<
+ Polars[2 x 10]
+ year integer [2014, 2014]
+ country string ["ZAMBIA", "ZIMBABWE"]
+ total integer [1228, 3278]
+ solid_fuel integer [132, 2097]
+ liquid_fuel integer [797, 1005]
+ gas_fuel integer [0, 0]
+ cement integer [299, 177]
+ gas_flaring integer [0, 0]
+ per_capita float [0.08, 0.22]
+ bunker_fuels integer [33, 9]
+>
Using grouped dataframes makes tail/2
return n rows from each group.
-Here is an example using the Iris dataset, and returning two rows from each group:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.tail(grouped, 2)
-#Explorer.DataFrame<
- Polars[6 x 5]
- Groups: ["species"]
- sepal_length float [5.3, 5.0, 5.1, 5.7, 6.2, ...]
- sepal_width float [3.7, 3.3, 2.5, 2.8, 3.4, ...]
- petal_length float [1.5, 1.4, 3.0, 4.1, 5.4, ...]
- petal_width float [0.2, 0.2, 1.1, 1.3, 2.3, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
->
+Here is an example using the Iris dataset, and returning two rows from each group:iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.tail(grouped, 2)
+#Explorer.DataFrame<
+ Polars[6 x 5]
+ Groups: ["species"]
+ sepal_length float [5.3, 5.0, 5.1, 5.7, 6.2, ...]
+ sepal_width float [3.7, 3.3, 2.5, 2.8, 3.4, ...]
+ petal_length float [1.5, 1.4, 3.0, 4.1, 5.4, ...]
+ petal_width float [0.2, 0.2, 1.1, 1.3, 2.3, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
+>
iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, 2])
-iex> Explorer.DataFrame.dtypes(df)
-%{"floats" => :float, "ints" => :integer}
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, 2])
+iex> Explorer.DataFrame.dtypes(df)
+%{"floats" => :float, "ints" => :integer}
iex> df = Explorer.Datasets.fossil_fuels()
-iex> df = Explorer.DataFrame.group_by(df, "country")
-iex> Explorer.DataFrame.groups(df)
-["country"]
-
-iex> df = Explorer.Datasets.iris()
-iex> Explorer.DataFrame.groups(df)
-[]
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> df = Explorer.DataFrame.group_by(df, "country")
+iex> Explorer.DataFrame.groups(df)
+["country"]
+
+iex> df = Explorer.Datasets.iris()
+iex> Explorer.DataFrame.groups(df)
+[]
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.n_columns(df)
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.n_columns(df)
10
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.n_rows(df)
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.n_rows(df)
1094
@@ -5015,9 +5015,9 @@ names(df)
Examples
-iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, 2])
-iex> Explorer.DataFrame.names(df)
-["floats", "ints"]
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, 2])
+iex> Explorer.DataFrame.names(df)
+["floats", "ints"]
@@ -5052,9 +5052,9 @@ shape(df)
Examples
-iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0, 3.0], ints: [1, 2, 3])
-iex> Explorer.DataFrame.shape(df)
-{3, 2}
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0, 3.0], ints: [1, 2, 3])
+iex> Explorer.DataFrame.shape(df)
+{3, 2}
@@ -5147,9 +5147,9 @@ dump_csv(df, opts \\ [])
Examples
-iex> df = Explorer.Datasets.fossil_fuels() |> Explorer.DataFrame.head(2)
-iex> Explorer.DataFrame.dump_csv(df)
-{:ok, "year,country,total,solid_fuel,liquid_fuel,gas_fuel,cement,gas_flaring,per_capita,bunker_fuels\n2010,AFGHANISTAN,2308,627,1601,74,5,0,0.08,9\n2010,ALBANIA,1254,117,953,7,177,0,0.43,7\n"}
+iex> df = Explorer.Datasets.fossil_fuels() |> Explorer.DataFrame.head(2)
+iex> Explorer.DataFrame.dump_csv(df)
+{:ok, "year,country,total,solid_fuel,liquid_fuel,gas_fuel,cement,gas_flaring,per_capita,bunker_fuels\n2010,AFGHANISTAN,2308,627,1601,74,5,0,0.08,9\n2010,ALBANIA,1254,117,953,7,177,0,0.43,7\n"}
@@ -5348,9 +5348,9 @@ dump_ndjson(df)
Examples
-iex> df = Explorer.DataFrame.new(col_a: [1, 2], col_b: [5.1, 5.2])
-iex> Explorer.DataFrame.dump_ndjson(df)
-{:ok, ~s({"col_a":1,"col_b":5.1}\n{"col_a":2,"col_b":5.2}\n)}
+iex> df = Explorer.DataFrame.new(col_a: [1, 2], col_b: [5.1, 5.2])
+iex> Explorer.DataFrame.dump_ndjson(df)
+{:ok, ~s({"col_a":1,"col_b":5.1}\n{"col_a":2,"col_b":5.2}\n)}
@@ -5841,24 +5841,24 @@ from_query(conn, query, params, opts \\ [])
In order to read data from a database, you must list :adbc
as a dependency,
download the relevant driver, and start both database and connection processes
-in your supervision tree.
First, add :adbc
as a dependency in your mix.exs
:
{:adbc, "~> 0.1"}
Now, in your config/config.exs, configure the drivers you are going to use
-(see Adbc
module docs for more information on supported drivers):
config :adbc, :drivers, [:sqlite]
If you are using a notebook or scripting, you can also use Adbc.download_driver!/1
+in your supervision tree.
First, add :adbc
as a dependency in your mix.exs
:
{:adbc, "~> 0.1"}
Now, in your config/config.exs, configure the drivers you are going to use
+(see Adbc
module docs for more information on supported drivers):
config :adbc, :drivers, [:sqlite]
If you are using a notebook or scripting, you can also use Adbc.download_driver!/1
to dynamically download one.
Then start the database and the relevant connection processes in your
-supervision tree:
children = [
- {Adbc.Database,
+supervision tree:children = [
+ {Adbc.Database,
driver: :sqlite,
- process_options: [name: MyApp.DB]},
- {Adbc.Connection,
+ process_options: [name: MyApp.DB]},
+ {Adbc.Connection,
database: MyApp.DB,
- process_options: [name: MyApp.Conn]}
-]
+ process_options: [name: MyApp.Conn]}
+]
-Supervisor.start_link(children, strategy: :one_for_one)
In a notebook, the above would look like this:
db = Kino.start_child!({Adbc.Database, driver: :sqlite})
-conn = Kino.start_child!({Adbc.Connection, database: db})
And now you can make queries with:
# For named connections
-{:ok, _} = Explorer.DataFrame.from_query(MyApp.Conn, "SELECT 123")
+Supervisor.start_link(children, strategy: :one_for_one)
In a notebook, the above would look like this:
db = Kino.start_child!({Adbc.Database, driver: :sqlite})
+conn = Kino.start_child!({Adbc.Connection, database: db})
And now you can make queries with:
# For named connections
+{:ok, _} = Explorer.DataFrame.from_query(MyApp.Conn, "SELECT 123")
# When using the conn PID directly
-{:ok, _} = Explorer.DataFrame.from_query(conn, "SELECT 123")
+
{:ok, _} = Explorer.DataFrame.from_query(conn, "SELECT 123")
Options
@@ -6178,12 +6178,12 @@ load_ndjson!(contents, opts \\ [])
iex> contents = ~s({"col_a":1,"col_b":5.1}\n{"col_a":2,"col_b":5.2}\n)
-iex> Explorer.DataFrame.load_ndjson!(contents)
-#Explorer.DataFrame<
- Polars[2 x 2]
- col_a integer [1, 2]
- col_b float [5.1, 5.2]
->
+ iex> Explorer.DataFrame.load_ndjson!(contents)
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ col_a integer [1, 2]
+ col_b float [5.1, 5.2]
+>
priv
directory and load them yourself.
-For example:Explorer.DataFrame.from_csv!(Application.app_dir(:my_app, "priv/iris.csv"))
+For example:Explorer.DataFrame.from_csv!(Application.app_dir(:my_app, "priv/iris.csv"))
Fisher,R. A.. (1988). Iris. UCI Machine Learning Repository. https://doi.org/10.24432/C56C76.
+Fisher,R. A.. (1988). Iris. UCI Machine Learning Repository. https://doi.org/10.24432/C56C76.
Aeberhard,Stefan and Forina,M.. (1991). Wine. UCI Machine Learning Repository. https://doi.org/10.24432/C5PC7J.
+Aeberhard,Stefan and Forina,M.. (1991). Wine. UCI Machine Learning Repository. https://doi.org/10.24432/C5PC7J.
Explorer.DataFrame
to DF
as shown below:alias Explorer.DataFrame, as: DF
Queries convert regular Elixir code which compile to efficient
dataframes operations. Inside a query, only the limited set of
Series operations are available and identifiers, such as strs
-and nums
, represent dataframe column names:
iex> df = DF.new(strs: ["a", "b", "c"], nums: [1, 2, 3])
-iex> DF.filter(df, nums > 2)
-#Explorer.DataFrame<
- Polars[1 x 2]
- strs string ["c"]
- nums integer [3]
->
If a column has unusual format, you can either rename it before-hand,
-or use col/1
inside queries:
iex> df = DF.new("unusual nums": [1, 2, 3])
-iex> DF.filter(df, col("unusual nums") > 2)
-#Explorer.DataFrame<
- Polars[1 x 1]
- unusual nums integer [3]
->
All operations from Explorer.Series
are imported inside queries.
+and nums
, represent dataframe column names:
iex> df = DF.new(strs: ["a", "b", "c"], nums: [1, 2, 3])
+iex> DF.filter(df, nums > 2)
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ strs string ["c"]
+ nums integer [3]
+>
If a column has unusual format, you can either rename it before-hand,
+or use col/1
inside queries:
iex> df = DF.new("unusual nums": [1, 2, 3])
+iex> DF.filter(df, col("unusual nums") > 2)
+#Explorer.DataFrame<
+ Polars[1 x 1]
+ unusual nums integer [3]
+>
All operations from Explorer.Series
are imported inside queries.
This module also provides operators to use in queries, which are
also imported into queries.
If you want to access variables defined outside of the query
or get access to all Elixir constructs, you must use ^
:
iex> min = 2
-iex> df = DF.new(strs: ["a", "b", "c"], nums: [1, 2, 3])
-iex> DF.filter(df, nums > ^min)
-#Explorer.DataFrame<
- Polars[1 x 2]
- strs string ["c"]
- nums integer [3]
->
+iex> df = DF.new(strs: ["a", "b", "c"], nums: [1, 2, 3])
+iex> DF.filter(df, nums > ^min)
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ strs string ["c"]
+ nums integer [3]
+>
iex> min = 2
-iex> df = DF.new(strs: ["a", "b", "c"], nums: [1, 2, 3])
-iex> DF.filter(df, nums < ^if(min > 0, do: 10, else: -10))
-#Explorer.DataFrame<
- Polars[3 x 2]
- strs string ["a", "b", "c"]
- nums integer [1, 2, 3]
->
^
can be used with col
to access columns dynamically:
iex> df = DF.new("unusual nums": [1, 2, 3])
+iex> df = DF.new(strs: ["a", "b", "c"], nums: [1, 2, 3])
+iex> DF.filter(df, nums < ^if(min > 0, do: 10, else: -10))
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ strs string ["a", "b", "c"]
+ nums integer [1, 2, 3]
+>
^
can be used with col
to access columns dynamically:
iex> df = DF.new("unusual nums": [1, 2, 3])
iex> name = "unusual nums"
-iex> DF.filter(df, col(^name) > 2)
-#Explorer.DataFrame<
- Polars[1 x 1]
- unusual nums integer [3]
->
Explorer.Query
leverages the power behind Elixir for-comprehensions
to provide a powerful syntax for traversing several columns in a dataframe
at once. For example, imagine you want to standardization the data on the
-iris dataset, you could write this:
iex> iris = Explorer.Datasets.iris()
-iex> DF.mutate(iris,
-...> sepal_width: (sepal_width - mean(sepal_width)) / variance(sepal_width),
-...> sepal_length: (sepal_length - mean(sepal_length)) / variance(sepal_length),
-...> petal_length: (petal_length - mean(petal_length)) / variance(petal_length),
-...> petal_width: (petal_width - mean(petal_width)) / variance(petal_width)
-...> )
-#Explorer.DataFrame<
- Polars[150 x 5]
- sepal_length float [-1.0840606189132314, -1.3757361217598396, -1.6674116246064494, -1.8132493760297548, -1.2298983703365356, ...]
- sepal_width float [2.372289612531505, -0.28722789030650403, 0.7765791108287006, 0.24467561026109824, 2.904193113099107, ...]
- petal_length float [-0.7576391687443842, -0.7576391687443842, -0.7897606710936372, -0.725517666395131, -0.7576391687443842, ...]
- petal_width float [-1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
While the code above does its job, it is quite repetitive. With across and for-comprehensions, -we could instead write:
iex> iris = Explorer.Datasets.iris()
-iex> DF.mutate(iris,
-...> for col <- across(["sepal_width", "sepal_length", "petal_length", "petal_width"]) do
-...> {col.name, (col - mean(col)) / variance(col)}
-...> end
-...> )
-#Explorer.DataFrame<
- Polars[150 x 5]
- sepal_length float [-1.0840606189132314, -1.3757361217598396, -1.6674116246064494, -1.8132493760297548, -1.2298983703365356, ...]
- sepal_width float [2.372289612531505, -0.28722789030650403, 0.7765791108287006, 0.24467561026109824, 2.904193113099107, ...]
- petal_length float [-0.7576391687443842, -0.7576391687443842, -0.7897606710936372, -0.725517666395131, -0.7576391687443842, ...]
- petal_width float [-1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
Which achieves the same result in a more concise and maintainable way. +iris dataset, you could write this:
iex> iris = Explorer.Datasets.iris()
+iex> DF.mutate(iris,
+...> sepal_width: (sepal_width - mean(sepal_width)) / variance(sepal_width),
+...> sepal_length: (sepal_length - mean(sepal_length)) / variance(sepal_length),
+...> petal_length: (petal_length - mean(petal_length)) / variance(petal_length),
+...> petal_width: (petal_width - mean(petal_width)) / variance(petal_width)
+...> )
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ sepal_length float [-1.0840606189132314, -1.3757361217598396, -1.6674116246064494, -1.8132493760297548, -1.2298983703365356, ...]
+ sepal_width float [2.372289612531505, -0.28722789030650403, 0.7765791108287006, 0.24467561026109824, 2.904193113099107, ...]
+ petal_length float [-0.7576391687443842, -0.7576391687443842, -0.7897606710936372, -0.725517666395131, -0.7576391687443842, ...]
+ petal_width float [-1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
While the code above does its job, it is quite repetitive. With across and for-comprehensions, +we could instead write:
iex> iris = Explorer.Datasets.iris()
+iex> DF.mutate(iris,
+...> for col <- across(["sepal_width", "sepal_length", "petal_length", "petal_width"]) do
+...> {col.name, (col - mean(col)) / variance(col)}
+...> end
+...> )
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ sepal_length float [-1.0840606189132314, -1.3757361217598396, -1.6674116246064494, -1.8132493760297548, -1.2298983703365356, ...]
+ sepal_width float [2.372289612531505, -0.28722789030650403, 0.7765791108287006, 0.24467561026109824, 2.904193113099107, ...]
+ petal_length float [-0.7576391687443842, -0.7576391687443842, -0.7897606710936372, -0.725517666395131, -0.7576391687443842, ...]
+ petal_width float [-1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
Which achieves the same result in a more concise and maintainable way.
across/1
may receive any of the following input as arguments:
a list of columns indexes or names as atoms and strings
a range
a regex that keeps only the names matching the regex
For example, since we know the width and length columns are the first four, -we could also have written (remember ranges in Elixir are inclusive):
DF.mutate(iris,
- for col <- across(0..3) do
- {col.name, (col - mean(col)) / variance(col)}
- end
-)
Or using a regex:
DF.mutate(iris,
- for col <- across(~r/(sepal|petal)_(length|width)/) do
- {col.name, (col - mean(col)) / variance(col)}
- end
-)
For those new to Elixir, for-comprehensions have the following format:
for PATTERN <- GENERATOR, FILTER do
+we could also have written (remember ranges in Elixir are inclusive):DF.mutate(iris,
+ for col <- across(0..3) do
+ {col.name, (col - mean(col)) / variance(col)}
+ end
+)
Or using a regex:
DF.mutate(iris,
+ for col <- across(~r/(sepal|petal)_(length|width)/) do
+ {col.name, (col - mean(col)) / variance(col)}
+ end
+)
For those new to Elixir, for-comprehensions have the following format:
for PATTERN <- GENERATOR, FILTER do
EXPR
-end
A comprehension filter is a mechanism that allows us to keep only columns
+
end
A comprehension filter is a mechanism that allows us to keep only columns
based on additional properties, such as its dtype
. A for-comprehension can
have multiple generators and filters. For instance, if you want to apply
standardization to all float columns, we can use across/0
to access all
-columns and then use a filter to keep only the float ones:
iex> iris = Explorer.Datasets.iris()
-iex> DF.mutate(iris,
-...> for col <- across(), col.dtype == :float do
-...> {col.name, (col - mean(col)) / variance(col)}
-...> end
-...> )
-#Explorer.DataFrame<
- Polars[150 x 5]
- sepal_length float [-1.0840606189132314, -1.3757361217598396, -1.6674116246064494, -1.8132493760297548, -1.2298983703365356, ...]
- sepal_width float [2.372289612531505, -0.28722789030650403, 0.7765791108287006, 0.24467561026109824, 2.904193113099107, ...]
- petal_length float [-0.7576391687443842, -0.7576391687443842, -0.7897606710936372, -0.725517666395131, -0.7576391687443842, ...]
- petal_width float [-1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
For-comprehensions works with all dataframe verbs. As we have seen +columns and then use a filter to keep only the float ones:
iex> iris = Explorer.Datasets.iris()
+iex> DF.mutate(iris,
+...> for col <- across(), col.dtype == :float do
+...> {col.name, (col - mean(col)) / variance(col)}
+...> end
+...> )
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ sepal_length float [-1.0840606189132314, -1.3757361217598396, -1.6674116246064494, -1.8132493760297548, -1.2298983703365356, ...]
+ sepal_width float [2.372289612531505, -0.28722789030650403, 0.7765791108287006, 0.24467561026109824, 2.904193113099107, ...]
+ petal_length float [-0.7576391687443842, -0.7576391687443842, -0.7897606710936372, -0.725517666395131, -0.7576391687443842, ...]
+ petal_width float [-1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
For-comprehensions works with all dataframe verbs. As we have seen
above, for mutations we must return tuples as pair with the mutation
name and its value. summarise
works similarly. Note in both cases
the name could also be generated dynamically. For example, to compute
-the mean per species, you could write:
iex> Explorer.Datasets.iris()
-...> |> DF.group_by("species")
-...> |> DF.summarise(
-...> for col <- across(), col.dtype == :float do
-...> {"#{col.name}_mean", mean(col)}
-...> end
-...> )
-#Explorer.DataFrame<
- Polars[3 x 5]
- species string ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
- sepal_length_mean float [5.005999999999999, 5.936, 6.587999999999998]
- sepal_width_mean float [3.4180000000000006, 2.7700000000000005, 2.9739999999999998]
- petal_length_mean float [1.464, 4.26, 5.552]
- petal_width_mean float [0.2439999999999999, 1.3259999999999998, 2.026]
->
arrange
expects a list of columns to sort by, while for-comprehensions
+the mean per species, you could write:
iex> Explorer.Datasets.iris()
+...> |> DF.group_by("species")
+...> |> DF.summarise(
+...> for col <- across(), col.dtype == :float do
+...> {"#{col.name}_mean", mean(col)}
+...> end
+...> )
+#Explorer.DataFrame<
+ Polars[3 x 5]
+ species string ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
+ sepal_length_mean float [5.005999999999999, 5.936, 6.587999999999998]
+ sepal_width_mean float [3.4180000000000006, 2.7700000000000005, 2.9739999999999998]
+ petal_length_mean float [1.464, 4.26, 5.552]
+ petal_width_mean float [0.2439999999999999, 1.3259999999999998, 2.026]
+>
arrange
expects a list of columns to sort by, while for-comprehensions
in filter
generate a list of conditions, which are joined using and
.
For example, to filter all entries have both sepal and petal length above
-average, using a filter on the column name, one could write:
iex> iris = Explorer.Datasets.iris()
-iex> DF.filter(iris,
-...> for col <- across(), String.ends_with?(col.name, "_length") do
-...> col > mean(col)
-...> end
-...> )
-#Explorer.DataFrame<
- Polars[70 x 5]
- sepal_length float [7.0, 6.4, 6.9, 6.5, 6.3, ...]
- sepal_width float [3.2, 3.2, 3.1, 2.8, 3.3, ...]
- petal_length float [4.7, 4.5, 4.9, 4.6, 4.7, ...]
- petal_width float [1.4, 1.5, 1.5, 1.5, 1.6, ...]
- species string ["Iris-versicolor", "Iris-versicolor", "Iris-versicolor", "Iris-versicolor", "Iris-versicolor", ...]
->
Do not mix comprehension and queries
The filter inside a for-comprehension works at the meta level: +average, using a filter on the column name, one could write:
iex> iris = Explorer.Datasets.iris() +iex> DF.filter(iris, +...> for col <- across(), String.ends_with?(col.name, "_length") do +...> col > mean(col) +...> end +...> ) +#Explorer.DataFrame< + Polars[70 x 5] + sepal_length float [7.0, 6.4, 6.9, 6.5, 6.3, ...] + sepal_width float [3.2, 3.2, 3.1, 2.8, 3.3, ...] + petal_length float [4.7, 4.5, 4.9, 4.6, 4.7, ...] + petal_width float [1.4, 1.5, 1.5, 1.5, 1.6, ...] + species string ["Iris-versicolor", "Iris-versicolor", "Iris-versicolor", "Iris-versicolor", "Iris-versicolor", ...] +>
Do not mix comprehension and queries
The filter inside a for-comprehension works at the meta level: it can only filter columns based on their names and dtypes, but not on their values. For example, this code does not make any -sense and it will fail to compile:
|> DF.filter( - for col <- across(), col > mean(col) do +sense and it will fail to compile:
|> DF.filter( + for col <- across(), col > mean(col) do col - end -end)
Another way to think about it, the comprehensions traverse on the +
end +end)Another way to think about it, the comprehensions traverse on the columns themselves, the contents inside the comprehension do-block traverse on the values inside the columns.
@@ -281,7 +281,7 @@
Queries simply become lazy dataframe operations at runtime. -For example, the following query
Explorer.DataFrame.filter(df, nums > 2)
is equivalent to
Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.greater(df["nums"], 2) end)
This means that, whenever you want to generate queries programatically, +For example, the following query
Explorer.DataFrame.filter(df, nums > 2)
is equivalent to
Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.greater(df["nums"], 2) end)
This means that, whenever you want to generate queries programatically, you can fallback to the regular
@@ -738,9 +738,9 @@_with
APIs.left <> right
Examples
DF.mutate(df, name: first_name <> " " <> last_name)
If you want to convert concatenate non-string +
DF.mutate(df, name: first_name <> " " <> last_name)
If you want to convert concatenate non-string series, you can explicitly cast them to string -before:
DF.mutate(df, name: cast(year, :string) <> "-" <> cast(month, :string))
Or use format:
DF.mutate(df, name: format([year, "-", month]))
+before:DF.mutate(df, name: cast(year, :string) <> "-" <> cast(month, :string))
Or use format:
DF.mutate(df, name: format([year, "-", month]))
Accesses a column by name.
If your column name contains whitespace or start with uppercase letters, you can still access its name by -using this macro:
iex> df = Explorer.DataFrame.new("unusual nums": [1, 2, 3])
-iex> Explorer.DataFrame.filter(df, col("unusual nums") > 2)
-#Explorer.DataFrame<
- Polars[1 x 1]
- unusual nums integer [3]
->
name
must be an atom, a string, or an integer.
+using this macro:
iex> df = Explorer.DataFrame.new("unusual nums": [1, 2, 3])
+iex> Explorer.DataFrame.filter(df, col("unusual nums") > 2)
+#Explorer.DataFrame<
+ Polars[1 x 1]
+ unusual nums integer [3]
+>
name
must be an atom, a string, or an integer.
It is equivalent to df[name]
but inside a query.
This can also be used if you want to access a column -programatically, for example:
iex> df = Explorer.DataFrame.new(nums: [1, 2, 3])
+programatically, for example:iex> df = Explorer.DataFrame.new(nums: [1, 2, 3])
iex> name = :nums
-iex> Explorer.DataFrame.filter(df, col(^name) > 2)
-#Explorer.DataFrame<
- Polars[1 x 1]
- nums integer [3]
->
For traversing multiple columns programatically,
+
iex> Explorer.DataFrame.filter(df, col(^name) > 2)
+#Explorer.DataFrame<
+ Polars[1 x 1]
+ nums integer [3]
+>
For traversing multiple columns programatically,
see across/0
and across/1
.
Series can be created using from_list/2
, from_binary/3
, and friends:
Series can be made of numbers:
iex> Explorer.Series.from_list([1, 2, 3])
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
Series are nullable, so you may also include nils:
iex> Explorer.Series.from_list([1.0, nil, 2.5, 3.1])
-#Explorer.Series<
- Polars[4]
- float [1.0, nil, 2.5, 3.1]
->
Any of the dtypes above are supported, such as strings:
iex> Explorer.Series.from_list(["foo", "bar", "baz"])
-#Explorer.Series<
- Polars[3]
- string ["foo", "bar", "baz"]
->
+Series can be created using from_list/2
, from_binary/3
, and friends:
Series can be made of numbers:
iex> Explorer.Series.from_list([1, 2, 3])
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
Series are nullable, so you may also include nils:
iex> Explorer.Series.from_list([1.0, nil, 2.5, 3.1])
+#Explorer.Series<
+ Polars[4]
+ float [1.0, nil, 2.5, 3.1]
+>
Any of the dtypes above are supported, such as strings:
iex> Explorer.Series.from_list(["foo", "bar", "baz"])
+#Explorer.Series<
+ Polars[3]
+ string ["foo", "bar", "baz"]
+>
@@ -1754,36 +1754,36 @@ Integers and floats follow their native encoding:
iex> Explorer.Series.from_binary(<<1.0::float-64-native, 2.0::float-64-native>>, :float)
-#Explorer.Series<
- Polars[2]
- float [1.0, 2.0]
->
-
-iex> Explorer.Series.from_binary(<<-1::signed-64-native, 1::signed-64-native>>, :integer)
-#Explorer.Series<
- Polars[2]
- integer [-1, 1]
->
Booleans are unsigned integers:
iex> Explorer.Series.from_binary(<<1, 0, 1>>, :boolean)
-#Explorer.Series<
- Polars[3]
- boolean [true, false, true]
->
Dates are encoded as i32 representing days from the Unix epoch (1970-01-01):
iex> binary = <<-719162::signed-32-native, 0::signed-32-native, 6129::signed-32-native>>
-iex> Explorer.Series.from_binary(binary, :date)
-#Explorer.Series<
- Polars[3]
- date [0001-01-01, 1970-01-01, 1986-10-13]
->
Times are encoded as i64 representing nanoseconds from midnight:
iex> binary = <<0::signed-64-native, 86399999999000::signed-64-native>>
-iex> Explorer.Series.from_binary(binary, :time)
-#Explorer.Series<
- Polars[2]
- time [00:00:00.000000, 23:59:59.999999]
->
Datetimes are encoded as i64 representing microseconds from the Unix epoch (1970-01-01):
iex> binary = <<0::signed-64-native, 529550625987654::signed-64-native>>
-iex> Explorer.Series.from_binary(binary, :datetime)
-#Explorer.Series<
- Polars[2]
- datetime [1970-01-01 00:00:00.000000, 1986-10-13 01:23:45.987654]
->
+Integers and floats follow their native encoding:
iex> Explorer.Series.from_binary(<<1.0::float-64-native, 2.0::float-64-native>>, :float)
+#Explorer.Series<
+ Polars[2]
+ float [1.0, 2.0]
+>
+
+iex> Explorer.Series.from_binary(<<-1::signed-64-native, 1::signed-64-native>>, :integer)
+#Explorer.Series<
+ Polars[2]
+ integer [-1, 1]
+>
Booleans are unsigned integers:
iex> Explorer.Series.from_binary(<<1, 0, 1>>, :boolean)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, true]
+>
Dates are encoded as i32 representing days from the Unix epoch (1970-01-01):
iex> binary = <<-719162::signed-32-native, 0::signed-32-native, 6129::signed-32-native>>
+iex> Explorer.Series.from_binary(binary, :date)
+#Explorer.Series<
+ Polars[3]
+ date [0001-01-01, 1970-01-01, 1986-10-13]
+>
Times are encoded as i64 representing nanoseconds from midnight:
iex> binary = <<0::signed-64-native, 86399999999000::signed-64-native>>
+iex> Explorer.Series.from_binary(binary, :time)
+#Explorer.Series<
+ Polars[2]
+ time [00:00:00.000000, 23:59:59.999999]
+>
Datetimes are encoded as i64 representing microseconds from the Unix epoch (1970-01-01):
iex> binary = <<0::signed-64-native, 529550625987654::signed-64-native>>
+iex> Explorer.Series.from_binary(binary, :datetime)
+#Explorer.Series<
+ Polars[2]
+ datetime [1970-01-01 00:00:00.000000, 1986-10-13 01:23:45.987654]
+>
Explorer will infer the type from the values in the list:
iex> Explorer.Series.from_list([1, 2, 3])
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
Series are nullable, so you may also include nils:
iex> Explorer.Series.from_list([1.0, nil, 2.5, 3.1])
-#Explorer.Series<
- Polars[4]
- float [1.0, nil, 2.5, 3.1]
->
A mix of integers and floats will be cast to a float:
iex> Explorer.Series.from_list([1, 2.0])
-#Explorer.Series<
- Polars[2]
- float [1.0, 2.0]
->
Floats series can accept NaN, Inf, and -Inf values:
iex> Explorer.Series.from_list([1.0, 2.0, :nan, 4.0])
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, NaN, 4.0]
->
-
-iex> Explorer.Series.from_list([1.0, 2.0, :infinity, 4.0])
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, Inf, 4.0]
->
-
-iex> Explorer.Series.from_list([1.0, 2.0, :neg_infinity, 4.0])
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, -Inf, 4.0]
->
Trying to create a "nil" series will, by default, result in a series of floats:
iex> Explorer.Series.from_list([nil, nil])
-#Explorer.Series<
- Polars[2]
- float [nil, nil]
->
You can specify the desired dtype
for a series with the :dtype
option.
iex> Explorer.Series.from_list([nil, nil], dtype: :integer)
-#Explorer.Series<
- Polars[2]
- integer [nil, nil]
->
-
-iex> Explorer.Series.from_list([1, nil], dtype: :string)
-#Explorer.Series<
- Polars[2]
- string ["1", nil]
->
The dtype
option is particulary important if a :binary
series is desired, because
-by default binary series will have the dtype of :string
:
iex> Explorer.Series.from_list([<<228, 146, 51>>, <<42, 209, 236>>], dtype: :binary)
-#Explorer.Series<
- Polars[2]
- binary [<<228, 146, 51>>, <<42, 209, 236>>]
->
A series mixing UTF8 strings and binaries is possible:
iex> Explorer.Series.from_list([<<228, 146, 51>>, "Elixir"], dtype: :binary)
-#Explorer.Series<
- Polars[2]
- binary [<<228, 146, 51>>, "Elixir"]
->
Another option is to create a categorical series from a list of strings:
iex> Explorer.Series.from_list(["EUA", "Brazil", "Poland"], dtype: :category)
-#Explorer.Series<
- Polars[3]
- category ["EUA", "Brazil", "Poland"]
->
It is possible to create a series of :datetime
from a list of microseconds since Unix Epoch.
iex> Explorer.Series.from_list([1649883642 * 1_000 * 1_000], dtype: :datetime)
-#Explorer.Series<
- Polars[1]
- datetime [2022-04-13 21:00:42.000000]
->
It is possible to create a series of :time
from a list of nanoseconds since midnight.
iex> Explorer.Series.from_list([123 * 1_000 * 1_000 * 1_000], dtype: :time)
-#Explorer.Series<
- Polars[1]
- time [00:02:03.000000]
->
Mixing non-numeric data types will raise an ArgumentError:
iex> Explorer.Series.from_list([1, "a"])
+Explorer will infer the type from the values in the list:
iex> Explorer.Series.from_list([1, 2, 3])
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
Series are nullable, so you may also include nils:
iex> Explorer.Series.from_list([1.0, nil, 2.5, 3.1])
+#Explorer.Series<
+ Polars[4]
+ float [1.0, nil, 2.5, 3.1]
+>
A mix of integers and floats will be cast to a float:
iex> Explorer.Series.from_list([1, 2.0])
+#Explorer.Series<
+ Polars[2]
+ float [1.0, 2.0]
+>
Floats series can accept NaN, Inf, and -Inf values:
iex> Explorer.Series.from_list([1.0, 2.0, :nan, 4.0])
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, NaN, 4.0]
+>
+
+iex> Explorer.Series.from_list([1.0, 2.0, :infinity, 4.0])
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, Inf, 4.0]
+>
+
+iex> Explorer.Series.from_list([1.0, 2.0, :neg_infinity, 4.0])
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, -Inf, 4.0]
+>
Trying to create a "nil" series will, by default, result in a series of floats:
iex> Explorer.Series.from_list([nil, nil])
+#Explorer.Series<
+ Polars[2]
+ float [nil, nil]
+>
You can specify the desired dtype
for a series with the :dtype
option.
iex> Explorer.Series.from_list([nil, nil], dtype: :integer)
+#Explorer.Series<
+ Polars[2]
+ integer [nil, nil]
+>
+
+iex> Explorer.Series.from_list([1, nil], dtype: :string)
+#Explorer.Series<
+ Polars[2]
+ string ["1", nil]
+>
The dtype
option is particulary important if a :binary
series is desired, because
+by default binary series will have the dtype of :string
:
iex> Explorer.Series.from_list([<<228, 146, 51>>, <<42, 209, 236>>], dtype: :binary)
+#Explorer.Series<
+ Polars[2]
+ binary [<<228, 146, 51>>, <<42, 209, 236>>]
+>
A series mixing UTF8 strings and binaries is possible:
iex> Explorer.Series.from_list([<<228, 146, 51>>, "Elixir"], dtype: :binary)
+#Explorer.Series<
+ Polars[2]
+ binary [<<228, 146, 51>>, "Elixir"]
+>
Another option is to create a categorical series from a list of strings:
iex> Explorer.Series.from_list(["EUA", "Brazil", "Poland"], dtype: :category)
+#Explorer.Series<
+ Polars[3]
+ category ["EUA", "Brazil", "Poland"]
+>
It is possible to create a series of :datetime
from a list of microseconds since Unix Epoch.
iex> Explorer.Series.from_list([1649883642 * 1_000 * 1_000], dtype: :datetime)
+#Explorer.Series<
+ Polars[1]
+ datetime [2022-04-13 21:00:42.000000]
+>
It is possible to create a series of :time
from a list of nanoseconds since midnight.
iex> Explorer.Series.from_list([123 * 1_000 * 1_000 * 1_000], dtype: :time)
+#Explorer.Series<
+ Polars[1]
+ time [00:02:03.000000]
+>
Mixing non-numeric data types will raise an ArgumentError:
iex> Explorer.Series.from_list([1, "a"])
** (ArgumentError) the value "a" does not match the inferred series dtype :integer
Integers and floats:
iex> tensor = Nx.tensor([1, 2, 3])
-iex> Explorer.Series.from_tensor(tensor)
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
-
-iex> tensor = Nx.tensor([1.0, 2.0, 3.0], type: :f64)
-iex> Explorer.Series.from_tensor(tensor)
-#Explorer.Series<
- Polars[3]
- float [1.0, 2.0, 3.0]
->
Unsigned 8-bit tensors are assumed to be booleans:
iex> tensor = Nx.tensor([1, 0, 1], type: :u8)
-iex> Explorer.Series.from_tensor(tensor)
-#Explorer.Series<
- Polars[3]
- boolean [true, false, true]
->
Signed 32-bit tensors are assumed to be dates:
iex> tensor = Nx.tensor([-719162, 0, 6129], type: :s32)
-iex> Explorer.Series.from_tensor(tensor)
-#Explorer.Series<
- Polars[3]
- date [0001-01-01, 1970-01-01, 1986-10-13]
->
Times are signed 64-bit representing nanoseconds from midnight and -therefore must have their dtype explicitly given:
iex> tensor = Nx.tensor([0, 86399999999000])
-iex> Explorer.Series.from_tensor(tensor, dtype: :time)
-#Explorer.Series<
- Polars[2]
- time [00:00:00.000000, 23:59:59.999999]
->
Datetimes are signed 64-bit and therefore must have their dtype explicitly given:
iex> tensor = Nx.tensor([0, 529550625987654])
-iex> Explorer.Series.from_tensor(tensor, dtype: :datetime)
-#Explorer.Series<
- Polars[2]
- datetime [1970-01-01 00:00:00.000000, 1986-10-13 01:23:45.987654]
->
+Integers and floats:
iex> tensor = Nx.tensor([1, 2, 3])
+iex> Explorer.Series.from_tensor(tensor)
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
+
+iex> tensor = Nx.tensor([1.0, 2.0, 3.0], type: :f64)
+iex> Explorer.Series.from_tensor(tensor)
+#Explorer.Series<
+ Polars[3]
+ float [1.0, 2.0, 3.0]
+>
Unsigned 8-bit tensors are assumed to be booleans:
iex> tensor = Nx.tensor([1, 0, 1], type: :u8)
+iex> Explorer.Series.from_tensor(tensor)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, true]
+>
Signed 32-bit tensors are assumed to be dates:
iex> tensor = Nx.tensor([-719162, 0, 6129], type: :s32)
+iex> Explorer.Series.from_tensor(tensor)
+#Explorer.Series<
+ Polars[3]
+ date [0001-01-01, 1970-01-01, 1986-10-13]
+>
Times are signed 64-bit representing nanoseconds from midnight and +therefore must have their dtype explicitly given:
iex> tensor = Nx.tensor([0, 86399999999000])
+iex> Explorer.Series.from_tensor(tensor, dtype: :time)
+#Explorer.Series<
+ Polars[2]
+ time [00:00:00.000000, 23:59:59.999999]
+>
Datetimes are signed 64-bit and therefore must have their dtype explicitly given:
iex> tensor = Nx.tensor([0, 529550625987654])
+iex> Explorer.Series.from_tensor(tensor, dtype: :datetime)
+#Explorer.Series<
+ Polars[2]
+ datetime [1970-01-01 00:00:00.000000, 1986-10-13 01:23:45.987654]
+>
iex> s = Explorer.Series.from_list([0, 1, 2])
-iex> Explorer.Series.replace(s, Nx.tensor([1, 2, 3]))
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
This is particularly useful for categorical columns:
iex> s = Explorer.Series.from_list(["foo", "bar", "baz"], dtype: :category)
-iex> Explorer.Series.replace(s, Nx.tensor([2, 1, 0]))
-#Explorer.Series<
- Polars[3]
- category ["baz", "bar", "foo"]
->
iex> s = Explorer.Series.from_list([0, 1, 2])
+iex> Explorer.Series.replace(s, Nx.tensor([1, 2, 3]))
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
This is particularly useful for categorical columns:
iex> s = Explorer.Series.from_list(["foo", "bar", "baz"], dtype: :category)
+iex> Explorer.Series.replace(s, Nx.tensor([2, 1, 0]))
+#Explorer.Series<
+ Polars[3]
+ category ["baz", "bar", "foo"]
+>
Similar to tensors, we can also replace by lists:
iex> s = Explorer.Series.from_list([0, 1, 2])
-iex> Explorer.Series.replace(s, [1, 2, 3, 4, 5])
-#Explorer.Series<
- Polars[5]
- integer [1, 2, 3, 4, 5]
->
The same considerations as above apply.
+Similar to tensors, we can also replace by lists:
iex> s = Explorer.Series.from_list([0, 1, 2])
+iex> Explorer.Series.replace(s, [1, 2, 3, 4, 5])
+#Explorer.Series<
+ Polars[5]
+ integer [1, 2, 3, 4, 5]
+>
The same considerations as above apply.
iex> series = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.to_binary(series)
-<<1::signed-64-native, 2::signed-64-native, 3::signed-64-native>>
+iex> series = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.to_binary(series)
+<<1::signed-64-native, 2::signed-64-native, 3::signed-64-native>>
-iex> series = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.to_binary(series)
-<<1, 0, 1>>
+iex> series = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.to_binary(series)
+<<1, 0, 1>>
iex> series = Explorer.Series.from_list([1, 2, 3])
-iex> series |> Explorer.Series.to_enum() |> Enum.to_list()
-[1, 2, 3]
+iex> series = Explorer.Series.from_list([1, 2, 3])
+iex> series |> Explorer.Series.to_enum() |> Enum.to_list()
+[1, 2, 3]
Integers and floats follow their native encoding:
iex> series = Explorer.Series.from_list([-1, 0, 1])
-iex> Explorer.Series.to_iovec(series)
-[<<-1::signed-64-native, 0::signed-64-native, 1::signed-64-native>>]
-
-iex> series = Explorer.Series.from_list([1.0, 2.0, 3.0])
-iex> Explorer.Series.to_iovec(series)
-[<<1.0::float-64-native, 2.0::float-64-native, 3.0::float-64-native>>]
Booleans are encoded as 0 and 1:
iex> series = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.to_iovec(series)
-[<<1, 0, 1>>]
Dates are encoded as i32 representing days from the Unix epoch (1970-01-01):
iex> series = Explorer.Series.from_list([~D[0001-01-01], ~D[1970-01-01], ~D[1986-10-13]])
-iex> Explorer.Series.to_iovec(series)
-[<<-719162::signed-32-native, 0::signed-32-native, 6129::signed-32-native>>]
Times are encoded as i64 representing nanoseconds from midnight:
iex> series = Explorer.Series.from_list([~T[00:00:00.000000], ~T[23:59:59.999999]])
-iex> Explorer.Series.to_iovec(series)
-[<<0::signed-64-native, 86399999999000::signed-64-native>>]
Datetimes are encoded as i64 representing microseconds from the Unix epoch (1970-01-01):
iex> series = Explorer.Series.from_list([~N[0001-01-01 00:00:00], ~N[1970-01-01 00:00:00], ~N[1986-10-13 01:23:45.987654]])
-iex> Explorer.Series.to_iovec(series)
-[<<-62135596800000000::signed-64-native, 0::signed-64-native, 529550625987654::signed-64-native>>]
The operation raises for binaries and strings, as they do not provide a fixed-width -binary representation:
iex> s = Explorer.Series.from_list(["a", "b", "c", "b"])
-iex> Explorer.Series.to_iovec(s)
+Integers and floats follow their native encoding:
iex> series = Explorer.Series.from_list([-1, 0, 1])
+iex> Explorer.Series.to_iovec(series)
+[<<-1::signed-64-native, 0::signed-64-native, 1::signed-64-native>>]
+
+iex> series = Explorer.Series.from_list([1.0, 2.0, 3.0])
+iex> Explorer.Series.to_iovec(series)
+[<<1.0::float-64-native, 2.0::float-64-native, 3.0::float-64-native>>]
Booleans are encoded as 0 and 1:
iex> series = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.to_iovec(series)
+[<<1, 0, 1>>]
Dates are encoded as i32 representing days from the Unix epoch (1970-01-01):
iex> series = Explorer.Series.from_list([~D[0001-01-01], ~D[1970-01-01], ~D[1986-10-13]])
+iex> Explorer.Series.to_iovec(series)
+[<<-719162::signed-32-native, 0::signed-32-native, 6129::signed-32-native>>]
Times are encoded as i64 representing nanoseconds from midnight:
iex> series = Explorer.Series.from_list([~T[00:00:00.000000], ~T[23:59:59.999999]])
+iex> Explorer.Series.to_iovec(series)
+[<<0::signed-64-native, 86399999999000::signed-64-native>>]
Datetimes are encoded as i64 representing microseconds from the Unix epoch (1970-01-01):
iex> series = Explorer.Series.from_list([~N[0001-01-01 00:00:00], ~N[1970-01-01 00:00:00], ~N[1986-10-13 01:23:45.987654]])
+iex> Explorer.Series.to_iovec(series)
+[<<-62135596800000000::signed-64-native, 0::signed-64-native, 529550625987654::signed-64-native>>]
The operation raises for binaries and strings, as they do not provide a fixed-width
+binary representation:
iex> s = Explorer.Series.from_list(["a", "b", "c", "b"])
+iex> Explorer.Series.to_iovec(s)
** (ArgumentError) cannot convert series of dtype :string into iovec
However, if appropriate, you can convert them to categorical types,
-which will then return the index of each category:
iex> series = Explorer.Series.from_list(["a", "b", "c", "b"], dtype: :category)
-iex> Explorer.Series.to_iovec(series)
-[<<0::unsigned-32-native, 1::unsigned-32-native, 2::unsigned-32-native, 1::unsigned-32-native>>]
+which will then return the index of each category:iex> series = Explorer.Series.from_list(["a", "b", "c", "b"], dtype: :category)
+iex> Explorer.Series.to_iovec(series)
+[<<0::unsigned-32-native, 1::unsigned-32-native, 2::unsigned-32-native, 1::unsigned-32-native>>]
iex> series = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.to_list(series)
-[1, 2, 3]
+iex> series = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.to_list(series)
+[1, 2, 3]
iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.to_tensor(s)
-#Nx.Tensor<
- s64[3]
- [1, 2, 3]
->
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.to_tensor(s)
+#Nx.Tensor<
+ s64[3]
+ [1, 2, 3]
+>
-iex> s = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.to_tensor(s)
-#Nx.Tensor<
- u8[3]
- [1, 0, 1]
->
+iex> s = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.to_tensor(s)
+#Nx.Tensor<
+ u8[3]
+ [1, 0, 1]
+>
iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.argmax(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.argmax(s)
3
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.argmax(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.argmax(s)
3
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.argmax(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.argmax(s)
0
-iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
-iex> Explorer.Series.argmax(s)
+iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
+iex> Explorer.Series.argmax(s)
0
-iex> s = Explorer.Series.from_list([~T[00:02:03.000212], ~T[00:05:04.000456]])
-iex> Explorer.Series.argmax(s)
+iex> s = Explorer.Series.from_list([~T[00:02:03.000212], ~T[00:05:04.000456]])
+iex> Explorer.Series.argmax(s)
1
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.argmax(s)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.argmax(s)
** (ArgumentError) Explorer.Series.argmax/1 not implemented for dtype :string. Valid dtypes are [:integer, :float, :date, :time, :datetime]
@@ -2371,28 +2371,28 @@ argmin(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.argmin(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.argmin(s)
2
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.argmin(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.argmin(s)
2
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.argmin(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.argmin(s)
1
-iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
-iex> Explorer.Series.argmin(s)
+iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
+iex> Explorer.Series.argmin(s)
1
-iex> s = Explorer.Series.from_list([~T[00:02:03.000212], ~T[00:05:04.000456]])
-iex> Explorer.Series.argmin(s)
+iex> s = Explorer.Series.from_list([~T[00:02:03.000212], ~T[00:05:04.000456]])
+iex> Explorer.Series.argmin(s)
0
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.argmin(s)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.argmin(s)
** (ArgumentError) Explorer.Series.argmin/1 not implemented for dtype :string. Valid dtypes are [:integer, :float, :date, :time, :datetime]
@@ -2437,9 +2437,9 @@ correlation(left, right, ddof \\ 1)
Examples
-iex> s1 = Series.from_list([1, 8, 3])
-iex> s2 = Series.from_list([4, 5, 2])
-iex> Series.correlation(s1, s2)
+iex> s1 = Series.from_list([1, 8, 3])
+iex> s2 = Series.from_list([4, 5, 2])
+iex> Series.correlation(s1, s2)
0.5447047794019223
@@ -2471,8 +2471,8 @@ count(series)
Examples
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.count(s)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.count(s)
3
@@ -2513,9 +2513,9 @@ covariance(left, right)
Examples
-iex> s1 = Series.from_list([1, 8, 3])
-iex> s2 = Series.from_list([4, 5, 2])
-iex> Series.covariance(s1, s2)
+iex> s1 = Series.from_list([1, 8, 3])
+iex> s2 = Series.from_list([4, 5, 2])
+iex> Series.covariance(s1, s2)
3.0
@@ -2556,14 +2556,14 @@ cut(series, bins, opts \\ [])
Examples
-iex> s = Explorer.Series.from_list([1.0, 2.0, 3.0])
-iex> Explorer.Series.cut(s, [1.5, 2.5])
-#Explorer.DataFrame<
- Polars[3 x 3]
- values float [1.0, 2.0, 3.0]
- break_point float [1.5, 2.5, Inf]
- category category ["(-inf, 1.5]", "(1.5, 2.5]", "(2.5, inf]"]
->
+iex> s = Explorer.Series.from_list([1.0, 2.0, 3.0])
+iex> Explorer.Series.cut(s, [1.5, 2.5])
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ values float [1.0, 2.0, 3.0]
+ break_point float [1.5, 2.5, Inf]
+ category category ["(-inf, 1.5]", "(1.5, 2.5]", "(2.5, inf]"]
+>
@@ -2591,13 +2591,13 @@ frequencies(series)
Examples
-iex> s = Explorer.Series.from_list(["a", "a", "b", "c", "c", "c"])
-iex> Explorer.Series.frequencies(s)
-#Explorer.DataFrame<
- Polars[3 x 2]
- values string ["c", "a", "b"]
- counts integer [3, 2, 1]
->
+iex> s = Explorer.Series.from_list(["a", "a", "b", "c", "c", "c"])
+iex> Explorer.Series.frequencies(s)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ values string ["c", "a", "b"]
+ counts integer [3, 2, 1]
+>
@@ -2638,28 +2638,28 @@ max(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.max(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.max(s)
3
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.max(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.max(s)
3.0
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.max(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.max(s)
~D[2021-01-01]
-iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
-iex> Explorer.Series.max(s)
+iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
+iex> Explorer.Series.max(s)
~N[2021-01-01 00:00:00.000000]
-iex> s = Explorer.Series.from_list([~T[00:02:03.000212], ~T[00:05:04.000456]])
-iex> Explorer.Series.max(s)
+iex> s = Explorer.Series.from_list([~T[00:02:03.000212], ~T[00:05:04.000456]])
+iex> Explorer.Series.max(s)
~T[00:05:04.000456]
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.max(s)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.max(s)
** (ArgumentError) Explorer.Series.max/1 not implemented for dtype :string. Valid dtypes are [:integer, :float, :date, :time, :datetime]
@@ -2700,16 +2700,16 @@ mean(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.mean(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.mean(s)
2.0
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.mean(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.mean(s)
2.0
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.mean(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.mean(s)
** (ArgumentError) Explorer.Series.mean/1 not implemented for dtype :date. Valid dtypes are [:integer, :float]
@@ -2750,16 +2750,16 @@ median(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.median(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.median(s)
2.0
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.median(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.median(s)
2.0
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.median(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.median(s)
** (ArgumentError) Explorer.Series.median/1 not implemented for dtype :date. Valid dtypes are [:integer, :float]
@@ -2801,28 +2801,28 @@ min(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.min(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.min(s)
1
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.min(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.min(s)
1.0
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.min(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.min(s)
~D[1999-12-31]
-iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
-iex> Explorer.Series.min(s)
+iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
+iex> Explorer.Series.min(s)
~N[1999-12-31 00:00:00.000000]
-iex> s = Explorer.Series.from_list([~T[00:02:03.000451], ~T[00:05:04.000134]])
-iex> Explorer.Series.min(s)
+iex> s = Explorer.Series.from_list([~T[00:02:03.000451], ~T[00:05:04.000134]])
+iex> Explorer.Series.min(s)
~T[00:02:03.000451]
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.min(s)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.min(s)
** (ArgumentError) Explorer.Series.min/1 not implemented for dtype :string. Valid dtypes are [:integer, :float, :date, :time, :datetime]
@@ -2851,8 +2851,8 @@ n_distinct(series)
Examples
-iex> s = Explorer.Series.from_list(["a", "b", "a", "b"])
-iex> Explorer.Series.n_distinct(s)
+iex> s = Explorer.Series.from_list(["a", "b", "a", "b"])
+iex> Explorer.Series.n_distinct(s)
2
@@ -2881,8 +2881,8 @@ nil_count(series)
Examples
-iex> s = Explorer.Series.from_list(["a", nil, "c", nil, nil])
-iex> Explorer.Series.nil_count(s)
+iex> s = Explorer.Series.from_list(["a", nil, "c", nil, nil])
+iex> Explorer.Series.nil_count(s)
3
@@ -2923,12 +2923,12 @@ product(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.product(s)
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.product(s)
6
-iex> s = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.product(s)
+iex> s = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.product(s)
** (ArgumentError) Explorer.Series.product/1 not implemented for dtype :boolean. Valid dtypes are [:integer, :float]
@@ -2970,14 +2970,14 @@ qcut(series, quantiles, opts \\ [])
Examples
-iex> s = Explorer.Series.from_list([1.0, 2.0, 3.0, 4.0, 5.0])
-iex> Explorer.Series.qcut(s, [0.25, 0.75])
-#Explorer.DataFrame<
- Polars[5 x 3]
- values float [1.0, 2.0, 3.0, 4.0, 5.0]
- break_point float [2.0, 2.0, 4.0, 4.0, Inf]
- category category ["(-inf, 2]", "(-inf, 2]", "(2, 4]", "(2, 4]", "(4, inf]"]
->
+iex> s = Explorer.Series.from_list([1.0, 2.0, 3.0, 4.0, 5.0])
+iex> Explorer.Series.qcut(s, [0.25, 0.75])
+#Explorer.DataFrame<
+ Polars[5 x 3]
+ values float [1.0, 2.0, 3.0, 4.0, 5.0]
+ break_point float [2.0, 2.0, 4.0, 4.0, Inf]
+ category category ["(-inf, 2]", "(-inf, 2]", "(2, 4]", "(2, 4]", "(4, inf]"]
+>
@@ -3017,28 +3017,28 @@ quantile(series, quantile)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.quantile(s, 0.2)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.quantile(s, 0.2)
1
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.quantile(s, 0.5)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.quantile(s, 0.5)
2.0
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.quantile(s, 0.5)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.quantile(s, 0.5)
~D[2021-01-01]
-iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
-iex> Explorer.Series.quantile(s, 0.5)
+iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
+iex> Explorer.Series.quantile(s, 0.5)
~N[2021-01-01 00:00:00.000000]
-iex> s = Explorer.Series.from_list([~T[01:55:00], ~T[15:35:00], ~T[23:00:00]])
-iex> Explorer.Series.quantile(s, 0.5)
+iex> s = Explorer.Series.from_list([~T[01:55:00], ~T[15:35:00], ~T[23:00:00]])
+iex> Explorer.Series.quantile(s, 0.5)
~T[15:35:00]
-iex> s = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.quantile(s, 0.5)
+iex> s = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.quantile(s, 0.5)
** (ArgumentError) Explorer.Series.quantile/2 not implemented for dtype :boolean. Valid dtypes are [:integer, :float, :date, :time, :datetime]
@@ -3083,24 +3083,24 @@ skew(series, opts \\ [])
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5, 23])
-iex> Explorer.Series.skew(s)
+iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5, 23])
+iex> Explorer.Series.skew(s)
1.6727687946848508
-iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5, 23])
-iex> Explorer.Series.skew(s, bias: false)
+iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5, 23])
+iex> Explorer.Series.skew(s, bias: false)
2.2905330058490514
-iex> s = Explorer.Series.from_list([1, 2, 3, nil, 1])
-iex> Explorer.Series.skew(s, bias: false)
+iex> s = Explorer.Series.from_list([1, 2, 3, nil, 1])
+iex> Explorer.Series.skew(s, bias: false)
0.8545630383279712
-iex> s = Explorer.Series.from_list([1, 2, 3, nil, 1])
-iex> Explorer.Series.skew(s)
+iex> s = Explorer.Series.from_list([1, 2, 3, nil, 1])
+iex> Explorer.Series.skew(s)
0.49338220021815865
-iex> s = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.skew(s, false)
+iex> s = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.skew(s, false)
** (ArgumentError) Explorer.Series.skew/2 not implemented for dtype :boolean. Valid dtypes are [:integer, :float]
@@ -3141,16 +3141,16 @@ standard_deviation(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.standard_deviation(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.standard_deviation(s)
1.0
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.standard_deviation(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.standard_deviation(s)
1.0
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.standard_deviation(s)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.standard_deviation(s)
** (ArgumentError) Explorer.Series.standard_deviation/1 not implemented for dtype :string. Valid dtypes are [:integer, :float]
@@ -3191,20 +3191,20 @@ sum(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.sum(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.sum(s)
6
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.sum(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.sum(s)
6.0
-iex> s = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.sum(s)
+iex> s = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.sum(s)
2
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.sum(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.sum(s)
** (ArgumentError) Explorer.Series.sum/1 not implemented for dtype :date. Valid dtypes are [:integer, :float, :boolean]
@@ -3245,16 +3245,16 @@ variance(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.variance(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.variance(s)
1.0
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.variance(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.variance(s)
1.0
-iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
-iex> Explorer.Series.variance(s)
+iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
+iex> Explorer.Series.variance(s)
** (ArgumentError) Explorer.Series.variance/1 not implemented for dtype :datetime. Valid dtypes are [:integer, :float]
@@ -3308,29 +3308,29 @@ abs(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, -1, -3])
-iex> Explorer.Series.abs(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 1, 3]
->
-
-iex> s = Explorer.Series.from_list([1.0, 2.0, -1.0, -3.0])
-iex> Explorer.Series.abs(s)
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, 1.0, 3.0]
->
-
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, -3.0])
-iex> Explorer.Series.abs(s)
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, nil, 3.0]
->
-
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.abs(s)
+iex> s = Explorer.Series.from_list([1, 2, -1, -3])
+iex> Explorer.Series.abs(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 1, 3]
+>
+
+iex> s = Explorer.Series.from_list([1.0, 2.0, -1.0, -3.0])
+iex> Explorer.Series.abs(s)
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, 1.0, 3.0]
+>
+
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, -3.0])
+iex> Explorer.Series.abs(s)
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, nil, 3.0]
+>
+
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.abs(s)
** (ArgumentError) Explorer.Series.abs/1 not implemented for dtype :string. Valid dtypes are [:integer, :float]
@@ -3373,25 +3373,25 @@ add(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([4, 5, 6])
-iex> Explorer.Series.add(s1, s2)
-#Explorer.Series<
- Polars[3]
- integer [5, 7, 9]
->
You can also use scalar values on both sides:
iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.add(s1, 2)
-#Explorer.Series<
- Polars[3]
- integer [3, 4, 5]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([4, 5, 6])
+iex> Explorer.Series.add(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ integer [5, 7, 9]
+>
You can also use scalar values on both sides:
iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.add(s1, 2)
+#Explorer.Series<
+ Polars[3]
+ integer [3, 4, 5]
+>
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.add(2, s1)
-#Explorer.Series<
- Polars[3]
- integer [3, 4, 5]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.add(2, s1)
+#Explorer.Series<
+ Polars[3]
+ integer [3, 4, 5]
+>
@@ -3419,19 +3419,19 @@ all_equal(left, right)
Examples
-iex> s1 = Explorer.Series.from_list(["a", "b"])
-iex> s2 = Explorer.Series.from_list(["a", "b"])
-iex> Explorer.Series.all_equal(s1, s2)
+iex> s1 = Explorer.Series.from_list(["a", "b"])
+iex> s2 = Explorer.Series.from_list(["a", "b"])
+iex> Explorer.Series.all_equal(s1, s2)
true
-iex> s1 = Explorer.Series.from_list(["a", "b"])
-iex> s2 = Explorer.Series.from_list(["a", "c"])
-iex> Explorer.Series.all_equal(s1, s2)
+iex> s1 = Explorer.Series.from_list(["a", "b"])
+iex> s2 = Explorer.Series.from_list(["a", "c"])
+iex> Explorer.Series.all_equal(s1, s2)
false
-iex> s1 = Explorer.Series.from_list(["a", "b"])
-iex> s2 = Explorer.Series.from_list([1, 2])
-iex> Explorer.Series.all_equal(s1, s2)
+iex> s1 = Explorer.Series.from_list(["a", "b"])
+iex> s2 = Explorer.Series.from_list([1, 2])
+iex> Explorer.Series.all_equal(s1, s2)
false
@@ -3461,14 +3461,14 @@ left and right
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> mask1 = Explorer.Series.greater(s1, 1)
-iex> mask2 = Explorer.Series.less(s1, 3)
-iex> Explorer.Series.and(mask1, mask2)
-#Explorer.Series<
- Polars[3]
- boolean [false, true, false]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> mask1 = Explorer.Series.greater(s1, 1)
+iex> mask2 = Explorer.Series.less(s1, 3)
+iex> Explorer.Series.and(mask1, mask2)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, true, false]
+>
@@ -3502,62 +3502,62 @@ cast(series, dtype)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.cast(s, :string)
-#Explorer.Series<
- Polars[3]
- string ["1", "2", "3"]
->
-
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.cast(s, :float)
-#Explorer.Series<
- Polars[3]
- float [1.0, 2.0, 3.0]
->
-
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.cast(s, :date)
-#Explorer.Series<
- Polars[3]
- date [1970-01-02, 1970-01-03, 1970-01-04]
->
Note that time
is represented as an integer of nanoseconds since midnight.
+
iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.cast(s, :string)
+#Explorer.Series<
+ Polars[3]
+ string ["1", "2", "3"]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.cast(s, :float)
+#Explorer.Series<
+ Polars[3]
+ float [1.0, 2.0, 3.0]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.cast(s, :date)
+#Explorer.Series<
+ Polars[3]
+ date [1970-01-02, 1970-01-03, 1970-01-04]
+>
Note that time
is represented as an integer of nanoseconds since midnight.
In Elixir we can't represent nanoseconds, only microseconds. So be aware that
-information can be lost if a conversion is needed (e.g. calling to_list/1
).
iex> s = Explorer.Series.from_list([1_000, 2_000, 3_000])
-iex> Explorer.Series.cast(s, :time)
-#Explorer.Series<
- Polars[3]
- time [00:00:00.000001, 00:00:00.000002, 00:00:00.000003]
->
-
-iex> s = Explorer.Series.from_list([86399 * 1_000 * 1_000 * 1_000])
-iex> Explorer.Series.cast(s, :time)
-#Explorer.Series<
- Polars[1]
- time [23:59:59.000000]
->
Note that datetime
is represented as an integer of microseconds since Unix Epoch (1970-01-01 00:00:00).
iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.cast(s, :datetime)
-#Explorer.Series<
- Polars[3]
- datetime [1970-01-01 00:00:00.000001, 1970-01-01 00:00:00.000002, 1970-01-01 00:00:00.000003]
->
-
-iex> s = Explorer.Series.from_list([1649883642 * 1_000 * 1_000])
-iex> Explorer.Series.cast(s, :datetime)
-#Explorer.Series<
- Polars[1]
- datetime [2022-04-13 21:00:42.000000]
->
You can also use cast/2
to categorise a string:
iex> s = Explorer.Series.from_list(["apple", "banana", "apple", "lemon"])
-iex> Explorer.Series.cast(s, :category)
-#Explorer.Series<
- Polars[4]
- category ["apple", "banana", "apple", "lemon"]
->
cast/2
will return the series as a no-op if you try to cast to the same dtype.
iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.cast(s, :integer)
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
+information can be lost if a conversion is needed (e.g. calling to_list/1
).iex> s = Explorer.Series.from_list([1_000, 2_000, 3_000])
+iex> Explorer.Series.cast(s, :time)
+#Explorer.Series<
+ Polars[3]
+ time [00:00:00.000001, 00:00:00.000002, 00:00:00.000003]
+>
+
+iex> s = Explorer.Series.from_list([86399 * 1_000 * 1_000 * 1_000])
+iex> Explorer.Series.cast(s, :time)
+#Explorer.Series<
+ Polars[1]
+ time [23:59:59.000000]
+>
Note that datetime
is represented as an integer of microseconds since Unix Epoch (1970-01-01 00:00:00).
iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.cast(s, :datetime)
+#Explorer.Series<
+ Polars[3]
+ datetime [1970-01-01 00:00:00.000001, 1970-01-01 00:00:00.000002, 1970-01-01 00:00:00.000003]
+>
+
+iex> s = Explorer.Series.from_list([1649883642 * 1_000 * 1_000])
+iex> Explorer.Series.cast(s, :datetime)
+#Explorer.Series<
+ Polars[1]
+ datetime [2022-04-13 21:00:42.000000]
+>
You can also use cast/2
to categorise a string:
iex> s = Explorer.Series.from_list(["apple", "banana", "apple", "lemon"])
+iex> Explorer.Series.cast(s, :category)
+#Explorer.Series<
+ Polars[4]
+ category ["apple", "banana", "apple", "lemon"]
+>
cast/2
will return the series as a no-op if you try to cast to the same dtype.
iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.cast(s, :integer)
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
@@ -3591,32 +3591,32 @@ categorise(series, categories)
If a categorical series is given as second argument, we will extract its
-categories and map the integers into it:
iex> categories = Explorer.Series.from_list(["a", "b", "c", nil, "a"], dtype: :category)
-iex> indexes = Explorer.Series.from_list([0, 2, 1, 0, 2])
-iex> Explorer.Series.categorise(indexes, categories)
-#Explorer.Series<
- Polars[5]
- category ["a", "c", "b", "a", "c"]
->
Otherwise, if a list of strings or a series of strings is given, they are
-considered to be the categories series itself:
iex> categories = Explorer.Series.from_list(["a", "b", "c"])
-iex> indexes = Explorer.Series.from_list([0, 2, 1, 0, 2])
-iex> Explorer.Series.categorise(indexes, categories)
-#Explorer.Series<
- Polars[5]
- category ["a", "c", "b", "a", "c"]
->
-
-iex> indexes = Explorer.Series.from_list([0, 2, 1, 0, 2])
-iex> Explorer.Series.categorise(indexes, ["a", "b", "c"])
-#Explorer.Series<
- Polars[5]
- category ["a", "c", "b", "a", "c"]
->
Elements that are not mapped to a category will become nil
:
iex> indexes = Explorer.Series.from_list([0, 2, nil, 0, 2, 7])
-iex> Explorer.Series.categorise(indexes, ["a", "b", "c"])
-#Explorer.Series<
- Polars[6]
- category ["a", "c", nil, "a", "c", nil]
->
+categories and map the integers into it:iex> categories = Explorer.Series.from_list(["a", "b", "c", nil, "a"], dtype: :category)
+iex> indexes = Explorer.Series.from_list([0, 2, 1, 0, 2])
+iex> Explorer.Series.categorise(indexes, categories)
+#Explorer.Series<
+ Polars[5]
+ category ["a", "c", "b", "a", "c"]
+>
Otherwise, if a list of strings or a series of strings is given, they are
+considered to be the categories series itself:
iex> categories = Explorer.Series.from_list(["a", "b", "c"])
+iex> indexes = Explorer.Series.from_list([0, 2, 1, 0, 2])
+iex> Explorer.Series.categorise(indexes, categories)
+#Explorer.Series<
+ Polars[5]
+ category ["a", "c", "b", "a", "c"]
+>
+
+iex> indexes = Explorer.Series.from_list([0, 2, 1, 0, 2])
+iex> Explorer.Series.categorise(indexes, ["a", "b", "c"])
+#Explorer.Series<
+ Polars[5]
+ category ["a", "c", "b", "a", "c"]
+>
Elements that are not mapped to a category will become nil
:
iex> indexes = Explorer.Series.from_list([0, 2, nil, 0, 2, 7])
+iex> Explorer.Series.categorise(indexes, ["a", "b", "c"])
+#Explorer.Series<
+ Polars[6]
+ category ["a", "c", nil, "a", "c", nil]
+>
@@ -3657,19 +3657,19 @@ clip(series, min, max)
Examples
-iex> s = Explorer.Series.from_list([-50, 5, nil, 50])
-iex> Explorer.Series.clip(s, 1, 10)
-#Explorer.Series<
- Polars[4]
- integer [1, 5, nil, 10]
->
+iex> s = Explorer.Series.from_list([-50, 5, nil, 50])
+iex> Explorer.Series.clip(s, 1, 10)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 5, nil, 10]
+>
-iex> s = Explorer.Series.from_list([-50, 5, nil, 50])
-iex> Explorer.Series.clip(s, 1.5, 10.5)
-#Explorer.Series<
- Polars[4]
- float [1.5, 5.0, nil, 10.5]
->
+iex> s = Explorer.Series.from_list([-50, 5, nil, 50])
+iex> Explorer.Series.clip(s, 1.5, 10.5)
+#Explorer.Series<
+ Polars[4]
+ float [1.5, 5.0, nil, 10.5]
+>
@@ -3703,14 +3703,14 @@ coalesce(list)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, nil, nil])
-iex> s2 = Explorer.Series.from_list([1, 2, nil, 4])
-iex> s3 = Explorer.Series.from_list([nil, nil, 3, 4])
-iex> Explorer.Series.coalesce([s1, s2, s3])
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 3, 4]
->
+iex> s1 = Explorer.Series.from_list([1, 2, nil, nil])
+iex> s2 = Explorer.Series.from_list([1, 2, nil, 4])
+iex> s3 = Explorer.Series.from_list([nil, nil, 3, 4])
+iex> Explorer.Series.coalesce([s1, s2, s3])
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 3, 4]
+>
@@ -3744,17 +3744,17 @@ coalesce(s1, s2)
Examples
-iex> s1 = Explorer.Series.from_list([1, nil, 3, nil])
-iex> s2 = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.coalesce(s1, s2)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 3, 4]
->
+iex> s1 = Explorer.Series.from_list([1, nil, 3, nil])
+iex> s2 = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.coalesce(s1, s2)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 3, 4]
+>
-iex> s1 = Explorer.Series.from_list(["foo", nil, "bar", nil])
-iex> s2 = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.coalesce(s1, s2)
+iex> s1 = Explorer.Series.from_list(["foo", nil, "bar", nil])
+iex> s2 = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.coalesce(s1, s2)
** (ArgumentError) cannot invoke Explorer.Series.coalesce/2 with mismatched dtypes: :string and :integer
@@ -3797,35 +3797,35 @@ divide(left, right)
Examples
-iex> s1 = [10, 10, 10] |> Explorer.Series.from_list()
-iex> s2 = [2, 2, 2] |> Explorer.Series.from_list()
-iex> Explorer.Series.divide(s1, s2)
-#Explorer.Series<
- Polars[3]
- float [5.0, 5.0, 5.0]
->
-
-iex> s1 = [10, 10, 10] |> Explorer.Series.from_list()
-iex> Explorer.Series.divide(s1, 2)
-#Explorer.Series<
- Polars[3]
- float [5.0, 5.0, 5.0]
->
-
-iex> s1 = [10, 52 ,10] |> Explorer.Series.from_list()
-iex> Explorer.Series.divide(s1, 2.5)
-#Explorer.Series<
- Polars[3]
- float [4.0, 20.8, 4.0]
->
-
-iex> s1 = [10, 10, 10] |> Explorer.Series.from_list()
-iex> s2 = [2, 0, 2] |> Explorer.Series.from_list()
-iex> Explorer.Series.divide(s1, s2)
-#Explorer.Series<
- Polars[3]
- float [5.0, Inf, 5.0]
->
+iex> s1 = [10, 10, 10] |> Explorer.Series.from_list()
+iex> s2 = [2, 2, 2] |> Explorer.Series.from_list()
+iex> Explorer.Series.divide(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ float [5.0, 5.0, 5.0]
+>
+
+iex> s1 = [10, 10, 10] |> Explorer.Series.from_list()
+iex> Explorer.Series.divide(s1, 2)
+#Explorer.Series<
+ Polars[3]
+ float [5.0, 5.0, 5.0]
+>
+
+iex> s1 = [10, 52 ,10] |> Explorer.Series.from_list()
+iex> Explorer.Series.divide(s1, 2.5)
+#Explorer.Series<
+ Polars[3]
+ float [4.0, 20.8, 4.0]
+>
+
+iex> s1 = [10, 10, 10] |> Explorer.Series.from_list()
+iex> s2 = [2, 0, 2] |> Explorer.Series.from_list()
+iex> Explorer.Series.divide(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ float [5.0, Inf, 5.0]
+>
@@ -3866,51 +3866,51 @@ equal(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([1, 2, 4])
-iex> Explorer.Series.equal(s1, s2)
-#Explorer.Series<
- Polars[3]
- boolean [true, true, false]
->
-
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.equal(s, 1)
-#Explorer.Series<
- Polars[3]
- boolean [true, false, false]
->
-
-iex> s = Explorer.Series.from_list([true, true, false])
-iex> Explorer.Series.equal(s, true)
-#Explorer.Series<
- Polars[3]
- boolean [true, true, false]
->
-
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.equal(s, "a")
-#Explorer.Series<
- Polars[3]
- boolean [true, false, false]
->
-
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.equal(s, ~D[1999-12-31])
-#Explorer.Series<
- Polars[2]
- boolean [false, true]
->
-
-iex> s = Explorer.Series.from_list([~N[2022-01-01 00:00:00], ~N[2022-01-01 23:00:00]])
-iex> Explorer.Series.equal(s, ~N[2022-01-01 00:00:00])
-#Explorer.Series<
- Polars[2]
- boolean [true, false]
->
-
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.equal(s, false)
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([1, 2, 4])
+iex> Explorer.Series.equal(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, true, false]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.equal(s, 1)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, false]
+>
+
+iex> s = Explorer.Series.from_list([true, true, false])
+iex> Explorer.Series.equal(s, true)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, true, false]
+>
+
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.equal(s, "a")
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, false]
+>
+
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.equal(s, ~D[1999-12-31])
+#Explorer.Series<
+ Polars[2]
+ boolean [false, true]
+>
+
+iex> s = Explorer.Series.from_list([~N[2022-01-01 00:00:00], ~N[2022-01-01 23:00:00]])
+iex> Explorer.Series.equal(s, ~N[2022-01-01 00:00:00])
+#Explorer.Series<
+ Polars[2]
+ boolean [true, false]
+>
+
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.equal(s, false)
** (ArgumentError) cannot invoke Explorer.Series.equal/2 with mismatched dtypes: :string and false
@@ -3984,13 +3984,13 @@ greater(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([1, 2, 4])
-iex> Explorer.Series.greater(s1, s2)
-#Explorer.Series<
- Polars[3]
- boolean [false, false, false]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([1, 2, 4])
+iex> Explorer.Series.greater(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, false, false]
+>
@@ -4035,13 +4035,13 @@ greater_equal(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([1, 2, 4])
-iex> Explorer.Series.greater_equal(s1, s2)
-#Explorer.Series<
- Polars[3]
- boolean [true, true, false]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([1, 2, 4])
+iex> Explorer.Series.greater_equal(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, true, false]
+>
@@ -4069,21 +4069,21 @@ left in right
Examples
-iex> left = Explorer.Series.from_list([1, 2, 3])
-iex> right = Explorer.Series.from_list([1, 2])
-iex> Series.in(left, right)
-#Explorer.Series<
- Polars[3]
- boolean [true, true, false]
->
+iex> left = Explorer.Series.from_list([1, 2, 3])
+iex> right = Explorer.Series.from_list([1, 2])
+iex> Series.in(left, right)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, true, false]
+>
-iex> left = Explorer.Series.from_list([~D[1970-01-01], ~D[2000-01-01], ~D[2010-04-17]])
-iex> right = Explorer.Series.from_list([~D[1970-01-01], ~D[2010-04-17]])
-iex> Series.in(left, right)
-#Explorer.Series<
- Polars[3]
- boolean [true, false, true]
->
+iex> left = Explorer.Series.from_list([~D[1970-01-01], ~D[2000-01-01], ~D[2010-04-17]])
+iex> right = Explorer.Series.from_list([~D[1970-01-01], ~D[2010-04-17]])
+iex> Series.in(left, right)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, true]
+>
@@ -4117,12 +4117,12 @@ is_nil(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.is_nil(s)
-#Explorer.Series<
- Polars[4]
- boolean [false, false, true, false]
->
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.is_nil(s)
+#Explorer.Series<
+ Polars[4]
+ boolean [false, false, true, false]
+>
@@ -4156,12 +4156,12 @@ is_not_nil(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.is_not_nil(s)
-#Explorer.Series<
- Polars[4]
- boolean [true, true, false, true]
->
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.is_not_nil(s)
+#Explorer.Series<
+ Polars[4]
+ boolean [true, true, false, true]
+>
@@ -4206,13 +4206,13 @@ less(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([1, 2, 4])
-iex> Explorer.Series.less(s1, s2)
-#Explorer.Series<
- Polars[3]
- boolean [false, false, true]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([1, 2, 4])
+iex> Explorer.Series.less(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, false, true]
+>
@@ -4257,13 +4257,13 @@ less_equal(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([1, 2, 4])
-iex> Explorer.Series.less_equal(s1, s2)
-#Explorer.Series<
- Polars[3]
- boolean [true, true, true]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([1, 2, 4])
+iex> Explorer.Series.less_equal(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, true, true]
+>
@@ -4304,12 +4304,12 @@ log(s)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3, nil, 4])
-iex> Explorer.Series.log(s)
-#Explorer.Series<
- Polars[5]
- float [0.0, 0.6931471805599453, 1.0986122886681098, nil, 1.3862943611198906]
->
+iex> s = Explorer.Series.from_list([1, 2, 3, nil, 4])
+iex> Explorer.Series.log(s)
+#Explorer.Series<
+ Polars[5]
+ float [0.0, 0.6931471805599453, 1.0986122886681098, nil, 1.3862943611198906]
+>
@@ -4349,12 +4349,12 @@ log(argument, base)
Examples
-iex> s = Explorer.Series.from_list([8, 16, 32])
-iex> Explorer.Series.log(s, 2)
-#Explorer.Series<
- Polars[3]
- float [3.0, 4.0, 5.0]
->
+iex> s = Explorer.Series.from_list([8, 16, 32])
+iex> Explorer.Series.log(s, 2)
+#Explorer.Series<
+ Polars[3]
+ float [3.0, 4.0, 5.0]
+>
@@ -4388,13 +4388,13 @@ mask(series, mask)
Examples
-iex> s1 = Explorer.Series.from_list([1,2,3])
-iex> s2 = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.mask(s1, s2)
-#Explorer.Series<
- Polars[2]
- integer [1, 3]
->
+iex> s1 = Explorer.Series.from_list([1,2,3])
+iex> s2 = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.mask(s1, s2)
+#Explorer.Series<
+ Polars[2]
+ integer [1, 3]
+>
@@ -4436,20 +4436,20 @@ multiply(left, right)
Examples
-iex> s1 = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> s2 = 11..20 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.multiply(s1, s2)
-#Explorer.Series<
- Polars[10]
- integer [11, 24, 39, 56, 75, 96, 119, 144, 171, 200]
->
+iex> s1 = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> s2 = 11..20 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.multiply(s1, s2)
+#Explorer.Series<
+ Polars[10]
+ integer [11, 24, 39, 56, 75, 96, 119, 144, 171, 200]
+>
-iex> s1 = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.multiply(s1, 2)
-#Explorer.Series<
- Polars[5]
- integer [2, 4, 6, 8, 10]
->
+iex> s1 = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.multiply(s1, 2)
+#Explorer.Series<
+ Polars[5]
+ integer [2, 4, 6, 8, 10]
+>
@@ -4477,12 +4477,12 @@ not series
Examples
-iex> s1 = Explorer.Series.from_list([true, false, false])
-iex> Explorer.Series.not(s1)
-#Explorer.Series<
- Polars[3]
- boolean [false, true, true]
->
+iex> s1 = Explorer.Series.from_list([true, false, false])
+iex> Explorer.Series.not(s1)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, true, true]
+>
@@ -4523,51 +4523,51 @@ not_equal(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([1, 2, 4])
-iex> Explorer.Series.not_equal(s1, s2)
-#Explorer.Series<
- Polars[3]
- boolean [false, false, true]
->
-
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.not_equal(s, 1)
-#Explorer.Series<
- Polars[3]
- boolean [false, true, true]
->
-
-iex> s = Explorer.Series.from_list([true, true, false])
-iex> Explorer.Series.not_equal(s, true)
-#Explorer.Series<
- Polars[3]
- boolean [false, false, true]
->
-
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.not_equal(s, "a")
-#Explorer.Series<
- Polars[3]
- boolean [false, true, true]
->
-
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.not_equal(s, ~D[1999-12-31])
-#Explorer.Series<
- Polars[2]
- boolean [true, false]
->
-
-iex> s = Explorer.Series.from_list([~N[2022-01-01 00:00:00], ~N[2022-01-01 23:00:00]])
-iex> Explorer.Series.not_equal(s, ~N[2022-01-01 00:00:00])
-#Explorer.Series<
- Polars[2]
- boolean [false, true]
->
-
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.not_equal(s, false)
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([1, 2, 4])
+iex> Explorer.Series.not_equal(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, false, true]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.not_equal(s, 1)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, true, true]
+>
+
+iex> s = Explorer.Series.from_list([true, true, false])
+iex> Explorer.Series.not_equal(s, true)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, false, true]
+>
+
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.not_equal(s, "a")
+#Explorer.Series<
+ Polars[3]
+ boolean [false, true, true]
+>
+
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.not_equal(s, ~D[1999-12-31])
+#Explorer.Series<
+ Polars[2]
+ boolean [true, false]
+>
+
+iex> s = Explorer.Series.from_list([~N[2022-01-01 00:00:00], ~N[2022-01-01 23:00:00]])
+iex> Explorer.Series.not_equal(s, ~N[2022-01-01 00:00:00])
+#Explorer.Series<
+ Polars[2]
+ boolean [false, true]
+>
+
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.not_equal(s, false)
** (ArgumentError) cannot invoke Explorer.Series.not_equal/2 with mismatched dtypes: :string and false
@@ -4597,14 +4597,14 @@ left or right
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> mask1 = Explorer.Series.less(s1, 2)
-iex> mask2 = Explorer.Series.greater(s1, 2)
-iex> Explorer.Series.or(mask1, mask2)
-#Explorer.Series<
- Polars[3]
- boolean [true, false, true]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> mask1 = Explorer.Series.less(s1, 2)
+iex> mask2 = Explorer.Series.greater(s1, 2)
+iex> Explorer.Series.or(mask1, mask2)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, true]
+>
@@ -4646,19 +4646,19 @@ peaks(series, max_or_min \\ :max)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 4, 1, 4])
-iex> Explorer.Series.peaks(s)
-#Explorer.Series<
- Polars[5]
- boolean [false, false, true, false, true]
->
+iex> s = Explorer.Series.from_list([1, 2, 4, 1, 4])
+iex> Explorer.Series.peaks(s)
+#Explorer.Series<
+ Polars[5]
+ boolean [false, false, true, false, true]
+>
-iex> s = [~T[03:00:02.000000], ~T[13:24:56.000000], ~T[02:04:19.000000]] |> Explorer.Series.from_list()
-iex> Explorer.Series.peaks(s)
-#Explorer.Series<
- Polars[3]
- boolean [false, true, false]
->
+iex> s = [~T[03:00:02.000000], ~T[13:24:56.000000], ~T[02:04:19.000000]] |> Explorer.Series.from_list()
+iex> Explorer.Series.peaks(s)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, true, false]
+>
@@ -4700,40 +4700,40 @@ pow(left, right)
Examples
-iex> s = [8, 16, 32] |> Explorer.Series.from_list()
-iex> Explorer.Series.pow(s, 2.0)
-#Explorer.Series<
- Polars[3]
- float [64.0, 256.0, 1024.0]
->
-
-iex> s = [2, 4, 6] |> Explorer.Series.from_list()
-iex> Explorer.Series.pow(s, 3)
-#Explorer.Series<
- Polars[3]
- integer [8, 64, 216]
->
-
-iex> s = [2, 4, 6] |> Explorer.Series.from_list()
-iex> Explorer.Series.pow(s, -3.0)
-#Explorer.Series<
- Polars[3]
- float [0.125, 0.015625, 0.004629629629629629]
->
-
-iex> s = [1.0, 2.0, 3.0] |> Explorer.Series.from_list()
-iex> Explorer.Series.pow(s, 3.0)
-#Explorer.Series<
- Polars[3]
- float [1.0, 8.0, 27.0]
->
-
-iex> s = [2.0, 4.0, 6.0] |> Explorer.Series.from_list()
-iex> Explorer.Series.pow(s, 2)
-#Explorer.Series<
- Polars[3]
- float [4.0, 16.0, 36.0]
->
+iex> s = [8, 16, 32] |> Explorer.Series.from_list()
+iex> Explorer.Series.pow(s, 2.0)
+#Explorer.Series<
+ Polars[3]
+ float [64.0, 256.0, 1024.0]
+>
+
+iex> s = [2, 4, 6] |> Explorer.Series.from_list()
+iex> Explorer.Series.pow(s, 3)
+#Explorer.Series<
+ Polars[3]
+ integer [8, 64, 216]
+>
+
+iex> s = [2, 4, 6] |> Explorer.Series.from_list()
+iex> Explorer.Series.pow(s, -3.0)
+#Explorer.Series<
+ Polars[3]
+ float [0.125, 0.015625, 0.004629629629629629]
+>
+
+iex> s = [1.0, 2.0, 3.0] |> Explorer.Series.from_list()
+iex> Explorer.Series.pow(s, 3.0)
+#Explorer.Series<
+ Polars[3]
+ float [1.0, 8.0, 27.0]
+>
+
+iex> s = [2.0, 4.0, 6.0] |> Explorer.Series.from_list()
+iex> Explorer.Series.pow(s, 2)
+#Explorer.Series<
+ Polars[3]
+ float [4.0, 16.0, 36.0]
+>
@@ -4775,28 +4775,28 @@ quotient(left, right)
Examples
-iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
-iex> s2 = [2, 2, 2] |> Explorer.Series.from_list()
-iex> Explorer.Series.quotient(s1, s2)
-#Explorer.Series<
- Polars[3]
- integer [5, 5, 5]
->
+iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
+iex> s2 = [2, 2, 2] |> Explorer.Series.from_list()
+iex> Explorer.Series.quotient(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ integer [5, 5, 5]
+>
-iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
-iex> s2 = [2, 2, 0] |> Explorer.Series.from_list()
-iex> Explorer.Series.quotient(s1, s2)
-#Explorer.Series<
- Polars[3]
- integer [5, 5, nil]
->
+iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
+iex> s2 = [2, 2, 0] |> Explorer.Series.from_list()
+iex> Explorer.Series.quotient(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ integer [5, 5, nil]
+>
-iex> s1 = [10, 12, 15] |> Explorer.Series.from_list()
-iex> Explorer.Series.quotient(s1, 3)
-#Explorer.Series<
- Polars[3]
- integer [3, 4, 5]
->
+iex> s1 = [10, 12, 15] |> Explorer.Series.from_list()
+iex> Explorer.Series.quotient(s1, 3)
+#Explorer.Series<
+ Polars[3]
+ integer [3, 4, 5]
+>
@@ -4838,48 +4838,48 @@ rank(series, opts \\ [])
Examples
-iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
-iex> Explorer.Series.rank(s)
-#Explorer.Series<
- Polars[5]
- float [3.0, 4.5, 1.5, 1.5, 4.5]
->
-
-iex> s = Explorer.Series.from_list([1.1, 2.4, 3.2])
-iex> Explorer.Series.rank(s, method: "ordinal")
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
-
-iex> s = Explorer.Series.from_list([ ~N[2022-07-07 17:44:13.020548], ~N[2022-07-07 17:43:08.473561], ~N[2022-07-07 17:45:00.116337] ])
-iex> Explorer.Series.rank(s, method: "average")
-#Explorer.Series<
- Polars[3]
- float [2.0, 1.0, 3.0]
->
-
-iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
-iex> Explorer.Series.rank(s, method: "min")
-#Explorer.Series<
- Polars[5]
- integer [3, 4, 1, 1, 4]
->
-
-iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
-iex> Explorer.Series.rank(s, method: "dense")
-#Explorer.Series<
- Polars[5]
- integer [2, 3, 1, 1, 3]
->
-
-
-iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
-iex> Explorer.Series.rank(s, method: "random", seed: 42)
-#Explorer.Series<
- Polars[5]
- integer [3, 4, 2, 1, 5]
->
+iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
+iex> Explorer.Series.rank(s)
+#Explorer.Series<
+ Polars[5]
+ float [3.0, 4.5, 1.5, 1.5, 4.5]
+>
+
+iex> s = Explorer.Series.from_list([1.1, 2.4, 3.2])
+iex> Explorer.Series.rank(s, method: "ordinal")
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
+
+iex> s = Explorer.Series.from_list([ ~N[2022-07-07 17:44:13.020548], ~N[2022-07-07 17:43:08.473561], ~N[2022-07-07 17:45:00.116337] ])
+iex> Explorer.Series.rank(s, method: "average")
+#Explorer.Series<
+ Polars[3]
+ float [2.0, 1.0, 3.0]
+>
+
+iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
+iex> Explorer.Series.rank(s, method: "min")
+#Explorer.Series<
+ Polars[5]
+ integer [3, 4, 1, 1, 4]
+>
+
+iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
+iex> Explorer.Series.rank(s, method: "dense")
+#Explorer.Series<
+ Polars[5]
+ integer [2, 3, 1, 1, 3]
+>
+
+
+iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
+iex> Explorer.Series.rank(s, method: "random", seed: 42)
+#Explorer.Series<
+ Polars[5]
+ integer [3, 4, 2, 1, 5]
+>
@@ -4921,28 +4921,28 @@ remainder(left, right)
Examples
-iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
-iex> s2 = [2, 2, 2] |> Explorer.Series.from_list()
-iex> Explorer.Series.remainder(s1, s2)
-#Explorer.Series<
- Polars[3]
- integer [0, 1, 0]
->
+iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
+iex> s2 = [2, 2, 2] |> Explorer.Series.from_list()
+iex> Explorer.Series.remainder(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ integer [0, 1, 0]
+>
-iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
-iex> s2 = [2, 2, 0] |> Explorer.Series.from_list()
-iex> Explorer.Series.remainder(s1, s2)
-#Explorer.Series<
- Polars[3]
- integer [0, 1, nil]
->
+iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
+iex> s2 = [2, 2, 0] |> Explorer.Series.from_list()
+iex> Explorer.Series.remainder(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ integer [0, 1, nil]
+>
-iex> s1 = [10, 11, 9] |> Explorer.Series.from_list()
-iex> Explorer.Series.remainder(s1, 3)
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 0]
->
+iex> s1 = [10, 11, 9] |> Explorer.Series.from_list()
+iex> Explorer.Series.remainder(s1, 3)
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 0]
+>
@@ -5012,12 +5012,12 @@ strftime(series, format_string)
Examples
-iex> s = Explorer.Series.from_list([~N[2023-01-05 12:34:56], nil])
-iex> Explorer.Series.strftime(s, "%Y/%m/%d %H:%M:%S")
-#Explorer.Series<
- Polars[2]
- string ["2023/01/05 12:34:56", nil]
->
+iex> s = Explorer.Series.from_list([~N[2023-01-05 12:34:56], nil])
+iex> Explorer.Series.strftime(s, "%Y/%m/%d %H:%M:%S")
+#Explorer.Series<
+ Polars[2]
+ string ["2023/01/05 12:34:56", nil]
+>
@@ -5052,12 +5052,12 @@ strptime(series, format_string)
Examples
-iex> s = Explorer.Series.from_list(["2023-01-05 12:34:56", "XYZ", nil])
-iex> Explorer.Series.strptime(s, "%Y-%m-%d %H:%M:%S")
-#Explorer.Series<
- Polars[3]
- datetime [2023-01-05 12:34:56.000000, nil, nil]
->
+iex> s = Explorer.Series.from_list(["2023-01-05 12:34:56", "XYZ", nil])
+iex> Explorer.Series.strptime(s, "%Y-%m-%d %H:%M:%S")
+#Explorer.Series<
+ Polars[3]
+ datetime [2023-01-05 12:34:56.000000, nil, nil]
+>
@@ -5099,25 +5099,25 @@ subtract(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([4, 5, 6])
-iex> Explorer.Series.subtract(s1, s2)
-#Explorer.Series<
- Polars[3]
- integer [-3, -3, -3]
->
You can also use scalar values on both sides:
iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.subtract(s1, 2)
-#Explorer.Series<
- Polars[3]
- integer [-1, 0, 1]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([4, 5, 6])
+iex> Explorer.Series.subtract(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ integer [-3, -3, -3]
+>
You can also use scalar values on both sides:
iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.subtract(s1, 2)
+#Explorer.Series<
+ Polars[3]
+ integer [-1, 0, 1]
+>
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.subtract(2, s1)
-#Explorer.Series<
- Polars[3]
- integer [1, 0, -1]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.subtract(2, s1)
+#Explorer.Series<
+ Polars[3]
+ integer [1, 0, -1]
+>
@@ -5149,19 +5149,19 @@ transform(series, fun)
Examples
-iex> s = Explorer.Series.from_list(["this ", " is", "great "])
-iex> Explorer.Series.transform(s, &String.trim/1)
-#Explorer.Series<
- Polars[3]
- string ["this", "is", "great"]
->
+iex> s = Explorer.Series.from_list(["this ", " is", "great "])
+iex> Explorer.Series.transform(s, &String.trim/1)
+#Explorer.Series<
+ Polars[3]
+ string ["this", "is", "great"]
+>
-iex> s = Explorer.Series.from_list(["this", "is", "great"])
-iex> Explorer.Series.transform(s, &String.length/1)
-#Explorer.Series<
- Polars[3]
- integer [4, 2, 5]
->
+iex> s = Explorer.Series.from_list(["this", "is", "great"])
+iex> Explorer.Series.transform(s, &String.length/1)
+#Explorer.Series<
+ Polars[3]
+ integer [4, 2, 5]
+>
@@ -5207,17 +5207,17 @@ day_of_week(series)
Examples
-iex> s = Explorer.Series.from_list([~D[2023-01-15], ~D[2023-01-16], ~D[2023-01-20], nil])
-iex> Explorer.Series.day_of_week(s)
-#Explorer.Series<
- Polars[4]
- integer [7, 1, 5, nil]
->
It can also be called on a datetime series.
iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2023-01-16 23:59:59.999999], ~N[2023-01-20 12:00:00], nil])
-iex> Explorer.Series.day_of_week(s)
-#Explorer.Series<
- Polars[4]
- integer [7, 1, 5, nil]
->
+iex> s = Explorer.Series.from_list([~D[2023-01-15], ~D[2023-01-16], ~D[2023-01-20], nil])
+iex> Explorer.Series.day_of_week(s)
+#Explorer.Series<
+ Polars[4]
+ integer [7, 1, 5, nil]
+>
It can also be called on a datetime series.
iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2023-01-16 23:59:59.999999], ~N[2023-01-20 12:00:00], nil])
+iex> Explorer.Series.day_of_week(s)
+#Explorer.Series<
+ Polars[4]
+ integer [7, 1, 5, nil]
+>
@@ -5251,12 +5251,12 @@ hour(series)
Examples
-iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
-iex> Explorer.Series.hour(s)
-#Explorer.Series<
- Polars[4]
- integer [0, 23, 12, nil]
->
+iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
+iex> Explorer.Series.hour(s)
+#Explorer.Series<
+ Polars[4]
+ integer [0, 23, 12, nil]
+>
@@ -5290,12 +5290,12 @@ minute(series)
Examples
-iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
-iex> Explorer.Series.minute(s)
-#Explorer.Series<
- Polars[4]
- integer [0, 59, 0, nil]
->
+iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
+iex> Explorer.Series.minute(s)
+#Explorer.Series<
+ Polars[4]
+ integer [0, 59, 0, nil]
+>
@@ -5329,17 +5329,17 @@ month(series)
Examples
-iex> s = Explorer.Series.from_list([~D[2023-01-15], ~D[2023-02-16], ~D[2023-03-20], nil])
-iex> Explorer.Series.month(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 3, nil]
->
It can also be called on a datetime series.
iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2023-02-16 23:59:59.999999], ~N[2023-03-20 12:00:00], nil])
-iex> Explorer.Series.month(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 3, nil]
->
+iex> s = Explorer.Series.from_list([~D[2023-01-15], ~D[2023-02-16], ~D[2023-03-20], nil])
+iex> Explorer.Series.month(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 3, nil]
+>
It can also be called on a datetime series.
iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2023-02-16 23:59:59.999999], ~N[2023-03-20 12:00:00], nil])
+iex> Explorer.Series.month(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 3, nil]
+>
@@ -5373,12 +5373,12 @@ second(series)
Examples
-iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
-iex> Explorer.Series.second(s)
-#Explorer.Series<
- Polars[4]
- integer [0, 59, 0, nil]
->
+iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
+iex> Explorer.Series.second(s)
+#Explorer.Series<
+ Polars[4]
+ integer [0, 59, 0, nil]
+>
@@ -5412,17 +5412,17 @@ year(series)
Examples
-iex> s = Explorer.Series.from_list([~D[2023-01-15], ~D[2022-02-16], ~D[2021-03-20], nil])
-iex> Explorer.Series.year(s)
-#Explorer.Series<
- Polars[4]
- integer [2023, 2022, 2021, nil]
->
It can also be called on a datetime series.
iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
-iex> Explorer.Series.year(s)
-#Explorer.Series<
- Polars[4]
- integer [2023, 2022, 2021, nil]
->
+iex> s = Explorer.Series.from_list([~D[2023-01-15], ~D[2022-02-16], ~D[2021-03-20], nil])
+iex> Explorer.Series.year(s)
+#Explorer.Series<
+ Polars[4]
+ integer [2023, 2022, 2021, nil]
+>
It can also be called on a datetime series.
iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
+iex> Explorer.Series.year(s)
+#Explorer.Series<
+ Polars[4]
+ integer [2023, 2022, 2021, nil]
+>
@@ -5475,12 +5475,12 @@ acos(series)
Examples
-iex> s = [1.0, 0.0, -1.0, -0.7071067811865475, 0.7071067811865475] |> Explorer.Series.from_list()
-iex> Explorer.Series.acos(s)
-#Explorer.Series<
- Polars[5]
- float [0.0, 1.5707963267948966, 3.141592653589793, 2.356194490192345, 0.7853981633974484]
->
+iex> s = [1.0, 0.0, -1.0, -0.7071067811865475, 0.7071067811865475] |> Explorer.Series.from_list()
+iex> Explorer.Series.acos(s)
+#Explorer.Series<
+ Polars[5]
+ float [0.0, 1.5707963267948966, 3.141592653589793, 2.356194490192345, 0.7853981633974484]
+>
@@ -5521,12 +5521,12 @@ asin(series)
Examples
-iex> s = [1.0, 0.0, -1.0, -0.7071067811865475, 0.7071067811865475] |> Explorer.Series.from_list()
-iex> Explorer.Series.asin(s)
-#Explorer.Series<
- Polars[5]
- float [1.5707963267948966, 0.0, -1.5707963267948966, -0.7853981633974482, 0.7853981633974482]
->
+iex> s = [1.0, 0.0, -1.0, -0.7071067811865475, 0.7071067811865475] |> Explorer.Series.from_list()
+iex> Explorer.Series.asin(s)
+#Explorer.Series<
+ Polars[5]
+ float [1.5707963267948966, 0.0, -1.5707963267948966, -0.7853981633974482, 0.7853981633974482]
+>
@@ -5567,12 +5567,12 @@ atan(series)
Examples
-iex> s = [1.0, 0.0, -1.0, -0.7071067811865475, 0.7071067811865475] |> Explorer.Series.from_list()
-iex> Explorer.Series.atan(s)
-#Explorer.Series<
- Polars[5]
- float [0.7853981633974483, 0.0, -0.7853981633974483, -0.6154797086703873, 0.6154797086703873]
->
+iex> s = [1.0, 0.0, -1.0, -0.7071067811865475, 0.7071067811865475] |> Explorer.Series.from_list()
+iex> Explorer.Series.atan(s)
+#Explorer.Series<
+ Polars[5]
+ float [0.7853981633974483, 0.0, -0.7853981633974483, -0.6154797086703873, 0.6154797086703873]
+>
@@ -5606,12 +5606,12 @@ ceil(series)
Examples
-iex> s = Explorer.Series.from_list([1.124993, 2.555321, 3.995001])
-iex> Explorer.Series.ceil(s)
-#Explorer.Series<
- Polars[3]
- float [2.0, 3.0, 4.0]
->
+iex> s = Explorer.Series.from_list([1.124993, 2.555321, 3.995001])
+iex> Explorer.Series.ceil(s)
+#Explorer.Series<
+ Polars[3]
+ float [2.0, 3.0, 4.0]
+>
@@ -5652,13 +5652,13 @@ cos(series)
Examples
-iex> pi = :math.pi()
-iex> s = [-pi * 3/2, -pi, -pi / 2, -pi / 4, 0, pi / 4, pi / 2, pi, pi * 3/2] |> Explorer.Series.from_list()
-iex> Explorer.Series.cos(s)
-#Explorer.Series<
- Polars[9]
- float [-1.8369701987210297e-16, -1.0, 6.123233995736766e-17, 0.7071067811865476, 1.0, 0.7071067811865476, 6.123233995736766e-17, -1.0, -1.8369701987210297e-16]
->
+iex> pi = :math.pi()
+iex> s = [-pi * 3/2, -pi, -pi / 2, -pi / 4, 0, pi / 4, pi / 2, pi, pi * 3/2] |> Explorer.Series.from_list()
+iex> Explorer.Series.cos(s)
+#Explorer.Series<
+ Polars[9]
+ float [-1.8369701987210297e-16, -1.0, 6.123233995736766e-17, 0.7071067811865476, 1.0, 0.7071067811865476, 6.123233995736766e-17, -1.0, -1.8369701987210297e-16]
+>
@@ -5692,12 +5692,12 @@ floor(series)
Examples
-iex> s = Explorer.Series.from_list([1.124993, 2.555321, 3.995001])
-iex> Explorer.Series.floor(s)
-#Explorer.Series<
- Polars[3]
- float [1.0, 2.0, 3.0]
->
+iex> s = Explorer.Series.from_list([1.124993, 2.555321, 3.995001])
+iex> Explorer.Series.floor(s)
+#Explorer.Series<
+ Polars[3]
+ float [1.0, 2.0, 3.0]
+>
@@ -5731,14 +5731,14 @@ is_finite(series)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 0, nil])
-iex> s2 = Explorer.Series.from_list([0, 2, 0, nil])
-iex> s3 = Explorer.Series.divide(s1, s2)
-iex> Explorer.Series.is_finite(s3)
-#Explorer.Series<
- Polars[4]
- boolean [false, true, false, nil]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 0, nil])
+iex> s2 = Explorer.Series.from_list([0, 2, 0, nil])
+iex> s3 = Explorer.Series.divide(s1, s2)
+iex> Explorer.Series.is_finite(s3)
+#Explorer.Series<
+ Polars[4]
+ boolean [false, true, false, nil]
+>
@@ -5772,14 +5772,14 @@ is_infinite(series)
Examples
-iex> s1 = Explorer.Series.from_list([1, -1, 2, 0, nil])
-iex> s2 = Explorer.Series.from_list([0, 0, 2, 0, nil])
-iex> s3 = Explorer.Series.divide(s1, s2)
-iex> Explorer.Series.is_infinite(s3)
-#Explorer.Series<
- Polars[5]
- boolean [true, true, false, false, nil]
->
+iex> s1 = Explorer.Series.from_list([1, -1, 2, 0, nil])
+iex> s2 = Explorer.Series.from_list([0, 0, 2, 0, nil])
+iex> s3 = Explorer.Series.divide(s1, s2)
+iex> Explorer.Series.is_infinite(s3)
+#Explorer.Series<
+ Polars[5]
+ boolean [true, true, false, false, nil]
+>
@@ -5813,14 +5813,14 @@ is_nan(series)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 0, nil])
-iex> s2 = Explorer.Series.from_list([0, 2, 0, nil])
-iex> s3 = Explorer.Series.divide(s1, s2)
-iex> Explorer.Series.is_nan(s3)
-#Explorer.Series<
- Polars[4]
- boolean [false, false, true, nil]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 0, nil])
+iex> s2 = Explorer.Series.from_list([0, 2, 0, nil])
+iex> s3 = Explorer.Series.divide(s1, s2)
+iex> Explorer.Series.is_nan(s3)
+#Explorer.Series<
+ Polars[4]
+ boolean [false, false, true, nil]
+>
@@ -5854,12 +5854,12 @@ round(series, decimals)
Examples
-iex> s = Explorer.Series.from_list([1.124993, 2.555321, 3.995001])
-iex> Explorer.Series.round(s, 2)
-#Explorer.Series<
- Polars[3]
- float [1.12, 2.56, 4.0]
->
+iex> s = Explorer.Series.from_list([1.124993, 2.555321, 3.995001])
+iex> Explorer.Series.round(s, 2)
+#Explorer.Series<
+ Polars[3]
+ float [1.12, 2.56, 4.0]
+>
@@ -5900,13 +5900,13 @@ sin(series)
Examples
-iex> pi = :math.pi()
-iex> s = [-pi * 3/2, -pi, -pi / 2, -pi / 4, 0, pi / 4, pi / 2, pi, pi * 3/2] |> Explorer.Series.from_list()
-iex> Explorer.Series.sin(s)
-#Explorer.Series<
- Polars[9]
- float [1.0, -1.2246467991473532e-16, -1.0, -0.7071067811865475, 0.0, 0.7071067811865475, 1.0, 1.2246467991473532e-16, -1.0]
->
+iex> pi = :math.pi()
+iex> s = [-pi * 3/2, -pi, -pi / 2, -pi / 4, 0, pi / 4, pi / 2, pi, pi * 3/2] |> Explorer.Series.from_list()
+iex> Explorer.Series.sin(s)
+#Explorer.Series<
+ Polars[9]
+ float [1.0, -1.2246467991473532e-16, -1.0, -0.7071067811865475, 0.0, 0.7071067811865475, 1.0, 1.2246467991473532e-16, -1.0]
+>
@@ -5947,13 +5947,13 @@ tan(series)
Examples
-iex> pi = :math.pi()
-iex> s = [-pi * 3/2, -pi, -pi / 2, -pi / 4, 0, pi / 4, pi / 2, pi, pi * 3/2] |> Explorer.Series.from_list()
-iex> Explorer.Series.tan(s)
-#Explorer.Series<
- Polars[9]
- float [-5443746451065123.0, 1.2246467991473532e-16, -1.633123935319537e16, -0.9999999999999999, 0.0, 0.9999999999999999, 1.633123935319537e16, -1.2246467991473532e-16, 5443746451065123.0]
->
+iex> pi = :math.pi()
+iex> s = [-pi * 3/2, -pi, -pi / 2, -pi / 4, 0, pi / 4, pi / 2, pi, pi * 3/2] |> Explorer.Series.from_list()
+iex> Explorer.Series.tan(s)
+#Explorer.Series<
+ Polars[9]
+ float [-5443746451065123.0, 1.2246467991473532e-16, -1.633123935319537e16, -0.9999999999999999, 0.0, 0.9999999999999999, 1.633123935319537e16, -1.2246467991473532e-16, 5443746451065123.0]
+>
@@ -5999,12 +5999,12 @@ contains(series, pattern)
Examples
-iex> s = Explorer.Series.from_list(["abc", "def", "bcd"])
-iex> Explorer.Series.contains(s, "bc")
-#Explorer.Series<
- Polars[3]
- boolean [true, false, true]
->
+iex> s = Explorer.Series.from_list(["abc", "def", "bcd"])
+iex> Explorer.Series.contains(s, "bc")
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, true]
+>
@@ -6038,12 +6038,12 @@ downcase(series)
Examples
-iex> s = Explorer.Series.from_list(["ABC", "DEF", "BCD"])
-iex> Explorer.Series.downcase(s)
-#Explorer.Series<
- Polars[3]
- string ["abc", "def", "bcd"]
->
+iex> s = Explorer.Series.from_list(["ABC", "DEF", "BCD"])
+iex> Explorer.Series.downcase(s)
+#Explorer.Series<
+ Polars[3]
+ string ["abc", "def", "bcd"]
+>
@@ -6077,12 +6077,12 @@ trim(series)
Examples
-iex> s = Explorer.Series.from_list([" abc", "def ", " bcd "])
-iex> Explorer.Series.trim(s)
-#Explorer.Series<
- Polars[3]
- string ["abc", "def", "bcd"]
->
+iex> s = Explorer.Series.from_list([" abc", "def ", " bcd "])
+iex> Explorer.Series.trim(s)
+#Explorer.Series<
+ Polars[3]
+ string ["abc", "def", "bcd"]
+>
@@ -6117,19 +6117,19 @@ trim(series, string)
Examples
-iex> s = Explorer.Series.from_list(["abc", "adefa", "bcda"])
-iex> Explorer.Series.trim(s, "a")
-#Explorer.Series<
- Polars[3]
- string ["bc", "def", "bcd"]
->
+iex> s = Explorer.Series.from_list(["abc", "adefa", "bcda"])
+iex> Explorer.Series.trim(s, "a")
+#Explorer.Series<
+ Polars[3]
+ string ["bc", "def", "bcd"]
+>
-iex> s = Explorer.Series.from_list(["Ā£123", "1.00Ā£", "Ā£1.00Ā£"])
-iex> Explorer.Series.trim(s, "Ā£")
-#Explorer.Series<
- Polars[3]
- string ["123", "1.00", "1.00"]
->
+iex> s = Explorer.Series.from_list(["Ā£123", "1.00Ā£", "Ā£1.00Ā£"])
+iex> Explorer.Series.trim(s, "Ā£")
+#Explorer.Series<
+ Polars[3]
+ string ["123", "1.00", "1.00"]
+>
@@ -6163,12 +6163,12 @@ trim_leading(series)
Examples
-iex> s = Explorer.Series.from_list([" abc", "def ", " bcd"])
-iex> Explorer.Series.trim_leading(s)
-#Explorer.Series<
- Polars[3]
- string ["abc", "def ", "bcd"]
->
+iex> s = Explorer.Series.from_list([" abc", "def ", " bcd"])
+iex> Explorer.Series.trim_leading(s)
+#Explorer.Series<
+ Polars[3]
+ string ["abc", "def ", "bcd"]
+>
@@ -6202,12 +6202,12 @@ trim_trailing(series)
Examples
-iex> s = Explorer.Series.from_list([" abc", "def ", " bcd"])
-iex> Explorer.Series.trim_trailing(s)
-#Explorer.Series<
- Polars[3]
- string [" abc", "def", " bcd"]
->
+iex> s = Explorer.Series.from_list([" abc", "def ", " bcd"])
+iex> Explorer.Series.trim_trailing(s)
+#Explorer.Series<
+ Polars[3]
+ string [" abc", "def", " bcd"]
+>
@@ -6241,12 +6241,12 @@ upcase(series)
Examples
-iex> s = Explorer.Series.from_list(["abc", "def", "bcd"])
-iex> Explorer.Series.upcase(s)
-#Explorer.Series<
- Polars[3]
- string ["ABC", "DEF", "BCD"]
->
+iex> s = Explorer.Series.from_list(["abc", "def", "bcd"])
+iex> Explorer.Series.upcase(s)
+#Explorer.Series<
+ Polars[3]
+ string ["ABC", "DEF", "BCD"]
+>
@@ -6293,19 +6293,19 @@ categories(series)
Examples
-iex> s = Explorer.Series.from_list(["a", "b", "c", nil, "a", "c"], dtype: :category)
-iex> Explorer.Series.categories(s)
-#Explorer.Series<
- Polars[3]
- string ["a", "b", "c"]
->
+iex> s = Explorer.Series.from_list(["a", "b", "c", nil, "a", "c"], dtype: :category)
+iex> Explorer.Series.categories(s)
+#Explorer.Series<
+ Polars[3]
+ string ["a", "b", "c"]
+>
-iex> s = Explorer.Series.from_list(["c", "a", "b"], dtype: :category)
-iex> Explorer.Series.categories(s)
-#Explorer.Series<
- Polars[3]
- string ["c", "a", "b"]
->
+iex> s = Explorer.Series.from_list(["c", "a", "b"], dtype: :category)
+iex> Explorer.Series.categories(s)
+#Explorer.Series<
+ Polars[3]
+ string ["c", "a", "b"]
+>
@@ -6339,12 +6339,12 @@ dtype(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.dtype(s)
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.dtype(s)
:integer
-iex> s = Explorer.Series.from_list(["a", nil, "b", "c"])
-iex> Explorer.Series.dtype(s)
+iex> s = Explorer.Series.from_list(["a", nil, "b", "c"])
+iex> Explorer.Series.dtype(s)
:string
@@ -6380,31 +6380,31 @@ iotype(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3, 4])
-iex> Explorer.Series.iotype(s)
-{:s, 64}
+iex> s = Explorer.Series.from_list([1, 2, 3, 4])
+iex> Explorer.Series.iotype(s)
+{:s, 64}
-iex> s = Explorer.Series.from_list([~D[1999-12-31], ~D[1989-01-01]])
-iex> Explorer.Series.iotype(s)
-{:s, 32}
+iex> s = Explorer.Series.from_list([~D[1999-12-31], ~D[1989-01-01]])
+iex> Explorer.Series.iotype(s)
+{:s, 32}
-iex> s = Explorer.Series.from_list([~T[00:00:00.000000], ~T[23:59:59.999999]])
-iex> Explorer.Series.iotype(s)
-{:s, 64}
+iex> s = Explorer.Series.from_list([~T[00:00:00.000000], ~T[23:59:59.999999]])
+iex> Explorer.Series.iotype(s)
+{:s, 64}
-iex> s = Explorer.Series.from_list([1.2, 2.3, 3.5, 4.5])
-iex> Explorer.Series.iotype(s)
-{:f, 64}
+iex> s = Explorer.Series.from_list([1.2, 2.3, 3.5, 4.5])
+iex> Explorer.Series.iotype(s)
+{:f, 64}
-iex> s = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.iotype(s)
-{:u, 8}
The operation returns :none
for strings and binaries, as they do not
-provide a fixed-width binary representation:
iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.iotype(s)
+iex> s = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.iotype(s)
+{:u, 8}
The operation returns :none
for strings and binaries, as they do not
+provide a fixed-width binary representation:
iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.iotype(s)
:none
However, if appropriate, you can convert them to categorical types,
-which will then return the index of each category:
iex> s = Explorer.Series.from_list(["a", "b", "c"], dtype: :category)
-iex> Explorer.Series.iotype(s)
-{:u, 32}
+which will then return the index of each category:iex> s = Explorer.Series.from_list(["a", "b", "c"], dtype: :category)
+iex> Explorer.Series.iotype(s)
+{:u, 32}
@@ -6438,8 +6438,8 @@ size(series)
Examples
-iex> s = Explorer.Series.from_list([~D[1999-12-31], ~D[1989-01-01]])
-iex> Explorer.Series.size(s)
+iex> s = Explorer.Series.from_list([~D[1999-12-31], ~D[1989-01-01]])
+iex> Explorer.Series.size(s)
2
@@ -6490,19 +6490,19 @@ argsort(series, opts \\ [])
Examples
-iex> s = Explorer.Series.from_list([9, 3, 7, 1])
-iex> Explorer.Series.argsort(s)
-#Explorer.Series<
- Polars[4]
- integer [3, 1, 2, 0]
->
+iex> s = Explorer.Series.from_list([9, 3, 7, 1])
+iex> Explorer.Series.argsort(s)
+#Explorer.Series<
+ Polars[4]
+ integer [3, 1, 2, 0]
+>
-iex> s = Explorer.Series.from_list([9, 3, 7, 1])
-iex> Explorer.Series.argsort(s, direction: :desc)
-#Explorer.Series<
- Polars[4]
- integer [0, 2, 1, 3]
->
+iex> s = Explorer.Series.from_list([9, 3, 7, 1])
+iex> Explorer.Series.argsort(s, direction: :desc)
+#Explorer.Series<
+ Polars[4]
+ integer [0, 2, 1, 3]
+>
@@ -6537,12 +6537,12 @@ at(series, idx)
Examples
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.at(s, 2)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.at(s, 2)
"c"
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.at(s, 4)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.at(s, 4)
** (ArgumentError) index 4 out of bounds for series of size 3
@@ -6577,17 +6577,17 @@ at_every(series, every_n)
Examples
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.at_every(s, 2)
-#Explorer.Series<
- Polars[5]
- integer [1, 3, 5, 7, 9]
->
If n is bigger than the size of the series, the result is a new series with only the first value of the supplied series.
iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.at_every(s, 20)
-#Explorer.Series<
- Polars[1]
- integer [1]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.at_every(s, 2)
+#Explorer.Series<
+ Polars[5]
+ integer [1, 3, 5, 7, 9]
+>
If n is bigger than the size of the series, the result is a new series with only the first value of the supplied series.
iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.at_every(s, 20)
+#Explorer.Series<
+ Polars[1]
+ integer [1]
+>
@@ -6621,21 +6621,21 @@ concat(series)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([4, 5, 6])
-iex> Explorer.Series.concat([s1, s2])
-#Explorer.Series<
- Polars[6]
- integer [1, 2, 3, 4, 5, 6]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([4, 5, 6])
+iex> Explorer.Series.concat([s1, s2])
+#Explorer.Series<
+ Polars[6]
+ integer [1, 2, 3, 4, 5, 6]
+>
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([4.0, 5.0, 6.4])
-iex> Explorer.Series.concat([s1, s2])
-#Explorer.Series<
- Polars[6]
- float [1.0, 2.0, 3.0, 4.0, 5.0, 6.4]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([4.0, 5.0, 6.4])
+iex> Explorer.Series.concat([s1, s2])
+#Explorer.Series<
+ Polars[6]
+ float [1.0, 2.0, 3.0, 4.0, 5.0, 6.4]
+>
@@ -6691,12 +6691,12 @@ distinct(series)
Examples
-iex> s = [1, 1, 2, 2, 3, 3] |> Explorer.Series.from_list()
-iex> Explorer.Series.distinct(s)
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
+iex> s = [1, 1, 2, 2, 3, 3] |> Explorer.Series.from_list()
+iex> Explorer.Series.distinct(s)
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
@@ -6730,8 +6730,8 @@ first(series)
Examples
-iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.first(s)
+iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.first(s)
1
@@ -6766,27 +6766,27 @@ format(list)
Examples
-iex> s1 = Explorer.Series.from_list(["a", "b", "c"])
-iex> s2 = Explorer.Series.from_list(["d", "e", "f"])
-iex> s3 = Explorer.Series.from_list(["g", "h", "i"])
-iex> Explorer.Series.format([s1, s2, s3])
-#Explorer.Series<
- Polars[3]
- string ["adg", "beh", "cfi"]
->
-
-iex> s1 = Explorer.Series.from_list(["a", "b", "c", "d"])
-iex> s2 = Explorer.Series.from_list([1, 2, 3, 4])
-iex> s3 = Explorer.Series.from_list([1.5, :nan, :infinity, :neg_infinity])
-iex> Explorer.Series.format([s1, "/", s2, "/", s3])
-#Explorer.Series<
- Polars[4]
- string ["a/1/1.5", "b/2/NaN", "c/3/inf", "d/4/-inf"]
->
-
-iex> s1 = Explorer.Series.from_list([<<1>>, <<239, 191, 19>>], dtype: :binary)
-iex> s2 = Explorer.Series.from_list([<<3>>, <<4>>], dtype: :binary)
-iex> Explorer.Series.format([s1, s2])
+iex> s1 = Explorer.Series.from_list(["a", "b", "c"])
+iex> s2 = Explorer.Series.from_list(["d", "e", "f"])
+iex> s3 = Explorer.Series.from_list(["g", "h", "i"])
+iex> Explorer.Series.format([s1, s2, s3])
+#Explorer.Series<
+ Polars[3]
+ string ["adg", "beh", "cfi"]
+>
+
+iex> s1 = Explorer.Series.from_list(["a", "b", "c", "d"])
+iex> s2 = Explorer.Series.from_list([1, 2, 3, 4])
+iex> s3 = Explorer.Series.from_list([1.5, :nan, :infinity, :neg_infinity])
+iex> Explorer.Series.format([s1, "/", s2, "/", s3])
+#Explorer.Series<
+ Polars[4]
+ string ["a/1/1.5", "b/2/NaN", "c/3/inf", "d/4/-inf"]
+>
+
+iex> s1 = Explorer.Series.from_list([<<1>>, <<239, 191, 19>>], dtype: :binary)
+iex> s2 = Explorer.Series.from_list([<<3>>, <<4>>], dtype: :binary)
+iex> Explorer.Series.format([s1, s2])
** (RuntimeError) Polars Error: External error: invalid utf-8 sequence
@@ -6823,12 +6823,12 @@ head(series, n_elements \\ 10)
Examples
-iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.head(s)
-#Explorer.Series<
- Polars[10]
- integer [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
->
+iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.head(s)
+#Explorer.Series<
+ Polars[10]
+ integer [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+>
@@ -6862,8 +6862,8 @@ last(series)
Examples
-iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.last(s)
+iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.last(s)
100
@@ -6892,12 +6892,12 @@ reverse(series)
Example
-iex> s = [1, 2, 3] |> Explorer.Series.from_list()
-iex> Explorer.Series.reverse(s)
-#Explorer.Series<
- Polars[3]
- integer [3, 2, 1]
->
+iex> s = [1, 2, 3] |> Explorer.Series.from_list()
+iex> Explorer.Series.reverse(s)
+#Explorer.Series<
+ Polars[3]
+ integer [3, 2, 1]
+>
@@ -6942,47 +6942,47 @@ sample(series, n_or_frac, opts \\ [])
Examples
-iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.sample(s, 10, seed: 100)
-#Explorer.Series<
- Polars[10]
- integer [55, 51, 33, 26, 5, 32, 62, 31, 9, 25]
->
-
-iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.sample(s, 0.05, seed: 100)
-#Explorer.Series<
- Polars[5]
- integer [49, 77, 96, 19, 18]
->
-
-iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.sample(s, 7, seed: 100, replace: true)
-#Explorer.Series<
- Polars[7]
- integer [4, 1, 3, 4, 3, 4, 2]
->
-
-iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.sample(s, 1.2, seed: 100, replace: true)
-#Explorer.Series<
- Polars[6]
- integer [4, 1, 3, 4, 3, 4]
->
-
-iex> s = 0..9 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.sample(s, 1.0, seed: 100, shuffle: false)
-#Explorer.Series<
- Polars[10]
- integer [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
->
-
-iex> s = 0..9 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.sample(s, 1.0, seed: 100, shuffle: true)
-#Explorer.Series<
- Polars[10]
- integer [7, 9, 2, 0, 4, 1, 3, 8, 5, 6]
->
+iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.sample(s, 10, seed: 100)
+#Explorer.Series<
+ Polars[10]
+ integer [55, 51, 33, 26, 5, 32, 62, 31, 9, 25]
+>
+
+iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.sample(s, 0.05, seed: 100)
+#Explorer.Series<
+ Polars[5]
+ integer [49, 77, 96, 19, 18]
+>
+
+iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.sample(s, 7, seed: 100, replace: true)
+#Explorer.Series<
+ Polars[7]
+ integer [4, 1, 3, 4, 3, 4, 2]
+>
+
+iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.sample(s, 1.2, seed: 100, replace: true)
+#Explorer.Series<
+ Polars[6]
+ integer [4, 1, 3, 4, 3, 4]
+>
+
+iex> s = 0..9 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.sample(s, 1.0, seed: 100, shuffle: false)
+#Explorer.Series<
+ Polars[10]
+ integer [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+>
+
+iex> s = 0..9 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.sample(s, 1.0, seed: 100, shuffle: true)
+#Explorer.Series<
+ Polars[10]
+ integer [7, 9, 2, 0, 4, 1, 3, 8, 5, 6]
+>
@@ -7016,19 +7016,19 @@ shift(series, offset)
Examples
-iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.shift(s, 2)
-#Explorer.Series<
- Polars[5]
- integer [nil, nil, 1, 2, 3]
->
+iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.shift(s, 2)
+#Explorer.Series<
+ Polars[5]
+ integer [nil, nil, 1, 2, 3]
+>
-iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.shift(s, -2)
-#Explorer.Series<
- Polars[5]
- integer [3, 4, 5, nil, nil]
->
+iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.shift(s, -2)
+#Explorer.Series<
+ Polars[5]
+ integer [3, 4, 5, nil, nil]
+>
@@ -7071,12 +7071,12 @@ shuffle(series, opts \\ [])
Examples
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.shuffle(s, seed: 100)
-#Explorer.Series<
- Polars[10]
- integer [8, 10, 3, 1, 5, 2, 4, 9, 6, 7]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.shuffle(s, seed: 100)
+#Explorer.Series<
+ Polars[10]
+ integer [8, 10, 3, 1, 5, 2, 4, 9, 6, 7]
+>
@@ -7113,33 +7113,33 @@ slice(series, indices)
Examples
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.slice(s, [0, 2])
-#Explorer.Series<
- Polars[2]
- string ["a", "c"]
->
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.slice(s, [0, 2])
+#Explorer.Series<
+ Polars[2]
+ string ["a", "c"]
+>
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.slice(s, 1..2)
-#Explorer.Series<
- Polars[2]
- string ["b", "c"]
->
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.slice(s, 1..2)
+#Explorer.Series<
+ Polars[2]
+ string ["b", "c"]
+>
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.slice(s, -2..-1)
-#Explorer.Series<
- Polars[2]
- string ["b", "c"]
->
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.slice(s, -2..-1)
+#Explorer.Series<
+ Polars[2]
+ string ["b", "c"]
+>
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.slice(s, 3..2//1)
-#Explorer.Series<
- Polars[0]
- string []
->
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.slice(s, 3..2//1)
+#Explorer.Series<
+ Polars[0]
+ string []
+>
@@ -7173,29 +7173,29 @@ slice(series, offset, size)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
-iex> Explorer.Series.slice(s, 1, 2)
-#Explorer.Series<
- Polars[2]
- integer [2, 3]
->
Negative offsets count from the end of the series:
iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
-iex> Explorer.Series.slice(s, -3, 2)
-#Explorer.Series<
- Polars[2]
- integer [3, 4]
->
If the offset runs past the end of the series,
-the series is empty:
iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
-iex> Explorer.Series.slice(s, 10, 3)
-#Explorer.Series<
- Polars[0]
- integer []
->
If the size runs past the end of the series,
-the result may be shorter than the size:
iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
-iex> Explorer.Series.slice(s, -3, 4)
-#Explorer.Series<
- Polars[3]
- integer [3, 4, 5]
->
+iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
+iex> Explorer.Series.slice(s, 1, 2)
+#Explorer.Series<
+ Polars[2]
+ integer [2, 3]
+>
Negative offsets count from the end of the series:
iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
+iex> Explorer.Series.slice(s, -3, 2)
+#Explorer.Series<
+ Polars[2]
+ integer [3, 4]
+>
If the offset runs past the end of the series,
+the series is empty:
iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
+iex> Explorer.Series.slice(s, 10, 3)
+#Explorer.Series<
+ Polars[0]
+ integer []
+>
If the size runs past the end of the series,
+the result may be shorter than the size:
iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
+iex> Explorer.Series.slice(s, -3, 4)
+#Explorer.Series<
+ Polars[3]
+ integer [3, 4, 5]
+>
@@ -7233,19 +7233,19 @@ sort(series, opts \\ [])
Examples
-iex> s = Explorer.Series.from_list([9, 3, 7, 1])
-iex> Explorer.Series.sort(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 3, 7, 9]
->
+iex> s = Explorer.Series.from_list([9, 3, 7, 1])
+iex> Explorer.Series.sort(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 3, 7, 9]
+>
-iex> s = Explorer.Series.from_list([9, 3, 7, 1])
-iex> Explorer.Series.sort(s, direction: :desc)
-#Explorer.Series<
- Polars[4]
- integer [9, 7, 3, 1]
->
+iex> s = Explorer.Series.from_list([9, 3, 7, 1])
+iex> Explorer.Series.sort(s, direction: :desc)
+#Explorer.Series<
+ Polars[4]
+ integer [9, 7, 3, 1]
+>
@@ -7281,12 +7281,12 @@ tail(series, n_elements \\ 10)
Examples
-iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.tail(s)
-#Explorer.Series<
- Polars[10]
- integer [91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
->
+iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.tail(s)
+#Explorer.Series<
+ Polars[10]
+ integer [91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
+>
@@ -7314,8 +7314,8 @@ unordered_distinct(series)
Examples
-iex> s = [1, 1, 2, 2, 3, 3] |> Explorer.Series.from_list()
-iex> Explorer.Series.unordered_distinct(s)
+iex> s = [1, 1, 2, 2, 3, 3] |> Explorer.Series.from_list()
+iex> Explorer.Series.unordered_distinct(s)
@@ -7369,26 +7369,26 @@ cumulative_max(series, opts \\ [])
Examples
-iex> s = [1, 2, 3, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_max(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 3, 4]
->
+iex> s = [1, 2, 3, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_max(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 3, 4]
+>
-iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_max(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, nil, 4]
->
+iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_max(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, nil, 4]
+>
-iex> s = [~T[03:00:02.000000], ~T[02:04:19.000000], nil, ~T[13:24:56.000000]] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_max(s)
-#Explorer.Series<
- Polars[4]
- time [03:00:02.000000, 03:00:02.000000, nil, 13:24:56.000000]
->
+iex> s = [~T[03:00:02.000000], ~T[02:04:19.000000], nil, ~T[13:24:56.000000]] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_max(s)
+#Explorer.Series<
+ Polars[4]
+ time [03:00:02.000000, 03:00:02.000000, nil, 13:24:56.000000]
+>
@@ -7430,26 +7430,26 @@ cumulative_min(series, opts \\ [])
Examples
-iex> s = [1, 2, 3, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_min(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 1, 1, 1]
->
+iex> s = [1, 2, 3, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_min(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 1, 1, 1]
+>
-iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_min(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 1, nil, 1]
->
+iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_min(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 1, nil, 1]
+>
-iex> s = [~T[03:00:02.000000], ~T[02:04:19.000000], nil, ~T[13:24:56.000000]] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_min(s)
-#Explorer.Series<
- Polars[4]
- time [03:00:02.000000, 02:04:19.000000, nil, 02:04:19.000000]
->
+iex> s = [~T[03:00:02.000000], ~T[02:04:19.000000], nil, ~T[13:24:56.000000]] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_min(s)
+#Explorer.Series<
+ Polars[4]
+ time [03:00:02.000000, 02:04:19.000000, nil, 02:04:19.000000]
+>
@@ -7491,19 +7491,19 @@ cumulative_product(series, opts \\ [])
Examples
-iex> s = [1, 2, 3, 2] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_product(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 6, 12]
->
+iex> s = [1, 2, 3, 2] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_product(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 6, 12]
+>
-iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_product(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, nil, 8]
->
+iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_product(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, nil, 8]
+>
@@ -7545,19 +7545,19 @@ cumulative_sum(series, opts \\ [])
Examples
-iex> s = [1, 2, 3, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_sum(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 3, 6, 10]
->
+iex> s = [1, 2, 3, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_sum(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 3, 6, 10]
+>
-iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_sum(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 3, nil, 7]
->
+iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_sum(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 3, nil, 7]
+>
@@ -7596,19 +7596,19 @@ ewm_mean(series, opts \\ [])
Examples
-iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.ewm_mean(s)
-#Explorer.Series<
- Polars[5]
- float [1.0, 1.6666666666666667, 2.4285714285714284, 3.2666666666666666, 4.161290322580645]
->
+iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.ewm_mean(s)
+#Explorer.Series<
+ Polars[5]
+ float [1.0, 1.6666666666666667, 2.4285714285714284, 3.2666666666666666, 4.161290322580645]
+>
-iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.ewm_mean(s, alpha: 0.1)
-#Explorer.Series<
- Polars[5]
- float [1.0, 1.5263157894736843, 2.070110701107011, 2.6312881651642916, 3.2097140484969833]
->
+iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.ewm_mean(s, alpha: 0.1)
+#Explorer.Series<
+ Polars[5]
+ float [1.0, 1.5263157894736843, 2.070110701107011, 2.6312881651642916, 3.2097140484969833]
+>
@@ -7660,73 +7660,73 @@ fill_missing(series, value)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, :forward)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 2, 4]
->
-
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, :backward)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 4, 4]
->
-
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, :max)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 4, 4]
->
-
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, :min)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 1, 4]
->
-
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, :mean)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 2, 4]
->
Values that belong to the series itself can also be added as missing:
iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, 3)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 3, 4]
->
-
-iex> s = Explorer.Series.from_list(["a", "b", nil, "d"])
-iex> Explorer.Series.fill_missing(s, "c")
-#Explorer.Series<
- Polars[4]
- string ["a", "b", "c", "d"]
->
Mismatched types will raise:
iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, "foo")
-** (ArgumentError) cannot invoke Explorer.Series.fill_missing/2 with mismatched dtypes: :integer and "foo"
Floats in particular accept missing values to be set to NaN, Inf, and -Inf:
iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 4.0])
-iex> Explorer.Series.fill_missing(s, :nan)
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, NaN, 4.0]
->
-
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 4.0])
-iex> Explorer.Series.fill_missing(s, :infinity)
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, Inf, 4.0]
->
-
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 4.0])
-iex> Explorer.Series.fill_missing(s, :neg_infinity)
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, -Inf, 4.0]
->
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, :forward)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 2, 4]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, :backward)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 4, 4]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, :max)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 4, 4]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, :min)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 1, 4]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, :mean)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 2, 4]
+>
Values that belong to the series itself can also be added as missing:
iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, 3)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 3, 4]
+>
+
+iex> s = Explorer.Series.from_list(["a", "b", nil, "d"])
+iex> Explorer.Series.fill_missing(s, "c")
+#Explorer.Series<
+ Polars[4]
+ string ["a", "b", "c", "d"]
+>
Mismatched types will raise:
iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, "foo")
+** (ArgumentError) cannot invoke Explorer.Series.fill_missing/2 with mismatched dtypes: :integer and "foo"
Floats in particular accept missing values to be set to NaN, Inf, and -Inf:
iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 4.0])
+iex> Explorer.Series.fill_missing(s, :nan)
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, NaN, 4.0]
+>
+
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 4.0])
+iex> Explorer.Series.fill_missing(s, :infinity)
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, Inf, 4.0]
+>
+
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 4.0])
+iex> Explorer.Series.fill_missing(s, :neg_infinity)
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, -Inf, 4.0]
+>
@@ -7764,19 +7764,19 @@ window_max(series, window_size, opts \\ [])
Examples
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_max(s, 4)
-#Explorer.Series<
- Polars[10]
- integer [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_max(s, 4)
+#Explorer.Series<
+ Polars[10]
+ integer [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+>
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_max(s, 2, weights: [1.0, 2.0])
-#Explorer.Series<
- Polars[10]
- float [1.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0, 18.0, 20.0]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_max(s, 2, weights: [1.0, 2.0])
+#Explorer.Series<
+ Polars[10]
+ float [1.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0, 18.0, 20.0]
+>
@@ -7814,26 +7814,26 @@ window_mean(series, window_size, opts \\ []
Examples
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_mean(s, 4)
-#Explorer.Series<
- Polars[10]
- float [1.0, 1.5, 2.0, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_mean(s, 4)
+#Explorer.Series<
+ Polars[10]
+ float [1.0, 1.5, 2.0, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5]
+>
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_mean(s, 2, weights: [0.25, 0.75])
-#Explorer.Series<
- Polars[10]
- float [0.25, 1.75, 2.75, 3.75, 4.75, 5.75, 6.75, 7.75, 8.75, 9.75]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_mean(s, 2, weights: [0.25, 0.75])
+#Explorer.Series<
+ Polars[10]
+ float [0.25, 1.75, 2.75, 3.75, 4.75, 5.75, 6.75, 7.75, 8.75, 9.75]
+>
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_mean(s, 2, weights: [0.25, 0.75], min_periods: nil)
-#Explorer.Series<
- Polars[10]
- float [nil, 1.75, 2.75, 3.75, 4.75, 5.75, 6.75, 7.75, 8.75, 9.75]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_mean(s, 2, weights: [0.25, 0.75], min_periods: nil)
+#Explorer.Series<
+ Polars[10]
+ float [nil, 1.75, 2.75, 3.75, 4.75, 5.75, 6.75, 7.75, 8.75, 9.75]
+>
@@ -7871,19 +7871,19 @@ window_min(series, window_size, opts \\ [])
Examples
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_min(s, 4)
-#Explorer.Series<
- Polars[10]
- integer [1, 1, 1, 1, 2, 3, 4, 5, 6, 7]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_min(s, 4)
+#Explorer.Series<
+ Polars[10]
+ integer [1, 1, 1, 1, 2, 3, 4, 5, 6, 7]
+>
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_min(s, 2, weights: [1.0, 2.0])
-#Explorer.Series<
- Polars[10]
- float [1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_min(s, 2, weights: [1.0, 2.0])
+#Explorer.Series<
+ Polars[10]
+ float [1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
+>
@@ -7921,19 +7921,19 @@ window_standard_deviation(series, window_si
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3, 4, 1])
-iex> Explorer.Series.window_standard_deviation(s, 2)
-#Explorer.Series<
- Polars[5]
- float [0.0, 0.7071067811865476, 0.7071067811865476, 0.7071067811865476, 2.1213203435596424]
->
+iex> s = Explorer.Series.from_list([1, 2, 3, 4, 1])
+iex> Explorer.Series.window_standard_deviation(s, 2)
+#Explorer.Series<
+ Polars[5]
+ float [0.0, 0.7071067811865476, 0.7071067811865476, 0.7071067811865476, 2.1213203435596424]
+>
-iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5, 6])
-iex> Explorer.Series.window_standard_deviation(s, 2, weights: [0.25, 0.75])
-#Explorer.Series<
- Polars[6]
- float [0.4330127018922193, 0.4330127018922193, 0.4330127018922193, 0.4330127018922193, 0.4330127018922193, 0.4330127018922193]
->
+iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5, 6])
+iex> Explorer.Series.window_standard_deviation(s, 2, weights: [0.25, 0.75])
+#Explorer.Series<
+ Polars[6]
+ float [0.4330127018922193, 0.4330127018922193, 0.4330127018922193, 0.4330127018922193, 0.4330127018922193, 0.4330127018922193]
+>
@@ -7971,19 +7971,19 @@ window_sum(series, window_size, opts \\ [])
Examples
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_sum(s, 4)
-#Explorer.Series<
- Polars[10]
- integer [1, 3, 6, 10, 14, 18, 22, 26, 30, 34]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_sum(s, 4)
+#Explorer.Series<
+ Polars[10]
+ integer [1, 3, 6, 10, 14, 18, 22, 26, 30, 34]
+>
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_sum(s, 2, weights: [1.0, 2.0])
-#Explorer.Series<
- Polars[10]
- float [1.0, 5.0, 8.0, 11.0, 14.0, 17.0, 20.0, 23.0, 26.0, 29.0]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_sum(s, 2, weights: [1.0, 2.0])
+#Explorer.Series<
+ Polars[10]
+ float [1.0, 5.0, 8.0, 11.0, 14.0, 17.0, 20.0, 23.0, 26.0, 29.0]
+>
diff --git a/Explorer.TensorFrame.html b/Explorer.TensorFrame.html
index 6043782f9..573a5d9ee 100644
--- a/Explorer.TensorFrame.html
+++ b/Explorer.TensorFrame.html
@@ -113,13 +113,13 @@
TensorFrame is a representation of Explorer.DataFrame
-that is designed to work inside Nx's defn
expressions.
For example, imagine the following defn
:
defn add_columns(tf) do
- tf[:a] + tf[:b]
-end
We can now pass a DataFrame as argument:
iex> add_columns(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]))
-#Nx.Tensor<
- s64[2]
- [32, 34]
->
Passing an Explorer.DataFrame
to a defn
will automatically
+that is designed to work inside Nx's defn
expressions.
For example, imagine the following defn
:
defn add_columns(tf) do
+ tf[:a] + tf[:b]
+end
We can now pass a DataFrame as argument:
iex> add_columns(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]))
+#Nx.Tensor<
+ s64[2]
+ [32, 34]
+>
Passing an Explorer.DataFrame
to a defn
will automatically
convert it to a TensorFrame. The TensorFrame will lazily
build tensors out of the used dataframe fields.
@@ -130,29 +130,29 @@
Due to the integration with Nx, you can also pass dataframes
into Nx.stack/2
and Nx.concatenate
and they will be automatically
converted to tensors. This makes it easy to pass dataframes into
-neural networks and other computationally intensive algorithms:
iex> Nx.concatenate(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]))
-#Nx.Tensor<
- s64[4]
- [11, 12, 21, 22]
->
-
-iex> Nx.stack(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]))
-#Nx.Tensor<
- s64[2][2]
- [
- [11, 12],
- [21, 22]
- ]
->
-
-iex> Nx.stack(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]), axis: -1)
-#Nx.Tensor<
- s64[2][2]
- [
- [11, 21],
- [12, 22]
- ]
->
+neural networks and other computationally intensive algorithms:iex> Nx.concatenate(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]))
+#Nx.Tensor<
+ s64[4]
+ [11, 12, 21, 22]
+>
+
+iex> Nx.stack(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]))
+#Nx.Tensor<
+ s64[2][2]
+ [
+ [11, 12],
+ [21, 22]
+ ]
+>
+
+iex> Nx.stack(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]), axis: -1)
+#Nx.Tensor<
+ s64[2][2]
+ [
+ [11, 21],
+ [12, 22]
+ ]
+>
Warning: returning TensorFrames
@@ -165,14 +165,14 @@
above we used Nx
to add two columns, if you want to
put the result of the computation back into a DataFrame,
you can use Explorer.DataFrame.put/4
, which also accepts
-tensors:iex> df = Explorer.DataFrame.new(a: [11, 12], b: [21, 22])
-iex> Explorer.DataFrame.put(df, "result", add_columns(df))
-#Explorer.DataFrame<
- Polars[2 x 3]
- a integer [11, 12]
- b integer [21, 22]
- result integer [32, 34]
->
One benefit of using Explorer.DataFrame.put/4
is that it will
+tensors:
iex> df = Explorer.DataFrame.new(a: [11, 12], b: [21, 22])
+iex> Explorer.DataFrame.put(df, "result", add_columns(df))
+#Explorer.DataFrame<
+ Polars[2 x 3]
+ a integer [11, 12]
+ b integer [21, 22]
+ result integer [32, 34]
+>
One benefit of using Explorer.DataFrame.put/4
is that it will
preserve the type of the column if one already exists. Alternatively,
use Explorer.Series.from_tensor/1
to explicitly convert a tensor
back to a series.
@@ -311,7 +311,7 @@ pull(tf, name)
Examples
-Explorer.TensorFrame.pull(tf, "some_column")
+Explorer.TensorFrame.pull(tf, "some_column")
@@ -339,7 +339,7 @@ put(tf, name, tensor)
Examples
-Explorer.TensorFrame.put(tf, "result", some_tensor)
+Explorer.TensorFrame.put(tf, "result", some_tensor)
diff --git a/Explorer.epub b/Explorer.epub
index ca61593c55ed7046d3223981a05739ce5ea92b1d..6e6e94620f5f89c194e7751ab2db691ae1a239ab 100644
GIT binary patch
delta 263944
zcmYhiQ;?ud(*@YJ?P=S#ZQHi3r)^s^ZQHgzZQC>Lp0@V={*8^;ij1hbtGcK>dGcgt
z_Bqb!84iN795@69$p4mHa54fnm|Sq919^Tmt+*4i@yfmm^Qw)Xz
z)l8l^;P1$yTueZ6m
zc`j|0-Fl(((ARqdrEBn1iAnnfKo`%8?ZK&;NOt)JA@ja-Ei-;xw=foA-bTKU_(-9C5W$(Dx$!d=GARSI)ruAdBnrid}0AZdx~X+@(EWd)H~y)PI47
zi$P|MA3Za^@b{3blYOX!q+g-S_SQvuwDkk%7h{5kYGPLY$@xmIcY(bG3Pb$mZ9uQ>+tvOWqiKhuz$lTvbWV3TPNG^)V`(X)1bZ@`bs1I_OMds}iJJB3e}A>?q^{W2b>{F~Y+i0==6;N{m}bLX
z6+mgtJOrbW({ceV3*_FK5@n5=<1;zYW{Q@oE^;Rk%}V`W`d7iW+}Nvk-0Dggk^UQH
z0#Ql0KV#xMIn1JCMr3pocE|;|&`nZB8`rqj2AAXPvazWZRFSb)GdBG&SfGc|G++o1
zSdLVyH{dh0Q(5xiJO`_%6lSM1cBZDiIkC>gmz(gaJy4b2OR?g={Ac+sd-lvj!5HP%
zXE6MZrfQ+a*ZoXVlqDxoF|?@&w4*w!$y3=+Rn;oAZ+tRbrJu$cYNAw^E!2|CsKyc|
zK+!Lje1hwH{c=;Up@{)~48O!5JyK4yF5(RF=&U*r703`nD(69D_w;WiI7=*w5HGw#
zumJ;_T6Rx5C~)yaRk)le+>qo+qHV-1DLw31;AWEY*)X4`V|lRf=kiqIC^m5U*nMF}
zbuTsU-~q`wg8tYEVsHKsw-fZb+>NdY*}Uy_)S$F>FeMbqrtZL?ye|Y>vi19=Mbvh#
z(zox5`H#B5kAKsVl-&nUiJ~u-C<;{lZPE~)0S^(q@(@%gJ)}LB+wB`p>0kPTI2JFY
zuR}fr+_N-VlBN*@W4FN%(>7Q$xnf>KWQ$*f!8L>(n|m!;?!Nf9-Ii+FeGq57(l-UL
zaSkb43|n;i!Fa$vTqaqAW8yznA6yfps9P4PQ}EO?6TiZ@eU-l+6`!_tCGE-$FrMlL
z!jL{;P8L|8)r!a7C!lOJ>iWE(jz^%}oQ}25j#UbKf
z+fANcWKEu1nXHON(ayh`iI1OUVWw0?F){0@QCmd<2ND4^&e6iWJK6~aO0lY96<3SxIf
zXoqUo)o6f1x>$&}9y?DAemPW3^Lkb>tN>cc55t=fRj!m*!m}htI|4B}T=i08#1RxT
za2ul)i}K6z+I`{jWE7tXIPRIkV)ks2SjzZ3?OQm#6|;`jG=mby
zif-sGF%H<$3&T%(s_n?XC#nJT(_S$!R@???VyJ)`Q5Kv{q{wP8igY-?B(IWcbfAG(
zo$|k%=A`s0c`%~VCI81JS6uI6G>S>*GWxOkoO@-YR0C;V2>t7(Ut<#-SXb=olw&-V
zeU#zKcI~@Vhf=-QlLvPjGVSfd=AB(O0)F%F=l6cHDkN#i)OI5wH6Wmt5Wi}UnI8xI
z=u-i4?L#Z=is+TO)#R}!yqZh6tsKm5{mqpA>=--HBxT}(dm@&(Qm{g&Ei89fEuI%y
zIH@8was2tgj32*?b(?CSe4nLql%Ygp#i4ld()M|xgrw?zV+bo2=VkA^Lt(>o$i}9i
zfVAHAh%cL;beZqDz#(BGOR_<^&Vg&eiF
z65>;S5LCsW#iW~Coks>;_m!7@X~uMfP*+(|eb~O~rAcWgbI*vKQ$`W1n(^0!xZ8la
ztT$f$5zZAc{4h=Yg1!uf?(@eNGgw1NgVoAM{A=Td;OXmlAey>1SJ1l$&+G*VsfwW$
zgs6Q=Q<8Wk3&xhyqVaIco7MArehKH*Uj-@w^~e$TtBm*XVy{I3ZUI5wb3%K1u?M;y
zuUpE}Ka9^_)%RQ&WoO}}mCJ9_lA%D=AF>-frG&(%XFuricVv$6`1LKy+ZDzOSxTW!
zy5JQvFW|}X)j-oblRKBUkYHk$lHRuIPBNH8P*Zi65~lc
z4=xm}#sNNAr!T&!t!@LWr*DS@jWP%{JqorlztG;D?Y
zPVIi5De2YKSMGoK2gUzjhn>7MaDY%DJITav8{gN}VGvmgow>iW=u--%(`${neZ7j1
zfSaZ5`;r|0c#MT^Pk&~KHTVX40@ipfa8z$^vJgv7J=-K83aGKb>Y}p4x4M{PFnmpY
zh^`r~5v41$BC4dl9?bOYaruGgCG_>?`?-|V{{P!XBAB+0_-qzW1
z+Tuj{)`AgeK}=dtsX^v*gTSzhc(e$xHi{^swvKL#RxXiLZ8Si5-|_;|jz@;bHC!*<
z5tBw5-Mx+pfoB=iYL$!x*kV1}alqJDY1c$Sa3L*jmDfU(JyAkbd8R>?3+p7#)l{M#
z^0Pux=m-9kwcyz8H1)@%4*Pr#q(6BInnq&*j#~=|-e7u$PmN4ShH6QLJ`_Vk5zAtC
zTm|Zzg0y%W_bg85uS7t^pHe0u9%C<=U#wISYZ1+D)Hsz33%ek3%D6QqO)uH?)a;?+
z%hXJw;(`aJ16OqZNXh`eG#;n=U@aDY`)XA
z=6u$eaxF%Xi=}iq1`jb!XW;}z3`gX;5<($i;bxj8qi3D%Y7j6exT;LDV;`A^>a0BK
zPPM^VUjj|yGqw%QYbw$LPDH;$rW4hIPDM@+5m=8Hvv;7*qbIv6=R+qY^WQd@aW_*3
zo&@$_p^fi@DiKrZ8;gM508fYU&Zb$V=|My>Wr~$-$iC!1Z8BH8x6Wmr2ogZd2Zvb{
zynLKz2uxLWTfe5@vp~5@`{vti8iYl@{=-AJrEKI=lFrcj#?Op$FUn3J~3nN
zRZ=LpPxV{=0gLmluVW0TN;x~RS3QuwKtq{wOuO{)CJ?mPo^Pn%wyHSh7#~5$ZSQAH
z{`J06#q;hiaisUF9DH6q7BqR?vrOb%UcvhBkLbVh*z+clXA`o}ww>eaLk?1|ou8~}
ztQ08oxWn+0iYN$!7lxuT`w)!cVKUtCP#||fm91>LA@vD(0UpkB;6>X@QLN(=H1GM*
ztZ;7*aDeAlpOuH2&`DiKe>GeTML2KdeN}~YrD|IJQ#LVEAIz8X96h9&sTmQiGoFHG
z&=#v$7D;5bC;JNDnj8t2HsyQF8JtWC3qm?34wgfOp_Wb5wpwzv>!YjpBCTuiQ8y7Q
z&&D6ad*X`#sWe(M1eC>P@_0v=v}X({pC2v|&p^7Cvf$j$K=Bnb=Ev3$)%iI_1k2Qc
zN6IS&cRgaF?fNP)VWd6%T(#b_QI-*H%-xhl26c61gCNlALPB3LDdnkLT|0>qTHkOA
zbnWWcX~FoRu?|~OPJ-1SMqy(8rhMbCXL%Xx!>~H5t>2-D(xTAvXFujWN(-*zALD*|
zpabqfu)e(HASz8kX@A0#*rAPa#zw_PG`X*q+H-7J!H=(}#@1T03q}cwTgj*K?P@*B
z>c)>aHYkfY=>R&dAeTC)Qj8`$O+zw%oRsf@K#y~0p1fIqD@#8y*$VJ7Y!-b%SMGJ5
zk;^uny7I$=i?@CSVOI`dr0)VpDfyqXGSb%qGU&?DEHO;
zM*0XM2?~t~Ye_a$c7;l}Chk=_geg$YX{CvgucYPns9XyV9ZA#7xSIbqhKO-zz?QGi^
z7?F`_GKCWK>qpEAENnY$Hry=V!kBrcePZ}MYUQ13d{#%^m)a*3@)~6a-&f9>>@0b5=-eVNp0qY?y*1f0>*<
za3SZWK@!vS7H@;39O;*6r(`Fyk@2HL&WWU%lB5m5jBdzKR8GT#k7yrG0x8W+S--t4
zg}2=G>0tDoOmPq94P>WY%rp6mC9NM{r%yr`WkyVk3?a4VAB)lKFOTe4y?c=0z@R&c
zL^r*wJLC~jmMRs8FCM-##fRDz*4q4AX>`f)(v7uV;K=xQDf&4?f1|DpB5;g*Z9vP+
z{))(dz-fSt(w>Cmfuhq40V3&H^O~5e0l$%pA
z(8}JV(Ba8=YWVz4SuG3YALj~a4%rVKGv6Yp#pPSv_lXc0nl!b{+ry+e3RAj4%Py59
zq{%5~r;>bWur`B?6i$;E?7*-^ZUS?o)n8iu#?z?g~1`IA{8vweRr^|Y}tKiRoP%178#jQ~kZKN<5lD)qRHr=xdo92Nt
z)oal;$PQul5DEcq0504{6?vhcVLZ%6-efh>xi7OYrAUAGia4)zcR}N;|*mO7Eh=!^(wp{66bsRLKH7k)Ja`ySzMh)dc-~m>KPeN4?~T
zmC9WDPQ{fET4l{Y`K2G2>3{#bWxGb6f?-jko0(qYRv()buu>-z-=Q=3`J%|0G7AgM
zY7!C@Rx(;74|FEhW#
z*z1qjA09jH37!3aDb1zMo>ML}O5mQ6iH~ZljNG(S$SSdd?sHe9fcg@<22s{NroAn(
zP3h-db1FL`I%QUgOxG1fG}yDt^TzYe&uCWl40P^Dl8zo~c=2JmSJ#_*^bSI|%qdsWII%VveE4Qlk`rOC4HuTlsP$8c3@30fZ@=k^
zHuS)!8WTZe>1=r$g#?(q&1VUH%nrd0nhORPs%%@ef|C$>H$cAoTO$`%DSi<}Iy
zLNyPZ9L1Vmnr6hqCzajXUp~Rn#Q-EP8Son-so*ZMh@hqCN0)zs)4|MX$hTHuaW_Dp
zyU&eF(-0yW4|d9lIP3hHP_J<;PlgGrcyaw=Zq7tz5~fPj7posTReQnlq$bm`S{$P!D1vBo$E3Aytb9&f=vLQQHVl
z@i`Pcf>!T<)1s@aYpK3p-8V1oCkviG2b>+`%5iP+J_+w%Fj6A;IpY+y(A-b%Xy_4r
z>~Styoteq+Y!a~OP?yStOUplVff35}i|g7#+hV+$cHBzZIS<()4E~z3JyGM}m|!^w
z)r3^oo-QWVF%}^sD4)Q5O`oT(8R;-4@oP?7^g`T1ZR@!ooYmcHN|GeRlZVM};&eRr
zOn9CIp}14FCJlZU_}dWDTJJ=39d%dU88@_a14PJt7}0&RwGQQks8dC{KnhF|Z>G=$
z!i^K=7Z>q76|5yQ62b5zgIY1D4c^Z>ZdTmtj#4qV=zDy!UEcD}6|sX7c*q(v-AxFEK&|7nk8
zQ;}0XTTl9IF#39D{%`ucF4vn(ce;IPo}_m^rur!KKb-0
z2{aVE3#y^($3oaL`Tt1nBbYUdqtdfw3f$3jCU%*UNDA
zxc%wwb28~6iC~AR<~sah`l-{Px{*td@pSycvI8trP0+W^j3*%yMUR6j5;&0bIw2oW
z{|lDOI)DE!<6edj#we(rheH
z`gk4^GbF8Nhyz470sVZ*(#mFxdirpTI&?jjIc#nV(dsjzMOY)0aX2|c(BGku$7(~x
zVa{TjlQ}gSilnqMCAh6|%C5m%6aCVGatytT!bK%rziCHD)5voe81$S(yo
zlVO`$h9UaTx*iDi>eCWJN6LyYS5=uZFj|;+*~vrj@d17Jrd!gbnl8H1cKZ1!sy$Lo
zH5-WAz>O3*BxYRMqs*y_Azr~M#vnwIoq!EE3cVG9NnLWNm@~QB`^&uX%vzCpzU#TM
zox%N{H8`ceoVZJHHUh1;3Ok<~bmY+ayH4T#%MP+Wbe9}?yMo8=g@b0t?~gd;
z#T=*ejzGLfD98S-R{z;lU+LFhYP59rvw2~*O$oPlFua*ozL2>o1q)thTw?fopDD-{
zQ-zwUdDZlhHBTF-M2JU~oHz+jtq|G_etBj7SqQo8mStB8nX=B72pmYAjOlRF<*tX?
zDVTJYrnt-y27BRMr~F!C8n#n5cTqOYGP~v9U_dh#&x)-DLA-P{fBNYRx_lNnd?{>k
z(#&UZ3%cni+@_C45RO@>4u|`c)^xw(|k3W~1N#tcG#JX3f_f
z==m^5i9~BjAdIAG=AxgMr#ggw$#;kD)&K~#>XiXHx-J){LM#zSfxz<5nXx4^>QdRO
z7um{6C8PTcRrxB;*ac$PGCHg
zCcY&^J3QC8CD9e$Q)t@?Y^Egiov@%y{~yRoTj5Qggt>ZR{gHglBa|UHjy!dN3^a`z
zT<$f29Y}XUZa?I@Whz4g2a)y(X%7lN#Kg%h`^U&Mq(dE&$Db
z&ZJ0dFe2HA&$S~rItX6B+lEpZ<95(e>psKI8g#x1nbuxwhM+B$hwO#;K+8zV@FA_K
z?wbcMa{5$BueN!bqPIrrP{vCcyyXmqnU|BY`>UB~-s+vaNEC$aF}E&e1yCBFZ$}9T
zhfUCZA!4H}KrFp&;0ZRnsv_AiG(k$Yl}P)$_dC+<$bbDYY}ek{6Ffli9Z1`g%ndE}
zwGssjCp|iIIY7U`+x2wHka{Iy>v>Pqyz7D)6D?SA51q&Zp#r_u(MPZFjzsZ^l~UYV-J(^s1cxSKv%c
zLxQU?!-=>Ua1Ip7P)&12Jl6bQMCcHwuhU@dz=c=0V>G
zJubGooohnZ=J)WSA{0|0fv2wF19$eHf^|Eea7Ty114iEZfI80(8)Mazy}fOH4AgJ%
z@%-CQ$nN&KEgGmdur5yjN-zce+WJg}V=*OY(FTI=sZ0)KPyUXwzF}K23^3AgU&uo9
z7yW%bTzm`)!b9E{)OJrK%Pszvezif2>aKPJnvJR9ZqQyXg!CRT=u*4!P4#?T9t98|Ko
zP>FeQXT@)e&EFYmDuBhBs5{KwXb2en7gzumjHv(WLG9TT1mFK!W+6^qm;^0Ef#;|D
z%gors(KANt9$f`qIC1HHBiaL}Ki?J9Mg_v;G-O^C!g?fCD3-DX>eLcvad`vYI1Xi5
z7u6Ez{$foSsFF7o?nxP0bP*h&-p1uy-jE63*F8|Ffrvv4$N)c~e9y++_s#9RU?AWP
z__$~SZ>@4e$XSsL&l%|Yu)f1%hV+{}8Oq~7I(7@DH_%t&$5tX0%TkDOMmZvWUac
zRnhPnDib8it{xzuRsa3wm&%R@eV`wRn{{yxbtHkS*r%TIE|`t9kPlrS8Px<)o};Fs
zD>B4?kP2O_mx{;zI~BN3G%uOD;0}25X9a!^blT_!fXqE)s9UTAaxgzO1Xn>MV8>
zEAf7KYmg3Wsd~~O&y3wh+^+ih0$pMA=2|aVtSiaTsX6O5x$~RpjSe-m(Ll%F61$Bp
zimenRd%KE$CpSH@Ozs;IM9ZgXyEs#sBDJ$mgQ+m`rf($qe7TCr&u(q1nL9f8KR{%?
z)^MOo7iP>Ig9y$@b((l-NMzVwh0#@(RTyYcnO4@~sl{EPj-aLl@EtMIR5~?nH1T*S
z#Q`YTwG?6MrnZVV35jQb8NlDtg0d$7yK6vMtpRx-d(flhFqa{R6|OM0u!oMBFttJ=
zVbv3A(9^-XhbY4=v3hY!xV#yC7JP$UmB<~3I|(O!d#?wBDUI6!w@3vWfZ{vR4Tb=d
z5Es5XmM#&;5~Qc0=b*Y3UkVF^k|yRN$qx3)8x$r=!X(H9(Pf4905k?L(K94*#l&Sn
zUxuo@MV52R_e46-pV#rb#hhN43B@UYxxRPAOM1+Vi+#T%NW%}p{s@M%Dod4@zcuj)
zrEZrD=E*vV*=$4WNE}9qXrw=Y0~HIwd4Lo-PM~!t5_2wkt7x#pSBY{XI+aYBe@`4P
zD%m~`?C?2CKI>Y729lQG<}y=HJc2FbM^iI;)I%0-d8edHeeDWP+a16MD7P3ZRy7K|
z*y-JVMOe<7&vIF>V3rwgFr(e-HY3xyi`|zdCG8xOvR#8bGP-Yw!ugoN+Pz}HX4^~a
z>Uu-@6DU$%TeCyFT&E{;xNb1J25T?`=sq-7Mf$M+mRdK3)+54KpA`oy0
zIQY%8kF@34{c8)dR;Zi6xtp5*M$Zu}pA&U`%f$gc^^35XbIv}NEP5=h`VfhS0yLgl
z2Y&4rq9ch1LdcqUZafjnpp_~fPdD0#7PGlWXV3N744}UryUL@w&u%b+C(qD`@Ph{C
zZt5Zc$%fg(YS)+(PiQ7Kqbn1&o1nV4G{`bDN8Q%7;$k^DBr%8vQTPgUN}a9*CphAT
zCvR*b*BjjFH2z0{4Mz%S_5LsN5Uj52kd9tGl1{GvC#r4ECyy{bI#rm?;nP6a4$N4K
zdfp%DR6u_z@$Dap^iBX&EJeI5C{v5CEl>}v-l@_))&PgK_?YsBe-Xb@)1L@=d%^j<
zvl#e4{9e`u{+Dum($+vh;3uuW*~r+C76`Wst(nkpB8Z_fFHVw@M$^iy+_mo|$NaQz
z*57cuB!{t=BTZBq_wu^IG33^~KUz`eYylMO{!z)u(Vnzw=*eU_`To|Pd@Xy2rz6R|
zmb0NyF9R=Q^kUH{*K`FB3EuvfrZ;K6uYgrE_j0J&wql7xUL0dM1L>3tpJAXEZ1?`_
zY(3EX0O?)IMPgd4qrz14@H+Ie{3jL9Jf(u#FVH{^FrnPz2zUu^@DSQDI~ll5c4Z(H
z%~b+}Dr6R(gXG7bxb0!rK?zA&O3Xr2M%cNkEYhanv=cea`Dof8b9v#a_=7(7NsR-F
zA1+YUY91V;@cIu+0tUa(X<_;I)gf_dgh3d`sg99RfHT63itplq`u7U*3GuB_}JEBhqBN5{oNf52yyS9$C6r7
zn(G;~ZZd@>p8paj_p#>Qw`rPo&fa@28$G%FL#h8hWRO{yA>)}W0wFK|o!19#jiA=q
znR?$;oAf#{4w-hslf2D8n=M=f-$%?(CA<_;10&M6&;&U|o@2Xwl51Q}*!c^<_eu6b
z^AxQ8GVr_>EB^Hgig#t$vTX!;^ODMQ7TPD%;X9@90p*UADMaScX~CWb0)iU7zF!
zkyXfbg~sXH5dpa%Mt<<-8B^~mwd6<6r;#&*STv{~hw3UJLqgo&23Lyb!}AmCHt9}X
zF~@cj)%VS9M>ADa;{+Mgq3j%ZB6`&^Xu_)*iDLy{-M0}v&uhYCxu9kXqBO^>YZ*DW
zqaYT{D+{ihN~q-SBK4@>80C=DI`AvPBOz)xo~e_pJMuQ!pRpWYFyfv65z^f|8WMy{P2@+t$7#;4A3ulLE4MaD#m)hB%(9PP_I*C(
z{nHsgvloHXWZL$Ame9(3Gd+#6&~w+5Gq4B3uQP|wgpUo1Ssv0;=01
zV*Y4vKiyA>5xx13bt0hPEP1e^f3PG{)Vwc4x@FE=qNFc=!J+6^05}Fg1y@6^LZRkQ
zZry2k3}&y2B4t5=IbNzR+jEt--1c=+VYNjJ+PpXe2G6N7KWn=sh`;n?psYWY6w~;5
z_sNhE51E_H#-kI3IRZcq*WdSjDMWf=&8&hQpp%b$=1>W8^`ZfRyJt|&Wkmq%)L&{%
zj9lQ)1K1R$lN^}nzQFQr${b5cTnhlsj|{%me1tp`
zmtE+v3I*=%J#QQ9yw@_wsacy(E_!?eq1e2VkD)d1mGIkyNpWkxH5d%CCE1^UEMkCf
zwmgH!2XGzwr~!L0b=;gs)l(J7UL2$ncwu}T%H^lVrW$z2BxR94|D5)^aI^vmHWO?x
zkgDb0+xmMFl2#ml=zt9kJXm_#|0pzB-x?1{VoJgqSEGp%P|BS7ja#y~D&uq>)|cp8
z!h7?6mL4{2R!0!JcW0G=KWBM|iu9+gPEZ^}?GZ^sa0duoR&=1jsdRpG;!aWjRKQ)V
z>>tXb-C4Zj&nzIcoaF6O>9k<(}AP+jg~FUvs%Uu7=J8Lscv_d@7PHxk0I7D+qNcxyaa
zb3Pn$<0uZP;KciaGx-GC9hdWqeU@Xq>HHTe1Oz*Wsvu@A5HX3UMY3k1hqA}rou$~C
zmgV|h=T2jZHrjiw3`yN`4sE`mLnpln6ovp|_Aenf5X)FMhQFcc?gRy)1;?#AEX1A1
z4X1~)LpfEL)`l6>Opg^(il!J5E*KJb$HCGBU8aptX7Z%flLm$B7rNn-biT7I%WL`U
zK0-E%?5C)FX(wrgnQ4~${AW4R2a$XP#PoJ}eP!81gjbT!2jiHswMfCj+4PkI2!X)f
zK-z%pzUgUg8tqBx4PB0d_#S_~iu9#hZec$xm*?M+7g1
z)i$X9_?nyhICt(H|gV1)KvD5?;7EZqH;o2*DPLioJ6LVJE!&rMD^1L+xKB(`!bx
zdSf^`3cTg|)~QJe!@e>060`QbuUCI}nM$6u{8o2=Wws^MNGPSYu8(fY(B4Q7_K-ol
zfI}N3a2c$3@7H+Gh5SAd)`HyzUi}(Sio3~gffZY$j=Rq4RkKU<>Q%suQtv7eI<;I_
zt$xMpd{v~^eQz!&_PAU5=jZ4R567R4C+u15p9#UTS*FSKQR-?&K*d&+^l=%o&_mU=
z$P}7d*ZR2~=ykVQ%0x5O*&Rva?v-udJt*C<%_
zpNhST*uY%Ia8FRL1co~zLH=CA-BtxH$9|^4=63ny0*cL9H;_b~=>DO=TlnHp?Z8Uh*
z!%PM;+W&r?+K5iEd?w@eU%2y8`(Mb>EB{H*FWbdum~j8f-P@bwXX6m&@!
z?Y+@7pLUdH(IK6|IWOsDLPu&(5Z(_tb=;YoRm_BP^!aCm6f$@K+|+&ba;TV0;5d
zqaNxpV}d3?`#UR<+RdvbRr9C$HDH)B
zQ|Jr5Z%@3=wM=QlNFi#hTSWP;gy|+=?ZUWDOpqU~uny8}d*+h>=IIG8YMfGrrS`uH
z&E$hARXLv*@yMhX$m3ksOHnqGz`$&Ckrb1YO9!Nbyz73#s@g}DiX7j*+LvqFY&{;B
zw&)9EE_Be}AX6}6N?Kc|Eu}G44!O*Z466u5hSKq7)B`&&EXI(@J@t-3ibd}et5H5UqT!t4j>A}Q7LLL{o#JpaIfmtE0(Zp%xC7%Aw$d~XelCZcPut*mGsgovRwmWTBc~8T6ELRD{a;GT
zYgZDSt?T2a+%3Fz=@+S$iUDC4urKP;M_8irYOMVO~i3
zsZXe}cr^~lx+R~*KE}BB^x5b(MPqrhC~2L97q;Sp)7+ma$!Z0!K!xk&FVqKif3VQV
z`F!~u74|{|xFW0s(o7p`7caKm-aPAq6#mpJ-3gqjV^tCoR47n|E8Haly9-l4eFI~t
z2H(<9rKJQeoafI$=8ngC!8i7P>@eCV%(t^H?5c2z_6?yQg}iTP5BjCyW0#n{$A7{=
z>q2xnjCa7tRJ^-*ft~TReXFJ8VCuZNE2MuT&M|#cR)codf%RSIA{&ey%OB2R>l>L!
zF_j@=ZwD#WM}47(%|jY1bdw{yWRD5>?Zd=4@{AosHbgI+O_smA#!UslRqSZKlt49W
zdPV80SX=IdY6-prD=k-Z_OpKH1)r=IiTXzSBZLh0?&rCB;Wr>=UO!s?2edG
zde5}vJ*+(oU)RSjC>=BG05e*0x
zFDSfSS?30LtRsfOvQ-MwIi(&hsJ$zQUyB)t$RoJ4OXWO}iZDrv!ch5krn!s%16T1|
z>IwoPats^CnjsGJdy!L=Gft+J@Q=ZSh+dvb;Ev@%B^mVQAtmPhxE%Cm2ZxxnQ6>fN
z)rDIM(`l^v+ows!OwqB?W=c;Y6{Y5|gII!Pw|yj#k=Tp$jNQVu@Qzf3pmW&BwOxZ>
zcSu&QeG!yd;JNMVvnEyU%5?0?-z#2Hy_~r7on*2_<9u^%v{-fA-$Hwl_UB`7F~|K8
zbO6DSAfxvCA)6;C!Ag4-fSZ|@2bB=)Rect6E1he$Z;$k+G;)hWDjc4*9jd>?nU@T1
zG(iCBJ5LdjX*z>8G0JhEcu%$83y|HRyLktoT=h`Fg~wQssr0}_Eo;F%F(A4lz3M<|
zQg=|P?bdxe#L>;WgAe)!LTh)J1M$G;g}55+3-_!MatlL-h}QjTU=`g)$LdK#p88Hh
zkbDBA_+6LP_w07f3+^d5*NmA#Rz0j%Xm-(C`P0G)KA`@;IT)D#=3tsNxvCKUHv$7D
zjt2_-Zv^=KP)32i%P%_RAEO
zx4amqCc}<8rZG2pTJ6}M#MhsmwY&0AhCtXYx^6J(3O!M^9SP|g^RN8;|4Q_wMEB_?
zxu+8_#ihK;Q%XbU`4H@~L9ksV1LL+M!-&R(3SP-ST=zLVG{VWMvnD#QgM=Yq)65J)
z=v>pBh5qYf3PKKs7>Ov=lU!Biq*%vPe_^H(i7Cr^Q*P$~)0?kYOn*}@5tWhqnF@rl
zami4-+<}=bbsAc)kCb_~b~Ldcqr6%$)Yxs|C162Vg^6WC$0e@QCbRK?X4q8ih3cmu
z=_KF6bgU<8gh1XBdxCe8Hjx!6+An#syOnGdcXGHhEcG@4K9ZaCe@Ov(Du#TKk#jUt
z6%1iWh>csE&8n%KVOGCoFFVwiDgX_nP#XOmW{ft5G~bl`I0i@OZJ03~qf{lTIYTy%
zCq9gRhFfx9nb8_{lJZ@M#bcyklMx!lT#lUU(uo?ox`31IMn$w4tS6%yCa{BcP-EX9
zVml&_2+?EnFG%&S&~&`VxC#iOV|u4BBui9BYY_Va+cyhnpVL6I_53leGf?SN@Zdau
z$l;5asPhxaZp`cKFMo1pooLt
zBEeI_xn6udSk5=a!Mn}>UAVsLyd6R#{yfYi^y9|KK>(e@*`|fLVli~hz8zaCO=OmE
z`q~1rQ)!03Y2lZ=fbY`%Dg?t=dt;4<_KE72zJ&t!)uLW3#U=gxhlw7Q$8{U8Osv_G
zj9xpN6NmKA@o4`8Jl@I}_=_~n-Pv-AlhREQpPbE(&N-!Fc_rR}ZDZ-~4B5kKHJW9$JmAZ?*kS
z9l!(s#WtvRoKrrBze(1_Op7`bDveM2*s`7%bhy=zsnS)Ba$gX2^EneOeUZOJ{Z{M*
zWa&o1iEdjKMExeyyDaEPGI>a}v+TfNL3H}Im)jGHb?(*X9$nWOA6=O}*M9Xs{F6PW
z_XQvCB^zHU6EL5jYee+xbE|oFzJ0cU$+hItsXav*-G5DQ)EYc0_Bh@d&3g#3I%X{f
zh5fzZvbJV=BKv2ihVgrqW5Ho-Jz^&VaM3!%;$t1VT9mjXqk#XiCP}Ju@7dQ1DH5Ah
zCk1sY#I<_RokYD7*PQ~Jz{C)UA`k1Q>1BVb4yQwBY;|SyBn@$0fgABxb`>WM6AzY|
z7t|Q*XUCkHE&B2w$Xl)yt^+as{8>K@V-@oqIGx0Vgt82qSkzMCz6mWeuys-fsNzQN
z$A(J>2%kT`$AWYIrzzzRYp83;rtylb-1e$$cUilmPW|Mh)8rZIThmuRy^vY91V(lT
zizF;uqAjv-^=YB?!npBw)-YuCc^Vl%Mv&>!_7ywUA1v&5$hTd%6$RDjxc)Ty#`7;nkNCq1Ky-U)AZPK7M#TyQE(Eo?!>^*OXWXS*1z
z7Td{gB}maqFihc+8pX8F^RZq5g^5EyIBAEQRn)q+28!8QL)sQ|T0+~6fS2sqGJD2C
z|3NivP{{P2DzmR%^JeRRvFGN(KUR7C!Qz5T(Vox*qTcG&Nn=ahFqMUXU-79PbZ~YM
zZrCUNtEW~Dxk7*1UW*lTh5Q>GAG#e!;phrTP1LRCBHQctK0Y{Uo8IW0HA$+#SJCdo
z;_+Ui63_!9g}=2dlC$~6kR8BwW0cZcoYP8hNZh$G36JBXeNGzY)h>y9RAyvcO`70f
za?(_NF*I+xf64yK>T2EqWbFnQmOzCZg7ZE8XF>s4Mvs|W42qz_aMwp?iRKtrPYi;=
zU{Wt@CrAV36ZG-@*9ptDY@Rd~F9dNNx_5feT5nSk1cwRAh+AV%AuHPq-$
zQBdvcq`4H>Xe-y84dv+fXcF$&`n_XQ+5YSJ5Nhzcs0p#-<;Re1VbT7*la){apI>*T
zIH#VTMq9^`b-bQsSat3)RN=a?vq5vCoM+Kcz5LNhXg3Is2N*~pW9aB}OM61O-E}Ps
z$Z;>CiQBfbIABZc)pt!v4fEG!2vX2kG=CdC(I9w<%Yn_bx3pBuu}=g;aOUU+Df}w8
z@ZUt6EX7MKo!CNDVU0a(9JrWtWw)itOI3EU