diff --git a/Explorer.Backend.html b/Explorer.Backend.html index e7189d4a3..df8d08953 100644 --- a/Explorer.Backend.html +++ b/Explorer.Backend.html @@ -216,9 +216,9 @@
iex> Explorer.Backend.put(Lib.CustomBackend)
+iex> Explorer.Backend.put(Lib.CustomBackend)
Explorer.PolarsBackend
-iex> Explorer.Backend.get()
+iex> Explorer.Backend.get()
Lib.CustomBackend
diff --git a/Explorer.DataFrame.html b/Explorer.DataFrame.html
index f8597967a..af9e26687 100644
--- a/Explorer.DataFrame.html
+++ b/Explorer.DataFrame.html
@@ -113,38 +113,38 @@
The DataFrame struct and API.
Dataframes are two-dimensional tabular data structures similar to a spreadsheet.
-For example, the Iris dataset:
iex> Explorer.Datasets.iris()
-#Explorer.DataFrame<
- Polars[150 x 5]
- sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
- petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
- petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
This dataframe has 150 rows and five columns. Each column is an Explorer.Series
-of the same size (150):
iex> df = Explorer.Datasets.iris()
-iex> df["sepal_length"]
-#Explorer.Series<
- Polars[150]
- float [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0, 5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6, 5.3, 5.0, ...]
->
+For example, the Iris dataset:iex> Explorer.Datasets.iris()
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
+ petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
+ petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
This dataframe has 150 rows and five columns. Each column is an Explorer.Series
+of the same size (150):
iex> df = Explorer.Datasets.iris()
+iex> df["sepal_length"]
+#Explorer.Series<
+ Polars[150]
+ float [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0, 5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6, 5.3, 5.0, ...]
+>
Creating dataframes
Dataframes can be created from normal Elixir terms. The main way you might do this is
-with the new/1
function. For example:
iex> Explorer.DataFrame.new(a: ["a", "b"], b: [1, 2])
-#Explorer.DataFrame<
- Polars[2 x 2]
- a string ["a", "b"]
- b integer [1, 2]
->
Or with a list of maps:
iex> Explorer.DataFrame.new([%{"col1" => "a", "col2" => 1}, %{"col1" => "b", "col2" => 2}])
-#Explorer.DataFrame<
- Polars[2 x 2]
- col1 string ["a", "b"]
- col2 integer [1, 2]
->
+with the new/1
function. For example:iex> Explorer.DataFrame.new(a: ["a", "b"], b: [1, 2])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ a string ["a", "b"]
+ b integer [1, 2]
+>
Or with a list of maps:
iex> Explorer.DataFrame.new([%{"col1" => "a", "col2" => 1}, %{"col1" => "b", "col2" => 2}])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ col1 string ["a", "b"]
+ col2 integer [1, 2]
+>
Verbs
@@ -182,16 +182,16 @@
Explorer supports reading and writing of:
- delimited files (such as CSV or TSV)
- Parquet
- Arrow IPC
- Arrow Streaming IPC
- Newline Delimited JSON
- Databases via
Adbc
in from_query/3
The convention Explorer uses is to have from_*
and to_*
functions to read and write
to files in the formats above. load_*
and dump_*
versions are also available to read
and write those formats directly in memory.
Files can be fetched from local or remote file system, such as S3, using the following formats:
# path to a file in disk
-Explorer.DataFrame.from_parquet("/path/to/file.parquet")
+Explorer.DataFrame.from_parquet("/path/to/file.parquet")
# path to a URL schema (with optional configuration)
-Explorer.DataFrame.from_parquet("s3://bucket/file.parquet", config: FSS.S3.config_from_system_env())
+Explorer.DataFrame.from_parquet("s3://bucket/file.parquet", config: FSS.S3.config_from_system_env())
# it's possible to configure using keyword lists
-Explorer.DataFrame.from_parquet("s3://bucket/file.parquet", config: [access_key_id: "my-key", secret_access_key: "my-secret"])
+Explorer.DataFrame.from_parquet("s3://bucket/file.parquet", config: [access_key_id: "my-key", secret_access_key: "my-secret"])
# a FSS entry (it already includes its config)
-Explorer.DataFrame.from_parquet(FSS.S3.parse("s3://bucket/file.parquet"))
The :config
option of from_*
functions is only required if the filename is a path
+
Explorer.DataFrame.from_parquet(FSS.S3.parse("s3://bucket/file.parquet"))
The :config
option of from_*
functions is only required if the filename is a path
to a remote resource. In case it's a FSS entry, the requirement is that the config is passed
inside the entry struct.
Explorer.DataFrame
also implements the Access
behaviour (also known as the brackets
syntax). This should be familiar for users coming from other language with dataframes
-such as R or Python. For example:
iex> df = Explorer.Datasets.wine()
-iex> df["class"]
-#Explorer.Series<
- Polars[178]
- integer [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
->
Accessing the dataframe with a column name either as a string or an atom, will return -the column. You can also pass an integer representing the column order:
iex> df = Explorer.Datasets.wine()
-iex> df[0]
-#Explorer.Series<
- Polars[178]
- integer [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
->
You can also pass a list, a range, or a regex to return a dataframe matching -the given data type. For example, by passing a list:
iex> df = Explorer.Datasets.wine()
-iex> df[["class", "hue"]]
-#Explorer.DataFrame<
- Polars[178 x 2]
- class integer [1, 1, 1, 1, 1, ...]
- hue float [1.04, 1.05, 1.03, 0.86, 1.04, ...]
->
Or a range for the given positions:
iex> df = Explorer.Datasets.wine()
-iex> df[0..2]
-#Explorer.DataFrame<
- Polars[178 x 3]
- class integer [1, 1, 1, 1, 1, ...]
- alcohol float [14.23, 13.2, 13.16, 14.37, 13.24, ...]
- malic_acid float [1.71, 1.78, 2.36, 1.95, 2.59, ...]
->
Or a regex to keep only columns matching a given pattern:
iex> df = Explorer.Datasets.wine()
-iex> df[~r/(class|hue)/]
-#Explorer.DataFrame<
- Polars[178 x 2]
- class integer [1, 1, 1, 1, 1, ...]
- hue float [1.04, 1.05, 1.03, 0.86, 1.04, ...]
->
Given you can also access a series using its index, you can use -multiple accesses to select a column and row at the same time:
iex> df = Explorer.Datasets.wine()
-iex> df["class"][3]
+such as R or Python. For example:iex> df = Explorer.Datasets.wine()
+iex> df["class"]
+#Explorer.Series<
+ Polars[178]
+ integer [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
+>
Accessing the dataframe with a column name either as a string or an atom, will return
+the column. You can also pass an integer representing the column order:
iex> df = Explorer.Datasets.wine()
+iex> df[0]
+#Explorer.Series<
+ Polars[178]
+ integer [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
+>
You can also pass a list, a range, or a regex to return a dataframe matching
+the given data type. For example, by passing a list:
iex> df = Explorer.Datasets.wine()
+iex> df[["class", "hue"]]
+#Explorer.DataFrame<
+ Polars[178 x 2]
+ class integer [1, 1, 1, 1, 1, ...]
+ hue float [1.04, 1.05, 1.03, 0.86, 1.04, ...]
+>
Or a range for the given positions:
iex> df = Explorer.Datasets.wine()
+iex> df[0..2]
+#Explorer.DataFrame<
+ Polars[178 x 3]
+ class integer [1, 1, 1, 1, 1, ...]
+ alcohol float [14.23, 13.2, 13.16, 14.37, 13.24, ...]
+ malic_acid float [1.71, 1.78, 2.36, 1.95, 2.59, ...]
+>
Or a regex to keep only columns matching a given pattern:
iex> df = Explorer.Datasets.wine()
+iex> df[~r/(class|hue)/]
+#Explorer.DataFrame<
+ Polars[178 x 2]
+ class integer [1, 1, 1, 1, 1, ...]
+ hue float [1.04, 1.05, 1.03, 0.86, 1.04, ...]
+>
Given you can also access a series using its index, you can use
+multiple accesses to select a column and row at the same time:
iex> df = Explorer.Datasets.wine()
+iex> df["class"][3]
1
@@ -1603,15 +1603,15 @@
Series can be given either as keyword lists or maps -where the keys are the name and the values are series:
iex> Explorer.DataFrame.new(%{
-...> floats: Explorer.Series.from_list([1.0, 2.0]),
-...> ints: Explorer.Series.from_list([1, nil])
-...> })
-#Explorer.DataFrame<
- Polars[2 x 2]
- floats float [1.0, 2.0]
- ints integer [1, nil]
->
iex> Explorer.DataFrame.new(%{
+...> floats: Explorer.Series.from_list([1.0, 2.0]),
+...> ints: Explorer.Series.from_list([1, nil])
+...> })
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ floats float [1.0, 2.0]
+ ints integer [1, nil]
+>
To create dataframe from tensors, you can pass a matrix as argument. Each matrix column becomes a dataframe column with names x1, x2, x3, -etc:
iex> Explorer.DataFrame.new(Nx.tensor([
-...> [1, 2, 3],
-...> [4, 5, 6]
-...> ]))
-#Explorer.DataFrame<
- Polars[2 x 3]
- x1 integer [1, 4]
- x2 integer [2, 5]
- x3 integer [3, 6]
->
Explorer expects tensors to have certain types, so you may need to cast
-the data accordingly. See Explorer.Series.from_tensor/2
for more info.
You can also pass a keyword list or maps of vectors (rank 1 tensors):
iex> Explorer.DataFrame.new(%{
-...> floats: Nx.tensor([1.0, 2.0], type: :f64),
-...> ints: Nx.tensor([3, 4])
-...> })
-#Explorer.DataFrame<
- Polars[2 x 2]
- floats float [1.0, 2.0]
- ints integer [3, 4]
->
Use dtypes to force a particular representation:
iex> Explorer.DataFrame.new([
-...> floats: Nx.tensor([1.0, 2.0], type: :f64),
-...> times: Nx.tensor([3_000, 4_000])
-...> ], dtypes: [times: :time])
-#Explorer.DataFrame<
- Polars[2 x 2]
- floats float [1.0, 2.0]
- times time [00:00:00.000003, 00:00:00.000004]
->
iex> Explorer.DataFrame.new(Nx.tensor([
+...> [1, 2, 3],
+...> [4, 5, 6]
+...> ]))
+#Explorer.DataFrame<
+ Polars[2 x 3]
+ x1 integer [1, 4]
+ x2 integer [2, 5]
+ x3 integer [3, 6]
+>
Explorer expects tensors to have certain types, so you may need to cast
+the data accordingly. See Explorer.Series.from_tensor/2
for more info.
You can also pass a keyword list or maps of vectors (rank 1 tensors):
iex> Explorer.DataFrame.new(%{
+...> floats: Nx.tensor([1.0, 2.0], type: :f64),
+...> ints: Nx.tensor([3, 4])
+...> })
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ floats float [1.0, 2.0]
+ ints integer [3, 4]
+>
Use dtypes to force a particular representation:
iex> Explorer.DataFrame.new([
+...> floats: Nx.tensor([1.0, 2.0], type: :f64),
+...> times: Nx.tensor([3_000, 4_000])
+...> ], dtypes: [times: :time])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ floats float [1.0, 2.0]
+ times time [00:00:00.000003, 00:00:00.000004]
+>
Tabular data can be either columnar or row-based. -Let's start with column data:
iex> Explorer.DataFrame.new(%{floats: [1.0, 2.0], ints: [1, nil]})
-#Explorer.DataFrame<
- Polars[2 x 2]
- floats float [1.0, 2.0]
- ints integer [1, nil]
->
-
-iex> Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
-#Explorer.DataFrame<
- Polars[2 x 2]
- floats float [1.0, 2.0]
- ints integer [1, nil]
->
-
-iex> Explorer.DataFrame.new([floats: [1.0, 2.0], ints: [1, nil], binaries: [<<239, 191, 19>>, nil]], dtypes: [{:binaries, :binary}])
-#Explorer.DataFrame<
- Polars[2 x 3]
- floats float [1.0, 2.0]
- ints integer [1, nil]
- binaries binary [<<239, 191, 19>>, nil]
->
-
-iex> Explorer.DataFrame.new(%{floats: [1.0, 2.0], ints: [1, "wrong"]})
-** (ArgumentError) cannot create series "ints": the value "wrong" does not match the inferred series dtype :integer
From row data:
iex> rows = [%{id: 1, name: "JosƩ"}, %{id: 2, name: "Christopher"}, %{id: 3, name: "Cristine"}]
-iex> Explorer.DataFrame.new(rows)
-#Explorer.DataFrame<
- Polars[3 x 2]
- id integer [1, 2, 3]
- name string ["JosƩ", "Christopher", "Cristine"]
->
-
-iex> rows = [[id: 1, name: "JosƩ"], [id: 2, name: "Christopher"], [id: 3, name: "Cristine"]]
-iex> Explorer.DataFrame.new(rows)
-#Explorer.DataFrame<
- Polars[3 x 2]
- id integer [1, 2, 3]
- name string ["JosƩ", "Christopher", "Cristine"]
->
+Let's start with column data:iex> Explorer.DataFrame.new(%{floats: [1.0, 2.0], ints: [1, nil]})
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ floats float [1.0, 2.0]
+ ints integer [1, nil]
+>
+
+iex> Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ floats float [1.0, 2.0]
+ ints integer [1, nil]
+>
+
+iex> Explorer.DataFrame.new([floats: [1.0, 2.0], ints: [1, nil], binaries: [<<239, 191, 19>>, nil]], dtypes: [{:binaries, :binary}])
+#Explorer.DataFrame<
+ Polars[2 x 3]
+ floats float [1.0, 2.0]
+ ints integer [1, nil]
+ binaries binary [<<239, 191, 19>>, nil]
+>
+
+iex> Explorer.DataFrame.new(%{floats: [1.0, 2.0], ints: [1, "wrong"]})
+** (ArgumentError) cannot create series "ints": the value "wrong" does not match the inferred series dtype :integer
From row data:
iex> rows = [%{id: 1, name: "JosƩ"}, %{id: 2, name: "Christopher"}, %{id: 3, name: "Cristine"}]
+iex> Explorer.DataFrame.new(rows)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ id integer [1, 2, 3]
+ name string ["JosƩ", "Christopher", "Cristine"]
+>
+
+iex> rows = [[id: 1, name: "JosƩ"], [id: 2, name: "Christopher"], [id: 3, name: "Cristine"]]
+iex> Explorer.DataFrame.new(rows)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ id integer [1, 2, 3]
+ name string ["JosƩ", "Christopher", "Cristine"]
+>
iex> df = Explorer.DataFrame.new(ints: [1, nil], floats: [1.0, 2.0])
-iex> Explorer.DataFrame.to_columns(df)
-%{"floats" => [1.0, 2.0], "ints" => [1, nil]}
+iex> df = Explorer.DataFrame.new(ints: [1, nil], floats: [1.0, 2.0])
+iex> Explorer.DataFrame.to_columns(df)
+%{"floats" => [1.0, 2.0], "ints" => [1, nil]}
-iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
-iex> Explorer.DataFrame.to_columns(df, atom_keys: true)
-%{floats: [1.0, 2.0], ints: [1, nil]}
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
+iex> Explorer.DataFrame.to_columns(df, atom_keys: true)
+%{floats: [1.0, 2.0], ints: [1, nil]}
iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
-iex> Explorer.DataFrame.to_rows(df)
-[%{"floats" => 1.0, "ints" => 1}, %{"floats" => 2.0 ,"ints" => nil}]
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
+iex> Explorer.DataFrame.to_rows(df)
+[%{"floats" => 1.0, "ints" => 1}, %{"floats" => 2.0 ,"ints" => nil}]
-iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
-iex> Explorer.DataFrame.to_rows(df, atom_keys: true)
-[%{floats: 1.0, ints: 1}, %{floats: 2.0, ints: nil}]
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
+iex> Explorer.DataFrame.to_rows(df, atom_keys: true)
+[%{floats: 1.0, ints: 1}, %{floats: 2.0, ints: nil}]
iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
-iex> Explorer.DataFrame.to_rows_stream(df) |> Enum.map(& &1)
-[%{"floats" => 1.0, "ints" => 1}, %{"floats" => 2.0 ,"ints" => nil}]
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
+iex> Explorer.DataFrame.to_rows_stream(df) |> Enum.map(& &1)
+[%{"floats" => 1.0, "ints" => 1}, %{"floats" => 2.0 ,"ints" => nil}]
-iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
-iex> Explorer.DataFrame.to_rows_stream(df, atom_keys: true) |> Enum.map(& &1)
-[%{floats: 1.0, ints: 1}, %{floats: 2.0, ints: nil}]
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
+iex> Explorer.DataFrame.to_rows_stream(df, atom_keys: true) |> Enum.map(& &1)
+[%{floats: 1.0, ints: 1}, %{floats: 2.0, ints: nil}]
iex> df = Explorer.DataFrame.new(ints: [1, nil], floats: [1.0, 2.0])
-iex> map = Explorer.DataFrame.to_series(df)
-iex> Explorer.Series.to_list(map["floats"])
-[1.0, 2.0]
-iex> Explorer.Series.to_list(map["ints"])
-[1, nil]
+iex> df = Explorer.DataFrame.new(ints: [1, nil], floats: [1.0, 2.0])
+iex> map = Explorer.DataFrame.to_series(df)
+iex> Explorer.Series.to_list(map["floats"])
+[1.0, 2.0]
+iex> Explorer.Series.to_list(map["ints"])
+[1, nil]
A single column name will sort ascending by that column:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
-iex> Explorer.DataFrame.arrange(df, a)
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["a", "b", "c"]
- b integer [3, 1, 2]
->
You can also sort descending:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
-iex> Explorer.DataFrame.arrange(df, desc: a)
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["c", "b", "a"]
- b integer [2, 1, 3]
->
Sorting by more than one column sorts them in the order they are entered:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.arrange(df, asc: total, desc: country)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- year integer [2010, 2010, 2011, 2011, 2012, ...]
- country string ["NIUE", "TUVALU", "TUVALU", "NIUE", "NIUE", ...]
- total integer [1, 2, 2, 2, 2, ...]
- solid_fuel integer [0, 0, 0, 0, 0, ...]
- liquid_fuel integer [1, 2, 2, 2, 2, ...]
- gas_fuel integer [0, 0, 0, 0, 0, ...]
- cement integer [0, 0, 0, 0, 0, ...]
- gas_flaring integer [0, 0, 0, 0, 0, ...]
- per_capita float [0.52, 0.0, 0.0, 1.04, 1.04, ...]
- bunker_fuels integer [0, 0, 0, 0, 0, ...]
->
A single column name will sort ascending by that column:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
+iex> Explorer.DataFrame.arrange(df, a)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["a", "b", "c"]
+ b integer [3, 1, 2]
+>
You can also sort descending:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
+iex> Explorer.DataFrame.arrange(df, desc: a)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["c", "b", "a"]
+ b integer [2, 1, 3]
+>
Sorting by more than one column sorts them in the order they are entered:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.arrange(df, asc: total, desc: country)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ year integer [2010, 2010, 2011, 2011, 2012, ...]
+ country string ["NIUE", "TUVALU", "TUVALU", "NIUE", "NIUE", ...]
+ total integer [1, 2, 2, 2, 2, ...]
+ solid_fuel integer [0, 0, 0, 0, 0, ...]
+ liquid_fuel integer [1, 2, 2, 2, 2, ...]
+ gas_fuel integer [0, 0, 0, 0, 0, ...]
+ cement integer [0, 0, 0, 0, 0, ...]
+ gas_flaring integer [0, 0, 0, 0, 0, ...]
+ per_capita float [0.52, 0.0, 0.0, 1.04, 1.04, ...]
+ bunker_fuels integer [0, 0, 0, 0, 0, ...]
+>
Here is an example using the Iris dataset. We group by species and then we try to sort the dataframe by species and petal length, but only "petal length" is taken into account -because "species" is a group.
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.arrange(grouped, desc: species, asc: sepal_width)
-#Explorer.DataFrame<
- Polars[150 x 5]
- Groups: ["species"]
- sepal_length float [4.5, 4.4, 4.9, 4.8, 4.3, ...]
- sepal_width float [2.3, 2.9, 3.0, 3.0, 3.0, ...]
- petal_length float [1.3, 1.4, 1.4, 1.4, 1.1, ...]
- petal_width float [0.3, 0.2, 0.2, 0.1, 0.1, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+because "species" is a group.iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.arrange(grouped, desc: species, asc: sepal_width)
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ Groups: ["species"]
+ sepal_length float [4.5, 4.4, 4.9, 4.8, 4.3, ...]
+ sepal_width float [2.3, 2.9, 3.0, 3.0, 3.0, ...]
+ petal_length float [1.3, 1.4, 1.4, 1.4, 1.1, ...]
+ petal_width float [0.3, 0.2, 0.2, 0.1, 0.1, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
A single column name will sort ascending by that column:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
-iex> Explorer.DataFrame.arrange_with(df, &(&1["a"]))
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["a", "b", "c"]
- b integer [3, 1, 2]
->
You can also sort descending:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
-iex> Explorer.DataFrame.arrange_with(df, &[desc: &1["a"]])
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["c", "b", "a"]
- b integer [2, 1, 3]
->
Sorting by more than one column sorts them in the order they are entered:
iex> df = Explorer.DataFrame.new(a: [3, 1, 3], b: [2, 1, 3])
-iex> Explorer.DataFrame.arrange_with(df, &[desc: &1["a"], asc: &1["b"]])
-#Explorer.DataFrame<
- Polars[3 x 2]
- a integer [3, 3, 1]
- b integer [2, 3, 1]
->
A single column name will sort ascending by that column:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
+iex> Explorer.DataFrame.arrange_with(df, &(&1["a"]))
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["a", "b", "c"]
+ b integer [3, 1, 2]
+>
You can also sort descending:
iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
+iex> Explorer.DataFrame.arrange_with(df, &[desc: &1["a"]])
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["c", "b", "a"]
+ b integer [2, 1, 3]
+>
Sorting by more than one column sorts them in the order they are entered:
iex> df = Explorer.DataFrame.new(a: [3, 1, 3], b: [2, 1, 3])
+iex> Explorer.DataFrame.arrange_with(df, &[desc: &1["a"], asc: &1["b"]])
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a integer [3, 3, 1]
+ b integer [2, 3, 1]
+>
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.arrange_with(grouped, &[desc: &1["species"], asc: &1["sepal_width"]])
-#Explorer.DataFrame<
- Polars[150 x 5]
- Groups: ["species"]
- sepal_length float [4.5, 4.4, 4.9, 4.8, 4.3, ...]
- sepal_width float [2.3, 2.9, 3.0, 3.0, 3.0, ...]
- petal_length float [1.3, 1.4, 1.4, 1.4, 1.1, ...]
- petal_width float [0.3, 0.2, 0.2, 0.1, 0.1, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.arrange_with(grouped, &[desc: &1["species"], asc: &1["sepal_width"]])
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ Groups: ["species"]
+ sepal_length float [4.5, 4.4, 4.9, 4.8, 4.3, ...]
+ sepal_width float [2.3, 2.9, 3.0, 3.0, 3.0, ...]
+ petal_length float [1.3, 1.4, 1.4, 1.4, 1.1, ...]
+ petal_width float [0.3, 0.2, 0.2, 0.1, 0.1, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
iex> df = Explorer.DataFrame.new(a: ["d", nil, "f"], b: [1, 2, 3], c: ["a", "b", "c"])
-iex> Explorer.DataFrame.describe(df)
-#Explorer.DataFrame<
- Polars[9 x 4]
- describe string ["count", "null_count", "mean", "std", "min", ...]
- a string ["3", "1", nil, nil, "d", ...]
- b float [3.0, 0.0, 2.0, 1.0, 1.0, ...]
- c string ["3", "0", nil, nil, "a", ...]
->
-
-iex> df = Explorer.DataFrame.new(a: ["d", nil, "f"], b: [1, 2, 3], c: ["a", "b", "c"])
-iex> Explorer.DataFrame.describe(df, percentiles: [0.3, 0.5, 0.8])
-#Explorer.DataFrame<
- Polars[9 x 4]
- describe string ["count", "null_count", "mean", "std", "min", ...]
- a string ["3", "1", nil, nil, "d", ...]
- b float [3.0, 0.0, 2.0, 1.0, 1.0, ...]
- c string ["3", "0", nil, nil, "a", ...]
->
+iex> df = Explorer.DataFrame.new(a: ["d", nil, "f"], b: [1, 2, 3], c: ["a", "b", "c"])
+iex> Explorer.DataFrame.describe(df)
+#Explorer.DataFrame<
+ Polars[9 x 4]
+ describe string ["count", "null_count", "mean", "std", "min", ...]
+ a string ["3", "1", nil, nil, "d", ...]
+ b float [3.0, 0.0, 2.0, 1.0, 1.0, ...]
+ c string ["3", "0", nil, nil, "a", ...]
+>
+
+iex> df = Explorer.DataFrame.new(a: ["d", nil, "f"], b: [1, 2, 3], c: ["a", "b", "c"])
+iex> Explorer.DataFrame.describe(df, percentiles: [0.3, 0.5, 0.8])
+#Explorer.DataFrame<
+ Polars[9 x 4]
+ describe string ["count", "null_count", "mean", "std", "min", ...]
+ a string ["3", "1", nil, nil, "d", ...]
+ b float [3.0, 0.0, 2.0, 1.0, 1.0, ...]
+ c string ["3", "0", nil, nil, "a", ...]
+>
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.discard(df, ["b"])
-#Explorer.DataFrame<
- Polars[3 x 1]
- a string ["a", "b", "c"]
->
-
-iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
-iex> Explorer.DataFrame.discard(df, ["a", "b"])
-#Explorer.DataFrame<
- Polars[3 x 1]
- c integer [4, 5, 6]
->
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.discard(df, ["b"])
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a string ["a", "b", "c"]
+>
+
+iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
+iex> Explorer.DataFrame.discard(df, ["a", "b"])
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ c integer [4, 5, 6]
+>
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
You cannot discard grouped columns. You need to ungroup before removing them:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.discard(grouped, ["species"])
-#Explorer.DataFrame<
- Polars[150 x 5]
- Groups: ["species"]
- sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
- petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
- petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+You cannot discard grouped columns. You need to ungroup before removing them:
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.discard(grouped, ["species"])
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
+ petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
+ petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
By default will return unique values of the requested columns:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.distinct(df, ["year", "country"])
-#Explorer.DataFrame<
- Polars[1094 x 2]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
->
If keep_all
is set to true
, then the first value of each column not in the requested
-columns will be returned:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.distinct(df, ["year", "country"], keep_all: true)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
A callback on the dataframe's names can be passed instead of a list (like select/2
):
iex> df = Explorer.DataFrame.new(x1: [1, 3, 3], x2: ["a", "c", "c"], y1: [1, 2, 3])
-iex> Explorer.DataFrame.distinct(df, &String.starts_with?(&1, "x"))
-#Explorer.DataFrame<
- Polars[2 x 2]
- x1 integer [1, 3]
- x2 string ["a", "c"]
->
If the dataframe has groups, then the columns of each group will be added to the distinct columns:
iex> df = Explorer.DataFrame.new(x1: [1, 3, 3], x2: ["a", "c", "c"], y1: [1, 2, 3])
-iex> df = Explorer.DataFrame.group_by(df, "x1")
-iex> Explorer.DataFrame.distinct(df, ["x2"])
-#Explorer.DataFrame<
- Polars[2 x 2]
- Groups: ["x1"]
- x1 integer [1, 3]
- x2 string ["a", "c"]
->
+By default will return unique values of the requested columns:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.distinct(df, ["year", "country"])
+#Explorer.DataFrame<
+ Polars[1094 x 2]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+>
If keep_all
is set to true
, then the first value of each column not in the requested
+columns will be returned:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.distinct(df, ["year", "country"], keep_all: true)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
A callback on the dataframe's names can be passed instead of a list (like select/2
):
iex> df = Explorer.DataFrame.new(x1: [1, 3, 3], x2: ["a", "c", "c"], y1: [1, 2, 3])
+iex> Explorer.DataFrame.distinct(df, &String.starts_with?(&1, "x"))
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ x1 integer [1, 3]
+ x2 string ["a", "c"]
+>
If the dataframe has groups, then the columns of each group will be added to the distinct columns:
iex> df = Explorer.DataFrame.new(x1: [1, 3, 3], x2: ["a", "c", "c"], y1: [1, 2, 3])
+iex> df = Explorer.DataFrame.group_by(df, "x1")
+iex> Explorer.DataFrame.distinct(df, ["x2"])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ Groups: ["x1"]
+ x1 integer [1, 3]
+ x2 string ["a", "c"]
+>
To drop nils on all columns:
iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3])
-iex> Explorer.DataFrame.drop_nil(df)
-#Explorer.DataFrame<
- Polars[1 x 2]
- a integer [1]
- b integer [1]
->
To drop nils on a single column:
iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3])
-iex> Explorer.DataFrame.drop_nil(df, :a)
-#Explorer.DataFrame<
- Polars[2 x 2]
- a integer [1, 2]
- b integer [1, nil]
->
To drop some columns:
iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3], c: [nil, 5, 6])
-iex> Explorer.DataFrame.drop_nil(df, [:a, :c])
-#Explorer.DataFrame<
- Polars[1 x 3]
- a integer [2]
- b integer [nil]
- c integer [5]
->
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
To drop nils on all columns:
iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3])
+iex> Explorer.DataFrame.drop_nil(df)
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ a integer [1]
+ b integer [1]
+>
To drop nils on a single column:
iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3])
+iex> Explorer.DataFrame.drop_nil(df, :a)
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ a integer [1, 2]
+ b integer [1, nil]
+>
To drop some columns:
iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3], c: [nil, 5, 6])
+iex> Explorer.DataFrame.drop_nil(df, [:a, :c])
+#Explorer.DataFrame<
+ Polars[1 x 3]
+ a integer [2]
+ b integer [nil]
+ c integer [5]
+>
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
To mark a single column as dummy:
iex> df = Explorer.DataFrame.new(col_x: ["a", "b", "a", "c"], col_y: ["b", "a", "b", "d"])
-iex> Explorer.DataFrame.dummies(df, "col_x")
-#Explorer.DataFrame<
- Polars[4 x 3]
- col_x_a integer [1, 0, 1, 0]
- col_x_b integer [0, 1, 0, 0]
- col_x_c integer [0, 0, 0, 1]
->
Or multiple columns:
iex> df = Explorer.DataFrame.new(col_x: ["a", "b", "a", "c"], col_y: ["b", "a", "b", "d"])
-iex> Explorer.DataFrame.dummies(df, ["col_x", "col_y"])
-#Explorer.DataFrame<
- Polars[4 x 6]
- col_x_a integer [1, 0, 1, 0]
- col_x_b integer [0, 1, 0, 0]
- col_x_c integer [0, 0, 0, 1]
- col_y_b integer [1, 0, 1, 0]
- col_y_a integer [0, 1, 0, 0]
- col_y_d integer [0, 0, 0, 1]
->
Or all string columns:
iex> df = Explorer.DataFrame.new(num: [1, 2, 3, 4], col_y: ["b", "a", "b", "d"])
-iex> Explorer.DataFrame.dummies(df, fn _name, type -> type == :string end)
-#Explorer.DataFrame<
- Polars[4 x 3]
- col_y_b integer [1, 0, 1, 0]
- col_y_a integer [0, 1, 0, 0]
- col_y_d integer [0, 0, 0, 1]
->
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
To mark a single column as dummy:
iex> df = Explorer.DataFrame.new(col_x: ["a", "b", "a", "c"], col_y: ["b", "a", "b", "d"])
+iex> Explorer.DataFrame.dummies(df, "col_x")
+#Explorer.DataFrame<
+ Polars[4 x 3]
+ col_x_a integer [1, 0, 1, 0]
+ col_x_b integer [0, 1, 0, 0]
+ col_x_c integer [0, 0, 0, 1]
+>
Or multiple columns:
iex> df = Explorer.DataFrame.new(col_x: ["a", "b", "a", "c"], col_y: ["b", "a", "b", "d"])
+iex> Explorer.DataFrame.dummies(df, ["col_x", "col_y"])
+#Explorer.DataFrame<
+ Polars[4 x 6]
+ col_x_a integer [1, 0, 1, 0]
+ col_x_b integer [0, 1, 0, 0]
+ col_x_c integer [0, 0, 0, 1]
+ col_y_b integer [1, 0, 1, 0]
+ col_y_a integer [0, 1, 0, 0]
+ col_y_d integer [0, 0, 0, 1]
+>
Or all string columns:
iex> df = Explorer.DataFrame.new(num: [1, 2, 3, 4], col_y: ["b", "a", "b", "d"])
+iex> Explorer.DataFrame.dummies(df, fn _name, type -> type == :string end)
+#Explorer.DataFrame<
+ Polars[4 x 3]
+ col_y_b integer [1, 0, 1, 0]
+ col_y_a integer [0, 1, 0, 0]
+ col_y_d integer [0, 0, 0, 1]
+>
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter(df, col2 > 2)
-#Explorer.DataFrame<
- Polars[1 x 2]
- col1 string ["c"]
- col2 integer [3]
->
-
-iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter(df, col1 == "b")
-#Explorer.DataFrame<
- Polars[1 x 2]
- col1 string ["b"]
- col2 integer [2]
->
-
-iex> df = Explorer.DataFrame.new(col1: [5, 4, 3], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter(df, [col1 > 3, col2 < 3])
-#Explorer.DataFrame<
- Polars[2 x 2]
- col1 integer [5, 4]
- col2 integer [1, 2]
->
Returning a non-boolean expression errors:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter(df, cumulative_max(col2))
-** (ArgumentError) expecting the function to return a boolean LazySeries, but instead it returned a LazySeries of type :integer
Which can be addressed by converting it to boolean:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter(df, cumulative_max(col2) == 1)
-#Explorer.DataFrame<
- Polars[1 x 2]
- col1 string ["a"]
- col2 integer [1]
->
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter(df, col2 > 2)
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ col1 string ["c"]
+ col2 integer [3]
+>
+
+iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter(df, col1 == "b")
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ col1 string ["b"]
+ col2 integer [2]
+>
+
+iex> df = Explorer.DataFrame.new(col1: [5, 4, 3], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter(df, [col1 > 3, col2 < 3])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ col1 integer [5, 4]
+ col2 integer [1, 2]
+>
Returning a non-boolean expression errors:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter(df, cumulative_max(col2))
+** (ArgumentError) expecting the function to return a boolean LazySeries, but instead it returned a LazySeries of type :integer
Which can be addressed by converting it to boolean:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter(df, cumulative_max(col2) == 1)
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ col1 string ["a"]
+ col2 integer [1]
+>
In a grouped dataframe, the aggregation is calculated within each group.
In the following example we select the flowers of the Iris dataset that have the "petal length" -above the average of each species group.
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.filter(grouped, petal_length > mean(petal_length))
-#Explorer.DataFrame<
- Polars[79 x 5]
- Groups: ["species"]
- sepal_length float [4.6, 5.4, 5.0, 4.9, 5.4, ...]
- sepal_width float [3.1, 3.9, 3.4, 3.1, 3.7, ...]
- petal_length float [1.5, 1.7, 1.5, 1.5, 1.5, ...]
- petal_width float [0.2, 0.4, 0.2, 0.1, 0.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+above the average of each species group.iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.filter(grouped, petal_length > mean(petal_length))
+#Explorer.DataFrame<
+ Polars[79 x 5]
+ Groups: ["species"]
+ sepal_length float [4.6, 5.4, 5.0, 4.9, 5.4, ...]
+ sepal_width float [3.1, 3.9, 3.4, 3.1, 3.7, ...]
+ petal_length float [1.5, 1.7, 1.5, 1.5, 1.5, ...]
+ petal_width float [0.2, 0.4, 0.2, 0.1, 0.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter_with(df, &Explorer.Series.greater(&1["col2"], 2))
-#Explorer.DataFrame<
- Polars[1 x 2]
- col1 string ["c"]
- col2 integer [3]
->
-
-iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.equal(df["col1"], "b") end)
-#Explorer.DataFrame<
- Polars[1 x 2]
- col1 string ["b"]
- col2 integer [2]
->
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter_with(df, &Explorer.Series.greater(&1["col2"], 2))
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ col1 string ["c"]
+ col2 integer [3]
+>
+
+iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.equal(df["col1"], "b") end)
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ col1 string ["b"]
+ col2 integer [2]
+>
In a grouped dataframe, the aggregation is calculated within each group.
In the following example we select the flowers of the Iris dataset that have the "petal length" -above the average of each species group.
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.filter_with(grouped, &Explorer.Series.greater(&1["petal_length"], Explorer.Series.mean(&1["petal_length"])))
-#Explorer.DataFrame<
- Polars[79 x 5]
- Groups: ["species"]
- sepal_length float [4.6, 5.4, 5.0, 4.9, 5.4, ...]
- sepal_width float [3.1, 3.9, 3.4, 3.1, 3.7, ...]
- petal_length float [1.5, 1.7, 1.5, 1.5, 1.5, ...]
- petal_width float [0.2, 0.4, 0.2, 0.1, 0.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+above the average of each species group.iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.filter_with(grouped, &Explorer.Series.greater(&1["petal_length"], Explorer.Series.mean(&1["petal_length"])))
+#Explorer.DataFrame<
+ Polars[79 x 5]
+ Groups: ["species"]
+ sepal_length float [4.6, 5.4, 5.0, 4.9, 5.4, ...]
+ sepal_width float [3.1, 3.9, 3.4, 3.1, 3.7, ...]
+ petal_length float [1.5, 1.7, 1.5, 1.5, 1.5, ...]
+ petal_width float [0.2, 0.4, 0.2, 0.1, 0.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
iex> df = Explorer.DataFrame.new(a: ["a", "a", "b"], b: [1, 1, nil])
-iex> Explorer.DataFrame.frequencies(df, [:a, :b])
-#Explorer.DataFrame<
- Polars[2 x 3]
- a string ["a", "b"]
- b integer [1, nil]
- counts integer [2, 1]
->
+iex> df = Explorer.DataFrame.new(a: ["a", "a", "b"], b: [1, 1, nil])
+iex> Explorer.DataFrame.frequencies(df, [:a, :b])
+#Explorer.DataFrame<
+ Polars[2 x 3]
+ a string ["a", "b"]
+ b integer [1, nil]
+ counts integer [2, 1]
+>
You can group by a single variable:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.group_by(df, "country")
-#Explorer.DataFrame<
- Polars[1094 x 10]
- Groups: ["country"]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Or you can group by multiple columns in a given list:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.group_by(df, ["country", "year"])
-#Explorer.DataFrame<
- Polars[1094 x 10]
- Groups: ["country", "year"]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Or by a range:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.group_by(df, 0..1)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- Groups: ["year", "country"]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Regexes and functions are also accepted in column names, as in select/2
.
You can group by a single variable:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.group_by(df, "country")
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ Groups: ["country"]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Or you can group by multiple columns in a given list:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.group_by(df, ["country", "year"])
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ Groups: ["country", "year"]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Or by a range:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.group_by(df, 0..1)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ Groups: ["year", "country"]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Regexes and functions are also accepted in column names, as in select/2
.
This function must only be used when you need to select rows based on external values that are not available to the dataframe. For example, -you can pass a list:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.mask(df, [false, true, false])
-#Explorer.DataFrame<
- Polars[1 x 2]
- col1 string ["b"]
- col2 integer [2]
->
You must avoid using masks when the masks themselves are computed from -other columns. For example, DO NOT do this:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.mask(df, Explorer.Series.greater(df["col2"], 1))
-#Explorer.DataFrame<
- Polars[2 x 2]
- col1 string ["b", "c"]
- col2 integer [2, 3]
->
Instead, do this:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
-iex> Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.greater(df["col2"], 1) end)
-#Explorer.DataFrame<
- Polars[2 x 2]
- col1 string ["b", "c"]
- col2 integer [2, 3]
->
The filter_with/2
version is much more efficient because it doesn't need
+you can pass a list:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.mask(df, [false, true, false])
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ col1 string ["b"]
+ col2 integer [2]
+>
You must avoid using masks when the masks themselves are computed from +other columns. For example, DO NOT do this:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.mask(df, Explorer.Series.greater(df["col2"], 1))
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ col1 string ["b", "c"]
+ col2 integer [2, 3]
+>
Instead, do this:
iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
+iex> Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.greater(df["col2"], 1) end)
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ col1 string ["b", "c"]
+ col2 integer [2, 3]
+>
The filter_with/2
version is much more efficient because it doesn't need
to create intermediate series representations to apply the mask.
Mutations are useful to add or modify columns in your dataframe:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate(df, c: b + 1)
-#Explorer.DataFrame<
- Polars[3 x 3]
- a string ["a", "b", "c"]
- b integer [1, 2, 3]
- c integer [2, 3, 4]
->
It's also possible to overwrite existing columns:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate(df, a: b * 2)
-#Explorer.DataFrame<
- Polars[3 x 2]
- a integer [2, 4, 6]
- b integer [1, 2, 3]
->
Scalar values are repeated to fill the series:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate(df, a: 4)
-#Explorer.DataFrame<
- Polars[3 x 2]
- a integer [4, 4, 4]
- b integer [1, 2, 3]
->
It's also possible to use functions from the Series module, like Explorer.Series.window_sum/3
:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> Explorer.DataFrame.mutate(df, b: window_sum(a, 2))
-#Explorer.DataFrame<
- Polars[3 x 2]
- a integer [1, 2, 3]
- b integer [1, 3, 5]
->
Alternatively, all of the above works with a map instead of a keyword list:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate(df, %{"c" => cast(b, :float)})
-#Explorer.DataFrame<
- Polars[3 x 3]
- a string ["a", "b", "c"]
- b integer [1, 2, 3]
- c float [1.0, 2.0, 3.0]
->
Mutations are useful to add or modify columns in your dataframe:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate(df, c: b + 1)
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ a string ["a", "b", "c"]
+ b integer [1, 2, 3]
+ c integer [2, 3, 4]
+>
It's also possible to overwrite existing columns:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate(df, a: b * 2)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a integer [2, 4, 6]
+ b integer [1, 2, 3]
+>
Scalar values are repeated to fill the series:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate(df, a: 4)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a integer [4, 4, 4]
+ b integer [1, 2, 3]
+>
It's also possible to use functions from the Series module, like Explorer.Series.window_sum/3
:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> Explorer.DataFrame.mutate(df, b: window_sum(a, 2))
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a integer [1, 2, 3]
+ b integer [1, 3, 5]
+>
Alternatively, all of the above works with a map instead of a keyword list:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate(df, %{"c" => cast(b, :float)})
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ a string ["a", "b", "c"]
+ b integer [1, 2, 3]
+ c float [1.0, 2.0, 3.0]
+>
summarise/2
,
but repeating the results for each member in the group.
For example, if we want to count how many elements of a given group, we can add a new
-column with that aggregation:iex> df = Explorer.DataFrame.new(id: ["a", "a", "b"], b: [1, 2, 3])
-iex> grouped = Explorer.DataFrame.group_by(df, :id)
-iex> Explorer.DataFrame.mutate(grouped, count: count(b))
-#Explorer.DataFrame<
- Polars[3 x 3]
- Groups: ["id"]
- id string ["a", "a", "b"]
- b integer [1, 2, 3]
- count integer [2, 2, 1]
->
In case we want to get the average size of the petal length from the Iris dataset, we can:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.mutate(grouped, petal_length_avg: mean(petal_length))
-#Explorer.DataFrame<
- Polars[150 x 6]
- Groups: ["species"]
- sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
- petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
- petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
- petal_length_avg float [1.4640000000000004, 1.4640000000000004, 1.4640000000000004, 1.4640000000000004, 1.4640000000000004, ...]
->
+column with that aggregation:iex> df = Explorer.DataFrame.new(id: ["a", "a", "b"], b: [1, 2, 3])
+iex> grouped = Explorer.DataFrame.group_by(df, :id)
+iex> Explorer.DataFrame.mutate(grouped, count: count(b))
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ Groups: ["id"]
+ id string ["a", "a", "b"]
+ b integer [1, 2, 3]
+ count integer [2, 2, 1]
+>
In case we want to get the average size of the petal length from the Iris dataset, we can:
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.mutate(grouped, petal_length_avg: mean(petal_length))
+#Explorer.DataFrame<
+ Polars[150 x 6]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
+ petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
+ petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+ petal_length_avg float [1.4640000000000004, 1.4640000000000004, 1.4640000000000004, 1.4640000000000004, 1.4640000000000004, ...]
+>
Here is an example of a new column that sums the value of two other columns:
iex> df = Explorer.DataFrame.new(a: [4, 5, 6], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate_with(df, &[c: Explorer.Series.add(&1["a"], &1["b"])])
-#Explorer.DataFrame<
- Polars[3 x 3]
- a integer [4, 5, 6]
- b integer [1, 2, 3]
- c integer [5, 7, 9]
->
You can overwrite existing columns as well:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate_with(df, &[b: Explorer.Series.pow(&1["b"], 2)])
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["a", "b", "c"]
- b float [1.0, 4.0, 9.0]
->
It's possible to "reuse" a variable for different computations:
iex> df = Explorer.DataFrame.new(a: [4, 5, 6], b: [1, 2, 3])
-iex> Explorer.DataFrame.mutate_with(df, fn ldf ->
-iex> c = Explorer.Series.add(ldf["a"], ldf["b"])
-iex> [c: c, d: Explorer.Series.window_sum(c, 2)]
-iex> end)
-#Explorer.DataFrame<
- Polars[3 x 4]
- a integer [4, 5, 6]
- b integer [1, 2, 3]
- c integer [5, 7, 9]
- d integer [5, 12, 16]
->
Here is an example of a new column that sums the value of two other columns:
iex> df = Explorer.DataFrame.new(a: [4, 5, 6], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate_with(df, &[c: Explorer.Series.add(&1["a"], &1["b"])])
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ a integer [4, 5, 6]
+ b integer [1, 2, 3]
+ c integer [5, 7, 9]
+>
You can overwrite existing columns as well:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate_with(df, &[b: Explorer.Series.pow(&1["b"], 2)])
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["a", "b", "c"]
+ b float [1.0, 4.0, 9.0]
+>
It's possible to "reuse" a variable for different computations:
iex> df = Explorer.DataFrame.new(a: [4, 5, 6], b: [1, 2, 3])
+iex> Explorer.DataFrame.mutate_with(df, fn ldf ->
+iex> c = Explorer.Series.add(ldf["a"], ldf["b"])
+iex> [c: c, d: Explorer.Series.window_sum(c, 2)]
+iex> end)
+#Explorer.DataFrame<
+ Polars[3 x 4]
+ a integer [4, 5, 6]
+ b integer [1, 2, 3]
+ c integer [5, 7, 9]
+ d integer [5, 12, 16]
+>
Mutations in grouped dataframes takes the context of the group. For example, if we want to count how many elements of a given group, -we can add a new column with that aggregation:
iex> df = Explorer.DataFrame.new(id: ["a", "a", "b"], b: [1, 2, 3])
-iex> grouped = Explorer.DataFrame.group_by(df, :id)
-iex> Explorer.DataFrame.mutate_with(grouped, &[count: Explorer.Series.count(&1["b"])])
-#Explorer.DataFrame<
- Polars[3 x 3]
- Groups: ["id"]
- id string ["a", "a", "b"]
- b integer [1, 2, 3]
- count integer [2, 2, 1]
->
+we can add a new column with that aggregation:iex> df = Explorer.DataFrame.new(id: ["a", "a", "b"], b: [1, 2, 3])
+iex> grouped = Explorer.DataFrame.group_by(df, :id)
+iex> Explorer.DataFrame.mutate_with(grouped, &[count: Explorer.Series.count(&1["b"])])
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ Groups: ["id"]
+ id string ["a", "a", "b"]
+ b integer [1, 2, 3]
+ count integer [2, 2, 1]
+>
iex> df = Explorer.DataFrame.new(a: ["d", nil, "f"], b: [nil, 2, nil], c: ["a", "b", "c"])
-iex> Explorer.DataFrame.nil_count(df)
-#Explorer.DataFrame<
- Polars[1 x 3]
- a integer [1]
- b integer [2]
- c integer [0]
->
+iex> df = Explorer.DataFrame.new(a: ["d", nil, "f"], b: [nil, 2, nil], c: ["a", "b", "c"])
+iex> Explorer.DataFrame.nil_count(df)
+#Explorer.DataFrame<
+ Polars[1 x 3]
+ a integer [1]
+ b integer [2]
+ c integer [0]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.pivot_longer(df, &String.ends_with?(&1, "fuel"))
-#Explorer.DataFrame<
- Polars[3282 x 9]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
- variable string ["solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", ...]
- value integer [627, 117, 332, 0, 0, ...]
->
-
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.pivot_longer(df, &String.ends_with?(&1, "fuel"), select: ["year", "country"])
-#Explorer.DataFrame<
- Polars[3282 x 4]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- variable string ["solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", ...]
- value integer [627, 117, 332, 0, 0, ...]
->
-
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.pivot_longer(df, ["total"], select: ["year", "country"], discard: ["country"])
-#Explorer.DataFrame<
- Polars[1094 x 3]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- variable string ["total", "total", "total", "total", "total", ...]
- value integer [2308, 1254, 32500, 141, 7924, ...]
->
-
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.pivot_longer(df, ["total"], select: [], names_to: "my_var", values_to: "my_value")
-#Explorer.DataFrame<
- Polars[1094 x 2]
- my_var string ["total", "total", "total", "total", "total", ...]
- my_value integer [2308, 1254, 32500, 141, 7924, ...]
->
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.pivot_longer(df, &String.ends_with?(&1, "fuel"))
+#Explorer.DataFrame<
+ Polars[3282 x 9]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+ variable string ["solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", ...]
+ value integer [627, 117, 332, 0, 0, ...]
+>
+
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.pivot_longer(df, &String.ends_with?(&1, "fuel"), select: ["year", "country"])
+#Explorer.DataFrame<
+ Polars[3282 x 4]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ variable string ["solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", ...]
+ value integer [627, 117, 332, 0, 0, ...]
+>
+
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.pivot_longer(df, ["total"], select: ["year", "country"], discard: ["country"])
+#Explorer.DataFrame<
+ Polars[1094 x 3]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ variable string ["total", "total", "total", "total", "total", ...]
+ value integer [2308, 1254, 32500, 141, 7924, ...]
+>
+
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.pivot_longer(df, ["total"], select: [], names_to: "my_var", values_to: "my_value")
+#Explorer.DataFrame<
+ Polars[1094 x 2]
+ my_var string ["total", "total", "total", "total", "total", ...]
+ my_value integer [2308, 1254, 32500, 141, 7924, ...]
+>
In the following example we want to take the Iris dataset and increase the number of rows by pivoting the "sepal_length" column. This dataset is grouped by "species", so the resultant -dataframe is going to keep the "species" group:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.pivot_longer(grouped, ["sepal_length"])
-#Explorer.DataFrame<
- Polars[150 x 6]
- Groups: ["species"]
- sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
- petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
- petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
- variable string ["sepal_length", "sepal_length", "sepal_length", "sepal_length", "sepal_length", ...]
- value float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
->
Now we want to do something different: we want to pivot the "species" column that is also a group. -This is going to remove the group in the resultant dataframe:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.pivot_longer(grouped, ["species"])
-#Explorer.DataFrame<
- Polars[150 x 6]
- sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
- petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
- petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
- variable string ["species", "species", "species", "species", "species", ...]
- value string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+dataframe is going to keep the "species" group:iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.pivot_longer(grouped, ["sepal_length"])
+#Explorer.DataFrame<
+ Polars[150 x 6]
+ Groups: ["species"]
+ sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
+ petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
+ petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+ variable string ["sepal_length", "sepal_length", "sepal_length", "sepal_length", "sepal_length", ...]
+ value float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
+>
Now we want to do something different: we want to pivot the "species" column that is also a group. +This is going to remove the group in the resultant dataframe:
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.pivot_longer(grouped, ["species"])
+#Explorer.DataFrame<
+ Polars[150 x 6]
+ sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
+ petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
+ petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
+ variable string ["species", "species", "species", "species", "species", ...]
+ value string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
Suppose we have a basketball court and multiple teams that want to train in that court. They need to share a schedule with the hours each team is going to use it. Here is a dataframe representing -that schedule:
iex> Explorer.DataFrame.new(
-iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
-iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
-iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
-iex> )
This dataframe is going to look like this - using table/2
:
+----------------------------------------------+
- | Explorer DataFrame: [rows: 10, columns: 3] |
+that schedule:iex> Explorer.DataFrame.new(
+iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
+iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
+iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
+iex> )
This dataframe is going to look like this - using table/2
:
+----------------------------------------------+
+ | Explorer DataFrame: [rows: 10, columns: 3] |
+---------------+--------------+---------------+
| weekday | team | hour |
| <string> | <string> | <integer> |
@@ -3181,22 +3181,22 @@ pivot_wider(df, names_from, values_from, op
| Friday | A | 16 |
+---------------+--------------+---------------+
You can see that the "weekday" repeats, and it's not clear how free the agenda is.
We can solve that by pivoting the "weekday" column in multiple columns, making each weekday
-a new column in the resultant dataframe.
iex> df = Explorer.DataFrame.new(
-iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
-iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
-iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
-iex> )
-iex> Explorer.DataFrame.pivot_wider(df, "weekday", "hour")
-#Explorer.DataFrame<
- Polars[3 x 6]
- team string ["A", "B", "C"]
- Monday integer [10, nil, 15]
- Tuesday integer [14, 9, nil]
- Wednesday integer [nil, 16, 10]
- Thursday integer [10, nil, 14]
- Friday integer [16, 11, nil]
->
Now if we print that same dataframe with table/2
, we get a better picture of the schedule:
+----------------------------------------------------------------------+
- | Explorer DataFrame: [rows: 3, columns: 6] |
+a new column in the resultant dataframe.iex> df = Explorer.DataFrame.new(
+iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
+iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
+iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
+iex> )
+iex> Explorer.DataFrame.pivot_wider(df, "weekday", "hour")
+#Explorer.DataFrame<
+ Polars[3 x 6]
+ team string ["A", "B", "C"]
+ Monday integer [10, nil, 15]
+ Tuesday integer [14, 9, nil]
+ Wednesday integer [nil, 16, 10]
+ Thursday integer [10, nil, 14]
+ Friday integer [16, 11, nil]
+>
Now if we print that same dataframe with table/2
, we get a better picture of the schedule:
+----------------------------------------------------------------------+
+ | Explorer DataFrame: [rows: 3, columns: 6] |
+----------+-----------+-----------+-----------+-----------+-----------+
| team | Monday | Tuesday | Wednesday | Thursday | Friday |
| <string> | <integer> | <integer> | <integer> | <integer> | <integer> |
@@ -3207,93 +3207,93 @@ pivot_wider(df, names_from, values_from, op
+----------+-----------+-----------+-----------+-----------+-----------+
| C | 15 | | 10 | 14 | |
+----------+-----------+-----------+-----------+-----------+-----------+
Pivot wider can create unpredictable column names, and sometimes they can conflict with ID columns.
-In that scenario, we add a number as suffix to duplicated column names. Here is an example:
iex> df = Explorer.DataFrame.new(
-iex> product_id: [1, 1, 1, 1, 2, 2, 2, 2],
-iex> property: ["product_id", "width_cm", "height_cm", "length_cm", "product_id", "width_cm", "height_cm", "length_cm"],
-iex> property_value: [1, 42, 40, 64, 2, 35, 20, 40]
-iex> )
-iex> Explorer.DataFrame.pivot_wider(df, "property", "property_value")
-#Explorer.DataFrame<
- Polars[2 x 5]
- product_id integer [1, 2]
- product_id_1 integer [1, 2]
- width_cm integer [42, 35]
- height_cm integer [40, 20]
- length_cm integer [64, 40]
->
But if the option :names_prefix
is used, that suffix is not added:
iex> df = Explorer.DataFrame.new(
-iex> product_id: [1, 1, 1, 1, 2, 2, 2, 2],
-iex> property: ["product_id", "width_cm", "height_cm", "length_cm", "product_id", "width_cm", "height_cm", "length_cm"],
-iex> property_value: [1, 42, 40, 64, 2, 35, 20, 40]
-iex> )
-iex> Explorer.DataFrame.pivot_wider(df, "property", "property_value", names_prefix: "col_")
-#Explorer.DataFrame<
- Polars[2 x 5]
- product_id integer [1, 2]
- col_product_id integer [1, 2]
- col_width_cm integer [42, 35]
- col_height_cm integer [40, 20]
- col_length_cm integer [64, 40]
->
Multiple columns are accepted for the values_from
parameter, but the behaviour is slightly
+In that scenario, we add a number as suffix to duplicated column names. Here is an example:
iex> df = Explorer.DataFrame.new(
+iex> product_id: [1, 1, 1, 1, 2, 2, 2, 2],
+iex> property: ["product_id", "width_cm", "height_cm", "length_cm", "product_id", "width_cm", "height_cm", "length_cm"],
+iex> property_value: [1, 42, 40, 64, 2, 35, 20, 40]
+iex> )
+iex> Explorer.DataFrame.pivot_wider(df, "property", "property_value")
+#Explorer.DataFrame<
+ Polars[2 x 5]
+ product_id integer [1, 2]
+ product_id_1 integer [1, 2]
+ width_cm integer [42, 35]
+ height_cm integer [40, 20]
+ length_cm integer [64, 40]
+>
But if the option :names_prefix
is used, that suffix is not added:
iex> df = Explorer.DataFrame.new(
+iex> product_id: [1, 1, 1, 1, 2, 2, 2, 2],
+iex> property: ["product_id", "width_cm", "height_cm", "length_cm", "product_id", "width_cm", "height_cm", "length_cm"],
+iex> property_value: [1, 42, 40, 64, 2, 35, 20, 40]
+iex> )
+iex> Explorer.DataFrame.pivot_wider(df, "property", "property_value", names_prefix: "col_")
+#Explorer.DataFrame<
+ Polars[2 x 5]
+ product_id integer [1, 2]
+ col_product_id integer [1, 2]
+ col_width_cm integer [42, 35]
+ col_height_cm integer [40, 20]
+ col_length_cm integer [64, 40]
+>
Multiple columns are accepted for the values_from
parameter, but the behaviour is slightly
different for the naming of new columns in the resultant dataframe. The new columns are going
to be prefixed by the name of the original value column, followed by an underscore and the
-original column name, followed by the name of the variable.
iex> df = Explorer.DataFrame.new(
-iex> product_id: [1, 1, 1, 1, 2, 2, 2, 2],
-iex> property: ["product_id", "width_cm", "height_cm", "length_cm", "product_id", "width_cm", "height_cm", "length_cm"],
-iex> property_value: [1, 42, 40, 64, 2, 35, 20, 40],
-iex> another_value: [1, 43, 41, 65, 2, 36, 21, 42]
-iex> )
-iex> Explorer.DataFrame.pivot_wider(df, "property", ["property_value", "another_value"])
-#Explorer.DataFrame<
- Polars[2 x 9]
- product_id integer [1, 2]
- property_value_property_product_id integer [1, 2]
- property_value_property_width_cm integer [42, 35]
- property_value_property_height_cm integer [40, 20]
- property_value_property_length_cm integer [64, 40]
- another_value_property_product_id integer [1, 2]
- another_value_property_width_cm integer [43, 36]
- another_value_property_height_cm integer [41, 21]
- another_value_property_length_cm integer [65, 42]
->
+original column name, followed by the name of the variable.iex> df = Explorer.DataFrame.new(
+iex> product_id: [1, 1, 1, 1, 2, 2, 2, 2],
+iex> property: ["product_id", "width_cm", "height_cm", "length_cm", "product_id", "width_cm", "height_cm", "length_cm"],
+iex> property_value: [1, 42, 40, 64, 2, 35, 20, 40],
+iex> another_value: [1, 43, 41, 65, 2, 36, 21, 42]
+iex> )
+iex> Explorer.DataFrame.pivot_wider(df, "property", ["property_value", "another_value"])
+#Explorer.DataFrame<
+ Polars[2 x 9]
+ product_id integer [1, 2]
+ property_value_property_product_id integer [1, 2]
+ property_value_property_width_cm integer [42, 35]
+ property_value_property_height_cm integer [40, 20]
+ property_value_property_length_cm integer [64, 40]
+ another_value_property_product_id integer [1, 2]
+ another_value_property_width_cm integer [43, 36]
+ another_value_property_height_cm integer [41, 21]
+ another_value_property_length_cm integer [65, 42]
+>
Grouped examples
Now using the same idea, we can see that there is not much difference for grouped dataframes.
-The only detail is that groups that are not ID columns are discarded.
iex> df = Explorer.DataFrame.new(
-iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
-iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
-iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
-iex> )
-iex> grouped = Explorer.DataFrame.group_by(df, "team")
-iex> Explorer.DataFrame.pivot_wider(grouped, "weekday", "hour")
-#Explorer.DataFrame<
- Polars[3 x 6]
- Groups: ["team"]
- team string ["A", "B", "C"]
- Monday integer [10, nil, 15]
- Tuesday integer [14, 9, nil]
- Wednesday integer [nil, 16, 10]
- Thursday integer [10, nil, 14]
- Friday integer [16, 11, nil]
->
In the following example the group "weekday" is going to be removed, because the column is going
-to be pivoted in multiple columns:
iex> df = Explorer.DataFrame.new(
-iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
-iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
-iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
-iex> )
-iex> grouped = Explorer.DataFrame.group_by(df, "weekday")
-iex> Explorer.DataFrame.pivot_wider(grouped, "weekday", "hour")
-#Explorer.DataFrame<
- Polars[3 x 6]
- team string ["A", "B", "C"]
- Monday integer [10, nil, 15]
- Tuesday integer [14, 9, nil]
- Wednesday integer [nil, 16, 10]
- Thursday integer [10, nil, 14]
- Friday integer [16, 11, nil]
->
+The only detail is that groups that are not ID columns are discarded.iex> df = Explorer.DataFrame.new(
+iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
+iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
+iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
+iex> )
+iex> grouped = Explorer.DataFrame.group_by(df, "team")
+iex> Explorer.DataFrame.pivot_wider(grouped, "weekday", "hour")
+#Explorer.DataFrame<
+ Polars[3 x 6]
+ Groups: ["team"]
+ team string ["A", "B", "C"]
+ Monday integer [10, nil, 15]
+ Tuesday integer [14, 9, nil]
+ Wednesday integer [nil, 16, 10]
+ Thursday integer [10, nil, 14]
+ Friday integer [16, 11, nil]
+>
In the following example the group "weekday" is going to be removed, because the column is going
+to be pivoted in multiple columns:
iex> df = Explorer.DataFrame.new(
+iex> weekday: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
+iex> team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
+iex> hour: [10, 9, 10, 10, 11, 15, 14, 16, 14, 16]
+iex> )
+iex> grouped = Explorer.DataFrame.group_by(df, "weekday")
+iex> Explorer.DataFrame.pivot_wider(grouped, "weekday", "hour")
+#Explorer.DataFrame<
+ Polars[3 x 6]
+ team string ["A", "B", "C"]
+ Monday integer [10, nil, 15]
+ Tuesday integer [14, 9, nil]
+ Wednesday integer [nil, 16, 10]
+ Thursday integer [10, nil, 14]
+ Friday integer [16, 11, nil]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.pull(df, "total")
-#Explorer.Series<
- Polars[1094]
- integer [2308, 1254, 32500, 141, 7924, 41, 143, 51246, 1150, 684, 106589, 18408, 8366, 451, 7981, 16345, 403, 17192, 30222, 147, 1388, 166, 133, 5802, 1278, 114468, 47, 2237, 12030, 535, 58, 1367, 145806, 152, 152, 72, 141, 19703, 2393248, 20773, 44, 540, 19, 2064, 1900, 5501, 10465, 2102, 30428, 18122, ...]
->
-
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.pull(df, 2)
-#Explorer.Series<
- Polars[1094]
- integer [2308, 1254, 32500, 141, 7924, 41, 143, 51246, 1150, 684, 106589, 18408, 8366, 451, 7981, 16345, 403, 17192, 30222, 147, 1388, 166, 133, 5802, 1278, 114468, 47, 2237, 12030, 535, 58, 1367, 145806, 152, 152, 72, 141, 19703, 2393248, 20773, 44, 540, 19, 2064, 1900, 5501, 10465, 2102, 30428, 18122, ...]
->
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.pull(df, "total")
+#Explorer.Series<
+ Polars[1094]
+ integer [2308, 1254, 32500, 141, 7924, 41, 143, 51246, 1150, 684, 106589, 18408, 8366, 451, 7981, 16345, 403, 17192, 30222, 147, 1388, 166, 133, 5802, 1278, 114468, 47, 2237, 12030, 535, 58, 1367, 145806, 152, 152, 72, 141, 19703, 2393248, 20773, 44, 540, 19, 2064, 1900, 5501, 10465, 2102, 30428, 18122, ...]
+>
+
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.pull(df, 2)
+#Explorer.Series<
+ Polars[1094]
+ integer [2308, 1254, 32500, 141, 7924, 41, 143, 51246, 1150, 684, 106589, 18408, 8366, 451, 7981, 16345, 403, 17192, 30222, 147, 1388, 166, 133, 5802, 1278, 114468, 47, 2237, 12030, 535, 58, 1367, 145806, 152, 152, 72, 141, 19703, 2393248, 20773, 44, 540, 19, 2064, 1900, 5501, 10465, 2102, 30428, 18122, ...]
+>
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> Explorer.DataFrame.put(df, :b, Explorer.Series.transform(df[:a], fn n -> n * 2 end))
-#Explorer.DataFrame<
- Polars[3 x 2]
- a integer [1, 2, 3]
- b integer [2, 4, 6]
->
-
-iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> Explorer.DataFrame.put(df, :b, Explorer.Series.from_list([4, 5, 6]))
-#Explorer.DataFrame<
- Polars[3 x 2]
- a integer [1, 2, 3]
- b integer [4, 5, 6]
->
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> Explorer.DataFrame.put(df, :b, Explorer.Series.transform(df[:a], fn n -> n * 2 end))
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a integer [1, 2, 3]
+ b integer [2, 4, 6]
+>
+
+iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> Explorer.DataFrame.put(df, :b, Explorer.Series.from_list([4, 5, 6]))
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a integer [1, 2, 3]
+ b integer [4, 5, 6]
+>
If the dataframe is grouped, put/3
is going to ignore the groups.
-So the series must be of the same size of the entire dataframe.
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> grouped = Explorer.DataFrame.group_by(df, "a")
-iex> series = Explorer.Series.from_list([9, 8, 7])
-iex> Explorer.DataFrame.put(grouped, :b, series)
-#Explorer.DataFrame<
- Polars[3 x 2]
- Groups: ["a"]
- a integer [1, 2, 3]
- b integer [9, 8, 7]
->
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> grouped = Explorer.DataFrame.group_by(df, "a")
+iex> series = Explorer.Series.from_list([9, 8, 7])
+iex> Explorer.DataFrame.put(grouped, :b, series)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ Groups: ["a"]
+ a integer [1, 2, 3]
+ b integer [9, 8, 7]
+>
You can also put tensors into the dataframe:
iex> df = Explorer.DataFrame.new([])
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor([1, 2, 3]))
-#Explorer.DataFrame<
- Polars[3 x 1]
- a integer [1, 2, 3]
->
You can specify which dtype the tensor represents. +
You can also put tensors into the dataframe:
iex> df = Explorer.DataFrame.new([])
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor([1, 2, 3]))
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a integer [1, 2, 3]
+>
You can specify which dtype the tensor represents. For example, a tensor of s64 represents integers by default, but it may also represent timestamps -in microseconds from the Unix epoch:
iex> df = Explorer.DataFrame.new([])
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor([1, 2, 3]), dtype: :datetime)
-#Explorer.DataFrame<
- Polars[3 x 1]
- a datetime [1970-01-01 00:00:00.000001, 1970-01-01 00:00:00.000002, 1970-01-01 00:00:00.000003]
->
If there is already a column where we want to place the tensor, +in microseconds from the Unix epoch:
iex> df = Explorer.DataFrame.new([])
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor([1, 2, 3]), dtype: :datetime)
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a datetime [1970-01-01 00:00:00.000001, 1970-01-01 00:00:00.000002, 1970-01-01 00:00:00.000003]
+>
If there is already a column where we want to place the tensor, the column dtype will be automatically used, this means that updating dataframes in place while preserving their types is -straight-forward:
iex> df = Explorer.DataFrame.new(a: [~N[1970-01-01 00:00:00]])
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor(529550625987654))
-#Explorer.DataFrame<
- Polars[1 x 1]
- a datetime [1986-10-13 01:23:45.987654]
->
This is particularly useful for categorical columns:
iex> cat = Explorer.Series.from_list(["foo", "bar", "baz"], dtype: :category)
-iex> df = Explorer.DataFrame.new(a: cat)
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor([2, 1, 0]))
-#Explorer.DataFrame<
- Polars[3 x 1]
- a category ["baz", "bar", "foo"]
->
On the other hand, if you try to put a floating tensor on +straight-forward:
iex> df = Explorer.DataFrame.new(a: [~N[1970-01-01 00:00:00]])
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor(529550625987654))
+#Explorer.DataFrame<
+ Polars[1 x 1]
+ a datetime [1986-10-13 01:23:45.987654]
+>
This is particularly useful for categorical columns:
iex> cat = Explorer.Series.from_list(["foo", "bar", "baz"], dtype: :category)
+iex> df = Explorer.DataFrame.new(a: cat)
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor([2, 1, 0]))
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a category ["baz", "bar", "foo"]
+>
On the other hand, if you try to put a floating tensor on
an integer column, an error will be raised unless a dtype
-or dtype: :infer
is given:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor(1.0, type: :f64))
+or dtype: :infer
is given:iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor(1.0, type: :f64))
** (ArgumentError) dtype integer expects a tensor of type {:s, 64} but got type {:f, 64}
-iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor(1.0, type: :f64), dtype: :float)
-#Explorer.DataFrame<
- Polars[3 x 1]
- a float [1.0, 1.0, 1.0]
->
-
-iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
-iex> Explorer.DataFrame.put(df, :a, Nx.tensor(1.0, type: :f64), dtype: :infer)
-#Explorer.DataFrame<
- Polars[3 x 1]
- a float [1.0, 1.0, 1.0]
->
+
iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor(1.0, type: :f64), dtype: :float)
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a float [1.0, 1.0, 1.0]
+>
+
+iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
+iex> Explorer.DataFrame.put(df, :a, Nx.tensor(1.0, type: :f64), dtype: :infer)
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a float [1.0, 1.0, 1.0]
+>
Similar to tensors, we can also put lists in the dataframe:
iex> df = Explorer.DataFrame.new([])
-iex> Explorer.DataFrame.put(df, :a, [1, 2, 3])
-#Explorer.DataFrame<
- Polars[3 x 1]
- a integer [1, 2, 3]
->
The same considerations as above apply.
+Similar to tensors, we can also put lists in the dataframe:
iex> df = Explorer.DataFrame.new([])
+iex> Explorer.DataFrame.put(df, :a, [1, 2, 3])
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a integer [1, 2, 3]
+>
The same considerations as above apply.
Relocate a single column
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1], c: [nil, 5, 6])
-iex> Explorer.DataFrame.relocate(df, "a", after: "c")
-#Explorer.DataFrame<
- Polars[3 x 3]
- b integer [1, 3, 1]
- c integer [nil, 5, 6]
- a string ["a", "b", "a"]
->
Relocate (and reorder) multiple columns to the beginning
iex> df = Explorer.DataFrame.new(a: [1, 2], b: [5.1, 5.2], c: [4, 5], d: ["yes", "no"])
-iex> Explorer.DataFrame.relocate(df, ["d", 1], before: 0)
-#Explorer.DataFrame<
- Polars[2 x 4]
- d string ["yes", "no"]
- b float [5.1, 5.2]
- a integer [1, 2]
- c integer [4, 5]
->
Relocate before another column
iex> df = Explorer.DataFrame.new(a: [1, 2], b: [5.1, 5.2], c: [4, 5], d: ["yes", "no"])
-iex> Explorer.DataFrame.relocate(df, ["a", "c"], before: "b")
-#Explorer.DataFrame<
- Polars[2 x 4]
- a integer [1, 2]
- c integer [4, 5]
- b float [5.1, 5.2]
- d string ["yes", "no"]
->
+Relocate a single column
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1], c: [nil, 5, 6])
+iex> Explorer.DataFrame.relocate(df, "a", after: "c")
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ b integer [1, 3, 1]
+ c integer [nil, 5, 6]
+ a string ["a", "b", "a"]
+>
Relocate (and reorder) multiple columns to the beginning
iex> df = Explorer.DataFrame.new(a: [1, 2], b: [5.1, 5.2], c: [4, 5], d: ["yes", "no"])
+iex> Explorer.DataFrame.relocate(df, ["d", 1], before: 0)
+#Explorer.DataFrame<
+ Polars[2 x 4]
+ d string ["yes", "no"]
+ b float [5.1, 5.2]
+ a integer [1, 2]
+ c integer [4, 5]
+>
Relocate before another column
iex> df = Explorer.DataFrame.new(a: [1, 2], b: [5.1, 5.2], c: [4, 5], d: ["yes", "no"])
+iex> Explorer.DataFrame.relocate(df, ["a", "c"], before: "b")
+#Explorer.DataFrame<
+ Polars[2 x 4]
+ a integer [1, 2]
+ c integer [4, 5]
+ b float [5.1, 5.2]
+ d string ["yes", "no"]
+>
You can pass in a list of new names:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
-iex> Explorer.DataFrame.rename(df, ["c", "d"])
-#Explorer.DataFrame<
- Polars[3 x 2]
- c string ["a", "b", "a"]
- d integer [1, 3, 1]
->
Or you can rename individual columns using keyword args:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
-iex> Explorer.DataFrame.rename(df, a: "first")
-#Explorer.DataFrame<
- Polars[3 x 2]
- first string ["a", "b", "a"]
- b integer [1, 3, 1]
->
Or you can rename individual columns using a map:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
-iex> Explorer.DataFrame.rename(df, %{"a" => "first"})
-#Explorer.DataFrame<
- Polars[3 x 2]
- first string ["a", "b", "a"]
- b integer [1, 3, 1]
->
+You can pass in a list of new names:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
+iex> Explorer.DataFrame.rename(df, ["c", "d"])
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ c string ["a", "b", "a"]
+ d integer [1, 3, 1]
+>
Or you can rename individual columns using keyword args:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
+iex> Explorer.DataFrame.rename(df, a: "first")
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ first string ["a", "b", "a"]
+ b integer [1, 3, 1]
+>
Or you can rename individual columns using a map:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
+iex> Explorer.DataFrame.rename(df, %{"a" => "first"})
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ first string ["a", "b", "a"]
+ b integer [1, 3, 1]
+>
If no columns are specified, it will apply the function to all column names:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.rename_with(df, &String.upcase/1)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- YEAR integer [2010, 2010, 2010, 2010, 2010, ...]
- COUNTRY string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- TOTAL integer [2308, 1254, 32500, 141, 7924, ...]
- SOLID_FUEL integer [627, 117, 332, 0, 0, ...]
- LIQUID_FUEL integer [1601, 953, 12381, 141, 3649, ...]
- GAS_FUEL integer [74, 7, 14565, 0, 374, ...]
- CEMENT integer [5, 177, 2598, 0, 204, ...]
- GAS_FLARING integer [0, 0, 2623, 0, 3697, ...]
- PER_CAPITA float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- BUNKER_FUELS integer [9, 7, 663, 0, 321, ...]
->
A callback can be used to filter the column names that will be renamed, similarly to select/2
:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.rename_with(df, &String.ends_with?(&1, "_fuel"), &String.trim_trailing(&1, "_fuel"))
-#Explorer.DataFrame<
- Polars[1094 x 10]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid integer [627, 117, 332, 0, 0, ...]
- liquid integer [1601, 953, 12381, 141, 3649, ...]
- gas integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Or you can just pass in the list of column names you'd like to apply the function to:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.rename_with(df, ["total", "cement"], &String.upcase/1)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- TOTAL integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- CEMENT integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
If no columns are specified, it will apply the function to all column names:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.rename_with(df, &String.upcase/1)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ YEAR integer [2010, 2010, 2010, 2010, 2010, ...]
+ COUNTRY string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ TOTAL integer [2308, 1254, 32500, 141, 7924, ...]
+ SOLID_FUEL integer [627, 117, 332, 0, 0, ...]
+ LIQUID_FUEL integer [1601, 953, 12381, 141, 3649, ...]
+ GAS_FUEL integer [74, 7, 14565, 0, 374, ...]
+ CEMENT integer [5, 177, 2598, 0, 204, ...]
+ GAS_FLARING integer [0, 0, 2623, 0, 3697, ...]
+ PER_CAPITA float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ BUNKER_FUELS integer [9, 7, 663, 0, 321, ...]
+>
A callback can be used to filter the column names that will be renamed, similarly to select/2
:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.rename_with(df, &String.ends_with?(&1, "_fuel"), &String.trim_trailing(&1, "_fuel"))
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid integer [627, 117, 332, 0, 0, ...]
+ liquid integer [1601, 953, 12381, 141, 3649, ...]
+ gas integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Or you can just pass in the list of column names you'd like to apply the function to:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.rename_with(df, ["total", "cement"], &String.upcase/1)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ TOTAL integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ CEMENT integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Ranges, regexes, and functions are also accepted in column names, as in select/2
.
You can select a single column:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.select(df, "a")
-#Explorer.DataFrame<
- Polars[3 x 1]
- a string ["a", "b", "c"]
->
Or a list of names:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.select(df, ["a"])
-#Explorer.DataFrame<
- Polars[3 x 1]
- a string ["a", "b", "c"]
->
You can also use a range or a list of integers:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
-iex> Explorer.DataFrame.select(df, [0, 1])
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["a", "b", "c"]
- b integer [1, 2, 3]
->
-
-iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
-iex> Explorer.DataFrame.select(df, 0..1)
-#Explorer.DataFrame<
- Polars[3 x 2]
- a string ["a", "b", "c"]
- b integer [1, 2, 3]
->
Or you can use a callback function that takes the dataframe's names as its first argument:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.select(df, &String.starts_with?(&1, "b"))
-#Explorer.DataFrame<
- Polars[3 x 1]
- b integer [1, 2, 3]
->
Or, if you prefer, a regex:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.select(df, ~r/^b$/)
-#Explorer.DataFrame<
- Polars[3 x 1]
- b integer [1, 2, 3]
->
Or a callback function that takes names and types:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
-iex> Explorer.DataFrame.select(df, fn _name, type -> type == :integer end)
-#Explorer.DataFrame<
- Polars[3 x 1]
- b integer [1, 2, 3]
->
You can select a single column:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.select(df, "a")
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a string ["a", "b", "c"]
+>
Or a list of names:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.select(df, ["a"])
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ a string ["a", "b", "c"]
+>
You can also use a range or a list of integers:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
+iex> Explorer.DataFrame.select(df, [0, 1])
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["a", "b", "c"]
+ b integer [1, 2, 3]
+>
+
+iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
+iex> Explorer.DataFrame.select(df, 0..1)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ a string ["a", "b", "c"]
+ b integer [1, 2, 3]
+>
Or you can use a callback function that takes the dataframe's names as its first argument:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.select(df, &String.starts_with?(&1, "b"))
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ b integer [1, 2, 3]
+>
Or, if you prefer, a regex:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.select(df, ~r/^b$/)
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ b integer [1, 2, 3]
+>
Or a callback function that takes names and types:
iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
+iex> Explorer.DataFrame.select(df, fn _name, type -> type == :integer end)
+#Explorer.DataFrame<
+ Polars[3 x 1]
+ b integer [1, 2, 3]
+>
Columns that are also groups cannot be removed, -you need to ungroup before removing these columns.
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.select(grouped, ["sepal_width"])
-#Explorer.DataFrame<
- Polars[150 x 2]
- Groups: ["species"]
- sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+you need to ungroup before removing these columns.iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.select(grouped, ["sepal_width"])
+#Explorer.DataFrame<
+ Polars[150 x 2]
+ Groups: ["species"]
+ sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> grouped_df = Explorer.DataFrame.group_by(df, "year")
-iex> Explorer.DataFrame.summarise(grouped_df, total_max: max(total), total_min: min(total))
-#Explorer.DataFrame<
- Polars[5 x 3]
- year integer [2010, 2011, 2012, 2013, 2014]
- total_max integer [2393248, 2654360, 2734817, 2797384, 2806634]
- total_min integer [1, 2, 2, 2, 3]
->
Suppose you want to get the mean petal length of each Iris species. You could do something -like this:
iex> df = Explorer.Datasets.iris()
-iex> grouped_df = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.summarise(grouped_df, mean_petal_length: mean(petal_length))
-#Explorer.DataFrame<
- Polars[3 x 2]
- species string ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
- mean_petal_length float [1.464, 4.26, 5.552]
->
In case aggregations for all the dataframe is what you want, you can use ungrouped -dataframes:
iex> df = Explorer.Datasets.iris()
-iex> Explorer.DataFrame.summarise(df, mean_petal_length: mean(petal_length))
-#Explorer.DataFrame<
- Polars[1 x 1]
- mean_petal_length float [3.758666666666667]
->
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> grouped_df = Explorer.DataFrame.group_by(df, "year")
+iex> Explorer.DataFrame.summarise(grouped_df, total_max: max(total), total_min: min(total))
+#Explorer.DataFrame<
+ Polars[5 x 3]
+ year integer [2010, 2011, 2012, 2013, 2014]
+ total_max integer [2393248, 2654360, 2734817, 2797384, 2806634]
+ total_min integer [1, 2, 2, 2, 3]
+>
Suppose you want to get the mean petal length of each Iris species. You could do something +like this:
iex> df = Explorer.Datasets.iris()
+iex> grouped_df = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.summarise(grouped_df, mean_petal_length: mean(petal_length))
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ species string ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
+ mean_petal_length float [1.464, 4.26, 5.552]
+>
In case aggregations for all the dataframe is what you want, you can use ungrouped +dataframes:
iex> df = Explorer.Datasets.iris()
+iex> Explorer.DataFrame.summarise(df, mean_petal_length: mean(petal_length))
+#Explorer.DataFrame<
+ Polars[1 x 1]
+ mean_petal_length float [3.758666666666667]
+>
iex> alias Explorer.{DataFrame, Series}
-iex> df = Explorer.Datasets.fossil_fuels() |> DataFrame.group_by("year")
-iex> DataFrame.summarise_with(df, &[total_max: Series.max(&1["total"]), countries: Series.n_distinct(&1["country"])])
-#Explorer.DataFrame<
- Polars[5 x 3]
- year integer [2010, 2011, 2012, 2013, 2014]
- total_max integer [2393248, 2654360, 2734817, 2797384, 2806634]
- countries integer [217, 217, 220, 220, 220]
->
-
-iex> alias Explorer.{DataFrame, Series}
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> DataFrame.summarise_with(df, &[total_max: Series.max(&1["total"]), countries: Series.n_distinct(&1["country"])])
-#Explorer.DataFrame<
- Polars[1 x 2]
- total_max integer [2806634]
- countries integer [222]
->
+iex> alias Explorer.{DataFrame, Series}
+iex> df = Explorer.Datasets.fossil_fuels() |> DataFrame.group_by("year")
+iex> DataFrame.summarise_with(df, &[total_max: Series.max(&1["total"]), countries: Series.n_distinct(&1["country"])])
+#Explorer.DataFrame<
+ Polars[5 x 3]
+ year integer [2010, 2011, 2012, 2013, 2014]
+ total_max integer [2393248, 2654360, 2734817, 2797384, 2806634]
+ countries integer [217, 217, 220, 220, 220]
+>
+
+iex> alias Explorer.{DataFrame, Series}
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> DataFrame.summarise_with(df, &[total_max: Series.max(&1["total"]), countries: Series.n_distinct(&1["country"])])
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ total_max integer [2806634]
+ countries integer [222]
+>
Ungroups all by default:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> df = Explorer.DataFrame.group_by(df, ["country", "year"])
-iex> Explorer.DataFrame.ungroup(df)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Ungrouping a single column:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> df = Explorer.DataFrame.group_by(df, ["country", "year"])
-iex> Explorer.DataFrame.ungroup(df, "country")
-#Explorer.DataFrame<
- Polars[1094 x 10]
- Groups: ["year"]
- year integer [2010, 2010, 2010, 2010, 2010, ...]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
- total integer [2308, 1254, 32500, 141, 7924, ...]
- solid_fuel integer [627, 117, 332, 0, 0, ...]
- liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
- gas_fuel integer [74, 7, 14565, 0, 374, ...]
- cement integer [5, 177, 2598, 0, 204, ...]
- gas_flaring integer [0, 0, 2623, 0, 3697, ...]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
- bunker_fuels integer [9, 7, 663, 0, 321, ...]
->
Lists, ranges, regexes, and functions are also accepted in column names, as in select/2
.
Ungroups all by default:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> df = Explorer.DataFrame.group_by(df, ["country", "year"])
+iex> Explorer.DataFrame.ungroup(df)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Ungrouping a single column:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> df = Explorer.DataFrame.group_by(df, ["country", "year"])
+iex> Explorer.DataFrame.ungroup(df, "country")
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ Groups: ["year"]
+ year integer [2010, 2010, 2010, 2010, 2010, ...]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
+ total integer [2308, 1254, 32500, 141, 7924, ...]
+ solid_fuel integer [627, 117, 332, 0, 0, ...]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
+ gas_fuel integer [74, 7, 14565, 0, 374, ...]
+ cement integer [5, 177, 2598, 0, 204, ...]
+ gas_flaring integer [0, 0, 2623, 0, 3697, ...]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
+ bunker_fuels integer [9, 7, 663, 0, 321, ...]
+>
Lists, ranges, regexes, and functions are also accepted in column names, as in select/2
.
iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
-iex> df2 = Explorer.DataFrame.new(z: [4, 5, 6], a: ["d", "e", "f"])
-iex> Explorer.DataFrame.concat_columns([df1, df2])
-#Explorer.DataFrame<
- Polars[3 x 4]
- x integer [1, 2, 3]
- y string ["a", "b", "c"]
- z integer [4, 5, 6]
- a string ["d", "e", "f"]
->
Conflicting names are suffixed with the index of the dataframe in the array:
iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
-iex> df2 = Explorer.DataFrame.new(x: [4, 5, 6], a: ["d", "e", "f"])
-iex> Explorer.DataFrame.concat_columns([df1, df2])
-#Explorer.DataFrame<
- Polars[3 x 4]
- x integer [1, 2, 3]
- y string ["a", "b", "c"]
- x_1 integer [4, 5, 6]
- a string ["d", "e", "f"]
->
+iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
+iex> df2 = Explorer.DataFrame.new(z: [4, 5, 6], a: ["d", "e", "f"])
+iex> Explorer.DataFrame.concat_columns([df1, df2])
+#Explorer.DataFrame<
+ Polars[3 x 4]
+ x integer [1, 2, 3]
+ y string ["a", "b", "c"]
+ z integer [4, 5, 6]
+ a string ["d", "e", "f"]
+>
Conflicting names are suffixed with the index of the dataframe in the array:
iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
+iex> df2 = Explorer.DataFrame.new(x: [4, 5, 6], a: ["d", "e", "f"])
+iex> Explorer.DataFrame.concat_columns([df1, df2])
+#Explorer.DataFrame<
+ Polars[3 x 4]
+ x integer [1, 2, 3]
+ y string ["a", "b", "c"]
+ x_1 integer [4, 5, 6]
+ a string ["d", "e", "f"]
+>
iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
-iex> df2 = Explorer.DataFrame.new(x: [4, 5, 6], y: ["d", "e", "f"])
-iex> Explorer.DataFrame.concat_rows([df1, df2])
-#Explorer.DataFrame<
- Polars[6 x 2]
- x integer [1, 2, 3, 4, 5, ...]
- y string ["a", "b", "c", "d", "e", ...]
->
-
-iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
-iex> df2 = Explorer.DataFrame.new(x: [4.2, 5.3, 6.4], y: ["d", "e", "f"])
-iex> Explorer.DataFrame.concat_rows([df1, df2])
-#Explorer.DataFrame<
- Polars[6 x 2]
- x float [1.0, 2.0, 3.0, 4.2, 5.3, ...]
- y string ["a", "b", "c", "d", "e", ...]
->
+iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
+iex> df2 = Explorer.DataFrame.new(x: [4, 5, 6], y: ["d", "e", "f"])
+iex> Explorer.DataFrame.concat_rows([df1, df2])
+#Explorer.DataFrame<
+ Polars[6 x 2]
+ x integer [1, 2, 3, 4, 5, ...]
+ y string ["a", "b", "c", "d", "e", ...]
+>
+
+iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
+iex> df2 = Explorer.DataFrame.new(x: [4.2, 5.3, 6.4], y: ["d", "e", "f"])
+iex> Explorer.DataFrame.concat_rows([df1, df2])
+#Explorer.DataFrame<
+ Polars[6 x 2]
+ x float [1.0, 2.0, 3.0, 4.2, 5.3, ...]
+ y string ["a", "b", "c", "d", "e", ...]
+>
Inner join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
-iex> Explorer.DataFrame.join(left, right)
-#Explorer.DataFrame<
- Polars[3 x 3]
- a integer [1, 2, 2]
- b string ["a", "b", "b"]
- c string ["d", "e", "f"]
->
Left join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
-iex> Explorer.DataFrame.join(left, right, how: :left)
-#Explorer.DataFrame<
- Polars[4 x 3]
- a integer [1, 2, 2, 3]
- b string ["a", "b", "b", "c"]
- c string ["d", "e", "f", nil]
->
Right join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
-iex> Explorer.DataFrame.join(left, right, how: :right)
-#Explorer.DataFrame<
- Polars[3 x 3]
- a integer [1, 2, 4]
- c string ["d", "e", "f"]
- b string ["a", "b", nil]
->
Outer join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
-iex> Explorer.DataFrame.join(left, right, how: :outer)
-#Explorer.DataFrame<
- Polars[4 x 3]
- a integer [1, 2, 4, 3]
- b string ["a", "b", nil, "c"]
- c string ["d", "e", "f", nil]
->
Cross join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
-iex> Explorer.DataFrame.join(left, right, how: :cross)
-#Explorer.DataFrame<
- Polars[9 x 4]
- a integer [1, 1, 1, 2, 2, ...]
- b string ["a", "a", "a", "b", "b", ...]
- a_right integer [1, 2, 4, 1, 2, ...]
- c string ["d", "e", "f", "d", "e", ...]
->
Inner join with different names:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(d: [1, 2, 2], c: ["d", "e", "f"])
-iex> Explorer.DataFrame.join(left, right, on: [{"a", "d"}])
-#Explorer.DataFrame<
- Polars[3 x 3]
- a integer [1, 2, 2]
- b string ["a", "b", "b"]
- c string ["d", "e", "f"]
->
Inner join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
+iex> Explorer.DataFrame.join(left, right)
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ a integer [1, 2, 2]
+ b string ["a", "b", "b"]
+ c string ["d", "e", "f"]
+>
Left join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
+iex> Explorer.DataFrame.join(left, right, how: :left)
+#Explorer.DataFrame<
+ Polars[4 x 3]
+ a integer [1, 2, 2, 3]
+ b string ["a", "b", "b", "c"]
+ c string ["d", "e", "f", nil]
+>
Right join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
+iex> Explorer.DataFrame.join(left, right, how: :right)
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ a integer [1, 2, 4]
+ c string ["d", "e", "f"]
+ b string ["a", "b", nil]
+>
Outer join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
+iex> Explorer.DataFrame.join(left, right, how: :outer)
+#Explorer.DataFrame<
+ Polars[4 x 3]
+ a integer [1, 2, 4, 3]
+ b string ["a", "b", nil, "c"]
+ c string ["d", "e", "f", nil]
+>
Cross join:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
+iex> Explorer.DataFrame.join(left, right, how: :cross)
+#Explorer.DataFrame<
+ Polars[9 x 4]
+ a integer [1, 1, 1, 2, 2, ...]
+ b string ["a", "a", "a", "b", "b", ...]
+ a_right integer [1, 2, 4, 1, 2, ...]
+ c string ["d", "e", "f", "d", "e", ...]
+>
Inner join with different names:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(d: [1, 2, 2], c: ["d", "e", "f"])
+iex> Explorer.DataFrame.join(left, right, on: [{"a", "d"}])
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ a integer [1, 2, 2]
+ b string ["a", "b", "b"]
+ c string ["d", "e", "f"]
+>
When doing a join operation with grouped dataframes, the joined dataframe -may keep the groups from only one side.
An inner join operation will keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
-iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
-iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
-iex> Explorer.DataFrame.join(grouped_left, grouped_right)
-#Explorer.DataFrame<
- Polars[3 x 3]
- Groups: ["b"]
- a integer [1, 2, 2]
- b string ["a", "b", "b"]
- c string ["d", "e", "f"]
->
A left join operation will keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
-iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
-iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
-iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :left)
-#Explorer.DataFrame<
- Polars[4 x 3]
- Groups: ["b"]
- a integer [1, 2, 2, 3]
- b string ["a", "b", "b", "c"]
- c string ["d", "e", "f", nil]
->
A right join operation will keep the groups from the right-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
-iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
-iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
-iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :right)
-#Explorer.DataFrame<
- Polars[3 x 3]
- Groups: ["c"]
- a integer [1, 2, 4]
- c string ["d", "e", "f"]
- b string ["a", "b", nil]
->
An outer join operation is going to keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
-iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
-iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
-iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :outer)
-#Explorer.DataFrame<
- Polars[4 x 3]
- Groups: ["b"]
- a integer [1, 2, 4, 3]
- b string ["a", "b", nil, "c"]
- c string ["d", "e", "f", nil]
->
A cross join operation is going to keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
-iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
-iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
-iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :cross)
-#Explorer.DataFrame<
- Polars[9 x 4]
- Groups: ["b"]
- a integer [1, 1, 1, 2, 2, ...]
- b string ["a", "a", "a", "b", "b", ...]
- a_right integer [1, 2, 4, 1, 2, ...]
- c string ["d", "e", "f", "d", "e", ...]
->
+may keep the groups from only one side.An inner join operation will keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
+iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
+iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
+iex> Explorer.DataFrame.join(grouped_left, grouped_right)
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ Groups: ["b"]
+ a integer [1, 2, 2]
+ b string ["a", "b", "b"]
+ c string ["d", "e", "f"]
+>
A left join operation will keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
+iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
+iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
+iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :left)
+#Explorer.DataFrame<
+ Polars[4 x 3]
+ Groups: ["b"]
+ a integer [1, 2, 2, 3]
+ b string ["a", "b", "b", "c"]
+ c string ["d", "e", "f", nil]
+>
A right join operation will keep the groups from the right-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
+iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
+iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
+iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :right)
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ Groups: ["c"]
+ a integer [1, 2, 4]
+ c string ["d", "e", "f"]
+ b string ["a", "b", nil]
+>
An outer join operation is going to keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
+iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
+iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
+iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :outer)
+#Explorer.DataFrame<
+ Polars[4 x 3]
+ Groups: ["b"]
+ a integer [1, 2, 4, 3]
+ b string ["a", "b", nil, "c"]
+ c string ["d", "e", "f", nil]
+>
A cross join operation is going to keep the groups from the left-hand side dataframe:
iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
+iex> grouped_left = Explorer.DataFrame.group_by(left, "b")
+iex> grouped_right = Explorer.DataFrame.group_by(right, "c")
+iex> Explorer.DataFrame.join(grouped_left, grouped_right, how: :cross)
+#Explorer.DataFrame<
+ Polars[9 x 4]
+ Groups: ["b"]
+ a integer [1, 1, 1, 2, 2, ...]
+ b string ["a", "a", "a", "b", "b", ...]
+ a_right integer [1, 2, 4, 1, 2, ...]
+ c string ["d", "e", "f", "d", "e", ...]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.head(df)
-#Explorer.DataFrame<
- Polars[5 x 10]
- year integer [2010, 2010, 2010, 2010, 2010]
- country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA"]
- total integer [2308, 1254, 32500, 141, 7924]
- solid_fuel integer [627, 117, 332, 0, 0]
- liquid_fuel integer [1601, 953, 12381, 141, 3649]
- gas_fuel integer [74, 7, 14565, 0, 374]
- cement integer [5, 177, 2598, 0, 204]
- gas_flaring integer [0, 0, 2623, 0, 3697]
- per_capita float [0.08, 0.43, 0.9, 1.68, 0.37]
- bunker_fuels integer [9, 7, 663, 0, 321]
->
-
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.head(df, 2)
-#Explorer.DataFrame<
- Polars[2 x 10]
- year integer [2010, 2010]
- country string ["AFGHANISTAN", "ALBANIA"]
- total integer [2308, 1254]
- solid_fuel integer [627, 117]
- liquid_fuel integer [1601, 953]
- gas_fuel integer [74, 7]
- cement integer [5, 177]
- gas_flaring integer [0, 0]
- per_capita float [0.08, 0.43]
- bunker_fuels integer [9, 7]
->
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.head(df)
+#Explorer.DataFrame<
+ Polars[5 x 10]
+ year integer [2010, 2010, 2010, 2010, 2010]
+ country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA"]
+ total integer [2308, 1254, 32500, 141, 7924]
+ solid_fuel integer [627, 117, 332, 0, 0]
+ liquid_fuel integer [1601, 953, 12381, 141, 3649]
+ gas_fuel integer [74, 7, 14565, 0, 374]
+ cement integer [5, 177, 2598, 0, 204]
+ gas_flaring integer [0, 0, 2623, 0, 3697]
+ per_capita float [0.08, 0.43, 0.9, 1.68, 0.37]
+ bunker_fuels integer [9, 7, 663, 0, 321]
+>
+
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.head(df, 2)
+#Explorer.DataFrame<
+ Polars[2 x 10]
+ year integer [2010, 2010]
+ country string ["AFGHANISTAN", "ALBANIA"]
+ total integer [2308, 1254]
+ solid_fuel integer [627, 117]
+ liquid_fuel integer [1601, 953]
+ gas_fuel integer [74, 7]
+ cement integer [5, 177]
+ gas_flaring integer [0, 0]
+ per_capita float [0.08, 0.43]
+ bunker_fuels integer [9, 7]
+>
Using grouped dataframes makes head/2
return n rows from each group.
-Here is an example using the Iris dataset, and returning two rows from each group:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.head(grouped, 2)
-#Explorer.DataFrame<
- Polars[6 x 5]
- Groups: ["species"]
- sepal_length float [5.1, 4.9, 7.0, 6.4, 6.3, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.2, 3.3, ...]
- petal_length float [1.4, 1.4, 4.7, 4.5, 6.0, ...]
- petal_width float [0.2, 0.2, 1.4, 1.5, 2.5, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
->
+Here is an example using the Iris dataset, and returning two rows from each group:iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.head(grouped, 2)
+#Explorer.DataFrame<
+ Polars[6 x 5]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.9, 7.0, 6.4, 6.3, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.2, 3.3, ...]
+ petal_length float [1.4, 1.4, 4.7, 4.5, 6.0, ...]
+ petal_width float [0.2, 0.2, 1.4, 1.5, 2.5, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
+>
You can sample N rows:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.sample(df, 3, seed: 100)
-#Explorer.DataFrame<
- Polars[3 x 10]
- year integer [2011, 2012, 2011]
- country string ["SERBIA", "FALKLAND ISLANDS (MALVINAS)", "SWAZILAND"]
- total integer [13422, 15, 286]
- solid_fuel integer [9355, 3, 102]
- liquid_fuel integer [2537, 12, 184]
- gas_fuel integer [1188, 0, 0]
- cement integer [342, 0, 0]
- gas_flaring integer [0, 0, 0]
- per_capita float [1.49, 5.21, 0.24]
- bunker_fuels integer [39, 0, 1]
->
Or you can sample a proportion of rows:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.sample(df, 0.03, seed: 100)
-#Explorer.DataFrame<
- Polars[32 x 10]
- year integer [2011, 2012, 2012, 2013, 2010, ...]
- country string ["URUGUAY", "FRENCH POLYNESIA", "ICELAND", "PERU", "TUNISIA", ...]
- total integer [2117, 222, 491, 15586, 7543, ...]
- solid_fuel integer [1, 0, 96, 784, 15, ...]
- liquid_fuel integer [1943, 222, 395, 7097, 3138, ...]
- gas_fuel integer [40, 0, 0, 3238, 3176, ...]
- cement integer [132, 0, 0, 1432, 1098, ...]
- gas_flaring integer [0, 0, 0, 3036, 116, ...]
- per_capita float [0.63, 0.81, 1.52, 0.51, 0.71, ...]
- bunker_fuels integer [401, 45, 170, 617, 219, ...]
->
You can sample N rows:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.sample(df, 3, seed: 100)
+#Explorer.DataFrame<
+ Polars[3 x 10]
+ year integer [2011, 2012, 2011]
+ country string ["SERBIA", "FALKLAND ISLANDS (MALVINAS)", "SWAZILAND"]
+ total integer [13422, 15, 286]
+ solid_fuel integer [9355, 3, 102]
+ liquid_fuel integer [2537, 12, 184]
+ gas_fuel integer [1188, 0, 0]
+ cement integer [342, 0, 0]
+ gas_flaring integer [0, 0, 0]
+ per_capita float [1.49, 5.21, 0.24]
+ bunker_fuels integer [39, 0, 1]
+>
Or you can sample a proportion of rows:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.sample(df, 0.03, seed: 100)
+#Explorer.DataFrame<
+ Polars[32 x 10]
+ year integer [2011, 2012, 2012, 2013, 2010, ...]
+ country string ["URUGUAY", "FRENCH POLYNESIA", "ICELAND", "PERU", "TUNISIA", ...]
+ total integer [2117, 222, 491, 15586, 7543, ...]
+ solid_fuel integer [1, 0, 96, 784, 15, ...]
+ liquid_fuel integer [1943, 222, 395, 7097, 3138, ...]
+ gas_fuel integer [40, 0, 0, 3238, 3176, ...]
+ cement integer [132, 0, 0, 1432, 1098, ...]
+ gas_flaring integer [0, 0, 0, 3036, 116, ...]
+ per_capita float [0.63, 0.81, 1.52, 0.51, 0.71, ...]
+ bunker_fuels integer [401, 45, 170, 617, 219, ...]
+>
In the following example we have the Iris dataset grouped by species, and we want to take a sample of two plants from each group. Since we have three species, the -resultant dataframe is going to have six rows (2 * 3).
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.sample(grouped, 2, seed: 100)
-#Explorer.DataFrame<
- Polars[6 x 5]
- Groups: ["species"]
- sepal_length float [5.3, 5.1, 5.1, 5.6, 6.2, ...]
- sepal_width float [3.7, 3.8, 2.5, 2.7, 3.4, ...]
- petal_length float [1.5, 1.9, 3.0, 4.2, 5.4, ...]
- petal_width float [0.2, 0.4, 1.1, 1.3, 2.3, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
->
The behaviour is similar when you want to take a fraction of the rows from each group. The main -difference is that each group can have more or less rows, depending on its size.
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.sample(grouped, 0.1, seed: 100)
-#Explorer.DataFrame<
- Polars[15 x 5]
- Groups: ["species"]
- sepal_length float [5.3, 5.1, 4.7, 5.7, 5.1, ...]
- sepal_width float [3.7, 3.8, 3.2, 3.8, 3.5, ...]
- petal_length float [1.5, 1.9, 1.3, 1.7, 1.4, ...]
- petal_width float [0.2, 0.4, 0.2, 0.3, 0.3, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
+resultant dataframe is going to have six rows (2 * 3).iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.sample(grouped, 2, seed: 100)
+#Explorer.DataFrame<
+ Polars[6 x 5]
+ Groups: ["species"]
+ sepal_length float [5.3, 5.1, 5.1, 5.6, 6.2, ...]
+ sepal_width float [3.7, 3.8, 2.5, 2.7, 3.4, ...]
+ petal_length float [1.5, 1.9, 3.0, 4.2, 5.4, ...]
+ petal_width float [0.2, 0.4, 1.1, 1.3, 2.3, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
+>
The behaviour is similar when you want to take a fraction of the rows from each group. The main +difference is that each group can have more or less rows, depending on its size.
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.sample(grouped, 0.1, seed: 100)
+#Explorer.DataFrame<
+ Polars[15 x 5]
+ Groups: ["species"]
+ sepal_length float [5.3, 5.1, 4.7, 5.7, 5.1, ...]
+ sepal_width float [3.7, 3.8, 3.2, 3.8, 3.5, ...]
+ petal_length float [1.5, 1.9, 1.3, 1.7, 1.4, ...]
+ petal_width float [0.2, 0.4, 0.2, 0.3, 0.3, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.shuffle(df, seed: 100)
-#Explorer.DataFrame<
- Polars[1094 x 10]
- year integer [2014, 2014, 2014, 2012, 2010, ...]
- country string ["ISRAEL", "ARGENTINA", "NETHERLANDS", "YEMEN", "GRENADA", ...]
- total integer [17617, 55638, 45624, 5091, 71, ...]
- solid_fuel integer [6775, 1588, 9070, 129, 0, ...]
- liquid_fuel integer [6013, 25685, 18272, 4173, 71, ...]
- gas_fuel integer [3930, 26368, 18010, 414, 0, ...]
- cement integer [898, 1551, 272, 375, 0, ...]
- gas_flaring integer [0, 446, 0, 0, 0, ...]
- per_capita float [2.22, 1.29, 2.7, 0.2, 0.68, ...]
- bunker_fuels integer [1011, 2079, 14210, 111, 4, ...]
->
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.shuffle(df, seed: 100)
+#Explorer.DataFrame<
+ Polars[1094 x 10]
+ year integer [2014, 2014, 2014, 2012, 2010, ...]
+ country string ["ISRAEL", "ARGENTINA", "NETHERLANDS", "YEMEN", "GRENADA", ...]
+ total integer [17617, 55638, 45624, 5091, 71, ...]
+ solid_fuel integer [6775, 1588, 9070, 129, 0, ...]
+ liquid_fuel integer [6013, 25685, 18272, 4173, 71, ...]
+ gas_fuel integer [3930, 26368, 18010, 414, 0, ...]
+ cement integer [898, 1551, 272, 375, 0, ...]
+ gas_flaring integer [0, 446, 0, 0, 0, ...]
+ per_capita float [2.22, 1.29, 2.7, 0.2, 0.68, ...]
+ bunker_fuels integer [1011, 2079, 14210, 111, 4, ...]
+>
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> Explorer.DataFrame.slice(df, [0, 2])
-#Explorer.DataFrame<
- Polars[2 x 2]
- a integer [1, 3]
- b string ["a", "c"]
->
With a series
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> Explorer.DataFrame.slice(df, Explorer.Series.from_list([0, 2]))
-#Explorer.DataFrame<
- Polars[2 x 2]
- a integer [1, 3]
- b string ["a", "c"]
->
With a range:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> Explorer.DataFrame.slice(df, 1..2)
-#Explorer.DataFrame<
- Polars[2 x 2]
- a integer [2, 3]
- b string ["b", "c"]
->
With a range with negative first and last:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
-iex> Explorer.DataFrame.slice(df, -2..-1)
-#Explorer.DataFrame<
- Polars[2 x 2]
- a integer [2, 3]
- b string ["b", "c"]
->
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> Explorer.DataFrame.slice(df, [0, 2])
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ a integer [1, 3]
+ b string ["a", "c"]
+>
With a series
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> Explorer.DataFrame.slice(df, Explorer.Series.from_list([0, 2]))
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ a integer [1, 3]
+ b string ["a", "c"]
+>
With a range:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> Explorer.DataFrame.slice(df, 1..2)
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ a integer [2, 3]
+ b string ["b", "c"]
+>
With a range with negative first and last:
iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
+iex> Explorer.DataFrame.slice(df, -2..-1)
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ a integer [2, 3]
+ b string ["b", "c"]
+>
We are going to once again use the Iris dataset. In this example we want to take elements at indexes -0 and 2:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.slice(grouped, [0, 2])
-#Explorer.DataFrame<
- Polars[6 x 5]
- Groups: ["species"]
- sepal_length float [5.1, 4.7, 7.0, 6.9, 6.3, ...]
- sepal_width float [3.5, 3.2, 3.2, 3.1, 3.3, ...]
- petal_length float [1.4, 1.3, 4.7, 4.9, 6.0, ...]
- petal_width float [0.2, 0.2, 1.4, 1.5, 2.5, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
->
Now we want to take the first 3 rows of each group.
-This is going to work with the range 0..2
:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.slice(grouped, 0..2)
-#Explorer.DataFrame<
- Polars[9 x 5]
- Groups: ["species"]
- sepal_length float [5.1, 4.9, 4.7, 7.0, 6.4, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.2, 3.2, ...]
- petal_length float [1.4, 1.4, 1.3, 4.7, 4.5, ...]
- petal_width float [0.2, 0.2, 0.2, 1.4, 1.5, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", ...]
->
+0 and 2:iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.slice(grouped, [0, 2])
+#Explorer.DataFrame<
+ Polars[6 x 5]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.7, 7.0, 6.9, 6.3, ...]
+ sepal_width float [3.5, 3.2, 3.2, 3.1, 3.3, ...]
+ petal_length float [1.4, 1.3, 4.7, 4.9, 6.0, ...]
+ petal_width float [0.2, 0.2, 1.4, 1.5, 2.5, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
+>
Now we want to take the first 3 rows of each group.
+This is going to work with the range 0..2
:
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.slice(grouped, 0..2)
+#Explorer.DataFrame<
+ Polars[9 x 5]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.9, 4.7, 7.0, 6.4, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.2, 3.2, ...]
+ petal_length float [1.4, 1.4, 1.3, 4.7, 4.5, ...]
+ petal_width float [0.2, 0.2, 0.2, 1.4, 1.5, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", ...]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.slice(df, 1, 2)
-#Explorer.DataFrame<
- Polars[2 x 10]
- year integer [2010, 2010]
- country string ["ALBANIA", "ALGERIA"]
- total integer [1254, 32500]
- solid_fuel integer [117, 332]
- liquid_fuel integer [953, 12381]
- gas_fuel integer [7, 14565]
- cement integer [177, 2598]
- gas_flaring integer [0, 2623]
- per_capita float [0.43, 0.9]
- bunker_fuels integer [7, 663]
->
Negative offsets count from the end of the series:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.slice(df, -10, 2)
-#Explorer.DataFrame<
- Polars[2 x 10]
- year integer [2014, 2014]
- country string ["UNITED STATES OF AMERICA", "URUGUAY"]
- total integer [1432855, 1840]
- solid_fuel integer [450047, 2]
- liquid_fuel integer [576531, 1700]
- gas_fuel integer [390719, 25]
- cement integer [11314, 112]
- gas_flaring integer [4244, 0]
- per_capita float [4.43, 0.54]
- bunker_fuels integer [30722, 251]
->
If the length would run past the end of the dataframe, the result may be shorter than the length:
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.slice(df, -10, 20)
-#Explorer.DataFrame<
- Polars[10 x 10]
- year integer [2014, 2014, 2014, 2014, 2014, ...]
- country string ["UNITED STATES OF AMERICA", "URUGUAY", "UZBEKISTAN", "VANUATU", "VENEZUELA", ...]
- total integer [1432855, 1840, 28692, 42, 50510, ...]
- solid_fuel integer [450047, 2, 1677, 0, 204, ...]
- liquid_fuel integer [576531, 1700, 2086, 42, 28445, ...]
- gas_fuel integer [390719, 25, 23929, 0, 12731, ...]
- cement integer [11314, 112, 1000, 0, 1088, ...]
- gas_flaring integer [4244, 0, 0, 0, 8042, ...]
- per_capita float [4.43, 0.54, 0.97, 0.16, 1.65, ...]
- bunker_fuels integer [30722, 251, 0, 10, 1256, ...]
->
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.slice(df, 1, 2)
+#Explorer.DataFrame<
+ Polars[2 x 10]
+ year integer [2010, 2010]
+ country string ["ALBANIA", "ALGERIA"]
+ total integer [1254, 32500]
+ solid_fuel integer [117, 332]
+ liquid_fuel integer [953, 12381]
+ gas_fuel integer [7, 14565]
+ cement integer [177, 2598]
+ gas_flaring integer [0, 2623]
+ per_capita float [0.43, 0.9]
+ bunker_fuels integer [7, 663]
+>
Negative offsets count from the end of the series:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.slice(df, -10, 2)
+#Explorer.DataFrame<
+ Polars[2 x 10]
+ year integer [2014, 2014]
+ country string ["UNITED STATES OF AMERICA", "URUGUAY"]
+ total integer [1432855, 1840]
+ solid_fuel integer [450047, 2]
+ liquid_fuel integer [576531, 1700]
+ gas_fuel integer [390719, 25]
+ cement integer [11314, 112]
+ gas_flaring integer [4244, 0]
+ per_capita float [4.43, 0.54]
+ bunker_fuels integer [30722, 251]
+>
If the length would run past the end of the dataframe, the result may be shorter than the length:
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.slice(df, -10, 20)
+#Explorer.DataFrame<
+ Polars[10 x 10]
+ year integer [2014, 2014, 2014, 2014, 2014, ...]
+ country string ["UNITED STATES OF AMERICA", "URUGUAY", "UZBEKISTAN", "VANUATU", "VENEZUELA", ...]
+ total integer [1432855, 1840, 28692, 42, 50510, ...]
+ solid_fuel integer [450047, 2, 1677, 0, 204, ...]
+ liquid_fuel integer [576531, 1700, 2086, 42, 28445, ...]
+ gas_fuel integer [390719, 25, 23929, 0, 12731, ...]
+ cement integer [11314, 112, 1000, 0, 1088, ...]
+ gas_flaring integer [4244, 0, 0, 0, 8042, ...]
+ per_capita float [4.43, 0.54, 0.97, 0.16, 1.65, ...]
+ bunker_fuels integer [30722, 251, 0, 10, 1256, ...]
+>
We want to take the first 3 rows of each group. We need the offset 0 and the length 3:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.slice(grouped, 0, 3)
-#Explorer.DataFrame<
- Polars[9 x 5]
- Groups: ["species"]
- sepal_length float [5.1, 4.9, 4.7, 7.0, 6.4, ...]
- sepal_width float [3.5, 3.0, 3.2, 3.2, 3.2, ...]
- petal_length float [1.4, 1.4, 1.3, 4.7, 4.5, ...]
- petal_width float [0.2, 0.2, 0.2, 1.4, 1.5, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", ...]
->
We can also pass a negative offset:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.slice(grouped, -6, 3)
-#Explorer.DataFrame<
- Polars[9 x 5]
- Groups: ["species"]
- sepal_length float [5.1, 4.8, 5.1, 5.6, 5.7, ...]
- sepal_width float [3.8, 3.0, 3.8, 2.7, 3.0, ...]
- petal_length float [1.9, 1.4, 1.6, 4.2, 4.2, ...]
- petal_width float [0.4, 0.3, 0.2, 1.3, 1.2, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", ...]
->
+We want to take the first 3 rows of each group. We need the offset 0 and the length 3:
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.slice(grouped, 0, 3)
+#Explorer.DataFrame<
+ Polars[9 x 5]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.9, 4.7, 7.0, 6.4, ...]
+ sepal_width float [3.5, 3.0, 3.2, 3.2, 3.2, ...]
+ petal_length float [1.4, 1.4, 1.3, 4.7, 4.5, ...]
+ petal_width float [0.2, 0.2, 0.2, 1.4, 1.5, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", ...]
+>
We can also pass a negative offset:
iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.slice(grouped, -6, 3)
+#Explorer.DataFrame<
+ Polars[9 x 5]
+ Groups: ["species"]
+ sepal_length float [5.1, 4.8, 5.1, 5.6, 5.7, ...]
+ sepal_width float [3.8, 3.0, 3.8, 2.7, 3.0, ...]
+ petal_length float [1.9, 1.4, 1.6, 4.2, 4.2, ...]
+ petal_width float [0.4, 0.3, 0.2, 1.3, 1.2, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", ...]
+>
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.tail(df)
-#Explorer.DataFrame<
- Polars[5 x 10]
- year integer [2014, 2014, 2014, 2014, 2014]
- country string ["VIET NAM", "WALLIS AND FUTUNA ISLANDS", "YEMEN", "ZAMBIA", "ZIMBABWE"]
- total integer [45517, 6, 6190, 1228, 3278]
- solid_fuel integer [19246, 0, 137, 132, 2097]
- liquid_fuel integer [12694, 6, 5090, 797, 1005]
- gas_fuel integer [5349, 0, 581, 0, 0]
- cement integer [8229, 0, 381, 299, 177]
- gas_flaring integer [0, 0, 0, 0, 0]
- per_capita float [0.49, 0.44, 0.24, 0.08, 0.22]
- bunker_fuels integer [761, 1, 153, 33, 9]
->
-
-iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.tail(df, 2)
-#Explorer.DataFrame<
- Polars[2 x 10]
- year integer [2014, 2014]
- country string ["ZAMBIA", "ZIMBABWE"]
- total integer [1228, 3278]
- solid_fuel integer [132, 2097]
- liquid_fuel integer [797, 1005]
- gas_fuel integer [0, 0]
- cement integer [299, 177]
- gas_flaring integer [0, 0]
- per_capita float [0.08, 0.22]
- bunker_fuels integer [33, 9]
->
iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.tail(df)
+#Explorer.DataFrame<
+ Polars[5 x 10]
+ year integer [2014, 2014, 2014, 2014, 2014]
+ country string ["VIET NAM", "WALLIS AND FUTUNA ISLANDS", "YEMEN", "ZAMBIA", "ZIMBABWE"]
+ total integer [45517, 6, 6190, 1228, 3278]
+ solid_fuel integer [19246, 0, 137, 132, 2097]
+ liquid_fuel integer [12694, 6, 5090, 797, 1005]
+ gas_fuel integer [5349, 0, 581, 0, 0]
+ cement integer [8229, 0, 381, 299, 177]
+ gas_flaring integer [0, 0, 0, 0, 0]
+ per_capita float [0.49, 0.44, 0.24, 0.08, 0.22]
+ bunker_fuels integer [761, 1, 153, 33, 9]
+>
+
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.tail(df, 2)
+#Explorer.DataFrame<
+ Polars[2 x 10]
+ year integer [2014, 2014]
+ country string ["ZAMBIA", "ZIMBABWE"]
+ total integer [1228, 3278]
+ solid_fuel integer [132, 2097]
+ liquid_fuel integer [797, 1005]
+ gas_fuel integer [0, 0]
+ cement integer [299, 177]
+ gas_flaring integer [0, 0]
+ per_capita float [0.08, 0.22]
+ bunker_fuels integer [33, 9]
+>
Using grouped dataframes makes tail/2
return n rows from each group.
-Here is an example using the Iris dataset, and returning two rows from each group:
iex> df = Explorer.Datasets.iris()
-iex> grouped = Explorer.DataFrame.group_by(df, "species")
-iex> Explorer.DataFrame.tail(grouped, 2)
-#Explorer.DataFrame<
- Polars[6 x 5]
- Groups: ["species"]
- sepal_length float [5.3, 5.0, 5.1, 5.7, 6.2, ...]
- sepal_width float [3.7, 3.3, 2.5, 2.8, 3.4, ...]
- petal_length float [1.5, 1.4, 3.0, 4.1, 5.4, ...]
- petal_width float [0.2, 0.2, 1.1, 1.3, 2.3, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
->
+Here is an example using the Iris dataset, and returning two rows from each group:iex> df = Explorer.Datasets.iris()
+iex> grouped = Explorer.DataFrame.group_by(df, "species")
+iex> Explorer.DataFrame.tail(grouped, 2)
+#Explorer.DataFrame<
+ Polars[6 x 5]
+ Groups: ["species"]
+ sepal_length float [5.3, 5.0, 5.1, 5.7, 6.2, ...]
+ sepal_width float [3.7, 3.3, 2.5, 2.8, 3.4, ...]
+ petal_length float [1.5, 1.4, 3.0, 4.1, 5.4, ...]
+ petal_width float [0.2, 0.2, 1.1, 1.3, 2.3, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-versicolor", "Iris-versicolor", "Iris-virginica", ...]
+>
iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, 2])
-iex> Explorer.DataFrame.dtypes(df)
-%{"floats" => :float, "ints" => :integer}
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, 2])
+iex> Explorer.DataFrame.dtypes(df)
+%{"floats" => :float, "ints" => :integer}
iex> df = Explorer.Datasets.fossil_fuels()
-iex> df = Explorer.DataFrame.group_by(df, "country")
-iex> Explorer.DataFrame.groups(df)
-["country"]
-
-iex> df = Explorer.Datasets.iris()
-iex> Explorer.DataFrame.groups(df)
-[]
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> df = Explorer.DataFrame.group_by(df, "country")
+iex> Explorer.DataFrame.groups(df)
+["country"]
+
+iex> df = Explorer.Datasets.iris()
+iex> Explorer.DataFrame.groups(df)
+[]
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.n_columns(df)
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.n_columns(df)
10
iex> df = Explorer.Datasets.fossil_fuels()
-iex> Explorer.DataFrame.n_rows(df)
+iex> df = Explorer.Datasets.fossil_fuels()
+iex> Explorer.DataFrame.n_rows(df)
1094
@@ -5015,9 +5015,9 @@ names(df)
Examples
-iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, 2])
-iex> Explorer.DataFrame.names(df)
-["floats", "ints"]
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, 2])
+iex> Explorer.DataFrame.names(df)
+["floats", "ints"]
@@ -5052,9 +5052,9 @@ shape(df)
Examples
-iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0, 3.0], ints: [1, 2, 3])
-iex> Explorer.DataFrame.shape(df)
-{3, 2}
+iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0, 3.0], ints: [1, 2, 3])
+iex> Explorer.DataFrame.shape(df)
+{3, 2}
@@ -5147,9 +5147,9 @@ dump_csv(df, opts \\ [])
Examples
-iex> df = Explorer.Datasets.fossil_fuels() |> Explorer.DataFrame.head(2)
-iex> Explorer.DataFrame.dump_csv(df)
-{:ok, "year,country,total,solid_fuel,liquid_fuel,gas_fuel,cement,gas_flaring,per_capita,bunker_fuels\n2010,AFGHANISTAN,2308,627,1601,74,5,0,0.08,9\n2010,ALBANIA,1254,117,953,7,177,0,0.43,7\n"}
+iex> df = Explorer.Datasets.fossil_fuels() |> Explorer.DataFrame.head(2)
+iex> Explorer.DataFrame.dump_csv(df)
+{:ok, "year,country,total,solid_fuel,liquid_fuel,gas_fuel,cement,gas_flaring,per_capita,bunker_fuels\n2010,AFGHANISTAN,2308,627,1601,74,5,0,0.08,9\n2010,ALBANIA,1254,117,953,7,177,0,0.43,7\n"}
@@ -5348,9 +5348,9 @@ dump_ndjson(df)
Examples
-iex> df = Explorer.DataFrame.new(col_a: [1, 2], col_b: [5.1, 5.2])
-iex> Explorer.DataFrame.dump_ndjson(df)
-{:ok, ~s({"col_a":1,"col_b":5.1}\n{"col_a":2,"col_b":5.2}\n)}
+iex> df = Explorer.DataFrame.new(col_a: [1, 2], col_b: [5.1, 5.2])
+iex> Explorer.DataFrame.dump_ndjson(df)
+{:ok, ~s({"col_a":1,"col_b":5.1}\n{"col_a":2,"col_b":5.2}\n)}
@@ -5841,24 +5841,24 @@ from_query(conn, query, params, opts \\ [])
In order to read data from a database, you must list :adbc
as a dependency,
download the relevant driver, and start both database and connection processes
-in your supervision tree.
First, add :adbc
as a dependency in your mix.exs
:
{:adbc, "~> 0.1"}
Now, in your config/config.exs, configure the drivers you are going to use
-(see Adbc
module docs for more information on supported drivers):
config :adbc, :drivers, [:sqlite]
If you are using a notebook or scripting, you can also use Adbc.download_driver!/1
+in your supervision tree.
First, add :adbc
as a dependency in your mix.exs
:
{:adbc, "~> 0.1"}
Now, in your config/config.exs, configure the drivers you are going to use
+(see Adbc
module docs for more information on supported drivers):
config :adbc, :drivers, [:sqlite]
If you are using a notebook or scripting, you can also use Adbc.download_driver!/1
to dynamically download one.
Then start the database and the relevant connection processes in your
-supervision tree:
children = [
- {Adbc.Database,
+supervision tree:children = [
+ {Adbc.Database,
driver: :sqlite,
- process_options: [name: MyApp.DB]},
- {Adbc.Connection,
+ process_options: [name: MyApp.DB]},
+ {Adbc.Connection,
database: MyApp.DB,
- process_options: [name: MyApp.Conn]}
-]
+ process_options: [name: MyApp.Conn]}
+]
-Supervisor.start_link(children, strategy: :one_for_one)
In a notebook, the above would look like this:
db = Kino.start_child!({Adbc.Database, driver: :sqlite})
-conn = Kino.start_child!({Adbc.Connection, database: db})
And now you can make queries with:
# For named connections
-{:ok, _} = Explorer.DataFrame.from_query(MyApp.Conn, "SELECT 123")
+Supervisor.start_link(children, strategy: :one_for_one)
In a notebook, the above would look like this:
db = Kino.start_child!({Adbc.Database, driver: :sqlite})
+conn = Kino.start_child!({Adbc.Connection, database: db})
And now you can make queries with:
# For named connections
+{:ok, _} = Explorer.DataFrame.from_query(MyApp.Conn, "SELECT 123")
# When using the conn PID directly
-{:ok, _} = Explorer.DataFrame.from_query(conn, "SELECT 123")
+
{:ok, _} = Explorer.DataFrame.from_query(conn, "SELECT 123")
Options
@@ -6178,12 +6178,12 @@ load_ndjson!(contents, opts \\ [])
iex> contents = ~s({"col_a":1,"col_b":5.1}\n{"col_a":2,"col_b":5.2}\n)
-iex> Explorer.DataFrame.load_ndjson!(contents)
-#Explorer.DataFrame<
- Polars[2 x 2]
- col_a integer [1, 2]
- col_b float [5.1, 5.2]
->
+ iex> Explorer.DataFrame.load_ndjson!(contents)
+#Explorer.DataFrame<
+ Polars[2 x 2]
+ col_a integer [1, 2]
+ col_b float [5.1, 5.2]
+>
priv
directory and load them yourself.
-For example:Explorer.DataFrame.from_csv!(Application.app_dir(:my_app, "priv/iris.csv"))
+For example:Explorer.DataFrame.from_csv!(Application.app_dir(:my_app, "priv/iris.csv"))
Fisher,R. A.. (1988). Iris. UCI Machine Learning Repository. https://doi.org/10.24432/C56C76.
+Fisher,R. A.. (1988). Iris. UCI Machine Learning Repository. https://doi.org/10.24432/C56C76.
Aeberhard,Stefan and Forina,M.. (1991). Wine. UCI Machine Learning Repository. https://doi.org/10.24432/C5PC7J.
+Aeberhard,Stefan and Forina,M.. (1991). Wine. UCI Machine Learning Repository. https://doi.org/10.24432/C5PC7J.
Explorer.DataFrame
to DF
as shown below:alias Explorer.DataFrame, as: DF
Queries convert regular Elixir code which compile to efficient
dataframes operations. Inside a query, only the limited set of
Series operations are available and identifiers, such as strs
-and nums
, represent dataframe column names:
iex> df = DF.new(strs: ["a", "b", "c"], nums: [1, 2, 3])
-iex> DF.filter(df, nums > 2)
-#Explorer.DataFrame<
- Polars[1 x 2]
- strs string ["c"]
- nums integer [3]
->
If a column has unusual format, you can either rename it before-hand,
-or use col/1
inside queries:
iex> df = DF.new("unusual nums": [1, 2, 3])
-iex> DF.filter(df, col("unusual nums") > 2)
-#Explorer.DataFrame<
- Polars[1 x 1]
- unusual nums integer [3]
->
All operations from Explorer.Series
are imported inside queries.
+and nums
, represent dataframe column names:
iex> df = DF.new(strs: ["a", "b", "c"], nums: [1, 2, 3])
+iex> DF.filter(df, nums > 2)
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ strs string ["c"]
+ nums integer [3]
+>
If a column has unusual format, you can either rename it before-hand,
+or use col/1
inside queries:
iex> df = DF.new("unusual nums": [1, 2, 3])
+iex> DF.filter(df, col("unusual nums") > 2)
+#Explorer.DataFrame<
+ Polars[1 x 1]
+ unusual nums integer [3]
+>
All operations from Explorer.Series
are imported inside queries.
This module also provides operators to use in queries, which are
also imported into queries.
If you want to access variables defined outside of the query
or get access to all Elixir constructs, you must use ^
:
iex> min = 2
-iex> df = DF.new(strs: ["a", "b", "c"], nums: [1, 2, 3])
-iex> DF.filter(df, nums > ^min)
-#Explorer.DataFrame<
- Polars[1 x 2]
- strs string ["c"]
- nums integer [3]
->
+iex> df = DF.new(strs: ["a", "b", "c"], nums: [1, 2, 3])
+iex> DF.filter(df, nums > ^min)
+#Explorer.DataFrame<
+ Polars[1 x 2]
+ strs string ["c"]
+ nums integer [3]
+>
iex> min = 2
-iex> df = DF.new(strs: ["a", "b", "c"], nums: [1, 2, 3])
-iex> DF.filter(df, nums < ^if(min > 0, do: 10, else: -10))
-#Explorer.DataFrame<
- Polars[3 x 2]
- strs string ["a", "b", "c"]
- nums integer [1, 2, 3]
->
^
can be used with col
to access columns dynamically:
iex> df = DF.new("unusual nums": [1, 2, 3])
+iex> df = DF.new(strs: ["a", "b", "c"], nums: [1, 2, 3])
+iex> DF.filter(df, nums < ^if(min > 0, do: 10, else: -10))
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ strs string ["a", "b", "c"]
+ nums integer [1, 2, 3]
+>
^
can be used with col
to access columns dynamically:
iex> df = DF.new("unusual nums": [1, 2, 3])
iex> name = "unusual nums"
-iex> DF.filter(df, col(^name) > 2)
-#Explorer.DataFrame<
- Polars[1 x 1]
- unusual nums integer [3]
->
Explorer.Query
leverages the power behind Elixir for-comprehensions
to provide a powerful syntax for traversing several columns in a dataframe
at once. For example, imagine you want to standardization the data on the
-iris dataset, you could write this:
iex> iris = Explorer.Datasets.iris()
-iex> DF.mutate(iris,
-...> sepal_width: (sepal_width - mean(sepal_width)) / variance(sepal_width),
-...> sepal_length: (sepal_length - mean(sepal_length)) / variance(sepal_length),
-...> petal_length: (petal_length - mean(petal_length)) / variance(petal_length),
-...> petal_width: (petal_width - mean(petal_width)) / variance(petal_width)
-...> )
-#Explorer.DataFrame<
- Polars[150 x 5]
- sepal_length float [-1.0840606189132314, -1.3757361217598396, -1.6674116246064494, -1.8132493760297548, -1.2298983703365356, ...]
- sepal_width float [2.372289612531505, -0.28722789030650403, 0.7765791108287006, 0.24467561026109824, 2.904193113099107, ...]
- petal_length float [-0.7576391687443842, -0.7576391687443842, -0.7897606710936372, -0.725517666395131, -0.7576391687443842, ...]
- petal_width float [-1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
While the code above does its job, it is quite repetitive. With across and for-comprehensions, -we could instead write:
iex> iris = Explorer.Datasets.iris()
-iex> DF.mutate(iris,
-...> for col <- across(["sepal_width", "sepal_length", "petal_length", "petal_width"]) do
-...> {col.name, (col - mean(col)) / variance(col)}
-...> end
-...> )
-#Explorer.DataFrame<
- Polars[150 x 5]
- sepal_length float [-1.0840606189132314, -1.3757361217598396, -1.6674116246064494, -1.8132493760297548, -1.2298983703365356, ...]
- sepal_width float [2.372289612531505, -0.28722789030650403, 0.7765791108287006, 0.24467561026109824, 2.904193113099107, ...]
- petal_length float [-0.7576391687443842, -0.7576391687443842, -0.7897606710936372, -0.725517666395131, -0.7576391687443842, ...]
- petal_width float [-1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
Which achieves the same result in a more concise and maintainable way. +iris dataset, you could write this:
iex> iris = Explorer.Datasets.iris()
+iex> DF.mutate(iris,
+...> sepal_width: (sepal_width - mean(sepal_width)) / variance(sepal_width),
+...> sepal_length: (sepal_length - mean(sepal_length)) / variance(sepal_length),
+...> petal_length: (petal_length - mean(petal_length)) / variance(petal_length),
+...> petal_width: (petal_width - mean(petal_width)) / variance(petal_width)
+...> )
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ sepal_length float [-1.0840606189132314, -1.3757361217598396, -1.6674116246064494, -1.8132493760297548, -1.2298983703365356, ...]
+ sepal_width float [2.372289612531505, -0.28722789030650403, 0.7765791108287006, 0.24467561026109824, 2.904193113099107, ...]
+ petal_length float [-0.7576391687443842, -0.7576391687443842, -0.7897606710936372, -0.725517666395131, -0.7576391687443842, ...]
+ petal_width float [-1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
While the code above does its job, it is quite repetitive. With across and for-comprehensions, +we could instead write:
iex> iris = Explorer.Datasets.iris()
+iex> DF.mutate(iris,
+...> for col <- across(["sepal_width", "sepal_length", "petal_length", "petal_width"]) do
+...> {col.name, (col - mean(col)) / variance(col)}
+...> end
+...> )
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ sepal_length float [-1.0840606189132314, -1.3757361217598396, -1.6674116246064494, -1.8132493760297548, -1.2298983703365356, ...]
+ sepal_width float [2.372289612531505, -0.28722789030650403, 0.7765791108287006, 0.24467561026109824, 2.904193113099107, ...]
+ petal_length float [-0.7576391687443842, -0.7576391687443842, -0.7897606710936372, -0.725517666395131, -0.7576391687443842, ...]
+ petal_width float [-1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
Which achieves the same result in a more concise and maintainable way.
across/1
may receive any of the following input as arguments:
a list of columns indexes or names as atoms and strings
a range
a regex that keeps only the names matching the regex
For example, since we know the width and length columns are the first four, -we could also have written (remember ranges in Elixir are inclusive):
DF.mutate(iris,
- for col <- across(0..3) do
- {col.name, (col - mean(col)) / variance(col)}
- end
-)
Or using a regex:
DF.mutate(iris,
- for col <- across(~r/(sepal|petal)_(length|width)/) do
- {col.name, (col - mean(col)) / variance(col)}
- end
-)
For those new to Elixir, for-comprehensions have the following format:
for PATTERN <- GENERATOR, FILTER do
+we could also have written (remember ranges in Elixir are inclusive):DF.mutate(iris,
+ for col <- across(0..3) do
+ {col.name, (col - mean(col)) / variance(col)}
+ end
+)
Or using a regex:
DF.mutate(iris,
+ for col <- across(~r/(sepal|petal)_(length|width)/) do
+ {col.name, (col - mean(col)) / variance(col)}
+ end
+)
For those new to Elixir, for-comprehensions have the following format:
for PATTERN <- GENERATOR, FILTER do
EXPR
-end
A comprehension filter is a mechanism that allows us to keep only columns
+
end
A comprehension filter is a mechanism that allows us to keep only columns
based on additional properties, such as its dtype
. A for-comprehension can
have multiple generators and filters. For instance, if you want to apply
standardization to all float columns, we can use across/0
to access all
-columns and then use a filter to keep only the float ones:
iex> iris = Explorer.Datasets.iris()
-iex> DF.mutate(iris,
-...> for col <- across(), col.dtype == :float do
-...> {col.name, (col - mean(col)) / variance(col)}
-...> end
-...> )
-#Explorer.DataFrame<
- Polars[150 x 5]
- sepal_length float [-1.0840606189132314, -1.3757361217598396, -1.6674116246064494, -1.8132493760297548, -1.2298983703365356, ...]
- sepal_width float [2.372289612531505, -0.28722789030650403, 0.7765791108287006, 0.24467561026109824, 2.904193113099107, ...]
- petal_length float [-0.7576391687443842, -0.7576391687443842, -0.7897606710936372, -0.725517666395131, -0.7576391687443842, ...]
- petal_width float [-1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, ...]
- species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
->
For-comprehensions works with all dataframe verbs. As we have seen +columns and then use a filter to keep only the float ones:
iex> iris = Explorer.Datasets.iris()
+iex> DF.mutate(iris,
+...> for col <- across(), col.dtype == :float do
+...> {col.name, (col - mean(col)) / variance(col)}
+...> end
+...> )
+#Explorer.DataFrame<
+ Polars[150 x 5]
+ sepal_length float [-1.0840606189132314, -1.3757361217598396, -1.6674116246064494, -1.8132493760297548, -1.2298983703365356, ...]
+ sepal_width float [2.372289612531505, -0.28722789030650403, 0.7765791108287006, 0.24467561026109824, 2.904193113099107, ...]
+ petal_length float [-0.7576391687443842, -0.7576391687443842, -0.7897606710936372, -0.725517666395131, -0.7576391687443842, ...]
+ petal_width float [-1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, -1.7147014356654704, ...]
+ species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
+>
For-comprehensions works with all dataframe verbs. As we have seen
above, for mutations we must return tuples as pair with the mutation
name and its value. summarise
works similarly. Note in both cases
the name could also be generated dynamically. For example, to compute
-the mean per species, you could write:
iex> Explorer.Datasets.iris()
-...> |> DF.group_by("species")
-...> |> DF.summarise(
-...> for col <- across(), col.dtype == :float do
-...> {"#{col.name}_mean", mean(col)}
-...> end
-...> )
-#Explorer.DataFrame<
- Polars[3 x 5]
- species string ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
- sepal_length_mean float [5.005999999999999, 5.936, 6.587999999999998]
- sepal_width_mean float [3.4180000000000006, 2.7700000000000005, 2.9739999999999998]
- petal_length_mean float [1.464, 4.26, 5.552]
- petal_width_mean float [0.2439999999999999, 1.3259999999999998, 2.026]
->
arrange
expects a list of columns to sort by, while for-comprehensions
+the mean per species, you could write:
iex> Explorer.Datasets.iris()
+...> |> DF.group_by("species")
+...> |> DF.summarise(
+...> for col <- across(), col.dtype == :float do
+...> {"#{col.name}_mean", mean(col)}
+...> end
+...> )
+#Explorer.DataFrame<
+ Polars[3 x 5]
+ species string ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
+ sepal_length_mean float [5.005999999999999, 5.936, 6.587999999999998]
+ sepal_width_mean float [3.4180000000000006, 2.7700000000000005, 2.9739999999999998]
+ petal_length_mean float [1.464, 4.26, 5.552]
+ petal_width_mean float [0.2439999999999999, 1.3259999999999998, 2.026]
+>
arrange
expects a list of columns to sort by, while for-comprehensions
in filter
generate a list of conditions, which are joined using and
.
For example, to filter all entries have both sepal and petal length above
-average, using a filter on the column name, one could write:
iex> iris = Explorer.Datasets.iris()
-iex> DF.filter(iris,
-...> for col <- across(), String.ends_with?(col.name, "_length") do
-...> col > mean(col)
-...> end
-...> )
-#Explorer.DataFrame<
- Polars[70 x 5]
- sepal_length float [7.0, 6.4, 6.9, 6.5, 6.3, ...]
- sepal_width float [3.2, 3.2, 3.1, 2.8, 3.3, ...]
- petal_length float [4.7, 4.5, 4.9, 4.6, 4.7, ...]
- petal_width float [1.4, 1.5, 1.5, 1.5, 1.6, ...]
- species string ["Iris-versicolor", "Iris-versicolor", "Iris-versicolor", "Iris-versicolor", "Iris-versicolor", ...]
->
Do not mix comprehension and queries
The filter inside a for-comprehension works at the meta level: +average, using a filter on the column name, one could write:
iex> iris = Explorer.Datasets.iris() +iex> DF.filter(iris, +...> for col <- across(), String.ends_with?(col.name, "_length") do +...> col > mean(col) +...> end +...> ) +#Explorer.DataFrame< + Polars[70 x 5] + sepal_length float [7.0, 6.4, 6.9, 6.5, 6.3, ...] + sepal_width float [3.2, 3.2, 3.1, 2.8, 3.3, ...] + petal_length float [4.7, 4.5, 4.9, 4.6, 4.7, ...] + petal_width float [1.4, 1.5, 1.5, 1.5, 1.6, ...] + species string ["Iris-versicolor", "Iris-versicolor", "Iris-versicolor", "Iris-versicolor", "Iris-versicolor", ...] +>
Do not mix comprehension and queries
The filter inside a for-comprehension works at the meta level: it can only filter columns based on their names and dtypes, but not on their values. For example, this code does not make any -sense and it will fail to compile:
|> DF.filter( - for col <- across(), col > mean(col) do +sense and it will fail to compile:
|> DF.filter( + for col <- across(), col > mean(col) do col - end -end)
Another way to think about it, the comprehensions traverse on the +
end +end)Another way to think about it, the comprehensions traverse on the columns themselves, the contents inside the comprehension do-block traverse on the values inside the columns.
@@ -281,7 +281,7 @@
Queries simply become lazy dataframe operations at runtime. -For example, the following query
Explorer.DataFrame.filter(df, nums > 2)
is equivalent to
Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.greater(df["nums"], 2) end)
This means that, whenever you want to generate queries programatically, +For example, the following query
Explorer.DataFrame.filter(df, nums > 2)
is equivalent to
Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.greater(df["nums"], 2) end)
This means that, whenever you want to generate queries programatically, you can fallback to the regular
@@ -738,9 +738,9 @@_with
APIs.left <> right
Examples
DF.mutate(df, name: first_name <> " " <> last_name)
If you want to convert concatenate non-string +
DF.mutate(df, name: first_name <> " " <> last_name)
If you want to convert concatenate non-string series, you can explicitly cast them to string -before:
DF.mutate(df, name: cast(year, :string) <> "-" <> cast(month, :string))
Or use format:
DF.mutate(df, name: format([year, "-", month]))
+before:DF.mutate(df, name: cast(year, :string) <> "-" <> cast(month, :string))
Or use format:
DF.mutate(df, name: format([year, "-", month]))
Accesses a column by name.
If your column name contains whitespace or start with uppercase letters, you can still access its name by -using this macro:
iex> df = Explorer.DataFrame.new("unusual nums": [1, 2, 3])
-iex> Explorer.DataFrame.filter(df, col("unusual nums") > 2)
-#Explorer.DataFrame<
- Polars[1 x 1]
- unusual nums integer [3]
->
name
must be an atom, a string, or an integer.
+using this macro:
iex> df = Explorer.DataFrame.new("unusual nums": [1, 2, 3])
+iex> Explorer.DataFrame.filter(df, col("unusual nums") > 2)
+#Explorer.DataFrame<
+ Polars[1 x 1]
+ unusual nums integer [3]
+>
name
must be an atom, a string, or an integer.
It is equivalent to df[name]
but inside a query.
This can also be used if you want to access a column -programatically, for example:
iex> df = Explorer.DataFrame.new(nums: [1, 2, 3])
+programatically, for example:iex> df = Explorer.DataFrame.new(nums: [1, 2, 3])
iex> name = :nums
-iex> Explorer.DataFrame.filter(df, col(^name) > 2)
-#Explorer.DataFrame<
- Polars[1 x 1]
- nums integer [3]
->
For traversing multiple columns programatically,
+
iex> Explorer.DataFrame.filter(df, col(^name) > 2)
+#Explorer.DataFrame<
+ Polars[1 x 1]
+ nums integer [3]
+>
For traversing multiple columns programatically,
see across/0
and across/1
.
Series can be created using from_list/2
, from_binary/3
, and friends:
Series can be made of numbers:
iex> Explorer.Series.from_list([1, 2, 3])
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
Series are nullable, so you may also include nils:
iex> Explorer.Series.from_list([1.0, nil, 2.5, 3.1])
-#Explorer.Series<
- Polars[4]
- float [1.0, nil, 2.5, 3.1]
->
Any of the dtypes above are supported, such as strings:
iex> Explorer.Series.from_list(["foo", "bar", "baz"])
-#Explorer.Series<
- Polars[3]
- string ["foo", "bar", "baz"]
->
+Series can be created using from_list/2
, from_binary/3
, and friends:
Series can be made of numbers:
iex> Explorer.Series.from_list([1, 2, 3])
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
Series are nullable, so you may also include nils:
iex> Explorer.Series.from_list([1.0, nil, 2.5, 3.1])
+#Explorer.Series<
+ Polars[4]
+ float [1.0, nil, 2.5, 3.1]
+>
Any of the dtypes above are supported, such as strings:
iex> Explorer.Series.from_list(["foo", "bar", "baz"])
+#Explorer.Series<
+ Polars[3]
+ string ["foo", "bar", "baz"]
+>
@@ -1743,36 +1743,36 @@ Integers and floats follow their native encoding:
iex> Explorer.Series.from_binary(<<1.0::float-64-native, 2.0::float-64-native>>, :float)
-#Explorer.Series<
- Polars[2]
- float [1.0, 2.0]
->
-
-iex> Explorer.Series.from_binary(<<-1::signed-64-native, 1::signed-64-native>>, :integer)
-#Explorer.Series<
- Polars[2]
- integer [-1, 1]
->
Booleans are unsigned integers:
iex> Explorer.Series.from_binary(<<1, 0, 1>>, :boolean)
-#Explorer.Series<
- Polars[3]
- boolean [true, false, true]
->
Dates are encoded as i32 representing days from the Unix epoch (1970-01-01):
iex> binary = <<-719162::signed-32-native, 0::signed-32-native, 6129::signed-32-native>>
-iex> Explorer.Series.from_binary(binary, :date)
-#Explorer.Series<
- Polars[3]
- date [0001-01-01, 1970-01-01, 1986-10-13]
->
Times are encoded as i64 representing nanoseconds from midnight:
iex> binary = <<0::signed-64-native, 86399999999000::signed-64-native>>
-iex> Explorer.Series.from_binary(binary, :time)
-#Explorer.Series<
- Polars[2]
- time [00:00:00.000000, 23:59:59.999999]
->
Datetimes are encoded as i64 representing microseconds from the Unix epoch (1970-01-01):
iex> binary = <<0::signed-64-native, 529550625987654::signed-64-native>>
-iex> Explorer.Series.from_binary(binary, :datetime)
-#Explorer.Series<
- Polars[2]
- datetime [1970-01-01 00:00:00.000000, 1986-10-13 01:23:45.987654]
->
+Integers and floats follow their native encoding:
iex> Explorer.Series.from_binary(<<1.0::float-64-native, 2.0::float-64-native>>, :float)
+#Explorer.Series<
+ Polars[2]
+ float [1.0, 2.0]
+>
+
+iex> Explorer.Series.from_binary(<<-1::signed-64-native, 1::signed-64-native>>, :integer)
+#Explorer.Series<
+ Polars[2]
+ integer [-1, 1]
+>
Booleans are unsigned integers:
iex> Explorer.Series.from_binary(<<1, 0, 1>>, :boolean)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, true]
+>
Dates are encoded as i32 representing days from the Unix epoch (1970-01-01):
iex> binary = <<-719162::signed-32-native, 0::signed-32-native, 6129::signed-32-native>>
+iex> Explorer.Series.from_binary(binary, :date)
+#Explorer.Series<
+ Polars[3]
+ date [0001-01-01, 1970-01-01, 1986-10-13]
+>
Times are encoded as i64 representing nanoseconds from midnight:
iex> binary = <<0::signed-64-native, 86399999999000::signed-64-native>>
+iex> Explorer.Series.from_binary(binary, :time)
+#Explorer.Series<
+ Polars[2]
+ time [00:00:00.000000, 23:59:59.999999]
+>
Datetimes are encoded as i64 representing microseconds from the Unix epoch (1970-01-01):
iex> binary = <<0::signed-64-native, 529550625987654::signed-64-native>>
+iex> Explorer.Series.from_binary(binary, :datetime)
+#Explorer.Series<
+ Polars[2]
+ datetime [1970-01-01 00:00:00.000000, 1986-10-13 01:23:45.987654]
+>
Explorer will infer the type from the values in the list:
iex> Explorer.Series.from_list([1, 2, 3])
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
Series are nullable, so you may also include nils:
iex> Explorer.Series.from_list([1.0, nil, 2.5, 3.1])
-#Explorer.Series<
- Polars[4]
- float [1.0, nil, 2.5, 3.1]
->
A mix of integers and floats will be cast to a float:
iex> Explorer.Series.from_list([1, 2.0])
-#Explorer.Series<
- Polars[2]
- float [1.0, 2.0]
->
Floats series can accept NaN, Inf, and -Inf values:
iex> Explorer.Series.from_list([1.0, 2.0, :nan, 4.0])
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, NaN, 4.0]
->
-
-iex> Explorer.Series.from_list([1.0, 2.0, :infinity, 4.0])
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, Inf, 4.0]
->
-
-iex> Explorer.Series.from_list([1.0, 2.0, :neg_infinity, 4.0])
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, -Inf, 4.0]
->
Trying to create a "nil" series will, by default, result in a series of floats:
iex> Explorer.Series.from_list([nil, nil])
-#Explorer.Series<
- Polars[2]
- float [nil, nil]
->
You can specify the desired dtype
for a series with the :dtype
option.
iex> Explorer.Series.from_list([nil, nil], dtype: :integer)
-#Explorer.Series<
- Polars[2]
- integer [nil, nil]
->
-
-iex> Explorer.Series.from_list([1, nil], dtype: :string)
-#Explorer.Series<
- Polars[2]
- string ["1", nil]
->
The dtype
option is particulary important if a :binary
series is desired, because
-by default binary series will have the dtype of :string
:
iex> Explorer.Series.from_list([<<228, 146, 51>>, <<42, 209, 236>>], dtype: :binary)
-#Explorer.Series<
- Polars[2]
- binary [<<228, 146, 51>>, <<42, 209, 236>>]
->
A series mixing UTF8 strings and binaries is possible:
iex> Explorer.Series.from_list([<<228, 146, 51>>, "Elixir"], dtype: :binary)
-#Explorer.Series<
- Polars[2]
- binary [<<228, 146, 51>>, "Elixir"]
->
Another option is to create a categorical series from a list of strings:
iex> Explorer.Series.from_list(["EUA", "Brazil", "Poland"], dtype: :category)
-#Explorer.Series<
- Polars[3]
- category ["EUA", "Brazil", "Poland"]
->
It is possible to create a series of :datetime
from a list of microseconds since Unix Epoch.
iex> Explorer.Series.from_list([1649883642 * 1_000 * 1_000], dtype: :datetime)
-#Explorer.Series<
- Polars[1]
- datetime [2022-04-13 21:00:42.000000]
->
It is possible to create a series of :time
from a list of nanoseconds since midnight.
iex> Explorer.Series.from_list([123 * 1_000 * 1_000 * 1_000], dtype: :time)
-#Explorer.Series<
- Polars[1]
- time [00:02:03.000000]
->
Mixing non-numeric data types will raise an ArgumentError:
iex> Explorer.Series.from_list([1, "a"])
+Explorer will infer the type from the values in the list:
iex> Explorer.Series.from_list([1, 2, 3])
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
Series are nullable, so you may also include nils:
iex> Explorer.Series.from_list([1.0, nil, 2.5, 3.1])
+#Explorer.Series<
+ Polars[4]
+ float [1.0, nil, 2.5, 3.1]
+>
A mix of integers and floats will be cast to a float:
iex> Explorer.Series.from_list([1, 2.0])
+#Explorer.Series<
+ Polars[2]
+ float [1.0, 2.0]
+>
Floats series can accept NaN, Inf, and -Inf values:
iex> Explorer.Series.from_list([1.0, 2.0, :nan, 4.0])
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, NaN, 4.0]
+>
+
+iex> Explorer.Series.from_list([1.0, 2.0, :infinity, 4.0])
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, Inf, 4.0]
+>
+
+iex> Explorer.Series.from_list([1.0, 2.0, :neg_infinity, 4.0])
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, -Inf, 4.0]
+>
Trying to create a "nil" series will, by default, result in a series of floats:
iex> Explorer.Series.from_list([nil, nil])
+#Explorer.Series<
+ Polars[2]
+ float [nil, nil]
+>
You can specify the desired dtype
for a series with the :dtype
option.
iex> Explorer.Series.from_list([nil, nil], dtype: :integer)
+#Explorer.Series<
+ Polars[2]
+ integer [nil, nil]
+>
+
+iex> Explorer.Series.from_list([1, nil], dtype: :string)
+#Explorer.Series<
+ Polars[2]
+ string ["1", nil]
+>
The dtype
option is particulary important if a :binary
series is desired, because
+by default binary series will have the dtype of :string
:
iex> Explorer.Series.from_list([<<228, 146, 51>>, <<42, 209, 236>>], dtype: :binary)
+#Explorer.Series<
+ Polars[2]
+ binary [<<228, 146, 51>>, <<42, 209, 236>>]
+>
A series mixing UTF8 strings and binaries is possible:
iex> Explorer.Series.from_list([<<228, 146, 51>>, "Elixir"], dtype: :binary)
+#Explorer.Series<
+ Polars[2]
+ binary [<<228, 146, 51>>, "Elixir"]
+>
Another option is to create a categorical series from a list of strings:
iex> Explorer.Series.from_list(["EUA", "Brazil", "Poland"], dtype: :category)
+#Explorer.Series<
+ Polars[3]
+ category ["EUA", "Brazil", "Poland"]
+>
It is possible to create a series of :datetime
from a list of microseconds since Unix Epoch.
iex> Explorer.Series.from_list([1649883642 * 1_000 * 1_000], dtype: :datetime)
+#Explorer.Series<
+ Polars[1]
+ datetime [2022-04-13 21:00:42.000000]
+>
It is possible to create a series of :time
from a list of nanoseconds since midnight.
iex> Explorer.Series.from_list([123 * 1_000 * 1_000 * 1_000], dtype: :time)
+#Explorer.Series<
+ Polars[1]
+ time [00:02:03.000000]
+>
Mixing non-numeric data types will raise an ArgumentError:
iex> Explorer.Series.from_list([1, "a"])
** (ArgumentError) the value "a" does not match the inferred series dtype :integer
Integers and floats:
iex> tensor = Nx.tensor([1, 2, 3])
-iex> Explorer.Series.from_tensor(tensor)
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
-
-iex> tensor = Nx.tensor([1.0, 2.0, 3.0], type: :f64)
-iex> Explorer.Series.from_tensor(tensor)
-#Explorer.Series<
- Polars[3]
- float [1.0, 2.0, 3.0]
->
Unsigned 8-bit tensors are assumed to be booleans:
iex> tensor = Nx.tensor([1, 0, 1], type: :u8)
-iex> Explorer.Series.from_tensor(tensor)
-#Explorer.Series<
- Polars[3]
- boolean [true, false, true]
->
Signed 32-bit tensors are assumed to be dates:
iex> tensor = Nx.tensor([-719162, 0, 6129], type: :s32)
-iex> Explorer.Series.from_tensor(tensor)
-#Explorer.Series<
- Polars[3]
- date [0001-01-01, 1970-01-01, 1986-10-13]
->
Times are signed 64-bit representing nanoseconds from midnight and -therefore must have their dtype explicitly given:
iex> tensor = Nx.tensor([0, 86399999999000])
-iex> Explorer.Series.from_tensor(tensor, dtype: :time)
-#Explorer.Series<
- Polars[2]
- time [00:00:00.000000, 23:59:59.999999]
->
Datetimes are signed 64-bit and therefore must have their dtype explicitly given:
iex> tensor = Nx.tensor([0, 529550625987654])
-iex> Explorer.Series.from_tensor(tensor, dtype: :datetime)
-#Explorer.Series<
- Polars[2]
- datetime [1970-01-01 00:00:00.000000, 1986-10-13 01:23:45.987654]
->
+Integers and floats:
iex> tensor = Nx.tensor([1, 2, 3])
+iex> Explorer.Series.from_tensor(tensor)
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
+
+iex> tensor = Nx.tensor([1.0, 2.0, 3.0], type: :f64)
+iex> Explorer.Series.from_tensor(tensor)
+#Explorer.Series<
+ Polars[3]
+ float [1.0, 2.0, 3.0]
+>
Unsigned 8-bit tensors are assumed to be booleans:
iex> tensor = Nx.tensor([1, 0, 1], type: :u8)
+iex> Explorer.Series.from_tensor(tensor)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, true]
+>
Signed 32-bit tensors are assumed to be dates:
iex> tensor = Nx.tensor([-719162, 0, 6129], type: :s32)
+iex> Explorer.Series.from_tensor(tensor)
+#Explorer.Series<
+ Polars[3]
+ date [0001-01-01, 1970-01-01, 1986-10-13]
+>
Times are signed 64-bit representing nanoseconds from midnight and +therefore must have their dtype explicitly given:
iex> tensor = Nx.tensor([0, 86399999999000])
+iex> Explorer.Series.from_tensor(tensor, dtype: :time)
+#Explorer.Series<
+ Polars[2]
+ time [00:00:00.000000, 23:59:59.999999]
+>
Datetimes are signed 64-bit and therefore must have their dtype explicitly given:
iex> tensor = Nx.tensor([0, 529550625987654])
+iex> Explorer.Series.from_tensor(tensor, dtype: :datetime)
+#Explorer.Series<
+ Polars[2]
+ datetime [1970-01-01 00:00:00.000000, 1986-10-13 01:23:45.987654]
+>
iex> s = Explorer.Series.from_list([0, 1, 2])
-iex> Explorer.Series.replace(s, Nx.tensor([1, 2, 3]))
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
This is particularly useful for categorical columns:
iex> s = Explorer.Series.from_list(["foo", "bar", "baz"], dtype: :category)
-iex> Explorer.Series.replace(s, Nx.tensor([2, 1, 0]))
-#Explorer.Series<
- Polars[3]
- category ["baz", "bar", "foo"]
->
iex> s = Explorer.Series.from_list([0, 1, 2])
+iex> Explorer.Series.replace(s, Nx.tensor([1, 2, 3]))
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
This is particularly useful for categorical columns:
iex> s = Explorer.Series.from_list(["foo", "bar", "baz"], dtype: :category)
+iex> Explorer.Series.replace(s, Nx.tensor([2, 1, 0]))
+#Explorer.Series<
+ Polars[3]
+ category ["baz", "bar", "foo"]
+>
Similar to tensors, we can also replace by lists:
iex> s = Explorer.Series.from_list([0, 1, 2])
-iex> Explorer.Series.replace(s, [1, 2, 3, 4, 5])
-#Explorer.Series<
- Polars[5]
- integer [1, 2, 3, 4, 5]
->
The same considerations as above apply.
+Similar to tensors, we can also replace by lists:
iex> s = Explorer.Series.from_list([0, 1, 2])
+iex> Explorer.Series.replace(s, [1, 2, 3, 4, 5])
+#Explorer.Series<
+ Polars[5]
+ integer [1, 2, 3, 4, 5]
+>
The same considerations as above apply.
iex> series = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.to_binary(series)
-<<1::signed-64-native, 2::signed-64-native, 3::signed-64-native>>
+iex> series = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.to_binary(series)
+<<1::signed-64-native, 2::signed-64-native, 3::signed-64-native>>
-iex> series = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.to_binary(series)
-<<1, 0, 1>>
+iex> series = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.to_binary(series)
+<<1, 0, 1>>
iex> series = Explorer.Series.from_list([1, 2, 3])
-iex> series |> Explorer.Series.to_enum() |> Enum.to_list()
-[1, 2, 3]
+iex> series = Explorer.Series.from_list([1, 2, 3])
+iex> series |> Explorer.Series.to_enum() |> Enum.to_list()
+[1, 2, 3]
Integers and floats follow their native encoding:
iex> series = Explorer.Series.from_list([-1, 0, 1])
-iex> Explorer.Series.to_iovec(series)
-[<<-1::signed-64-native, 0::signed-64-native, 1::signed-64-native>>]
-
-iex> series = Explorer.Series.from_list([1.0, 2.0, 3.0])
-iex> Explorer.Series.to_iovec(series)
-[<<1.0::float-64-native, 2.0::float-64-native, 3.0::float-64-native>>]
Booleans are encoded as 0 and 1:
iex> series = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.to_iovec(series)
-[<<1, 0, 1>>]
Dates are encoded as i32 representing days from the Unix epoch (1970-01-01):
iex> series = Explorer.Series.from_list([~D[0001-01-01], ~D[1970-01-01], ~D[1986-10-13]])
-iex> Explorer.Series.to_iovec(series)
-[<<-719162::signed-32-native, 0::signed-32-native, 6129::signed-32-native>>]
Times are encoded as i64 representing nanoseconds from midnight:
iex> series = Explorer.Series.from_list([~T[00:00:00.000000], ~T[23:59:59.999999]])
-iex> Explorer.Series.to_iovec(series)
-[<<0::signed-64-native, 86399999999000::signed-64-native>>]
Datetimes are encoded as i64 representing microseconds from the Unix epoch (1970-01-01):
iex> series = Explorer.Series.from_list([~N[0001-01-01 00:00:00], ~N[1970-01-01 00:00:00], ~N[1986-10-13 01:23:45.987654]])
-iex> Explorer.Series.to_iovec(series)
-[<<-62135596800000000::signed-64-native, 0::signed-64-native, 529550625987654::signed-64-native>>]
The operation raises for binaries and strings, as they do not provide a fixed-width -binary representation:
iex> s = Explorer.Series.from_list(["a", "b", "c", "b"])
-iex> Explorer.Series.to_iovec(s)
+Integers and floats follow their native encoding:
iex> series = Explorer.Series.from_list([-1, 0, 1])
+iex> Explorer.Series.to_iovec(series)
+[<<-1::signed-64-native, 0::signed-64-native, 1::signed-64-native>>]
+
+iex> series = Explorer.Series.from_list([1.0, 2.0, 3.0])
+iex> Explorer.Series.to_iovec(series)
+[<<1.0::float-64-native, 2.0::float-64-native, 3.0::float-64-native>>]
Booleans are encoded as 0 and 1:
iex> series = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.to_iovec(series)
+[<<1, 0, 1>>]
Dates are encoded as i32 representing days from the Unix epoch (1970-01-01):
iex> series = Explorer.Series.from_list([~D[0001-01-01], ~D[1970-01-01], ~D[1986-10-13]])
+iex> Explorer.Series.to_iovec(series)
+[<<-719162::signed-32-native, 0::signed-32-native, 6129::signed-32-native>>]
Times are encoded as i64 representing nanoseconds from midnight:
iex> series = Explorer.Series.from_list([~T[00:00:00.000000], ~T[23:59:59.999999]])
+iex> Explorer.Series.to_iovec(series)
+[<<0::signed-64-native, 86399999999000::signed-64-native>>]
Datetimes are encoded as i64 representing microseconds from the Unix epoch (1970-01-01):
iex> series = Explorer.Series.from_list([~N[0001-01-01 00:00:00], ~N[1970-01-01 00:00:00], ~N[1986-10-13 01:23:45.987654]])
+iex> Explorer.Series.to_iovec(series)
+[<<-62135596800000000::signed-64-native, 0::signed-64-native, 529550625987654::signed-64-native>>]
The operation raises for binaries and strings, as they do not provide a fixed-width
+binary representation:
iex> s = Explorer.Series.from_list(["a", "b", "c", "b"])
+iex> Explorer.Series.to_iovec(s)
** (ArgumentError) cannot convert series of dtype :string into iovec
However, if appropriate, you can convert them to categorical types,
-which will then return the index of each category:
iex> series = Explorer.Series.from_list(["a", "b", "c", "b"], dtype: :category)
-iex> Explorer.Series.to_iovec(series)
-[<<0::unsigned-32-native, 1::unsigned-32-native, 2::unsigned-32-native, 1::unsigned-32-native>>]
+which will then return the index of each category:iex> series = Explorer.Series.from_list(["a", "b", "c", "b"], dtype: :category)
+iex> Explorer.Series.to_iovec(series)
+[<<0::unsigned-32-native, 1::unsigned-32-native, 2::unsigned-32-native, 1::unsigned-32-native>>]
iex> series = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.to_list(series)
-[1, 2, 3]
+iex> series = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.to_list(series)
+[1, 2, 3]
iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.to_tensor(s)
-#Nx.Tensor<
- s64[3]
- [1, 2, 3]
->
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.to_tensor(s)
+#Nx.Tensor<
+ s64[3]
+ [1, 2, 3]
+>
-iex> s = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.to_tensor(s)
-#Nx.Tensor<
- u8[3]
- [1, 0, 1]
->
+iex> s = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.to_tensor(s)
+#Nx.Tensor<
+ u8[3]
+ [1, 0, 1]
+>
iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.argmax(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.argmax(s)
3
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.argmax(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.argmax(s)
3
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.argmax(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.argmax(s)
0
-iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
-iex> Explorer.Series.argmax(s)
+iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
+iex> Explorer.Series.argmax(s)
0
-iex> s = Explorer.Series.from_list([~T[00:02:03.000212], ~T[00:05:04.000456]])
-iex> Explorer.Series.argmax(s)
+iex> s = Explorer.Series.from_list([~T[00:02:03.000212], ~T[00:05:04.000456]])
+iex> Explorer.Series.argmax(s)
1
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.argmax(s)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.argmax(s)
** (ArgumentError) Explorer.Series.argmax/1 not implemented for dtype :string. Valid dtypes are [:integer, :float, :date, :time, :datetime]
@@ -2360,28 +2360,28 @@ argmin(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.argmin(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.argmin(s)
2
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.argmin(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.argmin(s)
2
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.argmin(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.argmin(s)
1
-iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
-iex> Explorer.Series.argmin(s)
+iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
+iex> Explorer.Series.argmin(s)
1
-iex> s = Explorer.Series.from_list([~T[00:02:03.000212], ~T[00:05:04.000456]])
-iex> Explorer.Series.argmin(s)
+iex> s = Explorer.Series.from_list([~T[00:02:03.000212], ~T[00:05:04.000456]])
+iex> Explorer.Series.argmin(s)
0
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.argmin(s)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.argmin(s)
** (ArgumentError) Explorer.Series.argmin/1 not implemented for dtype :string. Valid dtypes are [:integer, :float, :date, :time, :datetime]
@@ -2426,9 +2426,9 @@ correlation(left, right, ddof \\ 1)
Examples
-iex> s1 = Series.from_list([1, 8, 3])
-iex> s2 = Series.from_list([4, 5, 2])
-iex> Series.correlation(s1, s2)
+iex> s1 = Series.from_list([1, 8, 3])
+iex> s2 = Series.from_list([4, 5, 2])
+iex> Series.correlation(s1, s2)
0.5447047794019223
@@ -2460,8 +2460,8 @@ count(series)
Examples
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.count(s)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.count(s)
3
@@ -2502,9 +2502,9 @@ covariance(left, right)
Examples
-iex> s1 = Series.from_list([1, 8, 3])
-iex> s2 = Series.from_list([4, 5, 2])
-iex> Series.covariance(s1, s2)
+iex> s1 = Series.from_list([1, 8, 3])
+iex> s2 = Series.from_list([4, 5, 2])
+iex> Series.covariance(s1, s2)
3.0
@@ -2545,14 +2545,14 @@ cut(series, bins, opts \\ [])
Examples
-iex> s = Explorer.Series.from_list([1.0, 2.0, 3.0])
-iex> Explorer.Series.cut(s, [1.5, 2.5])
-#Explorer.DataFrame<
- Polars[3 x 3]
- values float [1.0, 2.0, 3.0]
- break_point float [1.5, 2.5, Inf]
- category category ["(-inf, 1.5]", "(1.5, 2.5]", "(2.5, inf]"]
->
+iex> s = Explorer.Series.from_list([1.0, 2.0, 3.0])
+iex> Explorer.Series.cut(s, [1.5, 2.5])
+#Explorer.DataFrame<
+ Polars[3 x 3]
+ values float [1.0, 2.0, 3.0]
+ break_point float [1.5, 2.5, Inf]
+ category category ["(-inf, 1.5]", "(1.5, 2.5]", "(2.5, inf]"]
+>
@@ -2580,13 +2580,13 @@ frequencies(series)
Examples
-iex> s = Explorer.Series.from_list(["a", "a", "b", "c", "c", "c"])
-iex> Explorer.Series.frequencies(s)
-#Explorer.DataFrame<
- Polars[3 x 2]
- values string ["c", "a", "b"]
- counts integer [3, 2, 1]
->
+iex> s = Explorer.Series.from_list(["a", "a", "b", "c", "c", "c"])
+iex> Explorer.Series.frequencies(s)
+#Explorer.DataFrame<
+ Polars[3 x 2]
+ values string ["c", "a", "b"]
+ counts integer [3, 2, 1]
+>
@@ -2627,28 +2627,28 @@ max(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.max(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.max(s)
3
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.max(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.max(s)
3.0
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.max(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.max(s)
~D[2021-01-01]
-iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
-iex> Explorer.Series.max(s)
+iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
+iex> Explorer.Series.max(s)
~N[2021-01-01 00:00:00.000000]
-iex> s = Explorer.Series.from_list([~T[00:02:03.000212], ~T[00:05:04.000456]])
-iex> Explorer.Series.max(s)
+iex> s = Explorer.Series.from_list([~T[00:02:03.000212], ~T[00:05:04.000456]])
+iex> Explorer.Series.max(s)
~T[00:05:04.000456]
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.max(s)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.max(s)
** (ArgumentError) Explorer.Series.max/1 not implemented for dtype :string. Valid dtypes are [:integer, :float, :date, :time, :datetime]
@@ -2689,16 +2689,16 @@ mean(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.mean(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.mean(s)
2.0
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.mean(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.mean(s)
2.0
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.mean(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.mean(s)
** (ArgumentError) Explorer.Series.mean/1 not implemented for dtype :date. Valid dtypes are [:integer, :float]
@@ -2739,16 +2739,16 @@ median(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.median(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.median(s)
2.0
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.median(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.median(s)
2.0
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.median(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.median(s)
** (ArgumentError) Explorer.Series.median/1 not implemented for dtype :date. Valid dtypes are [:integer, :float]
@@ -2790,28 +2790,28 @@ min(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.min(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.min(s)
1
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.min(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.min(s)
1.0
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.min(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.min(s)
~D[1999-12-31]
-iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
-iex> Explorer.Series.min(s)
+iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
+iex> Explorer.Series.min(s)
~N[1999-12-31 00:00:00.000000]
-iex> s = Explorer.Series.from_list([~T[00:02:03.000451], ~T[00:05:04.000134]])
-iex> Explorer.Series.min(s)
+iex> s = Explorer.Series.from_list([~T[00:02:03.000451], ~T[00:05:04.000134]])
+iex> Explorer.Series.min(s)
~T[00:02:03.000451]
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.min(s)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.min(s)
** (ArgumentError) Explorer.Series.min/1 not implemented for dtype :string. Valid dtypes are [:integer, :float, :date, :time, :datetime]
@@ -2840,8 +2840,8 @@ n_distinct(series)
Examples
-iex> s = Explorer.Series.from_list(["a", "b", "a", "b"])
-iex> Explorer.Series.n_distinct(s)
+iex> s = Explorer.Series.from_list(["a", "b", "a", "b"])
+iex> Explorer.Series.n_distinct(s)
2
@@ -2870,8 +2870,8 @@ nil_count(series)
Examples
-iex> s = Explorer.Series.from_list(["a", nil, "c", nil, nil])
-iex> Explorer.Series.nil_count(s)
+iex> s = Explorer.Series.from_list(["a", nil, "c", nil, nil])
+iex> Explorer.Series.nil_count(s)
3
@@ -2912,12 +2912,12 @@ product(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.product(s)
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.product(s)
6
-iex> s = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.product(s)
+iex> s = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.product(s)
** (ArgumentError) Explorer.Series.product/1 not implemented for dtype :boolean. Valid dtypes are [:integer, :float]
@@ -2959,14 +2959,14 @@ qcut(series, quantiles, opts \\ [])
Examples
-iex> s = Explorer.Series.from_list([1.0, 2.0, 3.0, 4.0, 5.0])
-iex> Explorer.Series.qcut(s, [0.25, 0.75])
-#Explorer.DataFrame<
- Polars[5 x 3]
- values float [1.0, 2.0, 3.0, 4.0, 5.0]
- break_point float [2.0, 2.0, 4.0, 4.0, Inf]
- category category ["(-inf, 2]", "(-inf, 2]", "(2, 4]", "(2, 4]", "(4, inf]"]
->
+iex> s = Explorer.Series.from_list([1.0, 2.0, 3.0, 4.0, 5.0])
+iex> Explorer.Series.qcut(s, [0.25, 0.75])
+#Explorer.DataFrame<
+ Polars[5 x 3]
+ values float [1.0, 2.0, 3.0, 4.0, 5.0]
+ break_point float [2.0, 2.0, 4.0, 4.0, Inf]
+ category category ["(-inf, 2]", "(-inf, 2]", "(2, 4]", "(2, 4]", "(4, inf]"]
+>
@@ -3006,28 +3006,28 @@ quantile(series, quantile)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.quantile(s, 0.2)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.quantile(s, 0.2)
1
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.quantile(s, 0.5)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.quantile(s, 0.5)
2.0
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.quantile(s, 0.5)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.quantile(s, 0.5)
~D[2021-01-01]
-iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
-iex> Explorer.Series.quantile(s, 0.5)
+iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
+iex> Explorer.Series.quantile(s, 0.5)
~N[2021-01-01 00:00:00.000000]
-iex> s = Explorer.Series.from_list([~T[01:55:00], ~T[15:35:00], ~T[23:00:00]])
-iex> Explorer.Series.quantile(s, 0.5)
+iex> s = Explorer.Series.from_list([~T[01:55:00], ~T[15:35:00], ~T[23:00:00]])
+iex> Explorer.Series.quantile(s, 0.5)
~T[15:35:00]
-iex> s = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.quantile(s, 0.5)
+iex> s = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.quantile(s, 0.5)
** (ArgumentError) Explorer.Series.quantile/2 not implemented for dtype :boolean. Valid dtypes are [:integer, :float, :date, :time, :datetime]
@@ -3072,24 +3072,24 @@ skew(series, opts \\ [])
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5, 23])
-iex> Explorer.Series.skew(s)
+iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5, 23])
+iex> Explorer.Series.skew(s)
1.6727687946848508
-iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5, 23])
-iex> Explorer.Series.skew(s, bias: false)
+iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5, 23])
+iex> Explorer.Series.skew(s, bias: false)
2.2905330058490514
-iex> s = Explorer.Series.from_list([1, 2, 3, nil, 1])
-iex> Explorer.Series.skew(s, bias: false)
+iex> s = Explorer.Series.from_list([1, 2, 3, nil, 1])
+iex> Explorer.Series.skew(s, bias: false)
0.8545630383279712
-iex> s = Explorer.Series.from_list([1, 2, 3, nil, 1])
-iex> Explorer.Series.skew(s)
+iex> s = Explorer.Series.from_list([1, 2, 3, nil, 1])
+iex> Explorer.Series.skew(s)
0.49338220021815865
-iex> s = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.skew(s, false)
+iex> s = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.skew(s, false)
** (ArgumentError) Explorer.Series.skew/2 not implemented for dtype :boolean. Valid dtypes are [:integer, :float]
@@ -3130,16 +3130,16 @@ standard_deviation(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.standard_deviation(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.standard_deviation(s)
1.0
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.standard_deviation(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.standard_deviation(s)
1.0
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.standard_deviation(s)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.standard_deviation(s)
** (ArgumentError) Explorer.Series.standard_deviation/1 not implemented for dtype :string. Valid dtypes are [:integer, :float]
@@ -3180,20 +3180,20 @@ sum(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.sum(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.sum(s)
6
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.sum(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.sum(s)
6.0
-iex> s = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.sum(s)
+iex> s = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.sum(s)
2
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.sum(s)
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.sum(s)
** (ArgumentError) Explorer.Series.sum/1 not implemented for dtype :date. Valid dtypes are [:integer, :float, :boolean]
@@ -3234,16 +3234,16 @@ variance(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 3])
-iex> Explorer.Series.variance(s)
+iex> s = Explorer.Series.from_list([1, 2, nil, 3])
+iex> Explorer.Series.variance(s)
1.0
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
-iex> Explorer.Series.variance(s)
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 3.0])
+iex> Explorer.Series.variance(s)
1.0
-iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
-iex> Explorer.Series.variance(s)
+iex> s = Explorer.Series.from_list([~N[2021-01-01 00:00:00], ~N[1999-12-31 00:00:00]])
+iex> Explorer.Series.variance(s)
** (ArgumentError) Explorer.Series.variance/1 not implemented for dtype :datetime. Valid dtypes are [:integer, :float]
@@ -3297,29 +3297,29 @@ abs(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, -1, -3])
-iex> Explorer.Series.abs(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 1, 3]
->
-
-iex> s = Explorer.Series.from_list([1.0, 2.0, -1.0, -3.0])
-iex> Explorer.Series.abs(s)
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, 1.0, 3.0]
->
-
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, -3.0])
-iex> Explorer.Series.abs(s)
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, nil, 3.0]
->
-
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.abs(s)
+iex> s = Explorer.Series.from_list([1, 2, -1, -3])
+iex> Explorer.Series.abs(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 1, 3]
+>
+
+iex> s = Explorer.Series.from_list([1.0, 2.0, -1.0, -3.0])
+iex> Explorer.Series.abs(s)
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, 1.0, 3.0]
+>
+
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, -3.0])
+iex> Explorer.Series.abs(s)
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, nil, 3.0]
+>
+
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.abs(s)
** (ArgumentError) Explorer.Series.abs/1 not implemented for dtype :string. Valid dtypes are [:integer, :float]
@@ -3362,25 +3362,25 @@ add(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([4, 5, 6])
-iex> Explorer.Series.add(s1, s2)
-#Explorer.Series<
- Polars[3]
- integer [5, 7, 9]
->
You can also use scalar values on both sides:
iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.add(s1, 2)
-#Explorer.Series<
- Polars[3]
- integer [3, 4, 5]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([4, 5, 6])
+iex> Explorer.Series.add(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ integer [5, 7, 9]
+>
You can also use scalar values on both sides:
iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.add(s1, 2)
+#Explorer.Series<
+ Polars[3]
+ integer [3, 4, 5]
+>
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.add(2, s1)
-#Explorer.Series<
- Polars[3]
- integer [3, 4, 5]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.add(2, s1)
+#Explorer.Series<
+ Polars[3]
+ integer [3, 4, 5]
+>
@@ -3408,19 +3408,19 @@ all_equal(left, right)
Examples
-iex> s1 = Explorer.Series.from_list(["a", "b"])
-iex> s2 = Explorer.Series.from_list(["a", "b"])
-iex> Explorer.Series.all_equal(s1, s2)
+iex> s1 = Explorer.Series.from_list(["a", "b"])
+iex> s2 = Explorer.Series.from_list(["a", "b"])
+iex> Explorer.Series.all_equal(s1, s2)
true
-iex> s1 = Explorer.Series.from_list(["a", "b"])
-iex> s2 = Explorer.Series.from_list(["a", "c"])
-iex> Explorer.Series.all_equal(s1, s2)
+iex> s1 = Explorer.Series.from_list(["a", "b"])
+iex> s2 = Explorer.Series.from_list(["a", "c"])
+iex> Explorer.Series.all_equal(s1, s2)
false
-iex> s1 = Explorer.Series.from_list(["a", "b"])
-iex> s2 = Explorer.Series.from_list([1, 2])
-iex> Explorer.Series.all_equal(s1, s2)
+iex> s1 = Explorer.Series.from_list(["a", "b"])
+iex> s2 = Explorer.Series.from_list([1, 2])
+iex> Explorer.Series.all_equal(s1, s2)
false
@@ -3450,14 +3450,14 @@ left and right
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> mask1 = Explorer.Series.greater(s1, 1)
-iex> mask2 = Explorer.Series.less(s1, 3)
-iex> Explorer.Series.and(mask1, mask2)
-#Explorer.Series<
- Polars[3]
- boolean [false, true, false]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> mask1 = Explorer.Series.greater(s1, 1)
+iex> mask2 = Explorer.Series.less(s1, 3)
+iex> Explorer.Series.and(mask1, mask2)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, true, false]
+>
@@ -3491,62 +3491,62 @@ cast(series, dtype)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.cast(s, :string)
-#Explorer.Series<
- Polars[3]
- string ["1", "2", "3"]
->
-
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.cast(s, :float)
-#Explorer.Series<
- Polars[3]
- float [1.0, 2.0, 3.0]
->
-
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.cast(s, :date)
-#Explorer.Series<
- Polars[3]
- date [1970-01-02, 1970-01-03, 1970-01-04]
->
Note that time
is represented as an integer of nanoseconds since midnight.
+
iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.cast(s, :string)
+#Explorer.Series<
+ Polars[3]
+ string ["1", "2", "3"]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.cast(s, :float)
+#Explorer.Series<
+ Polars[3]
+ float [1.0, 2.0, 3.0]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.cast(s, :date)
+#Explorer.Series<
+ Polars[3]
+ date [1970-01-02, 1970-01-03, 1970-01-04]
+>
Note that time
is represented as an integer of nanoseconds since midnight.
In Elixir we can't represent nanoseconds, only microseconds. So be aware that
-information can be lost if a conversion is needed (e.g. calling to_list/1
).
iex> s = Explorer.Series.from_list([1_000, 2_000, 3_000])
-iex> Explorer.Series.cast(s, :time)
-#Explorer.Series<
- Polars[3]
- time [00:00:00.000001, 00:00:00.000002, 00:00:00.000003]
->
-
-iex> s = Explorer.Series.from_list([86399 * 1_000 * 1_000 * 1_000])
-iex> Explorer.Series.cast(s, :time)
-#Explorer.Series<
- Polars[1]
- time [23:59:59.000000]
->
Note that datetime
is represented as an integer of microseconds since Unix Epoch (1970-01-01 00:00:00).
iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.cast(s, :datetime)
-#Explorer.Series<
- Polars[3]
- datetime [1970-01-01 00:00:00.000001, 1970-01-01 00:00:00.000002, 1970-01-01 00:00:00.000003]
->
-
-iex> s = Explorer.Series.from_list([1649883642 * 1_000 * 1_000])
-iex> Explorer.Series.cast(s, :datetime)
-#Explorer.Series<
- Polars[1]
- datetime [2022-04-13 21:00:42.000000]
->
You can also use cast/2
to categorise a string:
iex> s = Explorer.Series.from_list(["apple", "banana", "apple", "lemon"])
-iex> Explorer.Series.cast(s, :category)
-#Explorer.Series<
- Polars[4]
- category ["apple", "banana", "apple", "lemon"]
->
cast/2
will return the series as a no-op if you try to cast to the same dtype.
iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.cast(s, :integer)
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
+information can be lost if a conversion is needed (e.g. calling to_list/1
).iex> s = Explorer.Series.from_list([1_000, 2_000, 3_000])
+iex> Explorer.Series.cast(s, :time)
+#Explorer.Series<
+ Polars[3]
+ time [00:00:00.000001, 00:00:00.000002, 00:00:00.000003]
+>
+
+iex> s = Explorer.Series.from_list([86399 * 1_000 * 1_000 * 1_000])
+iex> Explorer.Series.cast(s, :time)
+#Explorer.Series<
+ Polars[1]
+ time [23:59:59.000000]
+>
Note that datetime
is represented as an integer of microseconds since Unix Epoch (1970-01-01 00:00:00).
iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.cast(s, :datetime)
+#Explorer.Series<
+ Polars[3]
+ datetime [1970-01-01 00:00:00.000001, 1970-01-01 00:00:00.000002, 1970-01-01 00:00:00.000003]
+>
+
+iex> s = Explorer.Series.from_list([1649883642 * 1_000 * 1_000])
+iex> Explorer.Series.cast(s, :datetime)
+#Explorer.Series<
+ Polars[1]
+ datetime [2022-04-13 21:00:42.000000]
+>
You can also use cast/2
to categorise a string:
iex> s = Explorer.Series.from_list(["apple", "banana", "apple", "lemon"])
+iex> Explorer.Series.cast(s, :category)
+#Explorer.Series<
+ Polars[4]
+ category ["apple", "banana", "apple", "lemon"]
+>
cast/2
will return the series as a no-op if you try to cast to the same dtype.
iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.cast(s, :integer)
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
@@ -3580,32 +3580,32 @@ categorise(series, categories)
If a categorical series is given as second argument, we will extract its
-categories and map the integers into it:
iex> categories = Explorer.Series.from_list(["a", "b", "c", nil, "a"], dtype: :category)
-iex> indexes = Explorer.Series.from_list([0, 2, 1, 0, 2])
-iex> Explorer.Series.categorise(indexes, categories)
-#Explorer.Series<
- Polars[5]
- category ["a", "c", "b", "a", "c"]
->
Otherwise, if a list of strings or a series of strings is given, they are
-considered to be the categories series itself:
iex> categories = Explorer.Series.from_list(["a", "b", "c"])
-iex> indexes = Explorer.Series.from_list([0, 2, 1, 0, 2])
-iex> Explorer.Series.categorise(indexes, categories)
-#Explorer.Series<
- Polars[5]
- category ["a", "c", "b", "a", "c"]
->
-
-iex> indexes = Explorer.Series.from_list([0, 2, 1, 0, 2])
-iex> Explorer.Series.categorise(indexes, ["a", "b", "c"])
-#Explorer.Series<
- Polars[5]
- category ["a", "c", "b", "a", "c"]
->
Elements that are not mapped to a category will become nil
:
iex> indexes = Explorer.Series.from_list([0, 2, nil, 0, 2, 7])
-iex> Explorer.Series.categorise(indexes, ["a", "b", "c"])
-#Explorer.Series<
- Polars[6]
- category ["a", "c", nil, "a", "c", nil]
->
+categories and map the integers into it:iex> categories = Explorer.Series.from_list(["a", "b", "c", nil, "a"], dtype: :category)
+iex> indexes = Explorer.Series.from_list([0, 2, 1, 0, 2])
+iex> Explorer.Series.categorise(indexes, categories)
+#Explorer.Series<
+ Polars[5]
+ category ["a", "c", "b", "a", "c"]
+>
Otherwise, if a list of strings or a series of strings is given, they are
+considered to be the categories series itself:
iex> categories = Explorer.Series.from_list(["a", "b", "c"])
+iex> indexes = Explorer.Series.from_list([0, 2, 1, 0, 2])
+iex> Explorer.Series.categorise(indexes, categories)
+#Explorer.Series<
+ Polars[5]
+ category ["a", "c", "b", "a", "c"]
+>
+
+iex> indexes = Explorer.Series.from_list([0, 2, 1, 0, 2])
+iex> Explorer.Series.categorise(indexes, ["a", "b", "c"])
+#Explorer.Series<
+ Polars[5]
+ category ["a", "c", "b", "a", "c"]
+>
Elements that are not mapped to a category will become nil
:
iex> indexes = Explorer.Series.from_list([0, 2, nil, 0, 2, 7])
+iex> Explorer.Series.categorise(indexes, ["a", "b", "c"])
+#Explorer.Series<
+ Polars[6]
+ category ["a", "c", nil, "a", "c", nil]
+>
@@ -3646,19 +3646,19 @@ clip(series, min, max)
Examples
-iex> s = Explorer.Series.from_list([-50, 5, nil, 50])
-iex> Explorer.Series.clip(s, 1, 10)
-#Explorer.Series<
- Polars[4]
- integer [1, 5, nil, 10]
->
+iex> s = Explorer.Series.from_list([-50, 5, nil, 50])
+iex> Explorer.Series.clip(s, 1, 10)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 5, nil, 10]
+>
-iex> s = Explorer.Series.from_list([-50, 5, nil, 50])
-iex> Explorer.Series.clip(s, 1.5, 10.5)
-#Explorer.Series<
- Polars[4]
- float [1.5, 5.0, nil, 10.5]
->
+iex> s = Explorer.Series.from_list([-50, 5, nil, 50])
+iex> Explorer.Series.clip(s, 1.5, 10.5)
+#Explorer.Series<
+ Polars[4]
+ float [1.5, 5.0, nil, 10.5]
+>
@@ -3692,14 +3692,14 @@ coalesce(list)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, nil, nil])
-iex> s2 = Explorer.Series.from_list([1, 2, nil, 4])
-iex> s3 = Explorer.Series.from_list([nil, nil, 3, 4])
-iex> Explorer.Series.coalesce([s1, s2, s3])
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 3, 4]
->
+iex> s1 = Explorer.Series.from_list([1, 2, nil, nil])
+iex> s2 = Explorer.Series.from_list([1, 2, nil, 4])
+iex> s3 = Explorer.Series.from_list([nil, nil, 3, 4])
+iex> Explorer.Series.coalesce([s1, s2, s3])
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 3, 4]
+>
@@ -3733,17 +3733,17 @@ coalesce(s1, s2)
Examples
-iex> s1 = Explorer.Series.from_list([1, nil, 3, nil])
-iex> s2 = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.coalesce(s1, s2)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 3, 4]
->
+iex> s1 = Explorer.Series.from_list([1, nil, 3, nil])
+iex> s2 = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.coalesce(s1, s2)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 3, 4]
+>
-iex> s1 = Explorer.Series.from_list(["foo", nil, "bar", nil])
-iex> s2 = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.coalesce(s1, s2)
+iex> s1 = Explorer.Series.from_list(["foo", nil, "bar", nil])
+iex> s2 = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.coalesce(s1, s2)
** (ArgumentError) cannot invoke Explorer.Series.coalesce/2 with mismatched dtypes: :string and :integer
@@ -3786,35 +3786,35 @@ divide(left, right)
Examples
-iex> s1 = [10, 10, 10] |> Explorer.Series.from_list()
-iex> s2 = [2, 2, 2] |> Explorer.Series.from_list()
-iex> Explorer.Series.divide(s1, s2)
-#Explorer.Series<
- Polars[3]
- float [5.0, 5.0, 5.0]
->
-
-iex> s1 = [10, 10, 10] |> Explorer.Series.from_list()
-iex> Explorer.Series.divide(s1, 2)
-#Explorer.Series<
- Polars[3]
- float [5.0, 5.0, 5.0]
->
-
-iex> s1 = [10, 52 ,10] |> Explorer.Series.from_list()
-iex> Explorer.Series.divide(s1, 2.5)
-#Explorer.Series<
- Polars[3]
- float [4.0, 20.8, 4.0]
->
-
-iex> s1 = [10, 10, 10] |> Explorer.Series.from_list()
-iex> s2 = [2, 0, 2] |> Explorer.Series.from_list()
-iex> Explorer.Series.divide(s1, s2)
-#Explorer.Series<
- Polars[3]
- float [5.0, Inf, 5.0]
->
+iex> s1 = [10, 10, 10] |> Explorer.Series.from_list()
+iex> s2 = [2, 2, 2] |> Explorer.Series.from_list()
+iex> Explorer.Series.divide(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ float [5.0, 5.0, 5.0]
+>
+
+iex> s1 = [10, 10, 10] |> Explorer.Series.from_list()
+iex> Explorer.Series.divide(s1, 2)
+#Explorer.Series<
+ Polars[3]
+ float [5.0, 5.0, 5.0]
+>
+
+iex> s1 = [10, 52 ,10] |> Explorer.Series.from_list()
+iex> Explorer.Series.divide(s1, 2.5)
+#Explorer.Series<
+ Polars[3]
+ float [4.0, 20.8, 4.0]
+>
+
+iex> s1 = [10, 10, 10] |> Explorer.Series.from_list()
+iex> s2 = [2, 0, 2] |> Explorer.Series.from_list()
+iex> Explorer.Series.divide(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ float [5.0, Inf, 5.0]
+>
@@ -3855,51 +3855,51 @@ equal(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([1, 2, 4])
-iex> Explorer.Series.equal(s1, s2)
-#Explorer.Series<
- Polars[3]
- boolean [true, true, false]
->
-
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.equal(s, 1)
-#Explorer.Series<
- Polars[3]
- boolean [true, false, false]
->
-
-iex> s = Explorer.Series.from_list([true, true, false])
-iex> Explorer.Series.equal(s, true)
-#Explorer.Series<
- Polars[3]
- boolean [true, true, false]
->
-
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.equal(s, "a")
-#Explorer.Series<
- Polars[3]
- boolean [true, false, false]
->
-
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.equal(s, ~D[1999-12-31])
-#Explorer.Series<
- Polars[2]
- boolean [false, true]
->
-
-iex> s = Explorer.Series.from_list([~N[2022-01-01 00:00:00], ~N[2022-01-01 23:00:00]])
-iex> Explorer.Series.equal(s, ~N[2022-01-01 00:00:00])
-#Explorer.Series<
- Polars[2]
- boolean [true, false]
->
-
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.equal(s, false)
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([1, 2, 4])
+iex> Explorer.Series.equal(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, true, false]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.equal(s, 1)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, false]
+>
+
+iex> s = Explorer.Series.from_list([true, true, false])
+iex> Explorer.Series.equal(s, true)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, true, false]
+>
+
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.equal(s, "a")
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, false]
+>
+
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.equal(s, ~D[1999-12-31])
+#Explorer.Series<
+ Polars[2]
+ boolean [false, true]
+>
+
+iex> s = Explorer.Series.from_list([~N[2022-01-01 00:00:00], ~N[2022-01-01 23:00:00]])
+iex> Explorer.Series.equal(s, ~N[2022-01-01 00:00:00])
+#Explorer.Series<
+ Polars[2]
+ boolean [true, false]
+>
+
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.equal(s, false)
** (ArgumentError) cannot invoke Explorer.Series.equal/2 with mismatched dtypes: :string and false
@@ -3973,13 +3973,13 @@ greater(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([1, 2, 4])
-iex> Explorer.Series.greater(s1, s2)
-#Explorer.Series<
- Polars[3]
- boolean [false, false, false]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([1, 2, 4])
+iex> Explorer.Series.greater(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, false, false]
+>
@@ -4024,13 +4024,13 @@ greater_equal(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([1, 2, 4])
-iex> Explorer.Series.greater_equal(s1, s2)
-#Explorer.Series<
- Polars[3]
- boolean [true, true, false]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([1, 2, 4])
+iex> Explorer.Series.greater_equal(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, true, false]
+>
@@ -4058,21 +4058,21 @@ left in right
Examples
-iex> left = Explorer.Series.from_list([1, 2, 3])
-iex> right = Explorer.Series.from_list([1, 2])
-iex> Series.in(left, right)
-#Explorer.Series<
- Polars[3]
- boolean [true, true, false]
->
+iex> left = Explorer.Series.from_list([1, 2, 3])
+iex> right = Explorer.Series.from_list([1, 2])
+iex> Series.in(left, right)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, true, false]
+>
-iex> left = Explorer.Series.from_list([~D[1970-01-01], ~D[2000-01-01], ~D[2010-04-17]])
-iex> right = Explorer.Series.from_list([~D[1970-01-01], ~D[2010-04-17]])
-iex> Series.in(left, right)
-#Explorer.Series<
- Polars[3]
- boolean [true, false, true]
->
+iex> left = Explorer.Series.from_list([~D[1970-01-01], ~D[2000-01-01], ~D[2010-04-17]])
+iex> right = Explorer.Series.from_list([~D[1970-01-01], ~D[2010-04-17]])
+iex> Series.in(left, right)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, true]
+>
@@ -4106,12 +4106,12 @@ is_nil(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.is_nil(s)
-#Explorer.Series<
- Polars[4]
- boolean [false, false, true, false]
->
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.is_nil(s)
+#Explorer.Series<
+ Polars[4]
+ boolean [false, false, true, false]
+>
@@ -4145,12 +4145,12 @@ is_not_nil(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.is_not_nil(s)
-#Explorer.Series<
- Polars[4]
- boolean [true, true, false, true]
->
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.is_not_nil(s)
+#Explorer.Series<
+ Polars[4]
+ boolean [true, true, false, true]
+>
@@ -4195,13 +4195,13 @@ less(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([1, 2, 4])
-iex> Explorer.Series.less(s1, s2)
-#Explorer.Series<
- Polars[3]
- boolean [false, false, true]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([1, 2, 4])
+iex> Explorer.Series.less(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, false, true]
+>
@@ -4246,13 +4246,13 @@ less_equal(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([1, 2, 4])
-iex> Explorer.Series.less_equal(s1, s2)
-#Explorer.Series<
- Polars[3]
- boolean [true, true, true]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([1, 2, 4])
+iex> Explorer.Series.less_equal(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, true, true]
+>
@@ -4293,12 +4293,12 @@ log(s)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3, nil, 4])
-iex> Explorer.Series.log(s)
-#Explorer.Series<
- Polars[5]
- float [0.0, 0.6931471805599453, 1.0986122886681098, nil, 1.3862943611198906]
->
+iex> s = Explorer.Series.from_list([1, 2, 3, nil, 4])
+iex> Explorer.Series.log(s)
+#Explorer.Series<
+ Polars[5]
+ float [0.0, 0.6931471805599453, 1.0986122886681098, nil, 1.3862943611198906]
+>
@@ -4338,12 +4338,12 @@ log(argument, base)
Examples
-iex> s = Explorer.Series.from_list([8, 16, 32])
-iex> Explorer.Series.log(s, 2)
-#Explorer.Series<
- Polars[3]
- float [3.0, 4.0, 5.0]
->
+iex> s = Explorer.Series.from_list([8, 16, 32])
+iex> Explorer.Series.log(s, 2)
+#Explorer.Series<
+ Polars[3]
+ float [3.0, 4.0, 5.0]
+>
@@ -4377,13 +4377,13 @@ mask(series, mask)
Examples
-iex> s1 = Explorer.Series.from_list([1,2,3])
-iex> s2 = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.mask(s1, s2)
-#Explorer.Series<
- Polars[2]
- integer [1, 3]
->
+iex> s1 = Explorer.Series.from_list([1,2,3])
+iex> s2 = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.mask(s1, s2)
+#Explorer.Series<
+ Polars[2]
+ integer [1, 3]
+>
@@ -4425,20 +4425,20 @@ multiply(left, right)
Examples
-iex> s1 = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> s2 = 11..20 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.multiply(s1, s2)
-#Explorer.Series<
- Polars[10]
- integer [11, 24, 39, 56, 75, 96, 119, 144, 171, 200]
->
+iex> s1 = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> s2 = 11..20 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.multiply(s1, s2)
+#Explorer.Series<
+ Polars[10]
+ integer [11, 24, 39, 56, 75, 96, 119, 144, 171, 200]
+>
-iex> s1 = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.multiply(s1, 2)
-#Explorer.Series<
- Polars[5]
- integer [2, 4, 6, 8, 10]
->
+iex> s1 = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.multiply(s1, 2)
+#Explorer.Series<
+ Polars[5]
+ integer [2, 4, 6, 8, 10]
+>
@@ -4466,12 +4466,12 @@ not series
Examples
-iex> s1 = Explorer.Series.from_list([true, false, false])
-iex> Explorer.Series.not(s1)
-#Explorer.Series<
- Polars[3]
- boolean [false, true, true]
->
+iex> s1 = Explorer.Series.from_list([true, false, false])
+iex> Explorer.Series.not(s1)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, true, true]
+>
@@ -4512,51 +4512,51 @@ not_equal(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([1, 2, 4])
-iex> Explorer.Series.not_equal(s1, s2)
-#Explorer.Series<
- Polars[3]
- boolean [false, false, true]
->
-
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.not_equal(s, 1)
-#Explorer.Series<
- Polars[3]
- boolean [false, true, true]
->
-
-iex> s = Explorer.Series.from_list([true, true, false])
-iex> Explorer.Series.not_equal(s, true)
-#Explorer.Series<
- Polars[3]
- boolean [false, false, true]
->
-
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.not_equal(s, "a")
-#Explorer.Series<
- Polars[3]
- boolean [false, true, true]
->
-
-iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
-iex> Explorer.Series.not_equal(s, ~D[1999-12-31])
-#Explorer.Series<
- Polars[2]
- boolean [true, false]
->
-
-iex> s = Explorer.Series.from_list([~N[2022-01-01 00:00:00], ~N[2022-01-01 23:00:00]])
-iex> Explorer.Series.not_equal(s, ~N[2022-01-01 00:00:00])
-#Explorer.Series<
- Polars[2]
- boolean [false, true]
->
-
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.not_equal(s, false)
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([1, 2, 4])
+iex> Explorer.Series.not_equal(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, false, true]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.not_equal(s, 1)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, true, true]
+>
+
+iex> s = Explorer.Series.from_list([true, true, false])
+iex> Explorer.Series.not_equal(s, true)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, false, true]
+>
+
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.not_equal(s, "a")
+#Explorer.Series<
+ Polars[3]
+ boolean [false, true, true]
+>
+
+iex> s = Explorer.Series.from_list([~D[2021-01-01], ~D[1999-12-31]])
+iex> Explorer.Series.not_equal(s, ~D[1999-12-31])
+#Explorer.Series<
+ Polars[2]
+ boolean [true, false]
+>
+
+iex> s = Explorer.Series.from_list([~N[2022-01-01 00:00:00], ~N[2022-01-01 23:00:00]])
+iex> Explorer.Series.not_equal(s, ~N[2022-01-01 00:00:00])
+#Explorer.Series<
+ Polars[2]
+ boolean [false, true]
+>
+
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.not_equal(s, false)
** (ArgumentError) cannot invoke Explorer.Series.not_equal/2 with mismatched dtypes: :string and false
@@ -4586,14 +4586,14 @@ left or right
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> mask1 = Explorer.Series.less(s1, 2)
-iex> mask2 = Explorer.Series.greater(s1, 2)
-iex> Explorer.Series.or(mask1, mask2)
-#Explorer.Series<
- Polars[3]
- boolean [true, false, true]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> mask1 = Explorer.Series.less(s1, 2)
+iex> mask2 = Explorer.Series.greater(s1, 2)
+iex> Explorer.Series.or(mask1, mask2)
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, true]
+>
@@ -4635,19 +4635,19 @@ peaks(series, max_or_min \\ :max)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 4, 1, 4])
-iex> Explorer.Series.peaks(s)
-#Explorer.Series<
- Polars[5]
- boolean [false, false, true, false, true]
->
+iex> s = Explorer.Series.from_list([1, 2, 4, 1, 4])
+iex> Explorer.Series.peaks(s)
+#Explorer.Series<
+ Polars[5]
+ boolean [false, false, true, false, true]
+>
-iex> s = [~T[03:00:02.000000], ~T[13:24:56.000000], ~T[02:04:19.000000]] |> Explorer.Series.from_list()
-iex> Explorer.Series.peaks(s)
-#Explorer.Series<
- Polars[3]
- boolean [false, true, false]
->
+iex> s = [~T[03:00:02.000000], ~T[13:24:56.000000], ~T[02:04:19.000000]] |> Explorer.Series.from_list()
+iex> Explorer.Series.peaks(s)
+#Explorer.Series<
+ Polars[3]
+ boolean [false, true, false]
+>
@@ -4689,40 +4689,40 @@ pow(left, right)
Examples
-iex> s = [8, 16, 32] |> Explorer.Series.from_list()
-iex> Explorer.Series.pow(s, 2.0)
-#Explorer.Series<
- Polars[3]
- float [64.0, 256.0, 1024.0]
->
-
-iex> s = [2, 4, 6] |> Explorer.Series.from_list()
-iex> Explorer.Series.pow(s, 3)
-#Explorer.Series<
- Polars[3]
- integer [8, 64, 216]
->
-
-iex> s = [2, 4, 6] |> Explorer.Series.from_list()
-iex> Explorer.Series.pow(s, -3.0)
-#Explorer.Series<
- Polars[3]
- float [0.125, 0.015625, 0.004629629629629629]
->
-
-iex> s = [1.0, 2.0, 3.0] |> Explorer.Series.from_list()
-iex> Explorer.Series.pow(s, 3.0)
-#Explorer.Series<
- Polars[3]
- float [1.0, 8.0, 27.0]
->
-
-iex> s = [2.0, 4.0, 6.0] |> Explorer.Series.from_list()
-iex> Explorer.Series.pow(s, 2)
-#Explorer.Series<
- Polars[3]
- float [4.0, 16.0, 36.0]
->
+iex> s = [8, 16, 32] |> Explorer.Series.from_list()
+iex> Explorer.Series.pow(s, 2.0)
+#Explorer.Series<
+ Polars[3]
+ float [64.0, 256.0, 1024.0]
+>
+
+iex> s = [2, 4, 6] |> Explorer.Series.from_list()
+iex> Explorer.Series.pow(s, 3)
+#Explorer.Series<
+ Polars[3]
+ integer [8, 64, 216]
+>
+
+iex> s = [2, 4, 6] |> Explorer.Series.from_list()
+iex> Explorer.Series.pow(s, -3.0)
+#Explorer.Series<
+ Polars[3]
+ float [0.125, 0.015625, 0.004629629629629629]
+>
+
+iex> s = [1.0, 2.0, 3.0] |> Explorer.Series.from_list()
+iex> Explorer.Series.pow(s, 3.0)
+#Explorer.Series<
+ Polars[3]
+ float [1.0, 8.0, 27.0]
+>
+
+iex> s = [2.0, 4.0, 6.0] |> Explorer.Series.from_list()
+iex> Explorer.Series.pow(s, 2)
+#Explorer.Series<
+ Polars[3]
+ float [4.0, 16.0, 36.0]
+>
@@ -4764,28 +4764,28 @@ quotient(left, right)
Examples
-iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
-iex> s2 = [2, 2, 2] |> Explorer.Series.from_list()
-iex> Explorer.Series.quotient(s1, s2)
-#Explorer.Series<
- Polars[3]
- integer [5, 5, 5]
->
+iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
+iex> s2 = [2, 2, 2] |> Explorer.Series.from_list()
+iex> Explorer.Series.quotient(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ integer [5, 5, 5]
+>
-iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
-iex> s2 = [2, 2, 0] |> Explorer.Series.from_list()
-iex> Explorer.Series.quotient(s1, s2)
-#Explorer.Series<
- Polars[3]
- integer [5, 5, nil]
->
+iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
+iex> s2 = [2, 2, 0] |> Explorer.Series.from_list()
+iex> Explorer.Series.quotient(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ integer [5, 5, nil]
+>
-iex> s1 = [10, 12, 15] |> Explorer.Series.from_list()
-iex> Explorer.Series.quotient(s1, 3)
-#Explorer.Series<
- Polars[3]
- integer [3, 4, 5]
->
+iex> s1 = [10, 12, 15] |> Explorer.Series.from_list()
+iex> Explorer.Series.quotient(s1, 3)
+#Explorer.Series<
+ Polars[3]
+ integer [3, 4, 5]
+>
@@ -4827,48 +4827,48 @@ rank(series, opts \\ [])
Examples
-iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
-iex> Explorer.Series.rank(s)
-#Explorer.Series<
- Polars[5]
- float [3.0, 4.5, 1.5, 1.5, 4.5]
->
-
-iex> s = Explorer.Series.from_list([1.1, 2.4, 3.2])
-iex> Explorer.Series.rank(s, method: "ordinal")
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
-
-iex> s = Explorer.Series.from_list([ ~N[2022-07-07 17:44:13.020548], ~N[2022-07-07 17:43:08.473561], ~N[2022-07-07 17:45:00.116337] ])
-iex> Explorer.Series.rank(s, method: "average")
-#Explorer.Series<
- Polars[3]
- float [2.0, 1.0, 3.0]
->
-
-iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
-iex> Explorer.Series.rank(s, method: "min")
-#Explorer.Series<
- Polars[5]
- integer [3, 4, 1, 1, 4]
->
-
-iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
-iex> Explorer.Series.rank(s, method: "dense")
-#Explorer.Series<
- Polars[5]
- integer [2, 3, 1, 1, 3]
->
-
-
-iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
-iex> Explorer.Series.rank(s, method: "random", seed: 42)
-#Explorer.Series<
- Polars[5]
- integer [3, 4, 2, 1, 5]
->
+iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
+iex> Explorer.Series.rank(s)
+#Explorer.Series<
+ Polars[5]
+ float [3.0, 4.5, 1.5, 1.5, 4.5]
+>
+
+iex> s = Explorer.Series.from_list([1.1, 2.4, 3.2])
+iex> Explorer.Series.rank(s, method: "ordinal")
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
+
+iex> s = Explorer.Series.from_list([ ~N[2022-07-07 17:44:13.020548], ~N[2022-07-07 17:43:08.473561], ~N[2022-07-07 17:45:00.116337] ])
+iex> Explorer.Series.rank(s, method: "average")
+#Explorer.Series<
+ Polars[3]
+ float [2.0, 1.0, 3.0]
+>
+
+iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
+iex> Explorer.Series.rank(s, method: "min")
+#Explorer.Series<
+ Polars[5]
+ integer [3, 4, 1, 1, 4]
+>
+
+iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
+iex> Explorer.Series.rank(s, method: "dense")
+#Explorer.Series<
+ Polars[5]
+ integer [2, 3, 1, 1, 3]
+>
+
+
+iex> s = Explorer.Series.from_list([3, 6, 1, 1, 6])
+iex> Explorer.Series.rank(s, method: "random", seed: 42)
+#Explorer.Series<
+ Polars[5]
+ integer [3, 4, 2, 1, 5]
+>
@@ -4910,28 +4910,28 @@ remainder(left, right)
Examples
-iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
-iex> s2 = [2, 2, 2] |> Explorer.Series.from_list()
-iex> Explorer.Series.remainder(s1, s2)
-#Explorer.Series<
- Polars[3]
- integer [0, 1, 0]
->
+iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
+iex> s2 = [2, 2, 2] |> Explorer.Series.from_list()
+iex> Explorer.Series.remainder(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ integer [0, 1, 0]
+>
-iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
-iex> s2 = [2, 2, 0] |> Explorer.Series.from_list()
-iex> Explorer.Series.remainder(s1, s2)
-#Explorer.Series<
- Polars[3]
- integer [0, 1, nil]
->
+iex> s1 = [10, 11, 10] |> Explorer.Series.from_list()
+iex> s2 = [2, 2, 0] |> Explorer.Series.from_list()
+iex> Explorer.Series.remainder(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ integer [0, 1, nil]
+>
-iex> s1 = [10, 11, 9] |> Explorer.Series.from_list()
-iex> Explorer.Series.remainder(s1, 3)
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 0]
->
+iex> s1 = [10, 11, 9] |> Explorer.Series.from_list()
+iex> Explorer.Series.remainder(s1, 3)
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 0]
+>
@@ -5001,12 +5001,12 @@ strftime(series, format_string)
Examples
-iex> s = Explorer.Series.from_list([~N[2023-01-05 12:34:56], nil])
-iex> Explorer.Series.strftime(s, "%Y/%m/%d %H:%M:%S")
-#Explorer.Series<
- Polars[2]
- string ["2023/01/05 12:34:56", nil]
->
+iex> s = Explorer.Series.from_list([~N[2023-01-05 12:34:56], nil])
+iex> Explorer.Series.strftime(s, "%Y/%m/%d %H:%M:%S")
+#Explorer.Series<
+ Polars[2]
+ string ["2023/01/05 12:34:56", nil]
+>
@@ -5041,12 +5041,12 @@ strptime(series, format_string)
Examples
-iex> s = Explorer.Series.from_list(["2023-01-05 12:34:56", "XYZ", nil])
-iex> Explorer.Series.strptime(s, "%Y-%m-%d %H:%M:%S")
-#Explorer.Series<
- Polars[3]
- datetime [2023-01-05 12:34:56.000000, nil, nil]
->
+iex> s = Explorer.Series.from_list(["2023-01-05 12:34:56", "XYZ", nil])
+iex> Explorer.Series.strptime(s, "%Y-%m-%d %H:%M:%S")
+#Explorer.Series<
+ Polars[3]
+ datetime [2023-01-05 12:34:56.000000, nil, nil]
+>
@@ -5088,25 +5088,25 @@ subtract(left, right)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([4, 5, 6])
-iex> Explorer.Series.subtract(s1, s2)
-#Explorer.Series<
- Polars[3]
- integer [-3, -3, -3]
->
You can also use scalar values on both sides:
iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.subtract(s1, 2)
-#Explorer.Series<
- Polars[3]
- integer [-1, 0, 1]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([4, 5, 6])
+iex> Explorer.Series.subtract(s1, s2)
+#Explorer.Series<
+ Polars[3]
+ integer [-3, -3, -3]
+>
You can also use scalar values on both sides:
iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.subtract(s1, 2)
+#Explorer.Series<
+ Polars[3]
+ integer [-1, 0, 1]
+>
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.subtract(2, s1)
-#Explorer.Series<
- Polars[3]
- integer [1, 0, -1]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.subtract(2, s1)
+#Explorer.Series<
+ Polars[3]
+ integer [1, 0, -1]
+>
@@ -5138,19 +5138,19 @@ transform(series, fun)
Examples
-iex> s = Explorer.Series.from_list(["this ", " is", "great "])
-iex> Explorer.Series.transform(s, &String.trim/1)
-#Explorer.Series<
- Polars[3]
- string ["this", "is", "great"]
->
+iex> s = Explorer.Series.from_list(["this ", " is", "great "])
+iex> Explorer.Series.transform(s, &String.trim/1)
+#Explorer.Series<
+ Polars[3]
+ string ["this", "is", "great"]
+>
-iex> s = Explorer.Series.from_list(["this", "is", "great"])
-iex> Explorer.Series.transform(s, &String.length/1)
-#Explorer.Series<
- Polars[3]
- integer [4, 2, 5]
->
+iex> s = Explorer.Series.from_list(["this", "is", "great"])
+iex> Explorer.Series.transform(s, &String.length/1)
+#Explorer.Series<
+ Polars[3]
+ integer [4, 2, 5]
+>
@@ -5196,17 +5196,17 @@ day_of_week(series)
Examples
-iex> s = Explorer.Series.from_list([~D[2023-01-15], ~D[2023-01-16], ~D[2023-01-20], nil])
-iex> Explorer.Series.day_of_week(s)
-#Explorer.Series<
- Polars[4]
- integer [7, 1, 5, nil]
->
It can also be called on a datetime series.
iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2023-01-16 23:59:59.999999], ~N[2023-01-20 12:00:00], nil])
-iex> Explorer.Series.day_of_week(s)
-#Explorer.Series<
- Polars[4]
- integer [7, 1, 5, nil]
->
+iex> s = Explorer.Series.from_list([~D[2023-01-15], ~D[2023-01-16], ~D[2023-01-20], nil])
+iex> Explorer.Series.day_of_week(s)
+#Explorer.Series<
+ Polars[4]
+ integer [7, 1, 5, nil]
+>
It can also be called on a datetime series.
iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2023-01-16 23:59:59.999999], ~N[2023-01-20 12:00:00], nil])
+iex> Explorer.Series.day_of_week(s)
+#Explorer.Series<
+ Polars[4]
+ integer [7, 1, 5, nil]
+>
@@ -5240,12 +5240,12 @@ hour(series)
Examples
-iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
-iex> Explorer.Series.hour(s)
-#Explorer.Series<
- Polars[4]
- integer [0, 23, 12, nil]
->
+iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
+iex> Explorer.Series.hour(s)
+#Explorer.Series<
+ Polars[4]
+ integer [0, 23, 12, nil]
+>
@@ -5279,12 +5279,12 @@ minute(series)
Examples
-iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
-iex> Explorer.Series.minute(s)
-#Explorer.Series<
- Polars[4]
- integer [0, 59, 0, nil]
->
+iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
+iex> Explorer.Series.minute(s)
+#Explorer.Series<
+ Polars[4]
+ integer [0, 59, 0, nil]
+>
@@ -5318,17 +5318,17 @@ month(series)
Examples
-iex> s = Explorer.Series.from_list([~D[2023-01-15], ~D[2023-02-16], ~D[2023-03-20], nil])
-iex> Explorer.Series.month(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 3, nil]
->
It can also be called on a datetime series.
iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2023-02-16 23:59:59.999999], ~N[2023-03-20 12:00:00], nil])
-iex> Explorer.Series.month(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 3, nil]
->
+iex> s = Explorer.Series.from_list([~D[2023-01-15], ~D[2023-02-16], ~D[2023-03-20], nil])
+iex> Explorer.Series.month(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 3, nil]
+>
It can also be called on a datetime series.
iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2023-02-16 23:59:59.999999], ~N[2023-03-20 12:00:00], nil])
+iex> Explorer.Series.month(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 3, nil]
+>
@@ -5362,12 +5362,12 @@ second(series)
Examples
-iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
-iex> Explorer.Series.second(s)
-#Explorer.Series<
- Polars[4]
- integer [0, 59, 0, nil]
->
+iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
+iex> Explorer.Series.second(s)
+#Explorer.Series<
+ Polars[4]
+ integer [0, 59, 0, nil]
+>
@@ -5401,17 +5401,17 @@ year(series)
Examples
-iex> s = Explorer.Series.from_list([~D[2023-01-15], ~D[2022-02-16], ~D[2021-03-20], nil])
-iex> Explorer.Series.year(s)
-#Explorer.Series<
- Polars[4]
- integer [2023, 2022, 2021, nil]
->
It can also be called on a datetime series.
iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
-iex> Explorer.Series.year(s)
-#Explorer.Series<
- Polars[4]
- integer [2023, 2022, 2021, nil]
->
+iex> s = Explorer.Series.from_list([~D[2023-01-15], ~D[2022-02-16], ~D[2021-03-20], nil])
+iex> Explorer.Series.year(s)
+#Explorer.Series<
+ Polars[4]
+ integer [2023, 2022, 2021, nil]
+>
It can also be called on a datetime series.
iex> s = Explorer.Series.from_list([~N[2023-01-15 00:00:00], ~N[2022-02-16 23:59:59.999999], ~N[2021-03-20 12:00:00], nil])
+iex> Explorer.Series.year(s)
+#Explorer.Series<
+ Polars[4]
+ integer [2023, 2022, 2021, nil]
+>
@@ -5464,12 +5464,12 @@ acos(series)
Examples
-iex> s = [1.0, 0.0, -1.0, -0.7071067811865475, 0.7071067811865475] |> Explorer.Series.from_list()
-iex> Explorer.Series.acos(s)
-#Explorer.Series<
- Polars[5]
- float [0.0, 1.5707963267948966, 3.141592653589793, 2.356194490192345, 0.7853981633974484]
->
+iex> s = [1.0, 0.0, -1.0, -0.7071067811865475, 0.7071067811865475] |> Explorer.Series.from_list()
+iex> Explorer.Series.acos(s)
+#Explorer.Series<
+ Polars[5]
+ float [0.0, 1.5707963267948966, 3.141592653589793, 2.356194490192345, 0.7853981633974484]
+>
@@ -5510,12 +5510,12 @@ asin(series)
Examples
-iex> s = [1.0, 0.0, -1.0, -0.7071067811865475, 0.7071067811865475] |> Explorer.Series.from_list()
-iex> Explorer.Series.asin(s)
-#Explorer.Series<
- Polars[5]
- float [1.5707963267948966, 0.0, -1.5707963267948966, -0.7853981633974482, 0.7853981633974482]
->
+iex> s = [1.0, 0.0, -1.0, -0.7071067811865475, 0.7071067811865475] |> Explorer.Series.from_list()
+iex> Explorer.Series.asin(s)
+#Explorer.Series<
+ Polars[5]
+ float [1.5707963267948966, 0.0, -1.5707963267948966, -0.7853981633974482, 0.7853981633974482]
+>
@@ -5556,12 +5556,12 @@ atan(series)
Examples
-iex> s = [1.0, 0.0, -1.0, -0.7071067811865475, 0.7071067811865475] |> Explorer.Series.from_list()
-iex> Explorer.Series.atan(s)
-#Explorer.Series<
- Polars[5]
- float [0.7853981633974483, 0.0, -0.7853981633974483, -0.6154797086703873, 0.6154797086703873]
->
+iex> s = [1.0, 0.0, -1.0, -0.7071067811865475, 0.7071067811865475] |> Explorer.Series.from_list()
+iex> Explorer.Series.atan(s)
+#Explorer.Series<
+ Polars[5]
+ float [0.7853981633974483, 0.0, -0.7853981633974483, -0.6154797086703873, 0.6154797086703873]
+>
@@ -5595,12 +5595,12 @@ ceil(series)
Examples
-iex> s = Explorer.Series.from_list([1.124993, 2.555321, 3.995001])
-iex> Explorer.Series.ceil(s)
-#Explorer.Series<
- Polars[3]
- float [2.0, 3.0, 4.0]
->
+iex> s = Explorer.Series.from_list([1.124993, 2.555321, 3.995001])
+iex> Explorer.Series.ceil(s)
+#Explorer.Series<
+ Polars[3]
+ float [2.0, 3.0, 4.0]
+>
@@ -5641,13 +5641,13 @@ cos(series)
Examples
-iex> pi = :math.pi()
-iex> s = [-pi * 3/2, -pi, -pi / 2, -pi / 4, 0, pi / 4, pi / 2, pi, pi * 3/2] |> Explorer.Series.from_list()
-iex> Explorer.Series.cos(s)
-#Explorer.Series<
- Polars[9]
- float [-1.8369701987210297e-16, -1.0, 6.123233995736766e-17, 0.7071067811865476, 1.0, 0.7071067811865476, 6.123233995736766e-17, -1.0, -1.8369701987210297e-16]
->
+iex> pi = :math.pi()
+iex> s = [-pi * 3/2, -pi, -pi / 2, -pi / 4, 0, pi / 4, pi / 2, pi, pi * 3/2] |> Explorer.Series.from_list()
+iex> Explorer.Series.cos(s)
+#Explorer.Series<
+ Polars[9]
+ float [-1.8369701987210297e-16, -1.0, 6.123233995736766e-17, 0.7071067811865476, 1.0, 0.7071067811865476, 6.123233995736766e-17, -1.0, -1.8369701987210297e-16]
+>
@@ -5681,12 +5681,12 @@ floor(series)
Examples
-iex> s = Explorer.Series.from_list([1.124993, 2.555321, 3.995001])
-iex> Explorer.Series.floor(s)
-#Explorer.Series<
- Polars[3]
- float [1.0, 2.0, 3.0]
->
+iex> s = Explorer.Series.from_list([1.124993, 2.555321, 3.995001])
+iex> Explorer.Series.floor(s)
+#Explorer.Series<
+ Polars[3]
+ float [1.0, 2.0, 3.0]
+>
@@ -5720,14 +5720,14 @@ is_finite(series)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 0, nil])
-iex> s2 = Explorer.Series.from_list([0, 2, 0, nil])
-iex> s3 = Explorer.Series.divide(s1, s2)
-iex> Explorer.Series.is_finite(s3)
-#Explorer.Series<
- Polars[4]
- boolean [false, true, false, nil]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 0, nil])
+iex> s2 = Explorer.Series.from_list([0, 2, 0, nil])
+iex> s3 = Explorer.Series.divide(s1, s2)
+iex> Explorer.Series.is_finite(s3)
+#Explorer.Series<
+ Polars[4]
+ boolean [false, true, false, nil]
+>
@@ -5761,14 +5761,14 @@ is_infinite(series)
Examples
-iex> s1 = Explorer.Series.from_list([1, -1, 2, 0, nil])
-iex> s2 = Explorer.Series.from_list([0, 0, 2, 0, nil])
-iex> s3 = Explorer.Series.divide(s1, s2)
-iex> Explorer.Series.is_infinite(s3)
-#Explorer.Series<
- Polars[5]
- boolean [true, true, false, false, nil]
->
+iex> s1 = Explorer.Series.from_list([1, -1, 2, 0, nil])
+iex> s2 = Explorer.Series.from_list([0, 0, 2, 0, nil])
+iex> s3 = Explorer.Series.divide(s1, s2)
+iex> Explorer.Series.is_infinite(s3)
+#Explorer.Series<
+ Polars[5]
+ boolean [true, true, false, false, nil]
+>
@@ -5802,14 +5802,14 @@ is_nan(series)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 0, nil])
-iex> s2 = Explorer.Series.from_list([0, 2, 0, nil])
-iex> s3 = Explorer.Series.divide(s1, s2)
-iex> Explorer.Series.is_nan(s3)
-#Explorer.Series<
- Polars[4]
- boolean [false, false, true, nil]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 0, nil])
+iex> s2 = Explorer.Series.from_list([0, 2, 0, nil])
+iex> s3 = Explorer.Series.divide(s1, s2)
+iex> Explorer.Series.is_nan(s3)
+#Explorer.Series<
+ Polars[4]
+ boolean [false, false, true, nil]
+>
@@ -5843,12 +5843,12 @@ round(series, decimals)
Examples
-iex> s = Explorer.Series.from_list([1.124993, 2.555321, 3.995001])
-iex> Explorer.Series.round(s, 2)
-#Explorer.Series<
- Polars[3]
- float [1.12, 2.56, 4.0]
->
+iex> s = Explorer.Series.from_list([1.124993, 2.555321, 3.995001])
+iex> Explorer.Series.round(s, 2)
+#Explorer.Series<
+ Polars[3]
+ float [1.12, 2.56, 4.0]
+>
@@ -5889,13 +5889,13 @@ sin(series)
Examples
-iex> pi = :math.pi()
-iex> s = [-pi * 3/2, -pi, -pi / 2, -pi / 4, 0, pi / 4, pi / 2, pi, pi * 3/2] |> Explorer.Series.from_list()
-iex> Explorer.Series.sin(s)
-#Explorer.Series<
- Polars[9]
- float [1.0, -1.2246467991473532e-16, -1.0, -0.7071067811865475, 0.0, 0.7071067811865475, 1.0, 1.2246467991473532e-16, -1.0]
->
+iex> pi = :math.pi()
+iex> s = [-pi * 3/2, -pi, -pi / 2, -pi / 4, 0, pi / 4, pi / 2, pi, pi * 3/2] |> Explorer.Series.from_list()
+iex> Explorer.Series.sin(s)
+#Explorer.Series<
+ Polars[9]
+ float [1.0, -1.2246467991473532e-16, -1.0, -0.7071067811865475, 0.0, 0.7071067811865475, 1.0, 1.2246467991473532e-16, -1.0]
+>
@@ -5936,13 +5936,13 @@ tan(series)
Examples
-iex> pi = :math.pi()
-iex> s = [-pi * 3/2, -pi, -pi / 2, -pi / 4, 0, pi / 4, pi / 2, pi, pi * 3/2] |> Explorer.Series.from_list()
-iex> Explorer.Series.tan(s)
-#Explorer.Series<
- Polars[9]
- float [-5443746451065123.0, 1.2246467991473532e-16, -1.633123935319537e16, -0.9999999999999999, 0.0, 0.9999999999999999, 1.633123935319537e16, -1.2246467991473532e-16, 5443746451065123.0]
->
+iex> pi = :math.pi()
+iex> s = [-pi * 3/2, -pi, -pi / 2, -pi / 4, 0, pi / 4, pi / 2, pi, pi * 3/2] |> Explorer.Series.from_list()
+iex> Explorer.Series.tan(s)
+#Explorer.Series<
+ Polars[9]
+ float [-5443746451065123.0, 1.2246467991473532e-16, -1.633123935319537e16, -0.9999999999999999, 0.0, 0.9999999999999999, 1.633123935319537e16, -1.2246467991473532e-16, 5443746451065123.0]
+>
@@ -5988,12 +5988,12 @@ contains(series, pattern)
Examples
-iex> s = Explorer.Series.from_list(["abc", "def", "bcd"])
-iex> Explorer.Series.contains(s, "bc")
-#Explorer.Series<
- Polars[3]
- boolean [true, false, true]
->
+iex> s = Explorer.Series.from_list(["abc", "def", "bcd"])
+iex> Explorer.Series.contains(s, "bc")
+#Explorer.Series<
+ Polars[3]
+ boolean [true, false, true]
+>
@@ -6027,12 +6027,12 @@ downcase(series)
Examples
-iex> s = Explorer.Series.from_list(["ABC", "DEF", "BCD"])
-iex> Explorer.Series.downcase(s)
-#Explorer.Series<
- Polars[3]
- string ["abc", "def", "bcd"]
->
+iex> s = Explorer.Series.from_list(["ABC", "DEF", "BCD"])
+iex> Explorer.Series.downcase(s)
+#Explorer.Series<
+ Polars[3]
+ string ["abc", "def", "bcd"]
+>
@@ -6066,12 +6066,12 @@ trim(series)
Examples
-iex> s = Explorer.Series.from_list([" abc", "def ", " bcd"])
-iex> Explorer.Series.trim(s)
-#Explorer.Series<
- Polars[3]
- string ["abc", "def", "bcd"]
->
+iex> s = Explorer.Series.from_list([" abc", "def ", " bcd"])
+iex> Explorer.Series.trim(s)
+#Explorer.Series<
+ Polars[3]
+ string ["abc", "def", "bcd"]
+>
@@ -6105,12 +6105,12 @@ trim_leading(series)
Examples
-iex> s = Explorer.Series.from_list([" abc", "def ", " bcd"])
-iex> Explorer.Series.trim_leading(s)
-#Explorer.Series<
- Polars[3]
- string ["abc", "def ", "bcd"]
->
+iex> s = Explorer.Series.from_list([" abc", "def ", " bcd"])
+iex> Explorer.Series.trim_leading(s)
+#Explorer.Series<
+ Polars[3]
+ string ["abc", "def ", "bcd"]
+>
@@ -6144,12 +6144,12 @@ trim_trailing(series)
Examples
-iex> s = Explorer.Series.from_list([" abc", "def ", " bcd"])
-iex> Explorer.Series.trim_trailing(s)
-#Explorer.Series<
- Polars[3]
- string [" abc", "def", " bcd"]
->
+iex> s = Explorer.Series.from_list([" abc", "def ", " bcd"])
+iex> Explorer.Series.trim_trailing(s)
+#Explorer.Series<
+ Polars[3]
+ string [" abc", "def", " bcd"]
+>
@@ -6183,12 +6183,12 @@ upcase(series)
Examples
-iex> s = Explorer.Series.from_list(["abc", "def", "bcd"])
-iex> Explorer.Series.upcase(s)
-#Explorer.Series<
- Polars[3]
- string ["ABC", "DEF", "BCD"]
->
+iex> s = Explorer.Series.from_list(["abc", "def", "bcd"])
+iex> Explorer.Series.upcase(s)
+#Explorer.Series<
+ Polars[3]
+ string ["ABC", "DEF", "BCD"]
+>
@@ -6235,19 +6235,19 @@ categories(series)
Examples
-iex> s = Explorer.Series.from_list(["a", "b", "c", nil, "a", "c"], dtype: :category)
-iex> Explorer.Series.categories(s)
-#Explorer.Series<
- Polars[3]
- string ["a", "b", "c"]
->
+iex> s = Explorer.Series.from_list(["a", "b", "c", nil, "a", "c"], dtype: :category)
+iex> Explorer.Series.categories(s)
+#Explorer.Series<
+ Polars[3]
+ string ["a", "b", "c"]
+>
-iex> s = Explorer.Series.from_list(["c", "a", "b"], dtype: :category)
-iex> Explorer.Series.categories(s)
-#Explorer.Series<
- Polars[3]
- string ["c", "a", "b"]
->
+iex> s = Explorer.Series.from_list(["c", "a", "b"], dtype: :category)
+iex> Explorer.Series.categories(s)
+#Explorer.Series<
+ Polars[3]
+ string ["c", "a", "b"]
+>
@@ -6281,12 +6281,12 @@ dtype(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3])
-iex> Explorer.Series.dtype(s)
+iex> s = Explorer.Series.from_list([1, 2, 3])
+iex> Explorer.Series.dtype(s)
:integer
-iex> s = Explorer.Series.from_list(["a", nil, "b", "c"])
-iex> Explorer.Series.dtype(s)
+iex> s = Explorer.Series.from_list(["a", nil, "b", "c"])
+iex> Explorer.Series.dtype(s)
:string
@@ -6322,31 +6322,31 @@ iotype(series)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3, 4])
-iex> Explorer.Series.iotype(s)
-{:s, 64}
+iex> s = Explorer.Series.from_list([1, 2, 3, 4])
+iex> Explorer.Series.iotype(s)
+{:s, 64}
-iex> s = Explorer.Series.from_list([~D[1999-12-31], ~D[1989-01-01]])
-iex> Explorer.Series.iotype(s)
-{:s, 32}
+iex> s = Explorer.Series.from_list([~D[1999-12-31], ~D[1989-01-01]])
+iex> Explorer.Series.iotype(s)
+{:s, 32}
-iex> s = Explorer.Series.from_list([~T[00:00:00.000000], ~T[23:59:59.999999]])
-iex> Explorer.Series.iotype(s)
-{:s, 64}
+iex> s = Explorer.Series.from_list([~T[00:00:00.000000], ~T[23:59:59.999999]])
+iex> Explorer.Series.iotype(s)
+{:s, 64}
-iex> s = Explorer.Series.from_list([1.2, 2.3, 3.5, 4.5])
-iex> Explorer.Series.iotype(s)
-{:f, 64}
+iex> s = Explorer.Series.from_list([1.2, 2.3, 3.5, 4.5])
+iex> Explorer.Series.iotype(s)
+{:f, 64}
-iex> s = Explorer.Series.from_list([true, false, true])
-iex> Explorer.Series.iotype(s)
-{:u, 8}
The operation returns :none
for strings and binaries, as they do not
-provide a fixed-width binary representation:
iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.iotype(s)
+iex> s = Explorer.Series.from_list([true, false, true])
+iex> Explorer.Series.iotype(s)
+{:u, 8}
The operation returns :none
for strings and binaries, as they do not
+provide a fixed-width binary representation:
iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.iotype(s)
:none
However, if appropriate, you can convert them to categorical types,
-which will then return the index of each category:
iex> s = Explorer.Series.from_list(["a", "b", "c"], dtype: :category)
-iex> Explorer.Series.iotype(s)
-{:u, 32}
+which will then return the index of each category:iex> s = Explorer.Series.from_list(["a", "b", "c"], dtype: :category)
+iex> Explorer.Series.iotype(s)
+{:u, 32}
@@ -6380,8 +6380,8 @@ size(series)
Examples
-iex> s = Explorer.Series.from_list([~D[1999-12-31], ~D[1989-01-01]])
-iex> Explorer.Series.size(s)
+iex> s = Explorer.Series.from_list([~D[1999-12-31], ~D[1989-01-01]])
+iex> Explorer.Series.size(s)
2
@@ -6432,19 +6432,19 @@ argsort(series, opts \\ [])
Examples
-iex> s = Explorer.Series.from_list([9, 3, 7, 1])
-iex> Explorer.Series.argsort(s)
-#Explorer.Series<
- Polars[4]
- integer [3, 1, 2, 0]
->
+iex> s = Explorer.Series.from_list([9, 3, 7, 1])
+iex> Explorer.Series.argsort(s)
+#Explorer.Series<
+ Polars[4]
+ integer [3, 1, 2, 0]
+>
-iex> s = Explorer.Series.from_list([9, 3, 7, 1])
-iex> Explorer.Series.argsort(s, direction: :desc)
-#Explorer.Series<
- Polars[4]
- integer [0, 2, 1, 3]
->
+iex> s = Explorer.Series.from_list([9, 3, 7, 1])
+iex> Explorer.Series.argsort(s, direction: :desc)
+#Explorer.Series<
+ Polars[4]
+ integer [0, 2, 1, 3]
+>
@@ -6479,12 +6479,12 @@ at(series, idx)
Examples
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.at(s, 2)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.at(s, 2)
"c"
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.at(s, 4)
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.at(s, 4)
** (ArgumentError) index 4 out of bounds for series of size 3
@@ -6519,17 +6519,17 @@ at_every(series, every_n)
Examples
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.at_every(s, 2)
-#Explorer.Series<
- Polars[5]
- integer [1, 3, 5, 7, 9]
->
If n is bigger than the size of the series, the result is a new series with only the first value of the supplied series.
iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.at_every(s, 20)
-#Explorer.Series<
- Polars[1]
- integer [1]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.at_every(s, 2)
+#Explorer.Series<
+ Polars[5]
+ integer [1, 3, 5, 7, 9]
+>
If n is bigger than the size of the series, the result is a new series with only the first value of the supplied series.
iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.at_every(s, 20)
+#Explorer.Series<
+ Polars[1]
+ integer [1]
+>
@@ -6563,21 +6563,21 @@ concat(series)
Examples
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([4, 5, 6])
-iex> Explorer.Series.concat([s1, s2])
-#Explorer.Series<
- Polars[6]
- integer [1, 2, 3, 4, 5, 6]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([4, 5, 6])
+iex> Explorer.Series.concat([s1, s2])
+#Explorer.Series<
+ Polars[6]
+ integer [1, 2, 3, 4, 5, 6]
+>
-iex> s1 = Explorer.Series.from_list([1, 2, 3])
-iex> s2 = Explorer.Series.from_list([4.0, 5.0, 6.4])
-iex> Explorer.Series.concat([s1, s2])
-#Explorer.Series<
- Polars[6]
- float [1.0, 2.0, 3.0, 4.0, 5.0, 6.4]
->
+iex> s1 = Explorer.Series.from_list([1, 2, 3])
+iex> s2 = Explorer.Series.from_list([4.0, 5.0, 6.4])
+iex> Explorer.Series.concat([s1, s2])
+#Explorer.Series<
+ Polars[6]
+ float [1.0, 2.0, 3.0, 4.0, 5.0, 6.4]
+>
@@ -6633,12 +6633,12 @@ distinct(series)
Examples
-iex> s = [1, 1, 2, 2, 3, 3] |> Explorer.Series.from_list()
-iex> Explorer.Series.distinct(s)
-#Explorer.Series<
- Polars[3]
- integer [1, 2, 3]
->
+iex> s = [1, 1, 2, 2, 3, 3] |> Explorer.Series.from_list()
+iex> Explorer.Series.distinct(s)
+#Explorer.Series<
+ Polars[3]
+ integer [1, 2, 3]
+>
@@ -6672,8 +6672,8 @@ first(series)
Examples
-iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.first(s)
+iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.first(s)
1
@@ -6708,27 +6708,27 @@ format(list)
Examples
-iex> s1 = Explorer.Series.from_list(["a", "b", "c"])
-iex> s2 = Explorer.Series.from_list(["d", "e", "f"])
-iex> s3 = Explorer.Series.from_list(["g", "h", "i"])
-iex> Explorer.Series.format([s1, s2, s3])
-#Explorer.Series<
- Polars[3]
- string ["adg", "beh", "cfi"]
->
-
-iex> s1 = Explorer.Series.from_list(["a", "b", "c", "d"])
-iex> s2 = Explorer.Series.from_list([1, 2, 3, 4])
-iex> s3 = Explorer.Series.from_list([1.5, :nan, :infinity, :neg_infinity])
-iex> Explorer.Series.format([s1, "/", s2, "/", s3])
-#Explorer.Series<
- Polars[4]
- string ["a/1/1.5", "b/2/NaN", "c/3/inf", "d/4/-inf"]
->
-
-iex> s1 = Explorer.Series.from_list([<<1>>, <<239, 191, 19>>], dtype: :binary)
-iex> s2 = Explorer.Series.from_list([<<3>>, <<4>>], dtype: :binary)
-iex> Explorer.Series.format([s1, s2])
+iex> s1 = Explorer.Series.from_list(["a", "b", "c"])
+iex> s2 = Explorer.Series.from_list(["d", "e", "f"])
+iex> s3 = Explorer.Series.from_list(["g", "h", "i"])
+iex> Explorer.Series.format([s1, s2, s3])
+#Explorer.Series<
+ Polars[3]
+ string ["adg", "beh", "cfi"]
+>
+
+iex> s1 = Explorer.Series.from_list(["a", "b", "c", "d"])
+iex> s2 = Explorer.Series.from_list([1, 2, 3, 4])
+iex> s3 = Explorer.Series.from_list([1.5, :nan, :infinity, :neg_infinity])
+iex> Explorer.Series.format([s1, "/", s2, "/", s3])
+#Explorer.Series<
+ Polars[4]
+ string ["a/1/1.5", "b/2/NaN", "c/3/inf", "d/4/-inf"]
+>
+
+iex> s1 = Explorer.Series.from_list([<<1>>, <<239, 191, 19>>], dtype: :binary)
+iex> s2 = Explorer.Series.from_list([<<3>>, <<4>>], dtype: :binary)
+iex> Explorer.Series.format([s1, s2])
** (RuntimeError) Polars Error: External error: invalid utf-8 sequence
@@ -6765,12 +6765,12 @@ head(series, n_elements \\ 10)
Examples
-iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.head(s)
-#Explorer.Series<
- Polars[10]
- integer [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
->
+iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.head(s)
+#Explorer.Series<
+ Polars[10]
+ integer [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+>
@@ -6804,8 +6804,8 @@ last(series)
Examples
-iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.last(s)
+iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.last(s)
100
@@ -6834,12 +6834,12 @@ reverse(series)
Example
-iex> s = [1, 2, 3] |> Explorer.Series.from_list()
-iex> Explorer.Series.reverse(s)
-#Explorer.Series<
- Polars[3]
- integer [3, 2, 1]
->
+iex> s = [1, 2, 3] |> Explorer.Series.from_list()
+iex> Explorer.Series.reverse(s)
+#Explorer.Series<
+ Polars[3]
+ integer [3, 2, 1]
+>
@@ -6884,47 +6884,47 @@ sample(series, n_or_frac, opts \\ [])
Examples
-iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.sample(s, 10, seed: 100)
-#Explorer.Series<
- Polars[10]
- integer [55, 51, 33, 26, 5, 32, 62, 31, 9, 25]
->
-
-iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.sample(s, 0.05, seed: 100)
-#Explorer.Series<
- Polars[5]
- integer [49, 77, 96, 19, 18]
->
-
-iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.sample(s, 7, seed: 100, replace: true)
-#Explorer.Series<
- Polars[7]
- integer [4, 1, 3, 4, 3, 4, 2]
->
-
-iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.sample(s, 1.2, seed: 100, replace: true)
-#Explorer.Series<
- Polars[6]
- integer [4, 1, 3, 4, 3, 4]
->
-
-iex> s = 0..9 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.sample(s, 1.0, seed: 100, shuffle: false)
-#Explorer.Series<
- Polars[10]
- integer [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
->
-
-iex> s = 0..9 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.sample(s, 1.0, seed: 100, shuffle: true)
-#Explorer.Series<
- Polars[10]
- integer [7, 9, 2, 0, 4, 1, 3, 8, 5, 6]
->
+iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.sample(s, 10, seed: 100)
+#Explorer.Series<
+ Polars[10]
+ integer [55, 51, 33, 26, 5, 32, 62, 31, 9, 25]
+>
+
+iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.sample(s, 0.05, seed: 100)
+#Explorer.Series<
+ Polars[5]
+ integer [49, 77, 96, 19, 18]
+>
+
+iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.sample(s, 7, seed: 100, replace: true)
+#Explorer.Series<
+ Polars[7]
+ integer [4, 1, 3, 4, 3, 4, 2]
+>
+
+iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.sample(s, 1.2, seed: 100, replace: true)
+#Explorer.Series<
+ Polars[6]
+ integer [4, 1, 3, 4, 3, 4]
+>
+
+iex> s = 0..9 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.sample(s, 1.0, seed: 100, shuffle: false)
+#Explorer.Series<
+ Polars[10]
+ integer [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+>
+
+iex> s = 0..9 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.sample(s, 1.0, seed: 100, shuffle: true)
+#Explorer.Series<
+ Polars[10]
+ integer [7, 9, 2, 0, 4, 1, 3, 8, 5, 6]
+>
@@ -6958,19 +6958,19 @@ shift(series, offset)
Examples
-iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.shift(s, 2)
-#Explorer.Series<
- Polars[5]
- integer [nil, nil, 1, 2, 3]
->
+iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.shift(s, 2)
+#Explorer.Series<
+ Polars[5]
+ integer [nil, nil, 1, 2, 3]
+>
-iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.shift(s, -2)
-#Explorer.Series<
- Polars[5]
- integer [3, 4, 5, nil, nil]
->
+iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.shift(s, -2)
+#Explorer.Series<
+ Polars[5]
+ integer [3, 4, 5, nil, nil]
+>
@@ -7013,12 +7013,12 @@ shuffle(series, opts \\ [])
Examples
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.shuffle(s, seed: 100)
-#Explorer.Series<
- Polars[10]
- integer [8, 10, 3, 1, 5, 2, 4, 9, 6, 7]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.shuffle(s, seed: 100)
+#Explorer.Series<
+ Polars[10]
+ integer [8, 10, 3, 1, 5, 2, 4, 9, 6, 7]
+>
@@ -7055,33 +7055,33 @@ slice(series, indices)
Examples
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.slice(s, [0, 2])
-#Explorer.Series<
- Polars[2]
- string ["a", "c"]
->
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.slice(s, [0, 2])
+#Explorer.Series<
+ Polars[2]
+ string ["a", "c"]
+>
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.slice(s, 1..2)
-#Explorer.Series<
- Polars[2]
- string ["b", "c"]
->
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.slice(s, 1..2)
+#Explorer.Series<
+ Polars[2]
+ string ["b", "c"]
+>
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.slice(s, -2..-1)
-#Explorer.Series<
- Polars[2]
- string ["b", "c"]
->
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.slice(s, -2..-1)
+#Explorer.Series<
+ Polars[2]
+ string ["b", "c"]
+>
-iex> s = Explorer.Series.from_list(["a", "b", "c"])
-iex> Explorer.Series.slice(s, 3..2//1)
-#Explorer.Series<
- Polars[0]
- string []
->
+iex> s = Explorer.Series.from_list(["a", "b", "c"])
+iex> Explorer.Series.slice(s, 3..2//1)
+#Explorer.Series<
+ Polars[0]
+ string []
+>
@@ -7115,29 +7115,29 @@ slice(series, offset, size)
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
-iex> Explorer.Series.slice(s, 1, 2)
-#Explorer.Series<
- Polars[2]
- integer [2, 3]
->
Negative offsets count from the end of the series:
iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
-iex> Explorer.Series.slice(s, -3, 2)
-#Explorer.Series<
- Polars[2]
- integer [3, 4]
->
If the offset runs past the end of the series,
-the series is empty:
iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
-iex> Explorer.Series.slice(s, 10, 3)
-#Explorer.Series<
- Polars[0]
- integer []
->
If the size runs past the end of the series,
-the result may be shorter than the size:
iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
-iex> Explorer.Series.slice(s, -3, 4)
-#Explorer.Series<
- Polars[3]
- integer [3, 4, 5]
->
+iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
+iex> Explorer.Series.slice(s, 1, 2)
+#Explorer.Series<
+ Polars[2]
+ integer [2, 3]
+>
Negative offsets count from the end of the series:
iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
+iex> Explorer.Series.slice(s, -3, 2)
+#Explorer.Series<
+ Polars[2]
+ integer [3, 4]
+>
If the offset runs past the end of the series,
+the series is empty:
iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
+iex> Explorer.Series.slice(s, 10, 3)
+#Explorer.Series<
+ Polars[0]
+ integer []
+>
If the size runs past the end of the series,
+the result may be shorter than the size:
iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5])
+iex> Explorer.Series.slice(s, -3, 4)
+#Explorer.Series<
+ Polars[3]
+ integer [3, 4, 5]
+>
@@ -7175,19 +7175,19 @@ sort(series, opts \\ [])
Examples
-iex> s = Explorer.Series.from_list([9, 3, 7, 1])
-iex> Explorer.Series.sort(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 3, 7, 9]
->
+iex> s = Explorer.Series.from_list([9, 3, 7, 1])
+iex> Explorer.Series.sort(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 3, 7, 9]
+>
-iex> s = Explorer.Series.from_list([9, 3, 7, 1])
-iex> Explorer.Series.sort(s, direction: :desc)
-#Explorer.Series<
- Polars[4]
- integer [9, 7, 3, 1]
->
+iex> s = Explorer.Series.from_list([9, 3, 7, 1])
+iex> Explorer.Series.sort(s, direction: :desc)
+#Explorer.Series<
+ Polars[4]
+ integer [9, 7, 3, 1]
+>
@@ -7223,12 +7223,12 @@ tail(series, n_elements \\ 10)
Examples
-iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.tail(s)
-#Explorer.Series<
- Polars[10]
- integer [91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
->
+iex> s = 1..100 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.tail(s)
+#Explorer.Series<
+ Polars[10]
+ integer [91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
+>
@@ -7256,8 +7256,8 @@ unordered_distinct(series)
Examples
-iex> s = [1, 1, 2, 2, 3, 3] |> Explorer.Series.from_list()
-iex> Explorer.Series.unordered_distinct(s)
+iex> s = [1, 1, 2, 2, 3, 3] |> Explorer.Series.from_list()
+iex> Explorer.Series.unordered_distinct(s)
@@ -7311,26 +7311,26 @@ cumulative_max(series, opts \\ [])
Examples
-iex> s = [1, 2, 3, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_max(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 3, 4]
->
+iex> s = [1, 2, 3, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_max(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 3, 4]
+>
-iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_max(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, nil, 4]
->
+iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_max(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, nil, 4]
+>
-iex> s = [~T[03:00:02.000000], ~T[02:04:19.000000], nil, ~T[13:24:56.000000]] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_max(s)
-#Explorer.Series<
- Polars[4]
- time [03:00:02.000000, 03:00:02.000000, nil, 13:24:56.000000]
->
+iex> s = [~T[03:00:02.000000], ~T[02:04:19.000000], nil, ~T[13:24:56.000000]] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_max(s)
+#Explorer.Series<
+ Polars[4]
+ time [03:00:02.000000, 03:00:02.000000, nil, 13:24:56.000000]
+>
@@ -7372,26 +7372,26 @@ cumulative_min(series, opts \\ [])
Examples
-iex> s = [1, 2, 3, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_min(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 1, 1, 1]
->
+iex> s = [1, 2, 3, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_min(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 1, 1, 1]
+>
-iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_min(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 1, nil, 1]
->
+iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_min(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 1, nil, 1]
+>
-iex> s = [~T[03:00:02.000000], ~T[02:04:19.000000], nil, ~T[13:24:56.000000]] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_min(s)
-#Explorer.Series<
- Polars[4]
- time [03:00:02.000000, 02:04:19.000000, nil, 02:04:19.000000]
->
+iex> s = [~T[03:00:02.000000], ~T[02:04:19.000000], nil, ~T[13:24:56.000000]] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_min(s)
+#Explorer.Series<
+ Polars[4]
+ time [03:00:02.000000, 02:04:19.000000, nil, 02:04:19.000000]
+>
@@ -7433,19 +7433,19 @@ cumulative_product(series, opts \\ [])
Examples
-iex> s = [1, 2, 3, 2] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_product(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 6, 12]
->
+iex> s = [1, 2, 3, 2] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_product(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 6, 12]
+>
-iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_product(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, nil, 8]
->
+iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_product(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, nil, 8]
+>
@@ -7487,19 +7487,19 @@ cumulative_sum(series, opts \\ [])
Examples
-iex> s = [1, 2, 3, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_sum(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 3, 6, 10]
->
+iex> s = [1, 2, 3, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_sum(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 3, 6, 10]
+>
-iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
-iex> Explorer.Series.cumulative_sum(s)
-#Explorer.Series<
- Polars[4]
- integer [1, 3, nil, 7]
->
+iex> s = [1, 2, nil, 4] |> Explorer.Series.from_list()
+iex> Explorer.Series.cumulative_sum(s)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 3, nil, 7]
+>
@@ -7538,19 +7538,19 @@ ewm_mean(series, opts \\ [])
Examples
-iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.ewm_mean(s)
-#Explorer.Series<
- Polars[5]
- float [1.0, 1.6666666666666667, 2.4285714285714284, 3.2666666666666666, 4.161290322580645]
->
+iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.ewm_mean(s)
+#Explorer.Series<
+ Polars[5]
+ float [1.0, 1.6666666666666667, 2.4285714285714284, 3.2666666666666666, 4.161290322580645]
+>
-iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.ewm_mean(s, alpha: 0.1)
-#Explorer.Series<
- Polars[5]
- float [1.0, 1.5263157894736843, 2.070110701107011, 2.6312881651642916, 3.2097140484969833]
->
+iex> s = 1..5 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.ewm_mean(s, alpha: 0.1)
+#Explorer.Series<
+ Polars[5]
+ float [1.0, 1.5263157894736843, 2.070110701107011, 2.6312881651642916, 3.2097140484969833]
+>
@@ -7602,73 +7602,73 @@ fill_missing(series, value)
Examples
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, :forward)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 2, 4]
->
-
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, :backward)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 4, 4]
->
-
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, :max)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 4, 4]
->
-
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, :min)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 1, 4]
->
-
-iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, :mean)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 2, 4]
->
Values that belong to the series itself can also be added as missing:
iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, 3)
-#Explorer.Series<
- Polars[4]
- integer [1, 2, 3, 4]
->
-
-iex> s = Explorer.Series.from_list(["a", "b", nil, "d"])
-iex> Explorer.Series.fill_missing(s, "c")
-#Explorer.Series<
- Polars[4]
- string ["a", "b", "c", "d"]
->
Mismatched types will raise:
iex> s = Explorer.Series.from_list([1, 2, nil, 4])
-iex> Explorer.Series.fill_missing(s, "foo")
-** (ArgumentError) cannot invoke Explorer.Series.fill_missing/2 with mismatched dtypes: :integer and "foo"
Floats in particular accept missing values to be set to NaN, Inf, and -Inf:
iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 4.0])
-iex> Explorer.Series.fill_missing(s, :nan)
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, NaN, 4.0]
->
-
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 4.0])
-iex> Explorer.Series.fill_missing(s, :infinity)
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, Inf, 4.0]
->
-
-iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 4.0])
-iex> Explorer.Series.fill_missing(s, :neg_infinity)
-#Explorer.Series<
- Polars[4]
- float [1.0, 2.0, -Inf, 4.0]
->
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, :forward)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 2, 4]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, :backward)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 4, 4]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, :max)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 4, 4]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, :min)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 1, 4]
+>
+
+iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, :mean)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 2, 4]
+>
Values that belong to the series itself can also be added as missing:
iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, 3)
+#Explorer.Series<
+ Polars[4]
+ integer [1, 2, 3, 4]
+>
+
+iex> s = Explorer.Series.from_list(["a", "b", nil, "d"])
+iex> Explorer.Series.fill_missing(s, "c")
+#Explorer.Series<
+ Polars[4]
+ string ["a", "b", "c", "d"]
+>
Mismatched types will raise:
iex> s = Explorer.Series.from_list([1, 2, nil, 4])
+iex> Explorer.Series.fill_missing(s, "foo")
+** (ArgumentError) cannot invoke Explorer.Series.fill_missing/2 with mismatched dtypes: :integer and "foo"
Floats in particular accept missing values to be set to NaN, Inf, and -Inf:
iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 4.0])
+iex> Explorer.Series.fill_missing(s, :nan)
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, NaN, 4.0]
+>
+
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 4.0])
+iex> Explorer.Series.fill_missing(s, :infinity)
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, Inf, 4.0]
+>
+
+iex> s = Explorer.Series.from_list([1.0, 2.0, nil, 4.0])
+iex> Explorer.Series.fill_missing(s, :neg_infinity)
+#Explorer.Series<
+ Polars[4]
+ float [1.0, 2.0, -Inf, 4.0]
+>
@@ -7706,19 +7706,19 @@ window_max(series, window_size, opts \\ [])
Examples
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_max(s, 4)
-#Explorer.Series<
- Polars[10]
- integer [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_max(s, 4)
+#Explorer.Series<
+ Polars[10]
+ integer [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+>
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_max(s, 2, weights: [1.0, 2.0])
-#Explorer.Series<
- Polars[10]
- float [1.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0, 18.0, 20.0]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_max(s, 2, weights: [1.0, 2.0])
+#Explorer.Series<
+ Polars[10]
+ float [1.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0, 18.0, 20.0]
+>
@@ -7756,26 +7756,26 @@ window_mean(series, window_size, opts \\ []
Examples
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_mean(s, 4)
-#Explorer.Series<
- Polars[10]
- float [1.0, 1.5, 2.0, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_mean(s, 4)
+#Explorer.Series<
+ Polars[10]
+ float [1.0, 1.5, 2.0, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5]
+>
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_mean(s, 2, weights: [0.25, 0.75])
-#Explorer.Series<
- Polars[10]
- float [0.25, 1.75, 2.75, 3.75, 4.75, 5.75, 6.75, 7.75, 8.75, 9.75]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_mean(s, 2, weights: [0.25, 0.75])
+#Explorer.Series<
+ Polars[10]
+ float [0.25, 1.75, 2.75, 3.75, 4.75, 5.75, 6.75, 7.75, 8.75, 9.75]
+>
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_mean(s, 2, weights: [0.25, 0.75], min_periods: nil)
-#Explorer.Series<
- Polars[10]
- float [nil, 1.75, 2.75, 3.75, 4.75, 5.75, 6.75, 7.75, 8.75, 9.75]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_mean(s, 2, weights: [0.25, 0.75], min_periods: nil)
+#Explorer.Series<
+ Polars[10]
+ float [nil, 1.75, 2.75, 3.75, 4.75, 5.75, 6.75, 7.75, 8.75, 9.75]
+>
@@ -7813,19 +7813,19 @@ window_min(series, window_size, opts \\ [])
Examples
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_min(s, 4)
-#Explorer.Series<
- Polars[10]
- integer [1, 1, 1, 1, 2, 3, 4, 5, 6, 7]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_min(s, 4)
+#Explorer.Series<
+ Polars[10]
+ integer [1, 1, 1, 1, 2, 3, 4, 5, 6, 7]
+>
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_min(s, 2, weights: [1.0, 2.0])
-#Explorer.Series<
- Polars[10]
- float [1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_min(s, 2, weights: [1.0, 2.0])
+#Explorer.Series<
+ Polars[10]
+ float [1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
+>
@@ -7863,19 +7863,19 @@ window_standard_deviation(series, window_si
Examples
-iex> s = Explorer.Series.from_list([1, 2, 3, 4, 1])
-iex> Explorer.Series.window_standard_deviation(s, 2)
-#Explorer.Series<
- Polars[5]
- float [0.0, 0.7071067811865476, 0.7071067811865476, 0.7071067811865476, 2.1213203435596424]
->
+iex> s = Explorer.Series.from_list([1, 2, 3, 4, 1])
+iex> Explorer.Series.window_standard_deviation(s, 2)
+#Explorer.Series<
+ Polars[5]
+ float [0.0, 0.7071067811865476, 0.7071067811865476, 0.7071067811865476, 2.1213203435596424]
+>
-iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5, 6])
-iex> Explorer.Series.window_standard_deviation(s, 2, weights: [0.25, 0.75])
-#Explorer.Series<
- Polars[6]
- float [0.4330127018922193, 0.4330127018922193, 0.4330127018922193, 0.4330127018922193, 0.4330127018922193, 0.4330127018922193]
->
+iex> s = Explorer.Series.from_list([1, 2, 3, 4, 5, 6])
+iex> Explorer.Series.window_standard_deviation(s, 2, weights: [0.25, 0.75])
+#Explorer.Series<
+ Polars[6]
+ float [0.4330127018922193, 0.4330127018922193, 0.4330127018922193, 0.4330127018922193, 0.4330127018922193, 0.4330127018922193]
+>
@@ -7913,19 +7913,19 @@ window_sum(series, window_size, opts \\ [])
Examples
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_sum(s, 4)
-#Explorer.Series<
- Polars[10]
- integer [1, 3, 6, 10, 14, 18, 22, 26, 30, 34]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_sum(s, 4)
+#Explorer.Series<
+ Polars[10]
+ integer [1, 3, 6, 10, 14, 18, 22, 26, 30, 34]
+>
-iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
-iex> Explorer.Series.window_sum(s, 2, weights: [1.0, 2.0])
-#Explorer.Series<
- Polars[10]
- float [1.0, 5.0, 8.0, 11.0, 14.0, 17.0, 20.0, 23.0, 26.0, 29.0]
->
+iex> s = 1..10 |> Enum.to_list() |> Explorer.Series.from_list()
+iex> Explorer.Series.window_sum(s, 2, weights: [1.0, 2.0])
+#Explorer.Series<
+ Polars[10]
+ float [1.0, 5.0, 8.0, 11.0, 14.0, 17.0, 20.0, 23.0, 26.0, 29.0]
+>
diff --git a/Explorer.TensorFrame.html b/Explorer.TensorFrame.html
index b2307c875..d84bc00ac 100644
--- a/Explorer.TensorFrame.html
+++ b/Explorer.TensorFrame.html
@@ -113,13 +113,13 @@
TensorFrame is a representation of Explorer.DataFrame
-that is designed to work inside Nx's defn
expressions.
For example, imagine the following defn
:
defn add_columns(tf) do
- tf[:a] + tf[:b]
-end
We can now pass a DataFrame as argument:
iex> add_columns(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]))
-#Nx.Tensor<
- s64[2]
- [32, 34]
->
Passing an Explorer.DataFrame
to a defn
will automatically
+that is designed to work inside Nx's defn
expressions.
For example, imagine the following defn
:
defn add_columns(tf) do
+ tf[:a] + tf[:b]
+end
We can now pass a DataFrame as argument:
iex> add_columns(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]))
+#Nx.Tensor<
+ s64[2]
+ [32, 34]
+>
Passing an Explorer.DataFrame
to a defn
will automatically
convert it to a TensorFrame. The TensorFrame will lazily
build tensors out of the used dataframe fields.
@@ -130,29 +130,29 @@
Due to the integration with Nx, you can also pass dataframes
into Nx.stack/2
and Nx.concatenate
and they will be automatically
converted to tensors. This makes it easy to pass dataframes into
-neural networks and other computationally intensive algorithms:
iex> Nx.concatenate(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]))
-#Nx.Tensor<
- s64[4]
- [11, 12, 21, 22]
->
-
-iex> Nx.stack(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]))
-#Nx.Tensor<
- s64[2][2]
- [
- [11, 12],
- [21, 22]
- ]
->
-
-iex> Nx.stack(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]), axis: -1)
-#Nx.Tensor<
- s64[2][2]
- [
- [11, 21],
- [12, 22]
- ]
->
+neural networks and other computationally intensive algorithms:iex> Nx.concatenate(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]))
+#Nx.Tensor<
+ s64[4]
+ [11, 12, 21, 22]
+>
+
+iex> Nx.stack(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]))
+#Nx.Tensor<
+ s64[2][2]
+ [
+ [11, 12],
+ [21, 22]
+ ]
+>
+
+iex> Nx.stack(Explorer.DataFrame.new(a: [11, 12], b: [21, 22]), axis: -1)
+#Nx.Tensor<
+ s64[2][2]
+ [
+ [11, 21],
+ [12, 22]
+ ]
+>
Warning: returning TensorFrames
@@ -165,14 +165,14 @@
above we used Nx
to add two columns, if you want to
put the result of the computation back into a DataFrame,
you can use Explorer.DataFrame.put/4
, which also accepts
-tensors:iex> df = Explorer.DataFrame.new(a: [11, 12], b: [21, 22])
-iex> Explorer.DataFrame.put(df, "result", add_columns(df))
-#Explorer.DataFrame<
- Polars[2 x 3]
- a integer [11, 12]
- b integer [21, 22]
- result integer [32, 34]
->
One benefit of using Explorer.DataFrame.put/4
is that it will
+tensors:
iex> df = Explorer.DataFrame.new(a: [11, 12], b: [21, 22])
+iex> Explorer.DataFrame.put(df, "result", add_columns(df))
+#Explorer.DataFrame<
+ Polars[2 x 3]
+ a integer [11, 12]
+ b integer [21, 22]
+ result integer [32, 34]
+>
One benefit of using Explorer.DataFrame.put/4
is that it will
preserve the type of the column if one already exists. Alternatively,
use Explorer.Series.from_tensor/1
to explicitly convert a tensor
back to a series.
@@ -311,7 +311,7 @@ pull(tf, name)
Examples
-Explorer.TensorFrame.pull(tf, "some_column")
+Explorer.TensorFrame.pull(tf, "some_column")
@@ -339,7 +339,7 @@ put(tf, name, tensor)
Examples
-Explorer.TensorFrame.put(tf, "result", some_tensor)
+Explorer.TensorFrame.put(tf, "result", some_tensor)
diff --git a/Explorer.epub b/Explorer.epub
index d7d7fdff6..b4348bbbe 100644
Binary files a/Explorer.epub and b/Explorer.epub differ
diff --git a/Explorer.html b/Explorer.html
index f831d0560..be48f2a69 100644
--- a/Explorer.html
+++ b/Explorer.html
@@ -119,13 +119,13 @@
Getting started
-
Inside an Elixir script or Livebook:
Mix.install([
- {:explorer, "~> 0.6.0"}
-])
Or in the mix.exs
file of your application:
def deps do
- [
- {:explorer, "~> 0.6.0"}
- ]
-end
+
Inside an Elixir script or Livebook:
Mix.install([
+ {:explorer, "~> 0.6.0"}
+])
Or in the mix.exs
file of your application:
def deps do
+ [
+ {:explorer, "~> 0.6.0"}
+ ]
+end
A glimpse of the API
@@ -135,13 +135,13 @@
of one data type only - or one dtype for short. Notice that nil values are
permitted in series of any dtype.
using a dataframe, that is just a way to represent one or more series together,
and work with them as a whole. The only restriction is that all the series shares
-the same size.
A series can be created from a list:
fruits = Explorer.Series.from_list(["apple", "mango", "banana", "orange"])
Your newly created series is going to look like:
#Explorer.Series<
- Polars[4]
- string ["apple", "mango", "banana", "orange"]
->
And you can, for example, sort that series:
Explorer.Series.sort(fruits)
Resulting in the following:
#Explorer.Series<
- Polars[4]
- string ["apple", "banana", "mango", "orange"]
->
+the same size.
A series can be created from a list:
fruits = Explorer.Series.from_list(["apple", "mango", "banana", "orange"])
Your newly created series is going to look like:
#Explorer.Series<
+ Polars[4]
+ string ["apple", "mango", "banana", "orange"]
+>
And you can, for example, sort that series:
Explorer.Series.sort(fruits)
Resulting in the following:
#Explorer.Series<
+ Polars[4]
+ string ["apple", "banana", "mango", "orange"]
+>
Dataframes
@@ -151,13 +151,13 @@
IO functions.
This is by far the most common way to load dataframes in Explorer.
We accept Parquet, IPC, CSV, and NDJSON files.
by using the Explorer.DataFrame.new/2
function, that is neat for small experiments.
-We are going to use this function here.
You can pass either series or lists to it:
mountains = Explorer.DataFrame.new(name: ["Everest", "K2", "Aconcagua"], elevation: [8848, 8611, 6962])
Your dataframe is going to look like this:
#Explorer.DataFrame<
- Polars[3 x 2]
- name string ["Everest", "K2", "Aconcagua"]
- elevation integer [8848, 8611, 6962]
->
It's also possible to see a dataframe like a table, using the Explorer.DataFrame.table/2
-function:
Explorer.DataFrame.table(mountains)
Prints:
+-------------------------------------------+
-| Explorer DataFrame: [rows: 3, columns: 2] |
+We are going to use this function here.You can pass either series or lists to it:
mountains = Explorer.DataFrame.new(name: ["Everest", "K2", "Aconcagua"], elevation: [8848, 8611, 6962])
Your dataframe is going to look like this:
#Explorer.DataFrame<
+ Polars[3 x 2]
+ name string ["Everest", "K2", "Aconcagua"]
+ elevation integer [8848, 8611, 6962]
+>
It's also possible to see a dataframe like a table, using the Explorer.DataFrame.table/2
+function:
Explorer.DataFrame.table(mountains)
Prints:
+-------------------------------------------+
+| Explorer DataFrame: [rows: 3, columns: 2] |
+---------------------+---------------------+
| name | elevation |
| <string> | <integer> |
@@ -170,13 +170,13 @@
+---------------------+---------------------+
And now I want to show you how to filter our dataframe. But first, let's require
the Explorer.DataFrame
module and give a short name to it:
require Explorer.DataFrame, as: DF
The "require" is needed to load the macro features of that module.
We give it a shorter name to simplify our examples.
Now let's go to the filter. I want to filter the mountains that are above
-the mean elevation in our dataframe:
DF.filter(mountains, elevation > mean(elevation))
You can see that we can refer to the columns using their names, and use functions
+the mean elevation in our dataframe:
DF.filter(mountains, elevation > mean(elevation))
You can see that we can refer to the columns using their names, and use functions
without define them. This is possible due the powerful Explorer.Query
features,
-and it's the main reason we need to "require" the Explorer.DataFrame
module.
The result is going to look like this:
#Explorer.DataFrame<
- Polars[2 x 2]
- name string ["Everest", "K2"]
- elevation integer [8848, 8611]
->
There is an extensive guide that you can play with Livebook:
+and it's the main reason we need to "require" the Explorer.DataFrame
module.
The result is going to look like this:
#Explorer.DataFrame<
+ Polars[2 x 2]
+ name string ["Everest", "K2"]
+ elevation integer [8848, 8611]
+>
There is an extensive guide that you can play with Livebook:
Ten Minutes to Explorer
You can also check the Explorer.DataFrame
and Explorer.Series
docs for further
details.
diff --git a/changelog.html b/changelog.html
index b6bbbbc60..66661cbb9 100644
--- a/changelog.html
+++ b/changelog.html
@@ -307,13 +307,13 @@
Add across
and comprehensions to Explorer.Query
. These features allow a
-more flexible and elegant way to work with multiple columns at once. Example:
iris = Explorer.Datasets.iris()
-Explorer.DataFrame.mutate(iris,
- for col <- across(["sepal_width", "sepal_length", "petal_length", "petal_width"]) do
- {col.name, (col - mean(col)) / variance(col)}
- end
-)
See the Explorer.Query
documentation for further details.
Add support for regexes to select columns of a dataframe. Example:
df = Explorer.Datasets.wine()
-df[~r/(class|hue)/]
Add the :max_rows
and :columns
options to Explorer.DataFrame.from_parquet/2
. This mirrors
+more flexible and elegant way to work with multiple columns at once. Example:
iris = Explorer.Datasets.iris()
+Explorer.DataFrame.mutate(iris,
+ for col <- across(["sepal_width", "sepal_length", "petal_length", "petal_width"]) do
+ {col.name, (col - mean(col)) / variance(col)}
+ end
+)
See the Explorer.Query
documentation for further details.
Add support for regexes to select columns of a dataframe. Example:
df = Explorer.Datasets.wine()
+df[~r/(class|hue)/]
Add the :max_rows
and :columns
options to Explorer.DataFrame.from_parquet/2
. This mirrors
the from_csv/2
function.
Allow Explorer.Series
functions that accept floats to work with :nan
, :infinity
and :neg_infinity
values.
Add Explorer.DataFrame.shuffle/2
and Explorer.Series.shuffle/2
.
Add support for a list of filters in Explorer.DataFrame.filter/2
. These filters are
joined as and
expressions.
@@ -368,7 +368,7 @@
Add DataFrame.describe/2
to gather some statistics from a dataframe.
Add Series.nil_count/1
to count nil values.
Add Series.in/2
to check if a given value is inside a series.
Add Series
float predicates: is_finite/1
, is_infinite/1
and is_nan/1
.
Add Series
string functions: contains/2
, trim/1
, trim_leading/1
, trim_trailing/1
,
upcase/1
and downcase/1
.
Enable slicing of lazy frames (LazyFrame
).
Add IO operations "from/load" to the lazy frame implementation.
Add support for the :lazy
option in the DataFrame.new/2
function.
Add Series
float rounding methods: round/2
, floor/1
and ceil/1
.
Add support for precompiling to Linux running on RISCV CPUs.
Add support for precompiling to Linux - with musl - running on AARCH64 computers.
Allow DataFrame.new/1
to receive the :dtypes
option.
Accept :nan
as an option for Series.fill_missing/2
with float series.
Add basic support for the categorical dtype - the :category
dtype.
Add Series.categories/1
to return categories from a categorical series.
Add Series.categorise/2
to categorise a series of integers using predefined categories.
Add Series.replace/2
to replace the contents of a series.
Support selecting columns with unusual names (like with spaces) inside Explorer.Query
-with col/1
.
The usage is like this:
Explorer.DataFrame.filter(df, col("my col") > 42)
+with col/1
.
The usage is like this:
Explorer.DataFrame.filter(df, col("my col") > 42)
Fixed
@@ -396,7 +396,7 @@
Add Series.quotient/2
and Series.remainder/2
to work with integer division.
Add Series.iotype/1
to return the underlying representation type.
Allow series on both sides of binary operations, like: add(series, 1)
and add(1, series)
.
Allow comparison, concat and coalesce operations on "(series, lazy series)".
Add lazy version of Series.sample/3
and Series.size/1
.
Add support for Arrow IPC Stream files.
Add Explorer.Query
and the macros that allow a simplified query API.
This is a huge improvement to some of the main functions, and allow refering to
-columns as they were variables.
Before this change we would need to write a filter like this:
Explorer.DataFrame.filter_with(df, &Explorer.Series.greater(&1["col1"], 42))
But now it's also possible to write this operation like this:
Explorer.DataFrame.filter(df, col1 > 42)
This operation is going to use filter_with/2
underneath, which means that
+columns as they were variables.
Before this change we would need to write a filter like this:
Explorer.DataFrame.filter_with(df, &Explorer.Series.greater(&1["col1"], 42))
But now it's also possible to write this operation like this:
Explorer.DataFrame.filter(df, col1 > 42)
This operation is going to use filter_with/2
underneath, which means that
is going to use lazy series and compute the results at once.
Notice that is mandatory to "require" the DataFrame module, since these operations
are implemented as macros.
The following new macros were added:
filter/2
mutate/2
summarise/2
arrange/2
They substitute older versions that did not accept the new query syntax.
Add DataFrame.put/3
to enable adding or replacing columns in a eager manner.
diff --git a/exploring_explorer.html b/exploring_explorer.html
index 15f595f84..7130328d9 100644
--- a/exploring_explorer.html
+++ b/exploring_explorer.html
@@ -115,10 +115,10 @@
-Mix.install([
- {:explorer, "~> 0.6.0"},
- {:kino, "~> 0.9.0"}
-])
+Mix.install([
+ {:explorer, "~> 0.6.0"},
+ {:kino, "~> 0.9.0"}
+])
Introduction
@@ -131,29 +131,29 @@
Reading and writing data
-
Data can be read from delimited files (like CSV), NDJSON, Parquet, and the Arrow IPC (feather) format. You can also load in data from a map or keyword list of columns with Explorer.DataFrame.new/1
.
For CSV, your 'usual suspects' of options are available:
delimiter
- A single character used to separate fields within a record. (default: ","
)dtypes
- A keyword list of [column_name: dtype]
. If a type is not specified for a column, it is imputed from the first 1000 rows. (default: []
)header
- Does the file have a header of column names as the first row or not? (default: true
)max_rows
- Maximum number of lines to read. (default: nil
)null_character
- The string that should be interpreted as a nil value. (default: "NA"
)skip_rows
- The number of lines to skip at the beginning of the file. (default: 0
)columns
- A list of column names to keep. If present, only these columns are read into the dataframe. (default: nil
)
Explorer
also has multiple example datasets built in, which you can load from the Explorer.Datasets
module like so:
df = Explorer.Datasets.fossil_fuels()
You'll notice that the output looks slightly different than many dataframe libraries. Explorer
takes inspiration on this front from glimpse
in R. A benefit to this approach is that you will rarely need to elide columns.
If you'd like to see a table with your data, we've got you covered there too.
Explorer.DataFrame.table(df)
Writing files is very similar to reading them. The options are a little more limited:
header
- Should the column names be written as the first line of the file? (default: true
)delimiter
- A single character used to separate fields within a record. (default: ","
)
First, let's add some useful aliases:
alias Explorer.DataFrame
-alias Explorer.Series
And then write to a file of your choosing:
input = Kino.Input.text("Filename")
filename = Kino.Input.read(input)
-DataFrame.to_csv(df, filename)
+
Data can be read from delimited files (like CSV), NDJSON, Parquet, and the Arrow IPC (feather) format. You can also load in data from a map or keyword list of columns with Explorer.DataFrame.new/1
.
For CSV, your 'usual suspects' of options are available:
delimiter
- A single character used to separate fields within a record. (default: ","
)dtypes
- A keyword list of [column_name: dtype]
. If a type is not specified for a column, it is imputed from the first 1000 rows. (default: []
)header
- Does the file have a header of column names as the first row or not? (default: true
)max_rows
- Maximum number of lines to read. (default: nil
)null_character
- The string that should be interpreted as a nil value. (default: "NA"
)skip_rows
- The number of lines to skip at the beginning of the file. (default: 0
)columns
- A list of column names to keep. If present, only these columns are read into the dataframe. (default: nil
)
Explorer
also has multiple example datasets built in, which you can load from the Explorer.Datasets
module like so:
df = Explorer.Datasets.fossil_fuels()
You'll notice that the output looks slightly different than many dataframe libraries. Explorer
takes inspiration on this front from glimpse
in R. A benefit to this approach is that you will rarely need to elide columns.
If you'd like to see a table with your data, we've got you covered there too.
Explorer.DataFrame.table(df)
Writing files is very similar to reading them. The options are a little more limited:
header
- Should the column names be written as the first line of the file? (default: true
)delimiter
- A single character used to separate fields within a record. (default: ","
)
First, let's add some useful aliases:
alias Explorer.DataFrame
+alias Explorer.Series
And then write to a file of your choosing:
input = Kino.Input.text("Filename")
filename = Kino.Input.read(input)
+DataFrame.to_csv(df, filename)
Working with Series
-Explorer
, like Polars
, works up from the concept of a Series
. In many ways, you can think of a dataframe as a row-aligned map of Series
. These are like vectors
in R or series
in Pandas.
For simplicity, Explorer
uses the following Series
dtypes
:
:float
- 64-bit floating point number:integer
- 64-bit signed integer:boolean
- Boolean:string
- UTF-8 encoded binary:date
- Date type that unwraps to Elixir.Date
:datetime
- DateTime type that unwraps to Elixir.NaiveDateTime
Series
can be constructed from Elixir basic types. For example:
s1 = Series.from_list([1, 2, 3])
s2 = Series.from_list(["a", "b", "c"])
s3 = Series.from_list([~D[2011-01-01], ~D[1965-01-21]])
You'll notice that the dtype
and size of the Series
are at the top of the printed value. You can get those programmatically as well.
Series.dtype(s3)
Series.size(s3)
And the printed values max out at 50:
1..100 |> Enum.to_list() |> Series.from_list()
Series are also nullable.
s = Series.from_list([1.0, 2.0, nil, nil, 5.0])
And you can fill in those missing values using one of the following strategies:
:forward
- replace nil with the previous value:backward
- replace nil with the next value:max
- replace nil with the series maximum:min
- replace nil with the series minimum:mean
- replace nil with the series mean
Series.fill_missing(s, :forward)
In the case of mixed numeric types (i.e. integers and floats), Series
will downcast to a float:
Series.from_list([1, 2.0])
In all other cases, Series
must all be of the same dtype
or else you'll get an ArgumentError
.
Series.from_list([1, 2, 3, "a"])
One of the goals of Explorer
is useful error messages. If you look at the error above, you get:
Cannot make a series from mismatched types. Type of "a" does not match inferred dtype integer.
Hopefully this makes abundantly clear what's going on.
Series
also implements the Access
protocol. You can slice and dice in many ways:
s = 1..10 |> Enum.to_list() |> Series.from_list()
s[1]
s[-1]
s[0..4]
s[[0, 4, 4]]
And of course, you can convert back to an Elixir list.
Series.to_list(s)
Explorer
supports comparisons.
s = 1..11 |> Enum.to_list() |> Series.from_list()
s1 = 11..1 |> Enum.to_list() |> Series.from_list()
Series.equal(s, s1)
Series.equal(s, 5)
Series.not_equal(s, 10)
Series.greater_equal(s, 4)
And arithmetic.
Series.add(s, s1)
Series.subtract(s, 4)
Series.multiply(s, s1)
Remember those helpful errors? We've tried to add those throughout. So if you try to do arithmetic with mismatching dtypes:
s = Series.from_list([1, 2, 3])
-s1 = Series.from_list([1.0, 2.0, 3.0])
-Series.add(s, s1)
Just kidding! Integers and floats will downcast to floats. Let's try again:
s = Series.from_list([1, 2, 3])
-s1 = Series.from_list(["a", "b", "c"])
-Series.add(s, s1)
You can flip them around.
s = Series.from_list([1, 2, 3, 4])
-Series.reverse(s)
And sort.
1..100 |> Enum.to_list() |> Enum.shuffle() |> Series.from_list() |> Series.sort()
Or argsort.
s = 1..100 |> Enum.to_list() |> Enum.shuffle() |> Series.from_list()
-ids = Series.argsort(s) |> Series.to_list()
Which you can pass to Explorer.Series.slice/2
if you want the sorted values.
Series.slice(s, ids)
You can calculate cumulative values.
s = 1..100 |> Enum.to_list() |> Series.from_list()
-Series.cumulative_sum(s)
Or rolling ones.
Series.window_sum(s, 4)
You can count and list unique values.
s = Series.from_list(["a", "b", "b", "c", "c", "c"])
-Series.distinct(s)
Series.n_distinct(s)
And you can even get a dataframe showing the frequencies for each distinct value.
Series.frequencies(s)
+
Explorer
, like Polars
, works up from the concept of a Series
. In many ways, you can think of a dataframe as a row-aligned map of Series
. These are like vectors
in R or series
in Pandas.
For simplicity, Explorer
uses the following Series
dtypes
:
:float
- 64-bit floating point number:integer
- 64-bit signed integer:boolean
- Boolean:string
- UTF-8 encoded binary:date
- Date type that unwraps to Elixir.Date
:datetime
- DateTime type that unwraps to Elixir.NaiveDateTime
Series
can be constructed from Elixir basic types. For example:
s1 = Series.from_list([1, 2, 3])
s2 = Series.from_list(["a", "b", "c"])
s3 = Series.from_list([~D[2011-01-01], ~D[1965-01-21]])
You'll notice that the dtype
and size of the Series
are at the top of the printed value. You can get those programmatically as well.
Series.dtype(s3)
Series.size(s3)
And the printed values max out at 50:
1..100 |> Enum.to_list() |> Series.from_list()
Series are also nullable.
s = Series.from_list([1.0, 2.0, nil, nil, 5.0])
And you can fill in those missing values using one of the following strategies:
:forward
- replace nil with the previous value:backward
- replace nil with the next value:max
- replace nil with the series maximum:min
- replace nil with the series minimum:mean
- replace nil with the series mean
Series.fill_missing(s, :forward)
In the case of mixed numeric types (i.e. integers and floats), Series
will downcast to a float:
Series.from_list([1, 2.0])
In all other cases, Series
must all be of the same dtype
or else you'll get an ArgumentError
.
Series.from_list([1, 2, 3, "a"])
One of the goals of Explorer
is useful error messages. If you look at the error above, you get:
Cannot make a series from mismatched types. Type of "a" does not match inferred dtype integer.
Hopefully this makes abundantly clear what's going on.
Series
also implements the Access
protocol. You can slice and dice in many ways:
s = 1..10 |> Enum.to_list() |> Series.from_list()
s[1]
s[-1]
s[0..4]
s[[0, 4, 4]]
And of course, you can convert back to an Elixir list.
Series.to_list(s)
Explorer
supports comparisons.
s = 1..11 |> Enum.to_list() |> Series.from_list()
s1 = 11..1 |> Enum.to_list() |> Series.from_list()
Series.equal(s, s1)
Series.equal(s, 5)
Series.not_equal(s, 10)
Series.greater_equal(s, 4)
And arithmetic.
Series.add(s, s1)
Series.subtract(s, 4)
Series.multiply(s, s1)
Remember those helpful errors? We've tried to add those throughout. So if you try to do arithmetic with mismatching dtypes:
s = Series.from_list([1, 2, 3])
+s1 = Series.from_list([1.0, 2.0, 3.0])
+Series.add(s, s1)
Just kidding! Integers and floats will downcast to floats. Let's try again:
s = Series.from_list([1, 2, 3])
+s1 = Series.from_list(["a", "b", "c"])
+Series.add(s, s1)
You can flip them around.
s = Series.from_list([1, 2, 3, 4])
+Series.reverse(s)
And sort.
1..100 |> Enum.to_list() |> Enum.shuffle() |> Series.from_list() |> Series.sort()
Or argsort.
s = 1..100 |> Enum.to_list() |> Enum.shuffle() |> Series.from_list()
+ids = Series.argsort(s) |> Series.to_list()
Which you can pass to Explorer.Series.slice/2
if you want the sorted values.
Series.slice(s, ids)
You can calculate cumulative values.
s = 1..100 |> Enum.to_list() |> Series.from_list()
+Series.cumulative_sum(s)
Or rolling ones.
Series.window_sum(s, 4)
You can count and list unique values.
s = Series.from_list(["a", "b", "b", "c", "c", "c"])
+Series.distinct(s)
Series.n_distinct(s)
And you can even get a dataframe showing the frequencies for each distinct value.
Series.frequencies(s)
Working with DataFrames
-A DataFrame
is really just a collection of Series
of the same size. Which is why you can create a DataFrame
from a Keyword
list.
DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
Similarly to Series
, the Inspect
implementation prints some info at the top and to the left. At the top we see the shape of the dataframe (rows and columns) and then for each column we see the name, dtype, and first five values. We can see a bit more from that built-in dataset we loaded in earlier.
df
You will also see grouping information there, but we'll get to that later. You can get the info yourself directly:
DataFrame.names(df)
DataFrame.dtypes(df)
DataFrame.shape(df)
{DataFrame.n_rows(df), DataFrame.n_columns(df)}
We can grab the head.
DataFrame.head(df)
Or the tail. Let's get a few more values from the tail.
DataFrame.tail(df, 10)
+
A DataFrame
is really just a collection of Series
of the same size. Which is why you can create a DataFrame
from a Keyword
list.
DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
Similarly to Series
, the Inspect
implementation prints some info at the top and to the left. At the top we see the shape of the dataframe (rows and columns) and then for each column we see the name, dtype, and first five values. We can see a bit more from that built-in dataset we loaded in earlier.
df
You will also see grouping information there, but we'll get to that later. You can get the info yourself directly:
DataFrame.names(df)
DataFrame.dtypes(df)
DataFrame.shape(df)
{DataFrame.n_rows(df), DataFrame.n_columns(df)}
We can grab the head.
DataFrame.head(df)
Or the tail. Let's get a few more values from the tail.
DataFrame.tail(df, 10)
Verbs and macros
@@ -168,103 +168,103 @@
Select
-Let's jump right into it. We can select columns pretty simply.
DF.select(df, ["year", "country"])
But Elixir gives us some superpowers. In R there's tidy-select
. I don't think we need that in Elixir. Anywhere in Explorer
where you need to pass a list of column names, you can also execute a filtering callback on the column names. It's just an anonymous function passed to df |> DataFrame.names() |> Enum.filter(callback_here)
.
DF.select(df, &String.ends_with?(&1, "fuel"))
Want all but some columns? discard/2
performs the opposite of select/2
.
DF.discard(df, &String.ends_with?(&1, "fuel"))
+
Let's jump right into it. We can select columns pretty simply.
DF.select(df, ["year", "country"])
But Elixir gives us some superpowers. In R there's tidy-select
. I don't think we need that in Elixir. Anywhere in Explorer
where you need to pass a list of column names, you can also execute a filtering callback on the column names. It's just an anonymous function passed to df |> DataFrame.names() |> Enum.filter(callback_here)
.
DF.select(df, &String.ends_with?(&1, "fuel"))
Want all but some columns? discard/2
performs the opposite of select/2
.
DF.discard(df, &String.ends_with?(&1, "fuel"))
Filter
The next verb we'll look at is filter
.
This is implemented using a macro, so it's possible to use expressions
-like you would if comparing variables in Elixir:
DF.filter(df, country == "BRAZIL")
Using complex filters is also possible:
DF.filter(df, country == "ALGERIA" and year > 2012)
You can also write the same filter without the macro, by using the callback version function which is filter_with/2
:
DF.filter_with(df, fn ldf ->
- ldf["country"]
- |> Series.equal("ALGERIA")
- |> Series.and(Series.greater(ldf["year"], 2012))
-end)
By the way, all the Explorer.DataFrame
macros have a correspondent function that accepts a callback.
+like you would if comparing variables in Elixir:
DF.filter(df, country == "BRAZIL")
Using complex filters is also possible:
DF.filter(df, country == "ALGERIA" and year > 2012)
You can also write the same filter without the macro, by using the callback version function which is filter_with/2
:
DF.filter_with(df, fn ldf ->
+ ldf["country"]
+ |> Series.equal("ALGERIA")
+ |> Series.and(Series.greater(ldf["year"], 2012))
+end)
By the way, all the Explorer.DataFrame
macros have a correspondent function that accepts a callback.
In fact, our macros are implemented using those functions.
The filter_with/2
function is going to use a virtual representation of the dataframe
that we call a "lazy frame". With lazy frames you canĀ“t access the
-series contents, but every operation will be optimized and run only once.
Remember those helpful error messages?
DF.filter(df, cuontry == "BRAZIL")
+series contents, but every operation will be optimized and run only once.
Remember those helpful error messages?
DF.filter(df, cuontry == "BRAZIL")
Mutate
-A common task in data analysis is to add columns or change existing ones. Mutate is a handy verb.
DF.mutate(df, new_column: solid_fuel + cement)
Did you catch that? You can pass in new columns as keyword arguments. It also works to transform existing columns.
DF.mutate(df,
- gas_fuel: Series.cast(gas_fuel, :float),
+A common task in data analysis is to add columns or change existing ones. Mutate is a handy verb.
DF.mutate(df, new_column: solid_fuel + cement)
Did you catch that? You can pass in new columns as keyword arguments. It also works to transform existing columns.
DF.mutate(df,
+ gas_fuel: Series.cast(gas_fuel, :float),
gas_and_liquid_fuel: gas_fuel + liquid_fuel
-)
DataFrame.mutate/2
is flexible though. You may not always want to use keyword arguments. Given that column names are String.t()
, it may make more sense to use a map.
DF.mutate(df, %{"gas_fuel" => gas_fuel - 10})
DF.transmute/2
, which is DF.mutate/2
that only retains the specified columns, is forthcoming.
+
)
DataFrame.mutate/2
is flexible though. You may not always want to use keyword arguments. Given that column names are String.t()
, it may make more sense to use a map.
DF.mutate(df, %{"gas_fuel" => gas_fuel - 10})
DF.transmute/2
, which is DF.mutate/2
that only retains the specified columns, is forthcoming.
Arrange
-Sorting the dataframe is pretty straightforward.
DF.arrange(df, year)
But it comes with some tricks up its sleeve.
DF.arrange(df, asc: total, desc: year)
As the examples show, arrange/2
is a macro, and therefore you can use some functions to arrange your dataframe:
DF.arrange(df, asc: Series.window_sum(total, 2))
Sort operations happen left to right. And keyword list args permit specifying the direction.
+
Sorting the dataframe is pretty straightforward.
DF.arrange(df, year)
But it comes with some tricks up its sleeve.
DF.arrange(df, asc: total, desc: year)
As the examples show, arrange/2
is a macro, and therefore you can use some functions to arrange your dataframe:
DF.arrange(df, asc: Series.window_sum(total, 2))
Sort operations happen left to right. And keyword list args permit specifying the direction.
Distinct
-Okay, as expected here too. Very straightforward.
DF.distinct(df, ["year", "country"])
You can specify whether to keep the other columns as well, so the first row of each distinct value is kept:
DF.distinct(df, ["country"], keep_all: true)
+
Okay, as expected here too. Very straightforward.
DF.distinct(df, ["year", "country"])
You can specify whether to keep the other columns as well, so the first row of each distinct value is kept:
DF.distinct(df, ["country"], keep_all: true)
Rename
-Rename can take either a list of new names or a callback that is passed to Enum.map/2
against the names. You can also use a map or keyword args to rename specific columns.
DF.rename(df, year: "year_test")
DF.rename_with(df, &(&1 <> "_test"))
+
Rename can take either a list of new names or a callback that is passed to Enum.map/2
against the names. You can also use a map or keyword args to rename specific columns.
DF.rename(df, year: "year_test")
DF.rename_with(df, &(&1 <> "_test"))
Dummies
-This is fun! We can get dummy variables for unique values.
DF.dummies(df, ["year"])
DF.dummies(df, ["country"])
+
This is fun! We can get dummy variables for unique values.
DF.dummies(df, ["year"])
DF.dummies(df, ["country"])
Sampling
-Random samples can give us a percent or a specific number of samples, with or without replacement, and the function is seedable.
DF.sample(df, 10)
DF.sample(df, 0.4)
Trying for those helpful error messages again.
DF.sample(df, 10000)
DF.sample(df, 10000, replacement: true)
+
Random samples can give us a percent or a specific number of samples, with or without replacement, and the function is seedable.
DF.sample(df, 10)
DF.sample(df, 0.4)
Trying for those helpful error messages again.
DF.sample(df, 10000)
DF.sample(df, 10000, replacement: true)
Pull and slice
-Slicing and dicing can be done with the Access
protocol or with explicit pull/slice/take functions.
df["year"]
DF.pull(df, "year")
df[["year", "country"]]
DF.slice(df, [1, 20, 50])
Negative offsets work for slice!
DF.slice(df, -10, 5)
DF.slice(df, 10, 5)
Slice also works with ranges:
DF.slice(df, 12..42)
+
Slicing and dicing can be done with the Access
protocol or with explicit pull/slice/take functions.
df["year"]
DF.pull(df, "year")
df[["year", "country"]]
DF.slice(df, [1, 20, 50])
Negative offsets work for slice!
DF.slice(df, -10, 5)
DF.slice(df, 10, 5)
Slice also works with ranges:
DF.slice(df, 12..42)
Pivot
-We can pivot_longer/3
and pivot_wider/4
. These are inspired by tidyr.
There are some shortcomings in pivot_wider/4
related to polars
. The select
option must select only columns of numeric type.
DF.pivot_longer(df, ["year", "country"], select: &String.ends_with?(&1, "fuel"))
DF.pivot_wider(df, "country", "total", id_columns: ["year"])
Let's make those names look nicer!
tidy_names = fn name ->
+We can pivot_longer/3
and pivot_wider/4
. These are inspired by tidyr.
There are some shortcomings in pivot_wider/4
related to polars
. The select
option must select only columns of numeric type.
DF.pivot_longer(df, ["year", "country"], select: &String.ends_with?(&1, "fuel"))
DF.pivot_wider(df, "country", "total", id_columns: ["year"])
Let's make those names look nicer!
tidy_names = fn name ->
name
- |> String.downcase()
- |> String.replace(~r/\s/, " ")
- |> String.replace(~r/[^A-Za-z\s]/, "")
- |> String.replace(" ", "_")
-end
+ |> String.downcase()
+ |> String.replace(~r/\s/, " ")
+ |> String.replace(~r/[^A-Za-z\s]/, "")
+ |> String.replace(" ", "_")
+end
df
-|> DF.pivot_wider("country", "total", id_columns: ["year"])
-|> DF.rename_with(tidy_names)
+
|> DF.pivot_wider("country", "total", id_columns: ["year"])
+|> DF.rename_with(tidy_names)
Joins
-Joining is fast and easy. You can specify the columns to join on and how to join. Polars even supports cartesian (cross) joins, so Explorer
does too.
df1 = DF.select(df, ["year", "country", "total"])
-df2 = DF.select(df, ["year", "country", "cement"])
+Joining is fast and easy. You can specify the columns to join on and how to join. Polars even supports cartesian (cross) joins, so Explorer
does too.
df1 = DF.select(df, ["year", "country", "total"])
+df2 = DF.select(df, ["year", "country", "cement"])
-DF.join(df1, df2)
df3 = df |> DF.select(["year", "cement"]) |> DF.slice(0, 500)
+DF.join(df1, df2)
df3 = df |> DF.select(["year", "cement"]) |> DF.slice(0, 500)
-DF.join(df1, df3, how: :left)
+
DF.join(df1, df3, how: :left)
Grouping
-Explorer
supports groupby operations. They're limited based on what's possible in Polars, but they do most of what you need to do.
grouped = DF.group_by(df, ["country"])
Notice that the Inspect
call now shows groups
as well as rows
and columns
. You can, of course, get them explicitly.
DF.groups(grouped)
And you can ungroup explicitly.
DF.ungroup(grouped)
But what we care about the most is aggregating! Let's see which country has the max per_capita
value.
grouped
-|> DF.summarise(max_per_capita: max(per_capita))
-|> DF.arrange(desc: max_per_capita)
Qatar it is.
You may noticed that we are using max/1
inside the summarise
macro. This is possible because we expose all functions from the Series
module. You can use the following aggregations inside summarise:
min/1
- Take the minimum value within the group. See Explorer.Series.min/1
.max/1
- Take the maximum value within the group. See Explorer.Series.max/1
.sum/1
- Take the sum of the series within the group. See Explorer.Series.sum/1
.mean/1
- Take the mean of the series within the group. See Explorer.Series.mean/1
.median/1
- Take the median of the series within the group. See Explorer.Series.median/1
.first/1
- Take the first value within the group. See Explorer.Series.first/1
.last/1
- Take the last value within the group. See Explorer.Series.last/1
.count/1
- Count the number of rows per group.n_unique/1
- Count the number of unique rows per group.
The API is similar to mutate
: you can use keyword args or a map and specify aggregations to use.
DF.summarise(grouped, min_per_capita: min(per_capita), min_total: min(total))
Speaking of mutate
, it's 'group-aware'. As are arrange
, distinct
, and n_rows
.
DF.mutate(grouped, total_window_sum: window_sum(total, 3), rows_in_group: count(country))
It's also possible to use aggregations inside other functions:
grouped
-|> DF.summarise(greater_than_9: greater(max(per_capita), 9.0), per_capita_max: max(per_capita))
-|> DataFrame.arrange(desc: per_capita_max)
+
Explorer
supports groupby operations. They're limited based on what's possible in Polars, but they do most of what you need to do.
grouped = DF.group_by(df, ["country"])
Notice that the Inspect
call now shows groups
as well as rows
and columns
. You can, of course, get them explicitly.
DF.groups(grouped)
And you can ungroup explicitly.
DF.ungroup(grouped)
But what we care about the most is aggregating! Let's see which country has the max per_capita
value.
grouped
+|> DF.summarise(max_per_capita: max(per_capita))
+|> DF.arrange(desc: max_per_capita)
Qatar it is.
You may noticed that we are using max/1
inside the summarise
macro. This is possible because we expose all functions from the Series
module. You can use the following aggregations inside summarise:
min/1
- Take the minimum value within the group. See Explorer.Series.min/1
.max/1
- Take the maximum value within the group. See Explorer.Series.max/1
.sum/1
- Take the sum of the series within the group. See Explorer.Series.sum/1
.mean/1
- Take the mean of the series within the group. See Explorer.Series.mean/1
.median/1
- Take the median of the series within the group. See Explorer.Series.median/1
.first/1
- Take the first value within the group. See Explorer.Series.first/1
.last/1
- Take the last value within the group. See Explorer.Series.last/1
.count/1
- Count the number of rows per group.n_unique/1
- Count the number of unique rows per group.
The API is similar to mutate
: you can use keyword args or a map and specify aggregations to use.
DF.summarise(grouped, min_per_capita: min(per_capita), min_total: min(total))
Speaking of mutate
, it's 'group-aware'. As are arrange
, distinct
, and n_rows
.
DF.mutate(grouped, total_window_sum: window_sum(total, 3), rows_in_group: count(country))
It's also possible to use aggregations inside other functions:
grouped
+|> DF.summarise(greater_than_9: greater(max(per_capita), 9.0), per_capita_max: max(per_capita))
+|> DataFrame.arrange(desc: per_capita_max)
That's it!