From b970d90a49df4b60e7a8250fb620d71dcb2257d3 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Sun, 25 Feb 2024 13:41:35 +0000 Subject: [PATCH] build based on 918e040 --- dev/.documenter-siteinfo.json | 2 +- dev/datasets/index.html | 14 +++++++------- dev/description/index.html | 2 +- dev/filesystem/index.html | 4 ++-- dev/index.html | 2 +- dev/manipulation/index.html | 32 ++++++++++++++++---------------- dev/utils/index.html | 2 +- 7 files changed, 29 insertions(+), 29 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 27b5c46..79c809d 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2024-02-14T16:56:39","documenter_version":"1.2.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2024-02-25T13:41:32","documenter_version":"1.2.1"}} \ No newline at end of file diff --git a/dev/datasets/index.html b/dev/datasets/index.html index 779541c..fd17b49 100644 --- a/dev/datasets/index.html +++ b/dev/datasets/index.html @@ -1,5 +1,5 @@ -Datasets · MultiData.jl

Datasets

A machine learning dataset are a collection of instances (or samples), each one described by a number of variables. In the case of tabular data, a dataset looks like a database table, where every column is a variable, and each row corresponds to a given instance. However, a dataset can also be non-tabular; for example, each instance can consist of a multivariate time-series, or an image.

When data is composed of different modalities) combining their statistical properties is non-trivial, since they may be quite different in nature one another.

The abstract representation of a multimodal dataset provided by this package is the AbstractMultiDataset.

MultiData.AbstractMultiDatasetType

Abstract supertype for all multimodal datasets.

A concrete multimodal dataset should always provide accessors data, to access the underlying tabular structure (e.g., DataFrame) and grouped_variables, to access the grouping of variables (a vector of vectors of column indices).

source
MultiData.grouped_variablesFunction
grouped_variables(amd)::Vector{Vector{Int}}

Return the indices of the variables grouped by modality, of an AbstractMultiDataset. The grouping describes how the different modalities are composed from the underlying AbstractDataFrame structure.

See also data, AbstractMultiDataset.

source
SoleBase.dimensionalityFunction
dimensionality(df)

Return the dimensionality of a dataframe df.

If the dataframe has variables of various dimensionalities :mixed is returned.

If the dataframe is empty (no instances) :empty is returned. This behavior can be controlled by setting the keyword argument force:

  • :no (default): return :mixed in case of mixed dimensionality
  • :max: return the greatest dimensionality
  • :min: return the lowest dimensionality
source

Unlabeled Datasets

In unlabeled datasets there is no labeling variable, and all of the variables (also called feature variables, or features) have equal role in the representation. These datasets are used in unsupervised learning contexts, for discovering internal correlation patterns between the features. Multimodal unlabeled datasets can be instantiated with MultiDataset.

MultiData.MultiDatasetType
MultiDataset(df, grouped_variables)

Create a MultiDataset from an AbstractDataFrame df, initializing its modalities according to the grouping in grouped_variables.

grouped_variables is an AbstractVector of variable grouping which are AbstractVectors of integers representing the index of the variables selected for that modality.

Note that the order matters for both the modalities and the variables.

julia> df = DataFrame(
+Datasets · MultiData.jl

Datasets

A machine learning dataset are a collection of instances (or samples), each one described by a number of variables. In the case of tabular data, a dataset looks like a database table, where every column is a variable, and each row corresponds to a given instance. However, a dataset can also be non-tabular; for example, each instance can consist of a multivariate time-series, or an image.

When data is composed of different modalities) combining their statistical properties is non-trivial, since they may be quite different in nature one another.

The abstract representation of a multimodal dataset provided by this package is the AbstractMultiDataset.

MultiData.AbstractMultiDatasetType

Abstract supertype for all multimodal datasets.

A concrete multimodal dataset should always provide accessors data, to access the underlying tabular structure (e.g., DataFrame) and grouped_variables, to access the grouping of variables (a vector of vectors of column indices).

source
MultiData.grouped_variablesFunction
grouped_variables(amd)::Vector{Vector{Int}}

Return the indices of the variables grouped by modality, of an AbstractMultiDataset. The grouping describes how the different modalities are composed from the underlying AbstractDataFrame structure.

See also data, AbstractMultiDataset.

source
SoleBase.dimensionalityFunction
dimensionality(df)

Return the dimensionality of a dataframe df.

If the dataframe has variables of various dimensionalities :mixed is returned.

If the dataframe is empty (no instances) :empty is returned. This behavior can be controlled by setting the keyword argument force:

  • :no (default): return :mixed in case of mixed dimensionality
  • :max: return the greatest dimensionality
  • :min: return the lowest dimensionality
source

Unlabeled Datasets

In unlabeled datasets there is no labeling variable, and all of the variables (also called feature variables, or features) have equal role in the representation. These datasets are used in unsupervised learning contexts, for discovering internal correlation patterns between the features. Multimodal unlabeled datasets can be instantiated with MultiDataset.

MultiData.MultiDatasetType
MultiDataset(df, grouped_variables)

Create a MultiDataset from an AbstractDataFrame df, initializing its modalities according to the grouping in grouped_variables.

grouped_variables is an AbstractVector of variable grouping which are AbstractVectors of integers representing the index of the variables selected for that modality.

Note that the order matters for both the modalities and the variables.

julia> df = DataFrame(
                   :age => [30, 9],
                   :name => ["Python", "Julia"],
                   :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]],
@@ -103,7 +103,7 @@
      │ Array…
 ─────┼───────────────────────────────────
    1 │ [0.540302, -0.416147, -0.989992,…
-   2 │ [0.841471, 0.909297, 0.14112, -0…
source
MultiData._emptyMethod
_empty(md)

Return a copy of a multimodal dataset with no instances.

Note: since the returned AbstractMultiDataset will be empty its columns types will be Any.

source

Labeled Datasets

In labeled datasets, one or more variables are considered to have special semantics with respect to the other variables; each of these labeling variables (or target variables) can be thought as assigning a label to each instance, which is typically a categorical value (classification label) or a numerical value (regression label). Supervised learning methods can be applied on these datasets for modeling the target variables as a function of the feature variables.

As an extension of the AbstractMultiDataset, AbstractLabeledMultiDataset has an interface that can be implemented to represent multimodal labeled datasets.

MultiData.AbstractLabeledMultiDatasetType

Abstract supertype for all labeled multimodal datasets (used in supervised learning).

As any multimodal dataset, any concrete labeled multimodal dataset should always provide the accessors data, to access the underlying tabular structure (e.g., DataFrame) and grouped_variables, to access the grouping of variables. In addition to these, implementations are required for labeling_variables, to access the indices of the labeling variables.

See also AbstractMultiDataset.

source
Missing docstring.

Missing docstring for dataset. Check Documenter's build log for details.

Multimodal labeled datasets can be instantiated with LabeledMultiDataset.

MultiData.LabeledMultiDatasetType
LabeledMultiDataset(md, labeling_variables)

Create a LabeledMultiDataset by associating an AbstractMultiDataset with some labeling variables, specified as a column index (Int) or a vector of column indices (Vector{Int}).

Arguments

  • md is the original AbstractMultiDataset;
  • labeling_variables is an AbstractVector of integers indicating the indices of the variables that will be set as labels.

Examples

julia> lmd = LabeledMultiDataset(MultiDataset([[2],[4]], DataFrame(
+   2 │ [0.841471, 0.909297, 0.14112, -0…
source
MultiData._emptyMethod
_empty(md)

Return a copy of a multimodal dataset with no instances.

Note: since the returned AbstractMultiDataset will be empty its columns types will be Any.

source

Labeled Datasets

In labeled datasets, one or more variables are considered to have special semantics with respect to the other variables; each of these labeling variables (or target variables) can be thought as assigning a label to each instance, which is typically a categorical value (classification label) or a numerical value (regression label). Supervised learning methods can be applied on these datasets for modeling the target variables as a function of the feature variables.

As an extension of the AbstractMultiDataset, AbstractLabeledMultiDataset has an interface that can be implemented to represent multimodal labeled datasets.

MultiData.AbstractLabeledMultiDatasetType

Abstract supertype for all labeled multimodal datasets (used in supervised learning).

As any multimodal dataset, any concrete labeled multimodal dataset should always provide the accessors data, to access the underlying tabular structure (e.g., DataFrame) and grouped_variables, to access the grouping of variables. In addition to these, implementations are required for labeling_variables, to access the indices of the labeling variables.

See also AbstractMultiDataset.

source
Missing docstring.

Missing docstring for dataset. Check Documenter's build log for details.

Multimodal labeled datasets can be instantiated with LabeledMultiDataset.

MultiData.LabeledMultiDatasetType
LabeledMultiDataset(md, labeling_variables)

Create a LabeledMultiDataset by associating an AbstractMultiDataset with some labeling variables, specified as a column index (Int) or a vector of column indices (Vector{Int}).

Arguments

  • md is the original AbstractMultiDataset;
  • labeling_variables is an AbstractVector of integers indicating the indices of the variables that will be set as labels.

Examples

julia> lmd = LabeledMultiDataset(MultiDataset([[2],[4]], DataFrame(
            :id => [1, 2],
            :age => [30, 9],
            :name => ["Python", "Julia"],
@@ -130,7 +130,7 @@
 ─────┼───────────────────────────────────
    1 │ [0.841471, 0.909297, 0.14112, -0…
    2 │ [0.540302, -0.416147, -0.989992,…
-
source
MultiData.joinlabels!Method
joinlabels!(lmd, [lbls...]; delim = "_")

On a labeled multimodal dataset, collapse the labeling variables identified by lbls into a single labeling variable of type String, by means of a join that uses delim for string delimiter.

If not specified differently this function will join all labels.

lbls can be an Integer indicating the index of the label, or a Symbol indicating the name of the labeling variable.

!!! note

The resulting labels will always be of type String.

Note

The resulting labeling variable will always be added as last column in the underlying DataFrame.

Examples

julia> lmd = LabeledMultiDataset(
+
source
MultiData.joinlabels!Method
joinlabels!(lmd, [lbls...]; delim = "_")

On a labeled multimodal dataset, collapse the labeling variables identified by lbls into a single labeling variable of type String, by means of a join that uses delim for string delimiter.

If not specified differently this function will join all labels.

lbls can be an Integer indicating the index of the label, or a Symbol indicating the name of the labeling variable.

!!! note

The resulting labels will always be of type String.

Note

The resulting labeling variable will always be added as last column in the underlying DataFrame.

Examples

julia> lmd = LabeledMultiDataset(
            MultiDataset(
                [[2],[4]],
                DataFrame(
@@ -185,7 +185,7 @@
      │ Array…
 ─────┼───────────────────────────────────
    1 │ [0.841471, 0.909297, 0.14112, -0…
-   2 │ [0.540302, -0.416147, -0.989992,…
source
MultiData.labelMethod
label(lmd, j, i)

Return the value of the i-th labeling variable for instance at index i_instance in a labeled multimodal dataset.

source
MultiData.labelsMethod
labels(lmd, i_instance)
-labels(lmd)

Return the labels of instance at index i_instance in a labeled multimodal dataset. A dictionary of type labelname => value is returned.

If only the first argument is passed then the labels for all instances are returned.

source
MultiData.setaslabeling!Method
setaslabeling!(lmd, i)
-setaslabeling!(lmd, var_name)

Set i-th variable as label.

The variable name can be passed as second argument instead of its index.

source
MultiData.unsetaslabeling!Method
unsetaslabeling!(lmd, i)
-unsetaslabeling!(lmd, var_name)

Remove i-th labeling variable from labels list.

The variable name can be passed as second argument instead of its index.

source
+ 2 │ [0.540302, -0.416147, -0.989992,…
source
MultiData.labelMethod
label(lmd, j, i)

Return the value of the i-th labeling variable for instance at index i_instance in a labeled multimodal dataset.

source
MultiData.labelsMethod
labels(lmd, i_instance)
+labels(lmd)

Return the labels of instance at index i_instance in a labeled multimodal dataset. A dictionary of type labelname => value is returned.

If only the first argument is passed then the labels for all instances are returned.

source
MultiData.setaslabeling!Method
setaslabeling!(lmd, i)
+setaslabeling!(lmd, var_name)

Set i-th variable as label.

The variable name can be passed as second argument instead of its index.

source
MultiData.unsetaslabeling!Method
unsetaslabeling!(lmd, i)
+unsetaslabeling!(lmd, var_name)

Remove i-th labeling variable from labels list.

The variable name can be passed as second argument instead of its index.

source
diff --git a/dev/description/index.html b/dev/description/index.html index df1f83c..3adfcd0 100644 --- a/dev/description/index.html +++ b/dev/description/index.html @@ -27,4 +27,4 @@ 1 │ stat AbstractFloat[8.63372e-6; -2.848… AbstractFloat[-1.0; -1.0 ⋯ 5 columns omitted

the describe implementation for MultiDatasets will try to find the best statistical measures that can be used to the type of data the modality contains.

In the example the 2nd modality, which contains variables (just one in the example) of data of type Vector{Float64}, was described by applying the well known 22 features from the package Catch22.jl plus maximum, minimum and mean as the vectors were time series.

DataAPI.describeFunction
describe(md; t = fill([(1, 0, 0)], nmodalities(md)), kwargs...)

Return descriptive statistics for an AbstractMultiDataset as a Vector of new DataFrames where each row represents a variable and each column a summary statistic.

Arguments

  • md: the AbstractMultiDataset;
  • t: is a vector of nmodalities elements, where each element is a vector as long as the dimensionality of
the i-th modality. Each element of the innermost vector is a tuple
-of arguments for [`paa`](@ref).

For other see the documentation of DataFrames.describe function.

Examples

TODO: examples

source
+of arguments for [`paa`](@ref).

For other see the documentation of DataFrames.describe function.

Examples

TODO: examples

source diff --git a/dev/filesystem/index.html b/dev/filesystem/index.html index 16c6717..75763d3 100644 --- a/dev/filesystem/index.html +++ b/dev/filesystem/index.html @@ -1,5 +1,5 @@ -Filesystem · MultiData.jl

Filesystem

MultiData.datasetinfoMethod
datasetinfo(datasetpath; onlywithlabels = [], shufflelabels = [], rng = Random.GLOBAL_RNG)

Show dataset size on disk and return a Touple with first element a vector of selected IDs, second element the labels DataFrame or nothing and third element the total size in bytes.

Arguments

  • onlywithlabels is used to select which portion of the Dataset to load, by specifying labels and their values to use as filters. See loaddataset for more info.
  • shufflelabels is an AbstractVector of names of labels to shuffle (default = [], means no shuffle).
  • rng is a random number generator to be used when shuffling (for reproducibility); can be either a Integer (used as seed for MersenneTwister) or an AbstractRNG.
source
MultiData.loaddatasetMethod
loaddataset(datasetpath; onlywithlabels = [], shufflelabels = [], rng = Random.GLOBAL_RNG)

Create a MultiDataset or a LabeledMultiDataset from a Dataset, based on the presence of file Labels.csv.

Arguments

  • datasetpath is an AbstractString that denote the Dataset's position;
  • onlywithlabels is an AbstractVector{AbstractVector{Pair{AbstractString,AbstractVector{Any}}}} and it's used to select which portion of the Dataset to load, by specifying labels and their values. Beginning from the center, each Pair{AbstractString,AbstractVector{Any}} must contain, as AbstractString the label's name, and, as AbstractVector{Any} the values for that label. Each Pair in one vector must refer to a different label, so if the Dataset has in total n labels, this vector of Pair can contain maximun n element. That's because the elements will combine with each other. Every vector of Pair act as a filter. Note that the same label can be used in different vector of Pair as they do not combine with each other. If onlywithlabels is an empty vector (default) the function will load the entire Dataset.
  • shufflelabels is an AbstractVector of names of labels to shuffle (default = [], means no shuffle).
  • rng is a random number generator to be used when shuffling (for reproducibility); can be either a Integer (used as seed for MersenneTwister) or an AbstractRNG.

Examples

julia> df_data = DataFrame(
+Filesystem · MultiData.jl

Filesystem

MultiData.datasetinfoMethod
datasetinfo(datasetpath; onlywithlabels = [], shufflelabels = [], rng = Random.GLOBAL_RNG)

Show dataset size on disk and return a Touple with first element a vector of selected IDs, second element the labels DataFrame or nothing and third element the total size in bytes.

Arguments

  • onlywithlabels is used to select which portion of the Dataset to load, by specifying labels and their values to use as filters. See loaddataset for more info.
  • shufflelabels is an AbstractVector of names of labels to shuffle (default = [], means no shuffle).
  • rng is a random number generator to be used when shuffling (for reproducibility); can be either a Integer (used as seed for MersenneTwister) or an AbstractRNG.
source
MultiData.loaddatasetMethod
loaddataset(datasetpath; onlywithlabels = [], shufflelabels = [], rng = Random.GLOBAL_RNG)

Create a MultiDataset or a LabeledMultiDataset from a Dataset, based on the presence of file Labels.csv.

Arguments

  • datasetpath is an AbstractString that denote the Dataset's position;
  • onlywithlabels is an AbstractVector{AbstractVector{Pair{AbstractString,AbstractVector{Any}}}} and it's used to select which portion of the Dataset to load, by specifying labels and their values. Beginning from the center, each Pair{AbstractString,AbstractVector{Any}} must contain, as AbstractString the label's name, and, as AbstractVector{Any} the values for that label. Each Pair in one vector must refer to a different label, so if the Dataset has in total n labels, this vector of Pair can contain maximun n element. That's because the elements will combine with each other. Every vector of Pair act as a filter. Note that the same label can be used in different vector of Pair as they do not combine with each other. If onlywithlabels is an empty vector (default) the function will load the entire Dataset.
  • shufflelabels is an AbstractVector of names of labels to shuffle (default = [], means no shuffle).
  • rng is a random number generator to be used when shuffling (for reproducibility); can be either a Integer (used as seed for MersenneTwister) or an AbstractRNG.

Examples

julia> df_data = DataFrame(
            :id => [1, 2, 3, 4, 5],
            :age => [30, 9, 30, 40, 9],
            :name => ["Python", "Julia", "C", "Java", "R"],
@@ -125,4 +125,4 @@
      │ Int64
 ─────┼───────
    1 │     2
-   2 │     3
source
MultiData.savedatasetMethod
savedataset(datasetpath, md; instance_ids, name, force = false)

Save md AbstractMultiDataset on disk at path datasetpath in the following format:

datasetpath ├─ Example1 │ └─ Modality1.csv │ └─ Modality2.csv │ └─ ... │ └─ Modalityn.csv │ └─ Metadata.txt ├─ Example2 │ └─ Modality1.csv │ └─ Modality2.csv │ └─ ... │ └─ Modalityn.csv │ └─ Metadata.txt ├─ ... ├─ Example_n ├─ Metadata.txt └─ Labels.csv

Arguments

  • instance_ids is an AbstractVector{Integer} that denote the identifier of the instances,
  • name is an AbstractString and denote the name of the Dataset, that will be saved in the Metadata of the Dataset,
  • force is a Bool, if it's set to true, then in case datasetpath already exists, it will be overwritten otherwise the operation will be aborted. (default = false)
  • labels_indices is an AbstractVector{Integer} and contains the indices of the labels' column (allowed only when passing a MultiDataset)

Alternatively to an AbstractMultiDataset, a DataFrame can be passed as second argument. If this is the case a third positional argument is required representing the grouped_variables of the dataset. See MultiDataset for syntax of grouped_variables.

source
+ 2 │ 3
source
MultiData.savedatasetMethod
savedataset(datasetpath, md; instance_ids, name, force = false)

Save md AbstractMultiDataset on disk at path datasetpath in the following format:

datasetpath ├─ Example1 │ └─ Modality1.csv │ └─ Modality2.csv │ └─ ... │ └─ Modalityn.csv │ └─ Metadata.txt ├─ Example2 │ └─ Modality1.csv │ └─ Modality2.csv │ └─ ... │ └─ Modalityn.csv │ └─ Metadata.txt ├─ ... ├─ Example_n ├─ Metadata.txt └─ Labels.csv

Arguments

  • instance_ids is an AbstractVector{Integer} that denote the identifier of the instances,
  • name is an AbstractString and denote the name of the Dataset, that will be saved in the Metadata of the Dataset,
  • force is a Bool, if it's set to true, then in case datasetpath already exists, it will be overwritten otherwise the operation will be aborted. (default = false)
  • labels_indices is an AbstractVector{Integer} and contains the indices of the labels' column (allowed only when passing a MultiDataset)

Alternatively to an AbstractMultiDataset, a DataFrame can be passed as second argument. If this is the case a third positional argument is required representing the grouped_variables of the dataset. See MultiDataset for syntax of grouped_variables.

source
diff --git a/dev/index.html b/dev/index.html index 723a23e..f7c5b05 100644 --- a/dev/index.html +++ b/dev/index.html @@ -72,4 +72,4 @@ 1 │ [0.841471, 0.909297, 0.14112, -0… 2 │ [0.540302, -0.416147, -0.989992,…

Note that each element of a MultiDataset is a SubDataFrame:

julia> eltype(md)
 SubDataFrame
-
Spare variables

Spare variables will never be seen when accessing a MultiDataset through its iterator interface. To access them see sparevariables.

+
Spare variables

Spare variables will never be seen when accessing a MultiDataset through its iterator interface. To access them see sparevariables.

diff --git a/dev/manipulation/index.html b/dev/manipulation/index.html index 208d4ba..fc4f6ef 100644 --- a/dev/manipulation/index.html +++ b/dev/manipulation/index.html @@ -94,7 +94,7 @@ │ Int64 ─────┼──────── 1 │ 180 - 2 │ 175source
MultiData.addvariable_tomodality!Method
addvariable_tomodality!(md, i_modality, var_index)
+   2 │    175
source
MultiData.addvariable_tomodality!Method
addvariable_tomodality!(md, i_modality, var_index)
 addvariable_tomodality!(md, i_modality, var_indices)
 addvariable_tomodality!(md, i_modality, var_name)
 addvariable_tomodality!(md, i_modality, var_names)

Add variable at index var_index to the modality at index i_modality in a multimodal dataset, and return the dataset. Alternatively to var_index the variable name can be used. Multiple variables can be inserted into the multimodal dataset at once using var_indices or var_inames.

Note: The function does not allow you to add a variable to a new modality, but only to add it to an existing modality. To add a new modality use addmodality! instead.

Arguments

  • md is a MultiDataset;
  • i_modality is an Integer indicating the modality in which the variable(s) will be added;
  • var_index is an Integer that indicates the index of the variable to add to a specific modality of the multimodal dataset;
  • var_indices is an AbstractVector{Integer} indicating the indices of the variables to add to a specific modality of the multimodal dataset;
  • var_name is a Symbol indicating the name of the variable to add to a specific modality of the multimodal dataset;
  • var_names is an AbstractVector{Symbol} indicating the name of the variables to add to a specific modality of the multimodal dataset;

Examples

julia> df = DataFrame(:name => ["Python", "Julia"],
@@ -176,7 +176,7 @@
      │ Char  String  Int64
 ─────┼──────────────────────
    1 │ M     Python      80
-   2 │ F     Julia       60
source
MultiData.dropmodalities!Method
dropmodalities!(md, indices)
+   2 │ F     Julia       60
source
MultiData.dropmodalities!Method
dropmodalities!(md, indices)
 dropmodalities!(md, index)

Remove the i-th modality from a multimodal dataset while dropping all variables in it, and return the dataset itself.

Note: if the dropped variables are contained in other modalities they will also be removed from them. This can lead to the removal of additional modalities other than the i-th.

If the intention is to remove a modality without dropping the variables use removemodality! instead.

Arguments

  • md is a MultiDataset;
  • index is an Integer indicating the index of the modality to drop;
  • indices is an AbstractVector{Integer} indicating the indices of the modalities to drop.

Examples

julia> df = DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60])
 2×5 DataFrame
  Row │ name    age    sex   height  weight
@@ -254,7 +254,7 @@
      │ String
 ─────┼────────
    1 │ Python
-   2 │ Julia
source
MultiData.eachmodalityMethod
eachmodality(md)

Return a (lazy) iterator of the modalities of a multimodal dataset.

source
MultiData.insertmodality!Function
insertmodality!(md, col, new_modality, existing_variables)
+   2 │ Julia
source
MultiData.eachmodalityMethod
eachmodality(md)

Return a (lazy) iterator of the modalities of a multimodal dataset.

source
MultiData.insertmodality!Function
insertmodality!(md, col, new_modality, existing_variables)
 insertmodality!(md, new_modality, existing_variables)

Insert new_modality as new modality to multimodal dataset, and return the dataset. Existing variables can be added to the new modality while adding it to the dataset by passing the corresponding indices as existing_variables. If col is specified then the variables will be inserted starting at index col.

Arguments

  • md is a MultiDataset;
  • col is an Integer indicating the column in which to insert the columns of new_modality;
  • new_modality is an AbstractDataFrame which will be added to the multimodal dataset as a sub-dataframe of a new modality;
  • existing_variables is an AbstractVector{Integer} or AbstractVector{Symbol}. It indicates which variables of the multimodal dataset internal dataframe structure to insert in the new modality.

Examples

julia> df = DataFrame(
            :name => ["Python", "Julia"],
            :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]
@@ -435,7 +435,7 @@
      │ Int64  String
 ─────┼───────────────
    1 │    30  Python
-   2 │     9  Julia
source
MultiData.keeponlymodalities!Method

TODO

source
MultiData.modalityMethod
modality(md, i)

Return the i-th modality of a multimodal dataset.

modality(md, indices)

Return a Vector of modalities at indices of a multimodal dataset.

source
MultiData.nmodalitiesMethod
nmodalities(md)

Return the number of modalities of a multimodal dataset.

source
MultiData.removemodality!Method
removemodality!(md, indices)
+   2 │     9  Julia
source
MultiData.keeponlymodalities!Method

TODO

source
MultiData.modalityMethod
modality(md, i)

Return the i-th modality of a multimodal dataset.

modality(md, indices)

Return a Vector of modalities at indices of a multimodal dataset.

source
MultiData.nmodalitiesMethod
nmodalities(md)

Return the number of modalities of a multimodal dataset.

source
MultiData.removemodality!Method
removemodality!(md, indices)
 removemodality!(md, index)

Remove i-th modality from a multimodal dataset, and return the dataset.

Note: to completely remove a modality and all variables in it use dropmodalities! instead.

Arguments

  • md is a MultiDataset;
  • index is an Integer that indicates which modality to remove from the multimodal dataset;
  • indices is an AbstractVector{Integer} that indicates the modalities to remove from the multimodal dataset;

Examples

julia> df = DataFrame(:name => ["Python", "Julia"],
                       :age => [25, 26],
                       :sex => ['M', 'F'],
@@ -540,7 +540,7 @@
 ─────┼─────────────────────────────
    1 │ Python     25  M        180
    2 │ Julia      26  F        175
-
source
MultiData.removevariable_frommodality!Method
removevariable_frommodality!(md, i_modality, var_indices)
+
source
MultiData.removevariable_frommodality!Method
removevariable_frommodality!(md, i_modality, var_indices)
 removevariable_frommodality!(md, i_modality, var_index)
 removevariable_frommodality!(md, i_modality, var_name)
 removevariable_frommodality!(md, i_modality, var_names)

Remove variable at index var_index from the modality at index i_modality in a multimodal dataset, and return the dataset itself.

Alternatively to var_index the variable name can be used. Multiple variables can be dropped from the multimodal dataset at once, by passing a Vector of Symbols (for names), or a Vector of integers (for indices) as a last argument.

Note: when all variables are dropped from a modality, it will be removed.

Arguments

  • md is a MultiDataset;
  • i_modality is an Integer indicating the modality in which the variable(s) will be dropped;
  • var_index is an Integer that indicates the index of the variable to drop from a specific modality of the multimodal dataset;
  • var_indices is an AbstractVector{Integer} indicating the indices of the variables to drop from a specific modality of the multimodal dataset;
  • var_name is a Symbol indicating the name of the variable to drop from a specific modality of the multimodal dataset;
  • var_names is an AbstractVector{Symbol} indicating the name of the variables to drop from a specific modality of the multimodal dataset;

Examples

julia> df = DataFrame(:name => ["Python", "Julia"],
@@ -688,7 +688,7 @@
      │ String  Char  Int64   Int64
 ─────┼──────────────────────────────
    1 │ Python  M        180      80
-   2 │ Julia   F        175      60
source

Variables

MultiData.dropsparevariables!Method
dropsparevariables!(md)

Drop all variables that are not contained in any of the modalities in a multimodal dataset.

Arguments

  • md is a MultiDataset, that is the structure at which sparevariables will be dropped.

Examples

julia> md = MultiDataset([[1]], DataFrame(:age => [30, 9], :name => ["Python", "Julia"]))
+   2 │ Julia   F        175      60
source

Variables

MultiData.dropsparevariables!Method
dropsparevariables!(md)

Drop all variables that are not contained in any of the modalities in a multimodal dataset.

Arguments

  • md is a MultiDataset, that is the structure at which sparevariables will be dropped.

Examples

julia> md = MultiDataset([[1]], DataFrame(:age => [30, 9], :name => ["Python", "Julia"]))
 ● MultiDataset
    └─ dimensionalities: (0,)
 - Modality 1 / 1
@@ -715,7 +715,7 @@
      │ String
 ─────┼────────
    1 │ Python
-   2 │ Julia
source
MultiData.dropvariables!Method
dropvariables!(md, i)
+   2 │ Julia
source
MultiData.dropvariables!Method
dropvariables!(md, i)
 dropvariables!(md, variable_name)
 dropvariables!(md, indices)
 dropvariables!(md, variable_names)
@@ -791,7 +791,7 @@
      │ Char
 ─────┼──────
    1 │ M
-   2 │ F

TODO: To be reviewed

source
MultiData.hasvariablesMethod
hasvariables(df, variable_name)
+   2 │ F

TODO: To be reviewed

source
MultiData.hasvariablesMethod
hasvariables(df, variable_name)
 hasvariables(md, i_modality, variable_name)
 hasvariables(md, variable_name)
 hasvariables(df, variable_names)
@@ -859,7 +859,7 @@
 true
 
 julia> hasvariables(md.data, [:name, :sex])
-true
source
MultiData.insertvariables!Method
insertvariables!(md, col, index, values)
+true
source
MultiData.insertvariables!Method
insertvariables!(md, col, index, values)
 insertvariables!(md, index, values)
 insertvariables!(md, col, index, value)
 insertvariables!(md, index, value)

Insert a variable in a multimodal dataset with a given index.

Note

Each inserted variable will be added in as a spare variables.

Arguments

  • md is an AbstractMultiDataset;
  • col is an Integer indicating in which position to insert the new variable. If no col is passed, the new variable will be placed last in the md's underlying dataframe structure;
  • index is a Symbol and denote the name of the variable to insert. Duplicated variable names will be renamed to avoid conflicts: see makeunique argument for insertcols! in DataFrames documentation;
  • values is an AbstractVector that indicates the values for the newly inserted variable. The length of values should match ninstances(md);
  • value is a single value for the new variable. If a single value is passed as a last argument this will be copied and used for each instance in the dataset.

Examples

julia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
@@ -960,7 +960,7 @@
      │ Int64   Int64   String
 ─────┼────────────────────────
    1 │    180      80  brown
-   2 │    180      75  blonde
source
MultiData.keeponlyvariables!Method
keeponlyvariables!(md, indices)
+   2 │    180      75  blonde
source
MultiData.keeponlyvariables!Method
keeponlyvariables!(md, indices)
 keeponlyvariables!(md, variable_names)

Drop all variables that do not correspond to the indices in indices from a multimodal dataset.

Note: if the dropped variables are contained in some modality they will also be removed from them; as a side effect, this can lead to the removal of modalities.

Arguments

  • md is a MultiDataset;
  • indices is an AbstractVector{Integer} that indicates which indices to keep in the multimodal dataset;
  • variable_names is an AbstractVector{Symbol} that indicates which variables to keep in the multimodal dataset.

Examples

julia> md = MultiDataset([[1, 2],[3, 4, 5],[5]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))
 ● MultiDataset
    └─ dimensionalities: (0, 0, 0)
@@ -1028,7 +1028,7 @@
      │ Char
 ─────┼──────
    1 │ M
-   2 │ F

TODO: review

source
MultiData.nvariablesMethod
nvariables(md)
+   2 │ F

TODO: review

source
MultiData.nvariablesMethod
nvariables(md)
 nvariables(md, i)

Return the number of variables in a multimodal dataset.

If an index i is passed as second argument, then the number of variables of the i-th modality is returned.

Alternatively, nvariables can be called on a single modality.

Arguments

  • md is a MultiDataset;
  • i (optional) is an Integer indicating the modality of the multimodal dataset whose number of variables you want to know.

Examples

julia> md = MultiDataset([[1],[2]], DataFrame(:age => [25, 26], :sex => ['M', 'F']))
 ● MultiDataset
    └─ dimensionalities: (0, 0)
@@ -1102,7 +1102,7 @@
    2 │ F        175      60
 
 julia> nvariables(mod2)
-3
source
MultiData.sparevariablesMethod
sparevariables(md)

Return the indices of all the variables that are not contained in any of the modalities of a multimodal dataset.

Arguments

  • md is a MultiDataset, which is the structure whose indices of the sparevariables are to be known.

Examples

julia> md = MultiDataset([[1],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
+3
source
MultiData.sparevariablesMethod
sparevariables(md)

Return the indices of all the variables that are not contained in any of the modalities of a multimodal dataset.

Arguments

  • md is a MultiDataset, which is the structure whose indices of the sparevariables are to be known.

Examples

julia> md = MultiDataset([[1],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
 ● MultiDataset
    └─ dimensionalities: (0, 0)
 - Modality 1 / 2
@@ -1140,7 +1140,7 @@
 
 julia> sparevariables(md)
 1-element Vector{Int64}:
- 2
source
MultiData.variableindexMethod
variableindex(df, variable_name)
+ 2
source
MultiData.variableindexMethod
variableindex(df, variable_name)
 variableindex(md, i_modality, variable_name)
 variableindex(md, variable_name)

Return the index of the variable. When i_modality is passed, the function returns the index of the variable in the sub-dataframe of the modality identified by i_modality. It returns 0 when the variable is not contained in the modality identified by i_modality.

Arguments

  • df is an AbstractDataFrame;
  • md is an AbstractMultiDataset;
  • variable_name is a Symbol indicating the variable whose index you want to know;
  • i_modality is an Integer indicating of which modality you want to know the index of the variable.

Examples

julia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
 ● MultiDataset
@@ -1186,7 +1186,7 @@
 1
 
 julia> variableindex(md.data, :age)
-2
source
MultiData.variablesMethod
variables(md, i)

Return the names as Symbols of the variables in a multimodal dataset.

When called on a object of type MultiDataset a Dict is returned which will map the modality index to an AbstractVector{Symbol}.

Note: the order of the variable names is granted to match the order of the variables in the modality.

If an index i is passed as second argument, then the names of the variables of the i-th modality are returned as an AbstractVector.

Alternatively, nvariables can be called on a single modality.

Arguments

  • md is an MultiDataset;
  • i is an Integer indicating from which modality of the multimodal dataset to get the names of the variables.

Examples

julia> md = MultiDataset([[2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
+2
source
MultiData.variablesMethod
variables(md, i)

Return the names as Symbols of the variables in a multimodal dataset.

When called on a object of type MultiDataset a Dict is returned which will map the modality index to an AbstractVector{Symbol}.

Note: the order of the variable names is granted to match the order of the variables in the modality.

If an index i is passed as second argument, then the names of the variables of the i-th modality are returned as an AbstractVector.

Alternatively, nvariables can be called on a single modality.

Arguments

  • md is an MultiDataset;
  • i is an Integer indicating from which modality of the multimodal dataset to get the names of the variables.

Examples

julia> md = MultiDataset([[2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
 ● MultiDataset
    └─ dimensionalities: (0, 0)
 - Modality 1 / 2
@@ -1237,7 +1237,7 @@
 
 julia> variables(mod2)
 1-element Vector{Symbol}:
- :sex
source

Instances

MultiData.deleteinstances!Method
deleteinstances!(md, i)

Remove the i-th instance in a multimodal dataset, and return the dataset itself.

deleteinstances!(md, i_instances)

Remove the instances at i_instances in a multimodal dataset, and return the dataset itself.

source
MultiData.instanceMethod
instance(md, i)

Return the i-th instance in a multimodal dataset.

instance(md, i_modality, i_instance)

Return the i_instance-th instance in a multimodal dataset with only variables from the the i_modality-th modality.

instance(md, i_instances)

Return instances at i_instances in a multimodal dataset.

instance(md, i_modality, i_instances)

Return iinstances at `iinstancesin a multimodal dataset with only variables from the thei_modality`-th modality.

source
MultiData.keeponlyinstances!Method
keeponlyinstances!(md, i_instances)

Remove all instances from a multimodal dataset, which index does not appear in i_instances.

source
MultiData.pushinstances!Method
pushinstances!(md, instance)

Add an instance to a multimodal dataset, and return the dataset itself.

The instance can be a DataFrameRow or an AbstractVector but in both cases the number and type of variables should match those of the dataset.

source
SoleBase.ninstancesMethod
ninstances(md)

Return the number of instances in a multimodal dataset.

Examples

julia> md = MultiDataset([[1],[2]],DataFrame(:age => [25, 26], :sex => ['M', 'F']))
+ :sex
source

Instances

MultiData.deleteinstances!Method
deleteinstances!(md, i)

Remove the i-th instance in a multimodal dataset, and return the dataset itself.

deleteinstances!(md, i_instances)

Remove the instances at i_instances in a multimodal dataset, and return the dataset itself.

source
MultiData.instanceMethod
instance(md, i)

Return the i-th instance in a multimodal dataset.

instance(md, i_modality, i_instance)

Return the i_instance-th instance in a multimodal dataset with only variables from the the i_modality-th modality.

instance(md, i_instances)

Return instances at i_instances in a multimodal dataset.

instance(md, i_modality, i_instances)

Return iinstances at `iinstancesin a multimodal dataset with only variables from the thei_modality`-th modality.

source
MultiData.keeponlyinstances!Method
keeponlyinstances!(md, i_instances)

Remove all instances from a multimodal dataset, which index does not appear in i_instances.

source
MultiData.pushinstances!Method
pushinstances!(md, instance)

Add an instance to a multimodal dataset, and return the dataset itself.

The instance can be a DataFrameRow or an AbstractVector but in both cases the number and type of variables should match those of the dataset.

source
SoleBase.ninstancesMethod
ninstances(md)

Return the number of instances in a multimodal dataset.

Examples

julia> md = MultiDataset([[1],[2]],DataFrame(:age => [25, 26], :sex => ['M', 'F']))
 ● MultiDataset
    └─ dimensionalities: (0, 0)
 - Modality 1 / 2
@@ -1266,4 +1266,4 @@
    2 │ F
 
 julia> ninstances(md) == ninstances(mod2) == 2
-true
source
+truesource diff --git a/dev/utils/index.html b/dev/utils/index.html index 910710d..874e190 100644 --- a/dev/utils/index.html +++ b/dev/utils/index.html @@ -1,2 +1,2 @@ -Utils · MultiData.jl

Utils

MultiData.paaFunction
paa(x; f = identity, t = (1, 0, 0))

Piecewise Aggregate Approximation

Apply f function to each dimensionality of x array divinding it in t[1] windows taking t[2] extra points left and t[3] extra points right.

Note: first window will always consider t[2] = 0 and last one will always consider t[3] = 0.

source
+Utils · MultiData.jl

Utils

MultiData.paaFunction
paa(x; f = identity, t = (1, 0, 0))

Piecewise Aggregate Approximation

Apply f function to each dimensionality of x array divinding it in t[1] windows taking t[2] extra points left and t[3] extra points right.

Note: first window will always consider t[2] = 0 and last one will always consider t[3] = 0.

source