From 4c2a5c3b4703d88cb69431b8c8f18eda96a99e45 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Wed, 14 Feb 2024 16:09:52 +0000 Subject: [PATCH] build based on 438e43d --- dev/.documenter-siteinfo.json | 2 +- dev/datasets/index.html | 14 +++++++------- dev/description/index.html | 2 +- dev/filesystem/index.html | 4 ++-- dev/index.html | 2 +- dev/manipulation/index.html | 32 ++++++++++++++++---------------- dev/search_index.js | 2 +- dev/utils/index.html | 2 +- 8 files changed, 30 insertions(+), 30 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 35814a8..7a6bbe1 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2024-02-02T14:14:46","documenter_version":"1.2.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2024-02-14T16:09:49","documenter_version":"1.2.1"}} \ No newline at end of file diff --git a/dev/datasets/index.html b/dev/datasets/index.html index 312ccac..144e76a 100644 --- a/dev/datasets/index.html +++ b/dev/datasets/index.html @@ -1,5 +1,5 @@ -Datasets · MultiData.jl

Datasets

A machine learning dataset are a collection of instances (or samples), each one described by a number of variables. In the case of tabular data, a dataset looks like a database table, where every column is a variable, and each row corresponds to a given instance. However, a dataset can also be non-tabular; for example, each instance can consist of a multivariate time-series, or an image.

When data is composed of different modalities) combining their statistical properties is non-trivial, since they may be quite different in nature one another.

The abstract representation of a multimodal dataset provided by this package is the AbstractMultiDataset.

MultiData.AbstractMultiDatasetType

Abstract supertype for all multimodal datasets.

A concrete multimodal dataset should always provide accessors data, to access the underlying tabular structure (e.g., DataFrame) and grouped_variables, to access the grouping of variables (a vector of vectors of column indices).

source
MultiData.grouped_variablesFunction
grouped_variables(amd)::Vector{Vector{Int}}

Return the indices of the variables grouped by modality, of an AbstractMultiDataset. The grouping describes how the different modalities are composed from the underlying AbstractDataFrame structure.

See also data, AbstractMultiDataset.

source
SoleBase.dimensionalityFunction
dimensionality(df)

Return the dimensionality of a dataframe df.

If the dataframe has variables of various dimensionalities :mixed is returned.

If the dataframe is empty (no instances) :empty is returned. This behavior can be controlled by setting the keyword argument force:

  • :no (default): return :mixed in case of mixed dimensionality
  • :max: return the greatest dimensionality
  • :min: return the lowest dimensionality
source

Unlabeled Datasets

In unlabeled datasets there is no labeling variable, and all of the variables (also called feature variables, or features) have equal role in the representation. These datasets are used in unsupervised learning contexts, for discovering internal correlation patterns between the features. Multimodal unlabeled datasets can be instantiated with MultiDataset.

MultiData.MultiDatasetType
MultiDataset(df, grouped_variables)

Create a MultiDataset from an AbstractDataFrame df, initializing its modalities according to the grouping in grouped_variables.

grouped_variables is an AbstractVector of variable grouping which are AbstractVectors of integers representing the index of the variables selected for that modality.

Note that the order matters for both the modalities and the variables.

julia> df = DataFrame(
+Datasets · MultiData.jl

Datasets

A machine learning dataset are a collection of instances (or samples), each one described by a number of variables. In the case of tabular data, a dataset looks like a database table, where every column is a variable, and each row corresponds to a given instance. However, a dataset can also be non-tabular; for example, each instance can consist of a multivariate time-series, or an image.

When data is composed of different modalities) combining their statistical properties is non-trivial, since they may be quite different in nature one another.

The abstract representation of a multimodal dataset provided by this package is the AbstractMultiDataset.

MultiData.AbstractMultiDatasetType

Abstract supertype for all multimodal datasets.

A concrete multimodal dataset should always provide accessors data, to access the underlying tabular structure (e.g., DataFrame) and grouped_variables, to access the grouping of variables (a vector of vectors of column indices).

source
MultiData.grouped_variablesFunction
grouped_variables(amd)::Vector{Vector{Int}}

Return the indices of the variables grouped by modality, of an AbstractMultiDataset. The grouping describes how the different modalities are composed from the underlying AbstractDataFrame structure.

See also data, AbstractMultiDataset.

source
SoleBase.dimensionalityFunction
dimensionality(df)

Return the dimensionality of a dataframe df.

If the dataframe has variables of various dimensionalities :mixed is returned.

If the dataframe is empty (no instances) :empty is returned. This behavior can be controlled by setting the keyword argument force:

  • :no (default): return :mixed in case of mixed dimensionality
  • :max: return the greatest dimensionality
  • :min: return the lowest dimensionality
source

Unlabeled Datasets

In unlabeled datasets there is no labeling variable, and all of the variables (also called feature variables, or features) have equal role in the representation. These datasets are used in unsupervised learning contexts, for discovering internal correlation patterns between the features. Multimodal unlabeled datasets can be instantiated with MultiDataset.

MultiData.MultiDatasetType
MultiDataset(df, grouped_variables)

Create a MultiDataset from an AbstractDataFrame df, initializing its modalities according to the grouping in grouped_variables.

grouped_variables is an AbstractVector of variable grouping which are AbstractVectors of integers representing the index of the variables selected for that modality.

Note that the order matters for both the modalities and the variables.

julia> df = DataFrame(
                   :age => [30, 9],
                   :name => ["Python", "Julia"],
                   :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]],
@@ -103,7 +103,7 @@
      │ Array…
 ─────┼───────────────────────────────────
    1 │ [0.540302, -0.416147, -0.989992,…
-   2 │ [0.841471, 0.909297, 0.14112, -0…
source
MultiData._emptyMethod
_empty(md)

Return a copy of a multimodal dataset with no instances.

Note: since the returned AbstractMultiDataset will be empty its columns types will be Any.

source

Labeled Datasets

In labeled datasets, one or more variables are considered to have special semantics with respect to the other variables; each of these labeling variables (or target variables) can be thought as assigning a label to each instance, which is typically a categorical value (classification label) or a numerical value (regression label). Supervised learning methods can be applied on these datasets for modeling the target variables as a function of the feature variables.

As an extension of the AbstractMultiDataset, AbstractLabeledMultiDataset has an interface that can be implemented to represent multimodal labeled datasets.

MultiData.AbstractLabeledMultiDatasetType

Abstract supertype for all labelled multimodal datasets (used in supervised learning).

As any multimodal dataset, any concrete labeled multimodal dataset should always provide the accessors data, to access the underlying tabular structure (e.g., DataFrame) and grouped_variables, to access the grouping of variables. In addition to these, implementations are required for labeling_variables, to access the indices of the labeling variables.

See also AbstractMultiDataset.

source
Missing docstring.

Missing docstring for dataset. Check Documenter's build log for details.

Multimodal labeled datasets can be instantiated with LabeledMultiDataset.

MultiData.LabeledMultiDatasetType
LabeledMultiDataset(md, labeling_variables)

Create a LabeledMultiDataset by associating an AbstractMultiDataset with some labeling variables, specified as a column index (Int) or a vector of column indices (Vector{Int}).

Arguments

  • md is the original AbstractMultiDataset;
  • labeling_variables is an AbstractVector of integers indicating the indices of the variables that will be set as labels.

Examples

julia> lmd = LabeledMultiDataset(MultiDataset([[2],[4]], DataFrame(
+   2 │ [0.841471, 0.909297, 0.14112, -0…
source
MultiData._emptyMethod
_empty(md)

Return a copy of a multimodal dataset with no instances.

Note: since the returned AbstractMultiDataset will be empty its columns types will be Any.

source

Labeled Datasets

In labeled datasets, one or more variables are considered to have special semantics with respect to the other variables; each of these labeling variables (or target variables) can be thought as assigning a label to each instance, which is typically a categorical value (classification label) or a numerical value (regression label). Supervised learning methods can be applied on these datasets for modeling the target variables as a function of the feature variables.

As an extension of the AbstractMultiDataset, AbstractLabeledMultiDataset has an interface that can be implemented to represent multimodal labeled datasets.

MultiData.AbstractLabeledMultiDatasetType

Abstract supertype for all labeled multimodal datasets (used in supervised learning).

As any multimodal dataset, any concrete labeled multimodal dataset should always provide the accessors data, to access the underlying tabular structure (e.g., DataFrame) and grouped_variables, to access the grouping of variables. In addition to these, implementations are required for labeling_variables, to access the indices of the labeling variables.

See also AbstractMultiDataset.

source
Missing docstring.

Missing docstring for dataset. Check Documenter's build log for details.

Multimodal labeled datasets can be instantiated with LabeledMultiDataset.

MultiData.LabeledMultiDatasetType
LabeledMultiDataset(md, labeling_variables)

Create a LabeledMultiDataset by associating an AbstractMultiDataset with some labeling variables, specified as a column index (Int) or a vector of column indices (Vector{Int}).

Arguments

  • md is the original AbstractMultiDataset;
  • labeling_variables is an AbstractVector of integers indicating the indices of the variables that will be set as labels.

Examples

julia> lmd = LabeledMultiDataset(MultiDataset([[2],[4]], DataFrame(
            :id => [1, 2],
            :age => [30, 9],
            :name => ["Python", "Julia"],
@@ -130,7 +130,7 @@
 ─────┼───────────────────────────────────
    1 │ [0.841471, 0.909297, 0.14112, -0…
    2 │ [0.540302, -0.416147, -0.989992,…
-
source
MultiData.joinlabels!Method
joinlabels!(lmd, [lbls...]; delim = "_")

On a labeled multimodal dataset, collapse the labeling variables identified by lbls into a single labeling variable of type String, by means of a join that uses delim for string delimiter.

If not specified differently this function will join all labels.

lbls can be an Integer indicating the index of the label, or a Symbol indicating the name of the labeling variable.

!!! note

The resulting labels will always be of type String.

Note

The resulting labeling variable will always be added as last column in the underlying DataFrame.

Examples

julia> lmd = LabeledMultiDataset(
+
source
MultiData.joinlabels!Method
joinlabels!(lmd, [lbls...]; delim = "_")

On a labeled multimodal dataset, collapse the labeling variables identified by lbls into a single labeling variable of type String, by means of a join that uses delim for string delimiter.

If not specified differently this function will join all labels.

lbls can be an Integer indicating the index of the label, or a Symbol indicating the name of the labeling variable.

!!! note

The resulting labels will always be of type String.

Note

The resulting labeling variable will always be added as last column in the underlying DataFrame.

Examples

julia> lmd = LabeledMultiDataset(
            MultiDataset(
                [[2],[4]],
                DataFrame(
@@ -185,7 +185,7 @@
      │ Array…
 ─────┼───────────────────────────────────
    1 │ [0.841471, 0.909297, 0.14112, -0…
-   2 │ [0.540302, -0.416147, -0.989992,…
source
MultiData.labelMethod
label(lmd, j, i)

Return the value of the i-th labeling variable for instance at index i_instance in a labeled multimodal dataset.

source
MultiData.labelsMethod
labels(lmd, i_instance)
-labels(lmd)

Return the labels of instance at index i_instance in a labeled multimodal dataset. A dictionary of type labelname => value is returned.

If only the first argument is passed then the labels for all instances are returned.

source
MultiData.setaslabeling!Method
setaslabeling!(lmd, i)
-setaslabeling!(lmd, var_name)

Set i-th variable as label.

The variable name can be passed as second argument instead of its index.

source
MultiData.unsetaslabeling!Method
unsetaslabeling!(lmd, i)
-unsetaslabeling!(lmd, var_name)

Remove i-th labeling variable from labels list.

The variable name can be passed as second argument instead of its index.

source
+ 2 │ [0.540302, -0.416147, -0.989992,…
source
MultiData.labelMethod
label(lmd, j, i)

Return the value of the i-th labeling variable for instance at index i_instance in a labeled multimodal dataset.

source
MultiData.labelsMethod
labels(lmd, i_instance)
+labels(lmd)

Return the labels of instance at index i_instance in a labeled multimodal dataset. A dictionary of type labelname => value is returned.

If only the first argument is passed then the labels for all instances are returned.

source
MultiData.setaslabeling!Method
setaslabeling!(lmd, i)
+setaslabeling!(lmd, var_name)

Set i-th variable as label.

The variable name can be passed as second argument instead of its index.

source
MultiData.unsetaslabeling!Method
unsetaslabeling!(lmd, i)
+unsetaslabeling!(lmd, var_name)

Remove i-th labeling variable from labels list.

The variable name can be passed as second argument instead of its index.

source
diff --git a/dev/description/index.html b/dev/description/index.html index 636409d..f31a100 100644 --- a/dev/description/index.html +++ b/dev/description/index.html @@ -27,4 +27,4 @@ 1 │ stat AbstractFloat[8.63372e-6; -2.848… AbstractFloat[-1.0; -1.0 ⋯ 5 columns omitted

the describe implementation for MultiDatasets will try to find the best statistical measures that can be used to the type of data the modality contains.

In the example the 2nd modality, which contains variables (just one in the example) of data of type Vector{Float64}, was described by applying the well known 22 features from the package Catch22.jl plus maximum, minimum and mean as the vectors were time series.

DataAPI.describeFunction
describe(md; t = fill([(1, 0, 0)], nmodalities(md)), kwargs...)

Return descriptive statistics for an AbstractMultiDataset as a Vector of new DataFrames where each row represents a variable and each column a summary statistic.

Arguments

  • md: the AbstractMultiDataset;
  • t: is a vector of nmodalities elements, where each element is a vector as long as the dimensionality of
the i-th modality. Each element of the innermost vector is a tuple
-of arguments for [`paa`](@ref).

For other see the documentation of DataFrames.describe function.

Examples

TODO: examples

source
+of arguments for [`paa`](@ref).

For other see the documentation of DataFrames.describe function.

Examples

TODO: examples

source diff --git a/dev/filesystem/index.html b/dev/filesystem/index.html index 35876aa..550fb3d 100644 --- a/dev/filesystem/index.html +++ b/dev/filesystem/index.html @@ -1,5 +1,5 @@ -Filesystem · MultiData.jl

Filesystem

MultiData.datasetinfoMethod
datasetinfo(datasetpath; onlywithlabels = [], shufflelabels = [], rng = Random.GLOBAL_RNG)

Show dataset size on disk and return a Touple with first element a vector of selected IDs, second element the labels DataFrame or nothing and third element the total size in bytes.

Arguments

  • onlywithlabels is used to select which portion of the Dataset to load, by specifying labels and their values to use as filters. See loaddataset for more info.
  • shufflelabels is an AbstractVector of names of labels to shuffle (default = [], means no shuffle).
  • rng is a random number generator to be used when shuffling (for reproducibility); can be either a Integer (used as seed for MersenneTwister) or an AbstractRNG.
source
MultiData.loaddatasetMethod
loaddataset(datasetpath; onlywithlabels = [], shufflelabels = [], rng = Random.GLOBAL_RNG)

Create a MultiDataset or a LabeledMultiDataset from a Dataset, based on the presence of file Labels.csv.

Arguments

  • datasetpath is an AbstractString that denote the Dataset's position;
  • onlywithlabels is an AbstractVector{AbstractVector{Pair{AbstractString,AbstractVector{Any}}}} and it's used to select which portion of the Dataset to load, by specifying labels and their values. Beginning from the center, each Pair{AbstractString,AbstractVector{Any}} must contain, as AbstractString the label's name, and, as AbstractVector{Any} the values for that label. Each Pair in one vector must refer to a different label, so if the Dataset has in total n labels, this vector of Pair can contain maximun n element. That's because the elements will combine with each other. Every vector of Pair act as a filter. Note that the same label can be used in different vector of Pair as they do not combine with each other. If onlywithlabels is an empty vector (default) the function will load the entire Dataset.
  • shufflelabels is an AbstractVector of names of labels to shuffle (default = [], means no shuffle).
  • rng is a random number generator to be used when shuffling (for reproducibility); can be either a Integer (used as seed for MersenneTwister) or an AbstractRNG.

Examples

julia> df_data = DataFrame(
+Filesystem · MultiData.jl

Filesystem

MultiData.datasetinfoMethod
datasetinfo(datasetpath; onlywithlabels = [], shufflelabels = [], rng = Random.GLOBAL_RNG)

Show dataset size on disk and return a Touple with first element a vector of selected IDs, second element the labels DataFrame or nothing and third element the total size in bytes.

Arguments

  • onlywithlabels is used to select which portion of the Dataset to load, by specifying labels and their values to use as filters. See loaddataset for more info.
  • shufflelabels is an AbstractVector of names of labels to shuffle (default = [], means no shuffle).
  • rng is a random number generator to be used when shuffling (for reproducibility); can be either a Integer (used as seed for MersenneTwister) or an AbstractRNG.
source
MultiData.loaddatasetMethod
loaddataset(datasetpath; onlywithlabels = [], shufflelabels = [], rng = Random.GLOBAL_RNG)

Create a MultiDataset or a LabeledMultiDataset from a Dataset, based on the presence of file Labels.csv.

Arguments

  • datasetpath is an AbstractString that denote the Dataset's position;
  • onlywithlabels is an AbstractVector{AbstractVector{Pair{AbstractString,AbstractVector{Any}}}} and it's used to select which portion of the Dataset to load, by specifying labels and their values. Beginning from the center, each Pair{AbstractString,AbstractVector{Any}} must contain, as AbstractString the label's name, and, as AbstractVector{Any} the values for that label. Each Pair in one vector must refer to a different label, so if the Dataset has in total n labels, this vector of Pair can contain maximun n element. That's because the elements will combine with each other. Every vector of Pair act as a filter. Note that the same label can be used in different vector of Pair as they do not combine with each other. If onlywithlabels is an empty vector (default) the function will load the entire Dataset.
  • shufflelabels is an AbstractVector of names of labels to shuffle (default = [], means no shuffle).
  • rng is a random number generator to be used when shuffling (for reproducibility); can be either a Integer (used as seed for MersenneTwister) or an AbstractRNG.

Examples

julia> df_data = DataFrame(
            :id => [1, 2, 3, 4, 5],
            :age => [30, 9, 30, 40, 9],
            :name => ["Python", "Julia", "C", "Java", "R"],
@@ -125,4 +125,4 @@
      │ Int64
 ─────┼───────
    1 │     2
-   2 │     3
source
MultiData.savedatasetMethod
savedataset(datasetpath, md; instance_ids, name, force = false)

Save md AbstractMultiDataset on disk at path datasetpath in the following format:

datasetpath ├─ Example1 │ └─ Modality1.csv │ └─ Modality2.csv │ └─ ... │ └─ Modalityn.csv │ └─ Metadata.txt ├─ Example2 │ └─ Modality1.csv │ └─ Modality2.csv │ └─ ... │ └─ Modalityn.csv │ └─ Metadata.txt ├─ ... ├─ Example_n ├─ Metadata.txt └─ Labels.csv

Arguments

  • instance_ids is an AbstractVector{Integer} that denote the identifier of the instances,
  • name is an AbstractString and denote the name of the Dataset, that will be saved in the Metadata of the Dataset,
  • force is a Bool, if it's set to true, then in case datasetpath already exists, it will be overwritten otherwise the operation will be aborted. (default = false)
  • labels_indices is an AbstractVector{Integer} and contains the indices of the labels' column (allowed only when passing a MultiDataset)

Alternatively to an AbstractMultiDataset, a DataFrame can be passed as second argument. If this is the case a third positional argument is required representing the grouped_variables of the dataset. See MultiDataset for syntax of grouped_variables.

source
+ 2 │ 3
source
MultiData.savedatasetMethod
savedataset(datasetpath, md; instance_ids, name, force = false)

Save md AbstractMultiDataset on disk at path datasetpath in the following format:

datasetpath ├─ Example1 │ └─ Modality1.csv │ └─ Modality2.csv │ └─ ... │ └─ Modalityn.csv │ └─ Metadata.txt ├─ Example2 │ └─ Modality1.csv │ └─ Modality2.csv │ └─ ... │ └─ Modalityn.csv │ └─ Metadata.txt ├─ ... ├─ Example_n ├─ Metadata.txt └─ Labels.csv

Arguments

  • instance_ids is an AbstractVector{Integer} that denote the identifier of the instances,
  • name is an AbstractString and denote the name of the Dataset, that will be saved in the Metadata of the Dataset,
  • force is a Bool, if it's set to true, then in case datasetpath already exists, it will be overwritten otherwise the operation will be aborted. (default = false)
  • labels_indices is an AbstractVector{Integer} and contains the indices of the labels' column (allowed only when passing a MultiDataset)

Alternatively to an AbstractMultiDataset, a DataFrame can be passed as second argument. If this is the case a third positional argument is required representing the grouped_variables of the dataset. See MultiDataset for syntax of grouped_variables.

source
diff --git a/dev/index.html b/dev/index.html index 9489f49..f6afeee 100644 --- a/dev/index.html +++ b/dev/index.html @@ -72,4 +72,4 @@ 1 │ [0.841471, 0.909297, 0.14112, -0… 2 │ [0.540302, -0.416147, -0.989992,…

Note that each element of a MultiDataset is a SubDataFrame:

julia> eltype(md)
 SubDataFrame
-
Spare variables

Spare variables will never be seen when accessing a MultiDataset through its iterator interface. To access them see sparevariables.

+
Spare variables

Spare variables will never be seen when accessing a MultiDataset through its iterator interface. To access them see sparevariables.

diff --git a/dev/manipulation/index.html b/dev/manipulation/index.html index 2e2d48a..6c943a7 100644 --- a/dev/manipulation/index.html +++ b/dev/manipulation/index.html @@ -94,7 +94,7 @@ │ Int64 ─────┼──────── 1 │ 180 - 2 │ 175source
MultiData.addvariable_tomodality!Method
addvariable_tomodality!(md, i_modality, var_index)
+   2 │    175
source
MultiData.addvariable_tomodality!Method
addvariable_tomodality!(md, i_modality, var_index)
 addvariable_tomodality!(md, i_modality, var_indices)
 addvariable_tomodality!(md, i_modality, var_name)
 addvariable_tomodality!(md, i_modality, var_names)

Add variable at index var_index to the modality at index i_modality in a multimodal dataset, and return the dataset. Alternatively to var_index the variable name can be used. Multiple variables can be inserted into the multimodal dataset at once using var_indices or var_inames.

Note: The function does not allow you to add a variable to a new modality, but only to add it to an existing modality. To add a new modality use addmodality! instead.

Arguments

  • md is a MultiDataset;
  • i_modality is an Integer indicating the modality in which the variable(s) will be added;
  • var_index is an Integer that indicates the index of the variable to add to a specific modality of the multimodal dataset;
  • var_indices is an AbstractVector{Integer} indicating the indices of the variables to add to a specific modality of the multimodal dataset;
  • var_name is a Symbol indicating the name of the variable to add to a specific modality of the multimodal dataset;
  • var_names is an AbstractVector{Symbol} indicating the name of the variables to add to a specific modality of the multimodal dataset;

Examples

julia> df = DataFrame(:name => ["Python", "Julia"],
@@ -176,7 +176,7 @@
      │ Char  String  Int64
 ─────┼──────────────────────
    1 │ M     Python      80
-   2 │ F     Julia       60
source
MultiData.dropmodalities!Method
dropmodalities!(md, indices)
+   2 │ F     Julia       60
source
MultiData.dropmodalities!Method
dropmodalities!(md, indices)
 dropmodalities!(md, index)

Remove the i-th modality from a multimodal dataset while dropping all variables in it, and return the dataset itself.

Note: if the dropped variables are contained in other modalities they will also be removed from them. This can lead to the removal of additional modalities other than the i-th.

If the intention is to remove a modality without dropping the variables use removemodality! instead.

Arguments

  • md is a MultiDataset;
  • index is an Integer indicating the index of the modality to drop;
  • indices is an AbstractVector{Integer} indicating the indices of the modalities to drop.

Examples

julia> df = DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60])
 2×5 DataFrame
  Row │ name    age    sex   height  weight
@@ -254,7 +254,7 @@
      │ String
 ─────┼────────
    1 │ Python
-   2 │ Julia
source
MultiData.eachmodalityMethod
eachmodality(md)

Return a (lazy) iterator of the modalities of a multimodal dataset.

source
MultiData.insertmodality!Function
insertmodality!(md, col, new_modality, existing_variables)
+   2 │ Julia
source
MultiData.eachmodalityMethod
eachmodality(md)

Return a (lazy) iterator of the modalities of a multimodal dataset.

source
MultiData.insertmodality!Function
insertmodality!(md, col, new_modality, existing_variables)
 insertmodality!(md, new_modality, existing_variables)

Insert new_modality as new modality to multimodal dataset, and return the dataset. Existing variables can be added to the new modality while adding it to the dataset by passing the corresponding indices as existing_variables. If col is specified then the variables will be inserted starting at index col.

Arguments

  • md is a MultiDataset;
  • col is an Integer indicating the column in which to insert the columns of new_modality;
  • new_modality is an AbstractDataFrame which will be added to the multimodal dataset as a sub-dataframe of a new modality;
  • existing_variables is an AbstractVector{Integer} or AbstractVector{Symbol}. It indicates which variables of the multimodal dataset internal dataframe structure to insert in the new modality.

Examples

julia> df = DataFrame(
            :name => ["Python", "Julia"],
            :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]
@@ -435,7 +435,7 @@
      │ Int64  String
 ─────┼───────────────
    1 │    30  Python
-   2 │     9  Julia
source
MultiData.keeponlymodalities!Method

TODO

source
MultiData.modalityMethod
modality(md, i)

Return the i-th modality of a multimodal dataset.

modality(md, indices)

Return a Vector of modalities at indices of a multimodal dataset.

source
MultiData.nmodalitiesMethod
nmodalities(md)

Return the number of modalities of a multimodal dataset.

source
MultiData.removemodality!Method
removemodality!(md, indices)
+   2 │     9  Julia
source
MultiData.keeponlymodalities!Method

TODO

source
MultiData.modalityMethod
modality(md, i)

Return the i-th modality of a multimodal dataset.

modality(md, indices)

Return a Vector of modalities at indices of a multimodal dataset.

source
MultiData.nmodalitiesMethod
nmodalities(md)

Return the number of modalities of a multimodal dataset.

source
MultiData.removemodality!Method
removemodality!(md, indices)
 removemodality!(md, index)

Remove i-th modality from a multimodal dataset, and return the dataset.

Note: to completely remove a modality and all variables in it use dropmodalities! instead.

Arguments

  • md is a MultiDataset;
  • index is an Integer that indicates which modality to remove from the multimodal dataset;
  • indices is an AbstractVector{Integer} that indicates the modalities to remove from the multimodal dataset;

Examples

julia> df = DataFrame(:name => ["Python", "Julia"],
                       :age => [25, 26],
                       :sex => ['M', 'F'],
@@ -540,7 +540,7 @@
 ─────┼─────────────────────────────
    1 │ Python     25  M        180
    2 │ Julia      26  F        175
-
source
MultiData.removevariable_frommodality!Method
removevariable_frommodality!(md, i_modality, var_indices)
+
source
MultiData.removevariable_frommodality!Method
removevariable_frommodality!(md, i_modality, var_indices)
 removevariable_frommodality!(md, i_modality, var_index)
 removevariable_frommodality!(md, i_modality, var_name)
 removevariable_frommodality!(md, i_modality, var_names)

Remove variable at index var_index from the modality at index i_modality in a multimodal dataset, and return the dataset itself.

Alternatively to var_index the variable name can be used. Multiple variables can be dropped from the multimodal dataset at once, by passing a Vector of Symbols (for names), or a Vector of integers (for indices) as a last argument.

Note: when all variables are dropped from a modality, it will be removed.

Arguments

  • md is a MultiDataset;
  • i_modality is an Integer indicating the modality in which the variable(s) will be dropped;
  • var_index is an Integer that indicates the index of the variable to drop from a specific modality of the multimodal dataset;
  • var_indices is an AbstractVector{Integer} indicating the indices of the variables to drop from a specific modality of the multimodal dataset;
  • var_name is a Symbol indicating the name of the variable to drop from a specific modality of the multimodal dataset;
  • var_names is an AbstractVector{Symbol} indicating the name of the variables to drop from a specific modality of the multimodal dataset;

Examples

julia> df = DataFrame(:name => ["Python", "Julia"],
@@ -688,7 +688,7 @@
      │ String  Char  Int64   Int64
 ─────┼──────────────────────────────
    1 │ Python  M        180      80
-   2 │ Julia   F        175      60
source

Variables

MultiData.dropsparevariables!Method
dropsparevariables!(md)

Drop all variables that are not contained in any of the modalities in a multimodal dataset.

Arguments

  • md is a MultiDataset, that is the structure at which sparevariables will be dropped.

Examples

julia> md = MultiDataset([[1]], DataFrame(:age => [30, 9], :name => ["Python", "Julia"]))
+   2 │ Julia   F        175      60
source

Variables

MultiData.dropsparevariables!Method
dropsparevariables!(md)

Drop all variables that are not contained in any of the modalities in a multimodal dataset.

Arguments

  • md is a MultiDataset, that is the structure at which sparevariables will be dropped.

Examples

julia> md = MultiDataset([[1]], DataFrame(:age => [30, 9], :name => ["Python", "Julia"]))
 ● MultiDataset
    └─ dimensionalities: (0,)
 - Modality 1 / 1
@@ -715,7 +715,7 @@
      │ String
 ─────┼────────
    1 │ Python
-   2 │ Julia
source
MultiData.dropvariables!Method
dropvariables!(md, i)
+   2 │ Julia
source
MultiData.dropvariables!Method
dropvariables!(md, i)
 dropvariables!(md, variable_name)
 dropvariables!(md, indices)
 dropvariables!(md, variable_names)
@@ -791,7 +791,7 @@
      │ Char
 ─────┼──────
    1 │ M
-   2 │ F

TODO: To be reviewed

source
MultiData.hasvariablesMethod
hasvariables(df, variable_name)
+   2 │ F

TODO: To be reviewed

source
MultiData.hasvariablesMethod
hasvariables(df, variable_name)
 hasvariables(md, i_modality, variable_name)
 hasvariables(md, variable_name)
 hasvariables(df, variable_names)
@@ -859,7 +859,7 @@
 true
 
 julia> hasvariables(md.data, [:name, :sex])
-true
source
MultiData.insertvariables!Method
insertvariables!(md, col, index, values)
+true
source
MultiData.insertvariables!Method
insertvariables!(md, col, index, values)
 insertvariables!(md, index, values)
 insertvariables!(md, col, index, value)
 insertvariables!(md, index, value)

Insert a variable in a multimodal dataset with a given index.

Note

Each inserted variable will be added in as a spare variables.

Arguments

  • md is an AbstractMultiDataset;
  • col is an Integer indicating in which position to insert the new variable. If no col is passed, the new variable will be placed last in the md's underlying dataframe structure;
  • index is a Symbol and denote the name of the variable to insert. Duplicated variable names will be renamed to avoid conflicts: see makeunique argument for insertcols! in DataFrames documentation;
  • values is an AbstractVector that indicates the values for the newly inserted variable. The length of values should match ninstances(md);
  • value is a single value for the new variable. If a single value is passed as a last argument this will be copied and used for each instance in the dataset.

Examples

julia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
@@ -960,7 +960,7 @@
      │ Int64   Int64   String
 ─────┼────────────────────────
    1 │    180      80  brown
-   2 │    180      75  blonde
source
MultiData.keeponlyvariables!Method
keeponlyvariables!(md, indices)
+   2 │    180      75  blonde
source
MultiData.keeponlyvariables!Method
keeponlyvariables!(md, indices)
 keeponlyvariables!(md, variable_names)

Drop all variables that do not correspond to the indices in indices from a multimodal dataset.

Note: if the dropped variables are contained in some modality they will also be removed from them; as a side effect, this can lead to the removal of modalities.

Arguments

  • md is a MultiDataset;
  • indices is an AbstractVector{Integer} that indicates which indices to keep in the multimodal dataset;
  • variable_names is an AbstractVector{Symbol} that indicates which variables to keep in the multimodal dataset.

Examples

julia> md = MultiDataset([[1, 2],[3, 4, 5],[5]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))
 ● MultiDataset
    └─ dimensionalities: (0, 0, 0)
@@ -1028,7 +1028,7 @@
      │ Char
 ─────┼──────
    1 │ M
-   2 │ F

TODO: review

source
MultiData.nvariablesMethod
nvariables(md)
+   2 │ F

TODO: review

source
MultiData.nvariablesMethod
nvariables(md)
 nvariables(md, i)

Return the number of variables in a multimodal dataset.

If an index i is passed as second argument, then the number of variables of the i-th modality is returned.

Alternatively, nvariables can be called on a single modality.

Arguments

  • md is a MultiDataset;
  • i (optional) is an Integer indicating the modality of the multimodal dataset whose number of variables you want to know.

Examples

julia> md = MultiDataset([[1],[2]], DataFrame(:age => [25, 26], :sex => ['M', 'F']))
 ● MultiDataset
    └─ dimensionalities: (0, 0)
@@ -1102,7 +1102,7 @@
    2 │ F        175      60
 
 julia> nvariables(mod2)
-3
source
MultiData.sparevariablesMethod
sparevariables(md)

Return the indices of all the variables that are not contained in any of the modalities of a multimodal dataset.

Arguments

  • md is a MultiDataset, which is the structure whose indices of the sparevariables are to be known.

Examples

julia> md = MultiDataset([[1],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
+3
source
MultiData.sparevariablesMethod
sparevariables(md)

Return the indices of all the variables that are not contained in any of the modalities of a multimodal dataset.

Arguments

  • md is a MultiDataset, which is the structure whose indices of the sparevariables are to be known.

Examples

julia> md = MultiDataset([[1],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
 ● MultiDataset
    └─ dimensionalities: (0, 0)
 - Modality 1 / 2
@@ -1140,7 +1140,7 @@
 
 julia> sparevariables(md)
 1-element Vector{Int64}:
- 2
source
MultiData.variableindexMethod
variableindex(df, variable_name)
+ 2
source
MultiData.variableindexMethod
variableindex(df, variable_name)
 variableindex(md, i_modality, variable_name)
 variableindex(md, variable_name)

Return the index of the variable. When i_modality is passed, the function returns the index of the variable in the sub-dataframe of the modality identified by i_modality. It returns 0 when the variable is not contained in the modality identified by i_modality.

Arguments

  • df is an AbstractDataFrame;
  • md is an AbstractMultiDataset;
  • variable_name is a Symbol indicating the variable whose index you want to know;
  • i_modality is an Integer indicating of which modality you want to know the index of the variable.

Examples

julia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
 ● MultiDataset
@@ -1186,7 +1186,7 @@
 1
 
 julia> variableindex(md.data, :age)
-2
source
MultiData.variablesMethod
variables(md, i)

Return the names as Symbols of the variables in a multimodal dataset.

When called on a object of type MultiDataset a Dict is returned which will map the modality index to an AbstractVector{Symbol}.

Note: the order of the variable names is granted to match the order of the variables in the modality.

If an index i is passed as second argument, then the names of the variables of the i-th modality are returned as an AbstractVector.

Alternatively, nvariables can be called on a single modality.

Arguments

  • md is an MultiDataset;
  • i is an Integer indicating from which modality of the multimodal dataset to get the names of the variables.

Examples

julia> md = MultiDataset([[2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
+2
source
MultiData.variablesMethod
variables(md, i)

Return the names as Symbols of the variables in a multimodal dataset.

When called on a object of type MultiDataset a Dict is returned which will map the modality index to an AbstractVector{Symbol}.

Note: the order of the variable names is granted to match the order of the variables in the modality.

If an index i is passed as second argument, then the names of the variables of the i-th modality are returned as an AbstractVector.

Alternatively, nvariables can be called on a single modality.

Arguments

  • md is an MultiDataset;
  • i is an Integer indicating from which modality of the multimodal dataset to get the names of the variables.

Examples

julia> md = MultiDataset([[2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
 ● MultiDataset
    └─ dimensionalities: (0, 0)
 - Modality 1 / 2
@@ -1237,7 +1237,7 @@
 
 julia> variables(mod2)
 1-element Vector{Symbol}:
- :sex
source

Instances

MultiData.deleteinstances!Method
deleteinstances!(md, i)

Remove the i-th instance in a multimodal dataset, and return the dataset itself.

deleteinstances!(md, i_instances)

Remove the instances at i_instances in a multimodal dataset, and return the dataset itself.

source
MultiData.instanceMethod
instance(md, i)

Return the i-th instance in a multimodal dataset.

instance(md, i_modality, i_instance)

Return the i_instance-th instance in a multimodal dataset with only variables from the the i_modality-th modality.

instance(md, i_instances)

Return instances at i_instances in a multimodal dataset.

instance(md, i_modality, i_instances)

Return iinstances at `iinstancesin a multimodal dataset with only variables from the thei_modality`-th modality.

source
MultiData.keeponlyinstances!Method
keeponlyinstances!(md, i_instances)

Remove all instances from a multimodal dataset, which index does not appear in i_instances.

source
MultiData.pushinstances!Method
pushinstances!(md, instance)

Add an instance to a multimodal dataset, and return the dataset itself.

The instance can be a DataFrameRow or an AbstractVector but in both cases the number and type of variables should match those of the dataset.

source
SoleBase.ninstancesMethod
ninstances(md)

Return the number of instances in a multimodal dataset.

Examples

julia> md = MultiDataset([[1],[2]],DataFrame(:age => [25, 26], :sex => ['M', 'F']))
+ :sex
source

Instances

MultiData.deleteinstances!Method
deleteinstances!(md, i)

Remove the i-th instance in a multimodal dataset, and return the dataset itself.

deleteinstances!(md, i_instances)

Remove the instances at i_instances in a multimodal dataset, and return the dataset itself.

source
MultiData.instanceMethod
instance(md, i)

Return the i-th instance in a multimodal dataset.

instance(md, i_modality, i_instance)

Return the i_instance-th instance in a multimodal dataset with only variables from the the i_modality-th modality.

instance(md, i_instances)

Return instances at i_instances in a multimodal dataset.

instance(md, i_modality, i_instances)

Return iinstances at `iinstancesin a multimodal dataset with only variables from the thei_modality`-th modality.

source
MultiData.keeponlyinstances!Method
keeponlyinstances!(md, i_instances)

Remove all instances from a multimodal dataset, which index does not appear in i_instances.

source
MultiData.pushinstances!Method
pushinstances!(md, instance)

Add an instance to a multimodal dataset, and return the dataset itself.

The instance can be a DataFrameRow or an AbstractVector but in both cases the number and type of variables should match those of the dataset.

source
SoleBase.ninstancesMethod
ninstances(md)

Return the number of instances in a multimodal dataset.

Examples

julia> md = MultiDataset([[1],[2]],DataFrame(:age => [25, 26], :sex => ['M', 'F']))
 ● MultiDataset
    └─ dimensionalities: (0, 0)
 - Modality 1 / 2
@@ -1266,4 +1266,4 @@
    2 │ F
 
 julia> ninstances(md) == ninstances(mod2) == 2
-true
source
+truesource diff --git a/dev/search_index.js b/dev/search_index.js index f2ee4e5..929ae57 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"description/","page":"Description","title":"Description","text":"CurrentModule = MultiData","category":"page"},{"location":"description/#man-description","page":"Description","title":"Description","text":"","category":"section"},{"location":"description/","page":"Description","title":"Description","text":"Just like DataFrames, MultiDatasets can be described using the method describe:","category":"page"},{"location":"description/","page":"Description","title":"Description","text":"julia> ts_cos = [cos(i) for i in 1:50000];\n\njulia> ts_sin = [sin(i) for i in 1:50000];\n\njulia> df_data = DataFrame(\n :id => [1, 2],\n :age => [30, 9],\n :name => [\"Python\", \"Julia\"],\n :stat => [deepcopy(ts_sin), deepcopy(ts_cos)]\n );\n\njulia> md = MultiDataset([[2,3], [4]], df_data);\n\njulia> description = describe(md)\n2-element Vector{DataFrame}:\n 2×7 DataFrame\n Row │ variable mean min median max nmissing eltype \n │ Symbol Union… Any Union… Any Int64 DataType\n─────┼─────────────────────────────────────────────────────────────\n 1 │ age 19.5 9 19.5 30 0 Int64\n 2 │ name Julia Python 0 String\n 1×7 DataFrame\n Row │ Variables mean min ⋯\n │ Symbol Array… Array… ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ stat AbstractFloat[8.63372e-6; -2.848… AbstractFloat[-1.0; -1.0 ⋯\n 5 columns omitted\n","category":"page"},{"location":"description/","page":"Description","title":"Description","text":"the describe implementation for MultiDatasets will try to find the best statistical measures that can be used to the type of data the modality contains.","category":"page"},{"location":"description/","page":"Description","title":"Description","text":"In the example the 2nd modality, which contains variables (just one in the example) of data of type Vector{Float64}, was described by applying the well known 22 features from the package Catch22.jl plus maximum, minimum and mean as the vectors were time series.","category":"page"},{"location":"description/","page":"Description","title":"Description","text":"describe","category":"page"},{"location":"description/#DataAPI.describe","page":"Description","title":"DataAPI.describe","text":"describe(md; t = fill([(1, 0, 0)], nmodalities(md)), kwargs...)\n\nReturn descriptive statistics for an AbstractMultiDataset as a Vector of new DataFrames where each row represents a variable and each column a summary statistic.\n\nArguments\n\nmd: the AbstractMultiDataset;\nt: is a vector of nmodalities elements, where each element is a vector as long as the dimensionality of\n\nthe i-th modality. Each element of the innermost vector is a tuple\nof arguments for [`paa`](@ref).\n\nFor other see the documentation of DataFrames.describe function.\n\nExamples\n\nTODO: examples\n\n\n\n\n\n","category":"function"},{"location":"manipulation/","page":"Manipulation","title":"Manipulation","text":"CurrentModule = MultiData","category":"page"},{"location":"manipulation/#man-manipulation","page":"Manipulation","title":"Manipulation","text":"","category":"section"},{"location":"manipulation/","page":"Manipulation","title":"Manipulation","text":"Pages = [\"manipulation.md\"]","category":"page"},{"location":"manipulation/#man-modalities","page":"Manipulation","title":"Modalities","text":"","category":"section"},{"location":"manipulation/","page":"Manipulation","title":"Manipulation","text":"Modules = [MultiData]\nPages = [\"modalities.jl\"]","category":"page"},{"location":"manipulation/#MultiData.addmodality!-Tuple{AbstractMultiDataset, AbstractVector{<:Integer}}","page":"Manipulation","title":"MultiData.addmodality!","text":"addmodality!(md, indices)\naddmodality!(md, index)\naddmodality!(md, variable_names)\naddmodality!(md, variable_name)\n\nCreate a new modality in a multimodal dataset using variables at indices or index, and return the dataset itself.\n\nAlternatively to the indices and the index, the variable name(s) can be used.\n\nNote: to add a new modality with new variables see insertmodality!.\n\nArguments\n\nmd is a MultiDataset;\nindices is an AbstractVector{Integer} that indicates which indices of the multimodal dataset's corresponding dataframe to add to the new modality;\nindex is an Integer that indicates the index of the multimodal dataset's corresponding dataframe to add to the new modality;\nvariable_names is an AbstractVector{Symbol} that indicates which variables of the multimodal dataset's corresponding dataframe to add to the new modality;\nvariable_name is a Symbol that indicates the variable of the multimodal dataset's corresponding dataframe to add to the new modality;\n\nExamples\n\njulia> df = DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60])\n2×5 DataFrame\n Row │ name age sex height weight\n │ String Int64 Char Int64 Int64\n─────┼─────────────────────────────────────\n 1 │ Python 25 M 180 80\n 2 │ Julia 26 F 175 60\n\njulia> md = MultiDataset([[1]], df)\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Spare variables\n └─ dimensionality: 0\n2×4 SubDataFrame\n Row │ age sex height weight\n │ Int64 Char Int64 Int64\n─────┼─────────────────────────────\n 1 │ 25 M 180 80\n 2 │ 26 F 175 60\n\n\njulia> addmodality!(md, [:age, :sex])\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age sex\n │ Int64 Char\n─────┼─────────────\n 1 │ 25 M\n 2 │ 26 F\n- Spare variables\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ height weight\n │ Int64 Int64\n─────┼────────────────\n 1 │ 180 80\n 2 │ 175 60\n\n\njulia> addmodality!(md, 5)\n● MultiDataset\n └─ dimensionalities: (0, 0, 0)\n- Modality 1 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 3\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age sex\n │ Int64 Char\n─────┼─────────────\n 1 │ 25 M\n 2 │ 26 F\n- Modality 3 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ height\n │ Int64\n─────┼────────\n 1 │ 180\n 2 │ 175\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.addvariable_tomodality!-Tuple{AbstractMultiDataset, Integer, Integer}","page":"Manipulation","title":"MultiData.addvariable_tomodality!","text":"addvariable_tomodality!(md, i_modality, var_index)\naddvariable_tomodality!(md, i_modality, var_indices)\naddvariable_tomodality!(md, i_modality, var_name)\naddvariable_tomodality!(md, i_modality, var_names)\n\nAdd variable at index var_index to the modality at index i_modality in a multimodal dataset, and return the dataset. Alternatively to var_index the variable name can be used. Multiple variables can be inserted into the multimodal dataset at once using var_indices or var_inames.\n\nNote: The function does not allow you to add a variable to a new modality, but only to add it to an existing modality. To add a new modality use addmodality! instead.\n\nArguments\n\nmd is a MultiDataset;\ni_modality is an Integer indicating the modality in which the variable(s) will be added;\nvar_index is an Integer that indicates the index of the variable to add to a specific modality of the multimodal dataset;\nvar_indices is an AbstractVector{Integer} indicating the indices of the variables to add to a specific modality of the multimodal dataset;\nvar_name is a Symbol indicating the name of the variable to add to a specific modality of the multimodal dataset;\nvar_names is an AbstractVector{Symbol} indicating the name of the variables to add to a specific modality of the multimodal dataset;\n\nExamples\n\njulia> df = DataFrame(:name => [\"Python\", \"Julia\"],\n :age => [25, 26],\n :sex => ['M', 'F'],\n :height => [180, 175],\n :weight => [80, 60])\n )\n2×5 DataFrame\n Row │ name age sex height weight\n │ String Int64 Char Int64 Int64\n─────┼─────────────────────────────────────\n 1 │ Python 25 M 180 80\n 2 │ Julia 26 F 175 60\n\njulia> md = MultiDataset([[1, 2],[3]], df)\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Spare variables\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ height weight\n │ Int64 Int64\n─────┼────────────────\n 1 │ 180 80\n 2 │ 175 60\n\njulia> addvariable_tomodality!(md, 1, [4,5])\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×4 SubDataFrame\n Row │ name age height weight\n │ String Int64 Int64 Int64\n─────┼───────────────────────────────\n 1 │ Python 25 180 80\n 2 │ Julia 26 175 60\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> addvariable_tomodality!(md, 2, [:name,:weight])\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×4 SubDataFrame\n Row │ name age height weight\n │ String Int64 Int64 Int64\n─────┼───────────────────────────────\n 1 │ Python 25 180 80\n 2 │ Julia 26 175 60\n- Modality 2 / 2\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ sex name weight\n │ Char String Int64\n─────┼──────────────────────\n 1 │ M Python 80\n 2 │ F Julia 60\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.dropmodalities!-Tuple{AbstractMultiDataset, Integer}","page":"Manipulation","title":"MultiData.dropmodalities!","text":"dropmodalities!(md, indices)\ndropmodalities!(md, index)\n\nRemove the i-th modality from a multimodal dataset while dropping all variables in it, and return the dataset itself.\n\nNote: if the dropped variables are contained in other modalities they will also be removed from them. This can lead to the removal of additional modalities other than the i-th.\n\nIf the intention is to remove a modality without dropping the variables use removemodality! instead.\n\nArguments\n\nmd is a MultiDataset;\nindex is an Integer indicating the index of the modality to drop;\nindices is an AbstractVector{Integer} indicating the indices of the modalities to drop.\n\nExamples\n\njulia> df = DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60])\n2×5 DataFrame\n Row │ name age sex height weight\n │ String Int64 Char Int64 Int64\n─────┼─────────────────────────────────────\n 1 │ Python 25 M 180 80\n 2 │ Julia 26 F 175 60\n\njulia> md = MultiDataset([[1, 2],[3,4],[5],[2,3]], df)\n● MultiDataset\n └─ dimensionalities: (0, 0, 0, 0)\n- Modality 1 / 4\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 4\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ sex height\n │ Char Int64\n─────┼──────────────\n 1 │ M 180\n 2 │ F 175\n- Modality 3 / 4\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n- Modality 4 / 4\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age sex\n │ Int64 Char\n─────┼─────────────\n 1 │ 25 M\n 2 │ 26 F\n\njulia> dropmodalities!(md, [2,3])\n[ Info: Variable 3 was last variable of modality 2: removing modality\n[ Info: Variable 3 was last variable of modality 2: removing modality\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n\njulia> dropmodalities!(md, 2)\n[ Info: Variable 2 was last variable of modality 2: removing modality\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.eachmodality-Tuple{AbstractMultiDataset}","page":"Manipulation","title":"MultiData.eachmodality","text":"eachmodality(md)\n\nReturn a (lazy) iterator of the modalities of a multimodal dataset.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.insertmodality!","page":"Manipulation","title":"MultiData.insertmodality!","text":"insertmodality!(md, col, new_modality, existing_variables)\ninsertmodality!(md, new_modality, existing_variables)\n\nInsert new_modality as new modality to multimodal dataset, and return the dataset. Existing variables can be added to the new modality while adding it to the dataset by passing the corresponding indices as existing_variables. If col is specified then the variables will be inserted starting at index col.\n\nArguments\n\nmd is a MultiDataset;\ncol is an Integer indicating the column in which to insert the columns of new_modality;\nnew_modality is an AbstractDataFrame which will be added to the multimodal dataset as a sub-dataframe of a new modality;\nexisting_variables is an AbstractVector{Integer} or AbstractVector{Symbol}. It indicates which variables of the multimodal dataset internal dataframe structure to insert in the new modality.\n\nExamples\n\njulia> df = DataFrame(\n :name => [\"Python\", \"Julia\"],\n :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]\n )\n2×2 DataFrame\n Row │ name stat1\n │ String Array…\n─────┼───────────────────────────────────────────\n 1 │ Python [0.841471, 0.909297, 0.14112, -0…\n 2 │ Julia [0.540302, -0.416147, -0.989992,…\n\njulia> md = MultiDataset(df; group = :all)\n● MultiDataset\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n\njulia> insertmodality!(md, DataFrame(:age => [30, 9]))\n● MultiDataset\n └─ dimensionalities: (0, 1, 0)\n- Modality 1 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 3\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n- Modality 3 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 30\n 2 │ 9\n\njulia> md.data\n2×3 DataFrame\n Row │ name stat1 age\n │ String Array… Int64\n─────┼──────────────────────────────────────────────────\n 1 │ Python [0.841471, 0.909297, 0.14112, -0… 30\n 2 │ Julia [0.540302, -0.416147, -0.989992,… 9\n\nor, selecting the column\n\njulia> df = DataFrame(\n :name => [\"Python\", \"Julia\"],\n :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]\n )\n2×2 DataFrame\n Row │ name stat1\n │ String Array…\n─────┼───────────────────────────────────────────\n 1 │ Python [0.841471, 0.909297, 0.14112, -0…\n 2 │ Julia [0.540302, -0.416147, -0.989992,…\n\njulia> md = MultiDataset(df; group = :all)\n● MultiDataset\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n\njulia> insertmodality!(md, 2, DataFrame(:age => [30, 9]))\n● MultiDataset\n └─ dimensionalities: (1, 0)\n- Modality 1 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 30\n 2 │ 9\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n\njulia> md.data\n2×3 DataFrame\n Row │ name age stat1\n │ String Int64 Array…\n─────┼──────────────────────────────────────────────────\n 1 │ Python 30 [0.841471, 0.909297, 0.14112, -0…\n 2 │ Julia 9 [0.540302, -0.416147, -0.989992,…\n\nor, adding an existing variable:\n\njulia> df = DataFrame(\n :name => [\"Python\", \"Julia\"],\n :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]\n )\n2×2 DataFrame\n Row │ name stat1\n │ String Array…\n─────┼───────────────────────────────────────────\n 1 │ Python [0.841471, 0.909297, 0.14112, -0…\n 2 │ Julia [0.540302, -0.416147, -0.989992,…\n\njulia> md = MultiDataset([[2]], df)\n● MultiDataset\n └─ dimensionalities: (1,)\n- Modality 1 / 1\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n\n\njulia> insertmodality!(md, DataFrame(:age => [30, 9]); existing_variables = [1])\n● MultiDataset\n └─ dimensionalities: (1, 0)\n- Modality 1 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n- Modality 2 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age name\n │ Int64 String\n─────┼───────────────\n 1 │ 30 Python\n 2 │ 9 Julia\n\n\n\n\n\n","category":"function"},{"location":"manipulation/#MultiData.keeponlymodalities!-Tuple{AbstractMultiDataset, AbstractVector{<:Integer}}","page":"Manipulation","title":"MultiData.keeponlymodalities!","text":"TODO\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.modality-Tuple{AbstractMultiDataset, Integer}","page":"Manipulation","title":"MultiData.modality","text":"modality(md, i)\n\nReturn the i-th modality of a multimodal dataset.\n\nmodality(md, indices)\n\nReturn a Vector of modalities at indices of a multimodal dataset.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.nmodalities-Tuple{AbstractMultiDataset}","page":"Manipulation","title":"MultiData.nmodalities","text":"nmodalities(md)\n\nReturn the number of modalities of a multimodal dataset.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.removemodality!-Tuple{AbstractMultiDataset, Integer}","page":"Manipulation","title":"MultiData.removemodality!","text":"removemodality!(md, indices)\nremovemodality!(md, index)\n\nRemove i-th modality from a multimodal dataset, and return the dataset.\n\nNote: to completely remove a modality and all variables in it use dropmodalities! instead.\n\nArguments\n\nmd is a MultiDataset;\nindex is an Integer that indicates which modality to remove from the multimodal dataset;\nindices is an AbstractVector{Integer} that indicates the modalities to remove from the multimodal dataset;\n\nExamples\n\njulia> df = DataFrame(:name => [\"Python\", \"Julia\"],\n :age => [25, 26],\n :sex => ['M', 'F'],\n :height => [180, 175],\n :weight => [80, 60])\n )\n2×5 DataFrame\n Row │ name age sex height weight\n │ String Int64 Char Int64 Int64\n─────┼─────────────────────────────────────\n 1 │ Python 25 M 180 80\n 2 │ Julia 26 F 175 60\n\njulia> md = MultiDataset([[1, 2],[3],[4],[5]], df)\n● MultiDataset\n └─ dimensionalities: (0, 0, 0, 0)\n- Modality 1 / 4\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 4\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Modality 3 / 4\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ height\n │ Int64\n─────┼────────\n 1 │ 180\n 2 │ 175\n- Modality 4 / 4\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n\njulia> removemodality!(md, [3])\n● MultiDataset\n └─ dimensionalities: (0, 0, 0)\n- Modality 1 / 3\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Modality 3 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ height\n │ Int64\n─────┼────────\n 1 │ 180\n 2 │ 175\n\njulia> removemodality!(md, [1,2])\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n- Spare variables\n └─ dimensionality: 0\n2×4 SubDataFrame\n Row │ name age sex height\n │ String Int64 Char Int64\n─────┼─────────────────────────────\n 1 │ Python 25 M 180\n 2 │ Julia 26 F 175\n\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.removevariable_frommodality!-Tuple{AbstractMultiDataset, Integer, Integer}","page":"Manipulation","title":"MultiData.removevariable_frommodality!","text":"removevariable_frommodality!(md, i_modality, var_indices)\nremovevariable_frommodality!(md, i_modality, var_index)\nremovevariable_frommodality!(md, i_modality, var_name)\nremovevariable_frommodality!(md, i_modality, var_names)\n\nRemove variable at index var_index from the modality at index i_modality in a multimodal dataset, and return the dataset itself.\n\nAlternatively to var_index the variable name can be used. Multiple variables can be dropped from the multimodal dataset at once, by passing a Vector of Symbols (for names), or a Vector of integers (for indices) as a last argument.\n\nNote: when all variables are dropped from a modality, it will be removed.\n\nArguments\n\nmd is a MultiDataset;\ni_modality is an Integer indicating the modality in which the variable(s) will be dropped;\nvar_index is an Integer that indicates the index of the variable to drop from a specific modality of the multimodal dataset;\nvar_indices is an AbstractVector{Integer} indicating the indices of the variables to drop from a specific modality of the multimodal dataset;\nvar_name is a Symbol indicating the name of the variable to drop from a specific modality of the multimodal dataset;\nvar_names is an AbstractVector{Symbol} indicating the name of the variables to drop from a specific modality of the multimodal dataset;\n\nExamples\n\njulia> df = DataFrame(:name => [\"Python\", \"Julia\"],\n :age => [25, 26],\n :sex => ['M', 'F'],\n :height => [180, 175],\n :weight => [80, 60])\n )\n2×5 DataFrame\n Row │ name age sex height weight\n │ String Int64 Char Int64 Int64\n─────┼─────────────────────────────────────\n 1 │ Python 25 M 180 80\n 2 │ Julia 26 F 175 60\n\njulia> md = MultiDataset([[1,2,4],[2,3,4],[5]], df)\n● MultiDataset\n └─ dimensionalities: (0, 0, 0)\n- Modality 1 / 3\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ name age height\n │ String Int64 Int64\n─────┼───────────────────────\n 1 │ Python 25 180\n 2 │ Julia 26 175\n- Modality 2 / 3\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ age sex height\n │ Int64 Char Int64\n─────┼─────────────────────\n 1 │ 25 M 180\n 2 │ 26 F 175\n- Modality 3 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n\njulia> removevariable_frommodality!(md, 3, 5)\n[ Info: Variable 5 was last variable of modality 3: removing modality\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ name age height\n │ String Int64 Int64\n─────┼───────────────────────\n 1 │ Python 25 180\n 2 │ Julia 26 175\n- Modality 2 / 2\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ age sex height\n │ Int64 Char Int64\n─────┼─────────────────────\n 1 │ 25 M 180\n 2 │ 26 F 175\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n\njulia> removevariable_frommodality!(md, 1, :age)\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name height\n │ String Int64\n─────┼────────────────\n 1 │ Python 180\n 2 │ Julia 175\n- Modality 2 / 2\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ age sex height\n │ Int64 Char Int64\n─────┼─────────────────────\n 1 │ 25 M 180\n 2 │ 26 F 175\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n\njulia> removevariable_frommodality!(md, 2, [3,4])\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name height\n │ String Int64\n─────┼────────────────\n 1 │ Python 180\n 2 │ Julia 175\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n- Spare variables\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ sex weight\n │ Char Int64\n─────┼──────────────\n 1 │ M 80\n 2 │ F 60\n\njulia> removevariable_frommodality!(md, 1, [:name,:height])\n[ Info: Variable 4 was last variable of modality 1: removing modality\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n- Spare variables\n └─ dimensionality: 0\n2×4 SubDataFrame\n Row │ name sex height weight\n │ String Char Int64 Int64\n─────┼──────────────────────────────\n 1 │ Python M 180 80\n 2 │ Julia F 175 60\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#man-variables","page":"Manipulation","title":"Variables","text":"","category":"section"},{"location":"manipulation/","page":"Manipulation","title":"Manipulation","text":"Modules = [MultiData]\nPages = [\"variables.jl\"]","category":"page"},{"location":"manipulation/#MultiData.dropsparevariables!-Tuple{AbstractMultiDataset}","page":"Manipulation","title":"MultiData.dropsparevariables!","text":"dropsparevariables!(md)\n\nDrop all variables that are not contained in any of the modalities in a multimodal dataset.\n\nArguments\n\nmd is a MultiDataset, that is the structure at which sparevariables will be dropped.\n\nExamples\n\njulia> md = MultiDataset([[1]], DataFrame(:age => [30, 9], :name => [\"Python\", \"Julia\"]))\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 30\n 2 │ 9\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n\n\njulia> dropsparevariables!(md)\n2×1 DataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.dropvariables!-Tuple{AbstractMultiDataset, Integer}","page":"Manipulation","title":"MultiData.dropvariables!","text":"dropvariables!(md, i)\ndropvariables!(md, variable_name)\ndropvariables!(md, indices)\ndropvariables!(md, variable_names)\ndropvariables!(md, i_modality, indices)\ndropvariables!(md, i_modality, variable_names)\n\nDrop the i-th variable from a multimodal dataset, and return the dataset itself.\n\nArguments\n\nmd is an MultiDataset;\ni is an Integer that indicates the index of the variable to drop;\nvariable_name is a Symbol that idicates the variable to drop;\nindices is an AbstractVector{Integer} that indicates the indices of the variables to drop;\nvariable_names is an AbstractVector{Symbol} that indicates the variables to drop.\ni_modality: index of the modality; if this argument is specified, indices are considered as relative to the i_modality-th modality\n\nExamples\n\njulia> md = MultiDataset([[1, 2],[3, 4, 5]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ sex height weight\n │ Char Int64 Int64\n─────┼──────────────────────\n 1 │ M 180 80\n 2 │ F 175 60\n\njulia> dropvariables!(md, 4)\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ sex weight\n │ Char Int64\n─────┼──────────────\n 1 │ M 80\n 2 │ F 60\n\njulia> dropvariables!(md, :name)\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ sex weight\n │ Char Int64\n─────┼──────────────\n 1 │ M 80\n 2 │ F 60\n\njulia> dropvariables!(md, [1,3])\n[ Info: Variable 1 was last variable of modality 1: removing modality\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\nTODO: To be reviewed\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.hasvariables-Tuple{AbstractDataFrame, Symbol}","page":"Manipulation","title":"MultiData.hasvariables","text":"hasvariables(df, variable_name)\nhasvariables(md, i_modality, variable_name)\nhasvariables(md, variable_name)\nhasvariables(df, variable_names)\nhasvariables(md, i_modality, variable_names)\nhasvariables(md, variable_names)\n\nCheck whether a multimodal dataset contains a variable named variable_name.\n\nInstead of a single variable name a Vector of names can be passed. If this is the case, this function will return true only if md contains all the specified variables.\n\nArguments\n\ndf is an AbstractDataFrame, which is one of the two structure in which you want to check the presence of the variable;\nmd is an AbstractMultiDataset, which is one of the two structure in which you want to check the presence of the variable;\nvariable_name is a Symbol indicating the variable, whose existence I want to verify;\ni_modality is an Integer indicating in which modality to look for the variable.\n\nExamples\n\njulia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> hasvariables(md, :age)\ntrue\n\njulia> hasvariables(md.data, :name)\ntrue\n\njulia> hasvariables(md, :height)\nfalse\n\njulia> hasvariables(md, 1, :sex)\nfalse\n\njulia> hasvariables(md, 2, :sex)\ntrue\n\njulia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> hasvariables(md, [:sex, :age])\ntrue\n\njulia> hasvariables(md, 1, [:sex])\nfalse\n\njulia> hasvariables(md, 2, [:sex])\ntrue\n\njulia> hasvariables(md.data, [:name, :sex])\ntrue\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.insertvariables!-Tuple{AbstractMultiDataset, Integer, Symbol, AbstractVector}","page":"Manipulation","title":"MultiData.insertvariables!","text":"insertvariables!(md, col, index, values)\ninsertvariables!(md, index, values)\ninsertvariables!(md, col, index, value)\ninsertvariables!(md, index, value)\n\nInsert a variable in a multimodal dataset with a given index.\n\nnote: Note\nEach inserted variable will be added in as a spare variables.\n\nArguments\n\nmd is an AbstractMultiDataset;\ncol is an Integer indicating in which position to insert the new variable. If no col is passed, the new variable will be placed last in the md's underlying dataframe structure;\nindex is a Symbol and denote the name of the variable to insert. Duplicated variable names will be renamed to avoid conflicts: see makeunique argument for insertcols! in DataFrames documentation;\nvalues is an AbstractVector that indicates the values for the newly inserted variable. The length of values should match ninstances(md);\nvalue is a single value for the new variable. If a single value is passed as a last argument this will be copied and used for each instance in the dataset.\n\nExamples\n\njulia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> insertvariables!(md, :weight, [80, 75])\n2×4 DataFrame\n Row │ name age sex weight\n │ String Int64 Char Int64\n─────┼─────────────────────────────\n 1 │ Python 25 M 80\n 2 │ Julia 26 F 75\n\njulia> md\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 75\n\njulia> insertvariables!(md, 2, :height, 180)\n2×5 DataFrame\n Row │ name height age sex weight\n │ String Int64 Int64 Char Int64\n─────┼─────────────────────────────────────\n 1 │ Python 180 25 M 80\n 2 │ Julia 180 26 F 75\n\njulia> insertvariables!(md, :hair, [\"brown\", \"blonde\"])\n2×6 DataFrame\n Row │ name height age sex weight hair\n │ String Int64 Int64 Char Int64 String\n─────┼─────────────────────────────────────────────\n 1 │ Python 180 25 M 80 brown\n 2 │ Julia 180 26 F 75 blonde\n\njulia> md\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Spare variables\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ height weight hair\n │ Int64 Int64 String\n─────┼────────────────────────\n 1 │ 180 80 brown\n 2 │ 180 75 blonde\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.keeponlyvariables!-Tuple{AbstractMultiDataset, AbstractVector{<:Integer}}","page":"Manipulation","title":"MultiData.keeponlyvariables!","text":"keeponlyvariables!(md, indices)\nkeeponlyvariables!(md, variable_names)\n\nDrop all variables that do not correspond to the indices in indices from a multimodal dataset.\n\nNote: if the dropped variables are contained in some modality they will also be removed from them; as a side effect, this can lead to the removal of modalities.\n\nArguments\n\nmd is a MultiDataset;\nindices is an AbstractVector{Integer} that indicates which indices to keep in the multimodal dataset;\nvariable_names is an AbstractVector{Symbol} that indicates which variables to keep in the multimodal dataset.\n\nExamples\n\njulia> md = MultiDataset([[1, 2],[3, 4, 5],[5]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))\n● MultiDataset\n └─ dimensionalities: (0, 0, 0)\n- Modality 1 / 3\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 3\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ sex height weight\n │ Char Int64 Int64\n─────┼──────────────────────\n 1 │ M 180 80\n 2 │ F 175 60\n- Modality 3 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n\njulia> keeponlyvariables!(md, [1,3,4])\n[ Info: Variable 5 was last variable of modality 3: removing modality\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ sex height\n │ Char Int64\n─────┼──────────────\n 1 │ M 180\n 2 │ F 175\n\njulia> keeponlyvariables!(md, [:name, :sex])\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\nTODO: review\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.nvariables-Tuple{AbstractDataFrame}","page":"Manipulation","title":"MultiData.nvariables","text":"nvariables(md)\nnvariables(md, i)\n\nReturn the number of variables in a multimodal dataset.\n\nIf an index i is passed as second argument, then the number of variables of the i-th modality is returned.\n\nAlternatively, nvariables can be called on a single modality.\n\nArguments\n\nmd is a MultiDataset;\ni (optional) is an Integer indicating the modality of the multimodal dataset whose number of variables you want to know.\n\nExamples\n\njulia> md = MultiDataset([[1],[2]], DataFrame(:age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\n\njulia> nvariables(md)\n2\n\njulia> nvariables(md, 2)\n1\n\njulia> mod2 = modality(md, 2)\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> nvariables(mod2)\n1\n\njulia> md = MultiDataset([[1, 2],[3, 4, 5]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ sex height weight\n │ Char Int64 Int64\n─────┼──────────────────────\n 1 │ M 180 80\n 2 │ F 175 60\n\njulia> nvariables(md)\n5\n\njulia> nvariables(md, 2)\n3\n\njulia> mod2 = modality(md,2)\n2×3 SubDataFrame\n Row │ sex height weight\n │ Char Int64 Int64\n─────┼──────────────────────\n 1 │ M 180 80\n 2 │ F 175 60\n\njulia> nvariables(mod2)\n3\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.sparevariables-Tuple{AbstractMultiDataset}","page":"Manipulation","title":"MultiData.sparevariables","text":"sparevariables(md)\n\nReturn the indices of all the variables that are not contained in any of the modalities of a multimodal dataset.\n\nArguments\n\nmd is a MultiDataset, which is the structure whose indices of the sparevariables are to be known.\n\nExamples\n\njulia> md = MultiDataset([[1],[3]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n\njulia> md.data\n2×3 DataFrame\n Row │ name age sex\n │ String Int64 Char\n─────┼─────────────────────\n 1 │ Python 25 M\n 2 │ Julia 26 F\n\njulia> sparevariables(md)\n1-element Vector{Int64}:\n 2\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.variableindex-Tuple{AbstractDataFrame, Symbol}","page":"Manipulation","title":"MultiData.variableindex","text":"variableindex(df, variable_name)\nvariableindex(md, i_modality, variable_name)\nvariableindex(md, variable_name)\n\nReturn the index of the variable. When i_modality is passed, the function returns the index of the variable in the sub-dataframe of the modality identified by i_modality. It returns 0 when the variable is not contained in the modality identified by i_modality.\n\nArguments\n\ndf is an AbstractDataFrame;\nmd is an AbstractMultiDataset;\nvariable_name is a Symbol indicating the variable whose index you want to know;\ni_modality is an Integer indicating of which modality you want to know the index of the variable.\n\nExamples\n\njulia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> md.data\n2×3 DataFrame\n Row │ name age sex\n │ String Int64 Char\n─────┼─────────────────────\n 1 │ Python 25 M\n 2 │ Julia 26 F\n\njulia> variableindex(md, :age)\n2\n\njulia> variableindex(md, :sex)\n3\n\njulia> variableindex(md, 1, :name)\n1\n\njulia> variableindex(md, 2, :name)\n0\n\njulia> variableindex(md, 2, :sex)\n1\n\njulia> variableindex(md.data, :age)\n2\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.variables-Tuple{AbstractDataFrame}","page":"Manipulation","title":"MultiData.variables","text":"variables(md, i)\n\nReturn the names as Symbols of the variables in a multimodal dataset.\n\nWhen called on a object of type MultiDataset a Dict is returned which will map the modality index to an AbstractVector{Symbol}.\n\nNote: the order of the variable names is granted to match the order of the variables in the modality.\n\nIf an index i is passed as second argument, then the names of the variables of the i-th modality are returned as an AbstractVector.\n\nAlternatively, nvariables can be called on a single modality.\n\nArguments\n\nmd is an MultiDataset;\ni is an Integer indicating from which modality of the multimodal dataset to get the names of the variables.\n\nExamples\n\njulia> md = MultiDataset([[2],[3]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n\njulia> variables(md)\nDict{Integer, AbstractVector{Symbol}} with 2 entries:\n 2 => [:sex]\n 1 => [:age]\n\njulia> variables(md, 2)\n1-element Vector{Symbol}:\n :sex\n\njulia> variables(md, 1)\n1-element Vector{Symbol}:\n :age\n\njulia> mod2 = modality(md, 2)\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> variables(mod2)\n1-element Vector{Symbol}:\n :sex\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#man-instances","page":"Manipulation","title":"Instances","text":"","category":"section"},{"location":"manipulation/","page":"Manipulation","title":"Manipulation","text":"Modules = [MultiData]\nPages = [\"instances.jl\"]","category":"page"},{"location":"manipulation/#MultiData.deleteinstances!-Tuple{AbstractMultiDataset, AbstractVector{<:Integer}}","page":"Manipulation","title":"MultiData.deleteinstances!","text":"deleteinstances!(md, i)\n\nRemove the i-th instance in a multimodal dataset, and return the dataset itself.\n\ndeleteinstances!(md, i_instances)\n\nRemove the instances at i_instances in a multimodal dataset, and return the dataset itself.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.instance-Tuple{AbstractDataFrame, Integer}","page":"Manipulation","title":"MultiData.instance","text":"instance(md, i)\n\nReturn the i-th instance in a multimodal dataset.\n\ninstance(md, i_modality, i_instance)\n\nReturn the i_instance-th instance in a multimodal dataset with only variables from the the i_modality-th modality.\n\ninstance(md, i_instances)\n\nReturn instances at i_instances in a multimodal dataset.\n\ninstance(md, i_modality, i_instances)\n\nReturn iinstances at `iinstancesin a multimodal dataset with only variables from the thei_modality`-th modality.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.keeponlyinstances!-Tuple{AbstractMultiDataset, AbstractVector{<:Integer}}","page":"Manipulation","title":"MultiData.keeponlyinstances!","text":"keeponlyinstances!(md, i_instances)\n\nRemove all instances from a multimodal dataset, which index does not appear in i_instances.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.pushinstances!-Tuple{AbstractMultiDataset, DataFrameRow}","page":"Manipulation","title":"MultiData.pushinstances!","text":"pushinstances!(md, instance)\n\nAdd an instance to a multimodal dataset, and return the dataset itself.\n\nThe instance can be a DataFrameRow or an AbstractVector but in both cases the number and type of variables should match those of the dataset.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#SoleBase.ninstances-Tuple{AbstractDataFrame}","page":"Manipulation","title":"SoleBase.ninstances","text":"ninstances(md)\n\nReturn the number of instances in a multimodal dataset.\n\nExamples\n\njulia> md = MultiDataset([[1],[2]],DataFrame(:age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> mod2 = modality(md, 2)\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> ninstances(md) == ninstances(mod2) == 2\ntrue\n\n\n\n\n\n","category":"method"},{"location":"filesystem/","page":"Filesystem","title":"Filesystem","text":"CurrentModule = MultiData","category":"page"},{"location":"filesystem/#man-filesystem","page":"Filesystem","title":"Filesystem","text":"","category":"section"},{"location":"filesystem/","page":"Filesystem","title":"Filesystem","text":"Pages = [\"filesystem.md\"]","category":"page"},{"location":"filesystem/","page":"Filesystem","title":"Filesystem","text":"Modules = [MultiData]\nPages = [\"filesystem.jl\"]","category":"page"},{"location":"filesystem/#MultiData.datasetinfo-Tuple{AbstractString}","page":"Filesystem","title":"MultiData.datasetinfo","text":"datasetinfo(datasetpath; onlywithlabels = [], shufflelabels = [], rng = Random.GLOBAL_RNG)\n\nShow dataset size on disk and return a Touple with first element a vector of selected IDs, second element the labels DataFrame or nothing and third element the total size in bytes.\n\nArguments\n\nonlywithlabels is used to select which portion of the Dataset to load, by specifying labels and their values to use as filters. See loaddataset for more info.\nshufflelabels is an AbstractVector of names of labels to shuffle (default = [], means no shuffle).\nrng is a random number generator to be used when shuffling (for reproducibility); can be either a Integer (used as seed for MersenneTwister) or an AbstractRNG.\n\n\n\n\n\n","category":"method"},{"location":"filesystem/#MultiData.loaddataset-Tuple{AbstractString}","page":"Filesystem","title":"MultiData.loaddataset","text":"loaddataset(datasetpath; onlywithlabels = [], shufflelabels = [], rng = Random.GLOBAL_RNG)\n\nCreate a MultiDataset or a LabeledMultiDataset from a Dataset, based on the presence of file Labels.csv.\n\nArguments\n\ndatasetpath is an AbstractString that denote the Dataset's position;\nonlywithlabels is an AbstractVector{AbstractVector{Pair{AbstractString,AbstractVector{Any}}}} and it's used to select which portion of the Dataset to load, by specifying labels and their values. Beginning from the center, each Pair{AbstractString,AbstractVector{Any}} must contain, as AbstractString the label's name, and, as AbstractVector{Any} the values for that label. Each Pair in one vector must refer to a different label, so if the Dataset has in total n labels, this vector of Pair can contain maximun n element. That's because the elements will combine with each other. Every vector of Pair act as a filter. Note that the same label can be used in different vector of Pair as they do not combine with each other. If onlywithlabels is an empty vector (default) the function will load the entire Dataset.\nshufflelabels is an AbstractVector of names of labels to shuffle (default = [], means no shuffle).\nrng is a random number generator to be used when shuffling (for reproducibility); can be either a Integer (used as seed for MersenneTwister) or an AbstractRNG.\n\nExamples\n\njulia> df_data = DataFrame(\n :id => [1, 2, 3, 4, 5],\n :age => [30, 9, 30, 40, 9],\n :name => [\"Python\", \"Julia\", \"C\", \"Java\", \"R\"],\n :stat => [deepcopy(ts_sin), deepcopy(ts_cos), deepcopy(ts_sin), deepcopy(ts_cos), deepcopy(ts_sin)]\n )\n5×4 DataFrame\n Row │ id age name stat\n │ Int64 Int64 String Array…\n─────┼─────────────────────────────────────────────────────────\n 1 │ 1 30 Python [0.841471, 0.909297, 0.14112, -0…\n 2 │ 2 9 Julia [0.540302, -0.416147, -0.989992,…\n 3 │ 3 30 C [0.841471, 0.909297, 0.14112, -0…\n 4 │ 4 40 Java [0.540302, -0.416147, -0.989992,…\n 5 │ 5 9 R [0.841471, 0.909297, 0.14112, -0…\n\njulia> lmd = LabeledMultiDataset(\n MultiDataset([[4]], deepcopy(df_data)),\n [2,3],\n)\n● LabeledMultiDataset\n ├─ labels\n │ ├─ age: Set([9, 30, 40])\n │ └─ name: Set([\"C\", \"Julia\", \"Python\", \"Java\", \"R\"])\n └─ dimensionalities: (1,)\n- Modality 1 / 1\n └─ dimensionality: 1\n5×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n 3 │ [0.841471, 0.909297, 0.14112, -0…\n 4 │ [0.540302, -0.416147, -0.989992,…\n 5 │ [0.841471, 0.909297, 0.14112, -0…\n- Spare variables\n └─ dimensionality: 0\n5×1 SubDataFrame\n Row │ id\n │ Int64\n─────┼───────\n 1 │ 1\n 2 │ 2\n 3 │ 3\n 4 │ 4\n 5 │ 5\n\njulia> savedataset(\"langs\", lmd, force = true)\n\njulia> loaddataset(\"langs\", onlywithlabels = [ [\"name\" => [\"Julia\"], \"age\" => [\"9\"]] ] )\nInstances count: 1\nTotal size: 981670 bytes\n● LabeledMultiDataset\n ├─ labels\n │ ├─ age: Set([\"9\"])\n │ └─ name: Set([\"Julia\"])\n └─ dimensionalities: (1,)\n- Modality 1 / 1\n └─ dimensionality: 1\n1×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.540302, -0.416147, -0.989992,…\n- Spare variables\n └─ dimensionality: 0\n1×1 SubDataFrame\n Row │ id\n │ Int64\n─────┼───────\n 1 │ 2\n\njulia> loaddataset(\"langs\", onlywithlabels = [ [\"name\" => [\"Julia\"], \"age\" => [\"30\"]] ] )\nInstances count: 0\nTotal size: 0 bytes\nERROR: AssertionError: No instance found\n\njulia> loaddataset(\"langs\", onlywithlabels = [ [\"name\" => [\"Julia\"]] , [\"age\" => [\"9\"]] ] )\nInstances count: 2\nTotal size: 1963537 bytes\n● LabeledMultiDataset\n ├─ labels\n │ ├─ age: Set([\"9\"])\n │ └─ name: Set([\"Julia\", \"R\"])\n └─ dimensionalities: (1,)\n- Modality 1 / 1\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.540302, -0.416147, -0.989992,…\n 2 │ [0.841471, 0.909297, 0.14112, -0…\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ id\n │ Int64\n─────┼───────\n 1 │ 2\n 2 │ 5\n\njulia> loaddataset(\"langs\", onlywithlabels = [ [\"name\" => [\"Julia\"]], [\"name\" => [\"C\"], \"age\" => [\"30\"]] ] )\nInstances count: 2\nTotal size: 1963537 bytes\n● LabeledMultiDataset\n ├─ labels\n │ ├─ age: Set([\"9\", \"30\"])\n │ └─ name: Set([\"C\", \"Julia\"])\n └─ dimensionalities: (1,)\n- Modality 1 / 1\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.540302, -0.416147, -0.989992,…\n 2 │ [0.841471, 0.909297, 0.14112, -0…\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ id\n │ Int64\n─────┼───────\n 1 │ 2\n 2 │ 3\n\n\n\n\n\n","category":"method"},{"location":"filesystem/#MultiData.savedataset-Tuple{AbstractString, AbstractMultiDataset}","page":"Filesystem","title":"MultiData.savedataset","text":"savedataset(datasetpath, md; instance_ids, name, force = false)\n\nSave md AbstractMultiDataset on disk at path datasetpath in the following format:\n\ndatasetpath ├─ Example1 │ └─ Modality1.csv │ └─ Modality2.csv │ └─ ... │ └─ Modalityn.csv │ └─ Metadata.txt ├─ Example2 │ └─ Modality1.csv │ └─ Modality2.csv │ └─ ... │ └─ Modalityn.csv │ └─ Metadata.txt ├─ ... ├─ Example_n ├─ Metadata.txt └─ Labels.csv\n\nArguments\n\ninstance_ids is an AbstractVector{Integer} that denote the identifier of the instances,\nname is an AbstractString and denote the name of the Dataset, that will be saved in the Metadata of the Dataset,\nforce is a Bool, if it's set to true, then in case datasetpath already exists, it will be overwritten otherwise the operation will be aborted. (default = false)\nlabels_indices is an AbstractVector{Integer} and contains the indices of the labels' column (allowed only when passing a MultiDataset)\n\nAlternatively to an AbstractMultiDataset, a DataFrame can be passed as second argument. If this is the case a third positional argument is required representing the grouped_variables of the dataset. See MultiDataset for syntax of grouped_variables.\n\n\n\n\n\n","category":"method"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = MultiData","category":"page"},{"location":"#MultiData","page":"Home","title":"MultiData","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The aim of this package is to provide a simple and comfortable interface for managing multimodal data. It is built on top of DataFrames.jl with Machine learning applications in mind.","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Currently this packages is still not registered so you need to run the following commands in a Julia REPL to install it:","category":"page"},{"location":"","page":"Home","title":"Home","text":"import Pkg\nPkg.add(\"MultiData\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"To install the developement version, run:","category":"page"},{"location":"","page":"Home","title":"Home","text":"import Pkg\nPkg.add(\"https://github.com/aclai-lab/MultiData.jl#dev\")","category":"page"},{"location":"#Usage","page":"Home","title":"Usage","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"To instantiate a multimodal dataset, use the MultiDataset constructor by providing: a) a DataFrame containing all variables from different modalities, and b) a Vector{Vector{Union{Symbol,String,Int64}}} object representing a grouping of some of the variables (identified by column index or name) into different modalities.","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> using MultiData\n\njulia> ts_cos = [cos(i) for i in 1:50000];\n\njulia> ts_sin = [sin(i) for i in 1:50000];\n\njulia> df_data = DataFrame(\n :id => [1, 2],\n :age => [30, 9],\n :name => [\"Python\", \"Julia\"],\n :stat => [deepcopy(ts_sin), deepcopy(ts_cos)]\n )\n2×4 DataFrame\n Row │ id age name stat \n │ Int64 Int64 String Array… \n─────┼─────────────────────────────────────────────────────────\n 1 │ 1 30 Python [0.841471, 0.909297, 0.14112, -0…\n 2 │ 2 9 Julia [0.540302, -0.416147, -0.989992,…\n\njulia> grouped_variables = [[2,3], [4]]; # group 2nd and 3rd variables in the first modality\n # the 4th variable in the second modality and\n # leave the first variable as a \"spare variable\"\n\njulia> md = MultiDataset(df_data, grouped_variables)\n● MultiDataset\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age name \n │ Int64 String\n─────┼───────────────\n 1 │ 30 Python\n 2 │ 9 Julia\n- Modality 2 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat \n │ Array… \n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ id \n │ Int64\n─────┼───────\n 1 │ 1\n 2 │ 2\n","category":"page"},{"location":"","page":"Home","title":"Home","text":"Now md holds a MultiDataset and all of its modalities can be conveniently iterated as elements of a Vector:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> for (i, f) in enumerate(md)\n println(\"Modality: \", i)\n println(f)\n println()\n end\nModality: 1\n2×2 SubDataFrame\n Row │ age name \n │ Int64 String\n─────┼───────────────\n 1 │ 30 Python\n 2 │ 9 Julia\n\nModality: 2\n2×1 SubDataFrame\n Row │ stat \n │ Array… \n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…","category":"page"},{"location":"","page":"Home","title":"Home","text":"Note that each element of a MultiDataset is a SubDataFrame:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> eltype(md)\nSubDataFrame\n","category":"page"},{"location":"","page":"Home","title":"Home","text":"note: Spare variables\nSpare variables will never be seen when accessing a MultiDataset through its iterator interface. To access them see sparevariables.","category":"page"},{"location":"utils/","page":"Utils","title":"Utils","text":"CurrentModule = MultiData","category":"page"},{"location":"utils/#man-utils","page":"Utils","title":"Utils","text":"","category":"section"},{"location":"utils/","page":"Utils","title":"Utils","text":"Pages = [\"utils.md\"]","category":"page"},{"location":"utils/","page":"Utils","title":"Utils","text":"paa\nlinearize_data\nunlinearize_data","category":"page"},{"location":"utils/#MultiData.paa","page":"Utils","title":"MultiData.paa","text":"paa(x; f = identity, t = (1, 0, 0))\n\nPiecewise Aggregate Approximation\n\nApply f function to each dimensionality of x array divinding it in t[1] windows taking t[2] extra points left and t[3] extra points right.\n\nNote: first window will always consider t[2] = 0 and last one will always consider t[3] = 0.\n\n\n\n\n\n","category":"function"},{"location":"utils/#MultiData.linearize_data","page":"Utils","title":"MultiData.linearize_data","text":"linearize_data(d)\n\nLinearize dimensional object d.\n\n\n\n\n\n","category":"function"},{"location":"utils/#MultiData.unlinearize_data","page":"Utils","title":"MultiData.unlinearize_data","text":"unlinearize_data(d, dims)\n\nUnlinearize a vector d to a shape dims.\n\n\n\n\n\n","category":"function"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"CurrentModule = MultiData","category":"page"},{"location":"datasets/#man-datasets","page":"Datasets","title":"Datasets","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Pages = [\"datasets.md\"]","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"A machine learning dataset are a collection of instances (or samples), each one described by a number of variables. In the case of tabular data, a dataset looks like a database table, where every column is a variable, and each row corresponds to a given instance. However, a dataset can also be non-tabular; for example, each instance can consist of a multivariate time-series, or an image.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"When data is composed of different modalities) combining their statistical properties is non-trivial, since they may be quite different in nature one another.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"The abstract representation of a multimodal dataset provided by this package is the AbstractMultiDataset.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"AbstractMultiDataset\ngrouped_variables\ndata\ndimensionality","category":"page"},{"location":"datasets/#MultiData.AbstractMultiDataset","page":"Datasets","title":"MultiData.AbstractMultiDataset","text":"Abstract supertype for all multimodal datasets.\n\nA concrete multimodal dataset should always provide accessors data, to access the underlying tabular structure (e.g., DataFrame) and grouped_variables, to access the grouping of variables (a vector of vectors of column indices).\n\n\n\n\n\n","category":"type"},{"location":"datasets/#MultiData.grouped_variables","page":"Datasets","title":"MultiData.grouped_variables","text":"grouped_variables(amd)::Vector{Vector{Int}}\n\nReturn the indices of the variables grouped by modality, of an AbstractMultiDataset. The grouping describes how the different modalities are composed from the underlying AbstractDataFrame structure.\n\nSee also data, AbstractMultiDataset.\n\n\n\n\n\n","category":"function"},{"location":"datasets/#MultiData.data","page":"Datasets","title":"MultiData.data","text":"data(amd)::AbstractDataFrame\n\nReturn the structure that underlies an AbstractMultiDataset.\n\nSee also grouped_variables, AbstractMultiDataset.\n\n\n\n\n\n","category":"function"},{"location":"datasets/#SoleBase.dimensionality","page":"Datasets","title":"SoleBase.dimensionality","text":"dimensionality(df)\n\nReturn the dimensionality of a dataframe df.\n\nIf the dataframe has variables of various dimensionalities :mixed is returned.\n\nIf the dataframe is empty (no instances) :empty is returned. This behavior can be controlled by setting the keyword argument force:\n\n:no (default): return :mixed in case of mixed dimensionality\n:max: return the greatest dimensionality\n:min: return the lowest dimensionality\n\n\n\n\n\n","category":"function"},{"location":"datasets/#man-unlabeled-datasets","page":"Datasets","title":"Unlabeled Datasets","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"In unlabeled datasets there is no labeling variable, and all of the variables (also called feature variables, or features) have equal role in the representation. These datasets are used in unsupervised learning contexts, for discovering internal correlation patterns between the features. Multimodal unlabeled datasets can be instantiated with MultiDataset.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Modules = [MultiData]\nPages = [\"src/MultiDataset.jl\"]","category":"page"},{"location":"datasets/#MultiData.MultiDataset","page":"Datasets","title":"MultiData.MultiDataset","text":"MultiDataset(df, grouped_variables)\n\nCreate a MultiDataset from an AbstractDataFrame df, initializing its modalities according to the grouping in grouped_variables.\n\ngrouped_variables is an AbstractVector of variable grouping which are AbstractVectors of integers representing the index of the variables selected for that modality.\n\nNote that the order matters for both the modalities and the variables.\n\njulia> df = DataFrame(\n :age => [30, 9],\n :name => [\"Python\", \"Julia\"],\n :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]],\n :stat2 => [[cos(i) for i in 1:50000], [sin(i) for i in 1:50000]]\n )\n2×4 DataFrame\n Row │ age name stat1 stat2 ⋯\n │ Int64 String Array… Array… ⋯\n─────┼──────────────────────────────────────────────────────────────────────────────────────\n 1 │ 30 Python [0.841471, 0.909297, 0.14112, -0… [0.540302, -0.416147, -0.989992,… ⋯\n 2 │ 9 Julia [0.540302, -0.416147, -0.989992,… [0.841471, 0.909297, 0.14112, -0…\n\njulia> md = MultiDataset([[2]], df)\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Spare variables\n └─ dimensionality: mixed\n2×3 SubDataFrame\n Row │ age stat1 stat2\n │ Int64 Array… Array…\n─────┼─────────────────────────────────────────────────────────────────────────────\n 1 │ 30 [0.841471, 0.909297, 0.14112, -0… [0.540302, -0.416147, -0.989992,…\n 2 │ 9 [0.540302, -0.416147, -0.989992,… [0.841471, 0.909297, 0.14112, -0…\n\nMultiDataset(df; group = :none)\n\nCreate a MultiDataset from an AbstractDataFrame df, automatically selecting modalities.\n\nThe selection of modalities can be controlled by the group argument which can be:\n\n:none (default): no modality will be created\n:all: all variables will be grouped by their dimensionality\na list of dimensionalities which will be grouped.\n\nNote: :all and :none are the only Symbols accepted by group.\n\nTODO: fix passing a vector of Integer to group\n\nTODO: rewrite examples\n\nExamples\n\njulia> df = DataFrame(\n :age => [30, 9],\n :name => [\"Python\", \"Julia\"],\n :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]],\n :stat2 => [[cos(i) for i in 1:50000], [sin(i) for i in 1:50000]]\n )\n2×4 DataFrame\n Row │ age name stat1 stat2 ⋯\n │ Int64 String Array… Array… ⋯\n─────┼──────────────────────────────────────────────────────────────────────────────────────\n 1 │ 30 Python [0.841471, 0.909297, 0.14112, -0… [0.540302, -0.416147, -0.989992,… ⋯\n 2 │ 9 Julia [0.540302, -0.416147, -0.989992,… [0.841471, 0.909297, 0.14112, -0…\n\njulia> md = MultiDataset(df)\n● MultiDataset\n └─ dimensionalities: ()\n- Spare variables\n └─ dimensionality: mixed\n2×4 SubDataFrame\n Row │ age name stat1 stat2 ⋯\n │ Int64 String Array… Array… ⋯\n─────┼──────────────────────────────────────────────────────────────────────────────────────\n 1 │ 30 Python [0.841471, 0.909297, 0.14112, -0… [0.540302, -0.416147, -0.989992,… ⋯\n 2 │ 9 Julia [0.540302, -0.416147, -0.989992,… [0.841471, 0.909297, 0.14112, -0…\n\n\njulia> md = MultiDataset(df; group = :all)\n● MultiDataset\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age name\n │ Int64 String\n─────┼───────────────\n 1 │ 30 Python\n 2 │ 9 Julia\n- Modality 2 / 2\n └─ dimensionality: 1\n2×2 SubDataFrame\n Row │ stat1 stat2\n │ Array… Array…\n─────┼──────────────────────────────────────────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0… [0.540302, -0.416147, -0.989992,…\n 2 │ [0.540302, -0.416147, -0.989992,… [0.841471, 0.909297, 0.14112, -0…\n\n\njulia> md = MultiDataset(df; group = [0])\n● MultiDataset\n └─ dimensionalities: (0, 1, 1)\n- Modality 1 / 3\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age name\n │ Int64 String\n─────┼───────────────\n 1 │ 30 Python\n 2 │ 9 Julia\n- Modality 2 / 3\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n- Modality 3 / 3\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat2\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.540302, -0.416147, -0.989992,…\n 2 │ [0.841471, 0.909297, 0.14112, -0…\n\n\n\n\n\n","category":"type"},{"location":"datasets/#MultiData._empty-Tuple{MultiDataset}","page":"Datasets","title":"MultiData._empty","text":"_empty(md)\n\nReturn a copy of a multimodal dataset with no instances.\n\nNote: since the returned AbstractMultiDataset will be empty its columns types will be Any.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#man-supervised-datasets","page":"Datasets","title":"Labeled Datasets","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"In labeled datasets, one or more variables are considered to have special semantics with respect to the other variables; each of these labeling variables (or target variables) can be thought as assigning a label to each instance, which is typically a categorical value (classification label) or a numerical value (regression label). Supervised learning methods can be applied on these datasets for modeling the target variables as a function of the feature variables.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"As an extension of the AbstractMultiDataset, AbstractLabeledMultiDataset has an interface that can be implemented to represent multimodal labeled datasets.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"AbstractLabeledMultiDataset\nlabeling_variables\ndataset","category":"page"},{"location":"datasets/#MultiData.AbstractLabeledMultiDataset","page":"Datasets","title":"MultiData.AbstractLabeledMultiDataset","text":"Abstract supertype for all labelled multimodal datasets (used in supervised learning).\n\nAs any multimodal dataset, any concrete labeled multimodal dataset should always provide the accessors data, to access the underlying tabular structure (e.g., DataFrame) and grouped_variables, to access the grouping of variables. In addition to these, implementations are required for labeling_variables, to access the indices of the labeling variables.\n\nSee also AbstractMultiDataset.\n\n\n\n\n\n","category":"type"},{"location":"datasets/#MultiData.labeling_variables","page":"Datasets","title":"MultiData.labeling_variables","text":"labeling_variables(almd)::Vector{Int}\n\nReturn the indices of the labelling variables, of the AbstractLabeledMultiDataset. with respect to the underlying AbstractDataFrame structure (see data).\n\nSee also grouped_variables, AbstractLabeledMultiDataset.\n\n\n\n\n\n","category":"function"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Multimodal labeled datasets can be instantiated with LabeledMultiDataset.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Modules = [MultiData]\nPages = [\"LabeledMultiDataset.jl\", \"labels.jl\"]","category":"page"},{"location":"datasets/#MultiData.LabeledMultiDataset","page":"Datasets","title":"MultiData.LabeledMultiDataset","text":"LabeledMultiDataset(md, labeling_variables)\n\nCreate a LabeledMultiDataset by associating an AbstractMultiDataset with some labeling variables, specified as a column index (Int) or a vector of column indices (Vector{Int}).\n\nArguments\n\nmd is the original AbstractMultiDataset;\nlabeling_variables is an AbstractVector of integers indicating the indices of the variables that will be set as labels.\n\nExamples\n\njulia> lmd = LabeledMultiDataset(MultiDataset([[2],[4]], DataFrame(\n :id => [1, 2],\n :age => [30, 9],\n :name => [\"Python\", \"Julia\"],\n :stat => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]\n )), [1, 3])\n● LabeledMultiDataset\n ├─ labels\n │ ├─ id: Set([2, 1])\n │ └─ name: Set([\"Julia\", \"Python\"])\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 30\n 2 │ 9\n- Modality 2 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n\n\n\n\n\n\n","category":"type"},{"location":"datasets/#MultiData.joinlabels!-Tuple{AbstractLabeledMultiDataset, Vararg{Symbol}}","page":"Datasets","title":"MultiData.joinlabels!","text":"joinlabels!(lmd, [lbls...]; delim = \"_\")\n\nOn a labeled multimodal dataset, collapse the labeling variables identified by lbls into a single labeling variable of type String, by means of a join that uses delim for string delimiter.\n\nIf not specified differently this function will join all labels.\n\nlbls can be an Integer indicating the index of the label, or a Symbol indicating the name of the labeling variable.\n\n!!! note\n\nThe resulting labels will always be of type String.\n\nnote: Note\nThe resulting labeling variable will always be added as last column in the underlying DataFrame.\n\nExamples\n\njulia> lmd = LabeledMultiDataset(\n MultiDataset(\n [[2],[4]],\n DataFrame(\n :id => [1, 2],\n :age => [30, 9],\n :name => [\"Python\", \"Julia\"],\n :stat => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]\n )\n ),\n [1, 3],\n )\n● LabeledMultiDataset\n ├─ labels\n │ ├─ id: Set([2, 1])\n │ └─ name: Set([\"Julia\", \"Python\"])\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 30\n 2 │ 9\n- Modality 2 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n\n\njulia> joinlabels!(lmd)\n● LabeledMultiDataset\n ├─ labels\n │ └─ id_name: Set([\"1_Python\", \"2_Julia\"])\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 30\n 2 │ 9\n- Modality 2 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MultiData.label-Tuple{AbstractLabeledMultiDataset, Integer, Integer}","page":"Datasets","title":"MultiData.label","text":"label(lmd, j, i)\n\nReturn the value of the i-th labeling variable for instance at index i_instance in a labeled multimodal dataset.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MultiData.labeldomain-Tuple{AbstractLabeledMultiDataset, Integer}","page":"Datasets","title":"MultiData.labeldomain","text":"labeldomain(lmd, i)\n\nReturn the domain of i-th label of a labeled multimodal dataset.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MultiData.labels-Tuple{AbstractLabeledMultiDataset}","page":"Datasets","title":"MultiData.labels","text":"labels(lmd, i_instance)\nlabels(lmd)\n\nReturn the labels of instance at index i_instance in a labeled multimodal dataset. A dictionary of type labelname => value is returned.\n\nIf only the first argument is passed then the labels for all instances are returned.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MultiData.nlabelingvariables-Tuple{AbstractLabeledMultiDataset}","page":"Datasets","title":"MultiData.nlabelingvariables","text":"nlabelingvariables(lmd)\n\nReturn the number of labeling variables of a labeled multimodal dataset.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MultiData.setaslabeling!-Tuple{AbstractLabeledMultiDataset, Integer}","page":"Datasets","title":"MultiData.setaslabeling!","text":"setaslabeling!(lmd, i)\nsetaslabeling!(lmd, var_name)\n\nSet i-th variable as label.\n\nThe variable name can be passed as second argument instead of its index.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MultiData.unsetaslabeling!-Tuple{AbstractLabeledMultiDataset, Integer}","page":"Datasets","title":"MultiData.unsetaslabeling!","text":"unsetaslabeling!(lmd, i)\nunsetaslabeling!(lmd, var_name)\n\nRemove i-th labeling variable from labels list.\n\nThe variable name can be passed as second argument instead of its index.\n\n\n\n\n\n","category":"method"}] +[{"location":"description/","page":"Description","title":"Description","text":"CurrentModule = MultiData","category":"page"},{"location":"description/#man-description","page":"Description","title":"Description","text":"","category":"section"},{"location":"description/","page":"Description","title":"Description","text":"Just like DataFrames, MultiDatasets can be described using the method describe:","category":"page"},{"location":"description/","page":"Description","title":"Description","text":"julia> ts_cos = [cos(i) for i in 1:50000];\n\njulia> ts_sin = [sin(i) for i in 1:50000];\n\njulia> df_data = DataFrame(\n :id => [1, 2],\n :age => [30, 9],\n :name => [\"Python\", \"Julia\"],\n :stat => [deepcopy(ts_sin), deepcopy(ts_cos)]\n );\n\njulia> md = MultiDataset([[2,3], [4]], df_data);\n\njulia> description = describe(md)\n2-element Vector{DataFrame}:\n 2×7 DataFrame\n Row │ variable mean min median max nmissing eltype \n │ Symbol Union… Any Union… Any Int64 DataType\n─────┼─────────────────────────────────────────────────────────────\n 1 │ age 19.5 9 19.5 30 0 Int64\n 2 │ name Julia Python 0 String\n 1×7 DataFrame\n Row │ Variables mean min ⋯\n │ Symbol Array… Array… ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ stat AbstractFloat[8.63372e-6; -2.848… AbstractFloat[-1.0; -1.0 ⋯\n 5 columns omitted\n","category":"page"},{"location":"description/","page":"Description","title":"Description","text":"the describe implementation for MultiDatasets will try to find the best statistical measures that can be used to the type of data the modality contains.","category":"page"},{"location":"description/","page":"Description","title":"Description","text":"In the example the 2nd modality, which contains variables (just one in the example) of data of type Vector{Float64}, was described by applying the well known 22 features from the package Catch22.jl plus maximum, minimum and mean as the vectors were time series.","category":"page"},{"location":"description/","page":"Description","title":"Description","text":"describe","category":"page"},{"location":"description/#DataAPI.describe","page":"Description","title":"DataAPI.describe","text":"describe(md; t = fill([(1, 0, 0)], nmodalities(md)), kwargs...)\n\nReturn descriptive statistics for an AbstractMultiDataset as a Vector of new DataFrames where each row represents a variable and each column a summary statistic.\n\nArguments\n\nmd: the AbstractMultiDataset;\nt: is a vector of nmodalities elements, where each element is a vector as long as the dimensionality of\n\nthe i-th modality. Each element of the innermost vector is a tuple\nof arguments for [`paa`](@ref).\n\nFor other see the documentation of DataFrames.describe function.\n\nExamples\n\nTODO: examples\n\n\n\n\n\n","category":"function"},{"location":"manipulation/","page":"Manipulation","title":"Manipulation","text":"CurrentModule = MultiData","category":"page"},{"location":"manipulation/#man-manipulation","page":"Manipulation","title":"Manipulation","text":"","category":"section"},{"location":"manipulation/","page":"Manipulation","title":"Manipulation","text":"Pages = [\"manipulation.md\"]","category":"page"},{"location":"manipulation/#man-modalities","page":"Manipulation","title":"Modalities","text":"","category":"section"},{"location":"manipulation/","page":"Manipulation","title":"Manipulation","text":"Modules = [MultiData]\nPages = [\"modalities.jl\"]","category":"page"},{"location":"manipulation/#MultiData.addmodality!-Tuple{AbstractMultiDataset, AbstractVector{<:Integer}}","page":"Manipulation","title":"MultiData.addmodality!","text":"addmodality!(md, indices)\naddmodality!(md, index)\naddmodality!(md, variable_names)\naddmodality!(md, variable_name)\n\nCreate a new modality in a multimodal dataset using variables at indices or index, and return the dataset itself.\n\nAlternatively to the indices and the index, the variable name(s) can be used.\n\nNote: to add a new modality with new variables see insertmodality!.\n\nArguments\n\nmd is a MultiDataset;\nindices is an AbstractVector{Integer} that indicates which indices of the multimodal dataset's corresponding dataframe to add to the new modality;\nindex is an Integer that indicates the index of the multimodal dataset's corresponding dataframe to add to the new modality;\nvariable_names is an AbstractVector{Symbol} that indicates which variables of the multimodal dataset's corresponding dataframe to add to the new modality;\nvariable_name is a Symbol that indicates the variable of the multimodal dataset's corresponding dataframe to add to the new modality;\n\nExamples\n\njulia> df = DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60])\n2×5 DataFrame\n Row │ name age sex height weight\n │ String Int64 Char Int64 Int64\n─────┼─────────────────────────────────────\n 1 │ Python 25 M 180 80\n 2 │ Julia 26 F 175 60\n\njulia> md = MultiDataset([[1]], df)\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Spare variables\n └─ dimensionality: 0\n2×4 SubDataFrame\n Row │ age sex height weight\n │ Int64 Char Int64 Int64\n─────┼─────────────────────────────\n 1 │ 25 M 180 80\n 2 │ 26 F 175 60\n\n\njulia> addmodality!(md, [:age, :sex])\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age sex\n │ Int64 Char\n─────┼─────────────\n 1 │ 25 M\n 2 │ 26 F\n- Spare variables\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ height weight\n │ Int64 Int64\n─────┼────────────────\n 1 │ 180 80\n 2 │ 175 60\n\n\njulia> addmodality!(md, 5)\n● MultiDataset\n └─ dimensionalities: (0, 0, 0)\n- Modality 1 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 3\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age sex\n │ Int64 Char\n─────┼─────────────\n 1 │ 25 M\n 2 │ 26 F\n- Modality 3 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ height\n │ Int64\n─────┼────────\n 1 │ 180\n 2 │ 175\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.addvariable_tomodality!-Tuple{AbstractMultiDataset, Integer, Integer}","page":"Manipulation","title":"MultiData.addvariable_tomodality!","text":"addvariable_tomodality!(md, i_modality, var_index)\naddvariable_tomodality!(md, i_modality, var_indices)\naddvariable_tomodality!(md, i_modality, var_name)\naddvariable_tomodality!(md, i_modality, var_names)\n\nAdd variable at index var_index to the modality at index i_modality in a multimodal dataset, and return the dataset. Alternatively to var_index the variable name can be used. Multiple variables can be inserted into the multimodal dataset at once using var_indices or var_inames.\n\nNote: The function does not allow you to add a variable to a new modality, but only to add it to an existing modality. To add a new modality use addmodality! instead.\n\nArguments\n\nmd is a MultiDataset;\ni_modality is an Integer indicating the modality in which the variable(s) will be added;\nvar_index is an Integer that indicates the index of the variable to add to a specific modality of the multimodal dataset;\nvar_indices is an AbstractVector{Integer} indicating the indices of the variables to add to a specific modality of the multimodal dataset;\nvar_name is a Symbol indicating the name of the variable to add to a specific modality of the multimodal dataset;\nvar_names is an AbstractVector{Symbol} indicating the name of the variables to add to a specific modality of the multimodal dataset;\n\nExamples\n\njulia> df = DataFrame(:name => [\"Python\", \"Julia\"],\n :age => [25, 26],\n :sex => ['M', 'F'],\n :height => [180, 175],\n :weight => [80, 60])\n )\n2×5 DataFrame\n Row │ name age sex height weight\n │ String Int64 Char Int64 Int64\n─────┼─────────────────────────────────────\n 1 │ Python 25 M 180 80\n 2 │ Julia 26 F 175 60\n\njulia> md = MultiDataset([[1, 2],[3]], df)\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Spare variables\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ height weight\n │ Int64 Int64\n─────┼────────────────\n 1 │ 180 80\n 2 │ 175 60\n\njulia> addvariable_tomodality!(md, 1, [4,5])\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×4 SubDataFrame\n Row │ name age height weight\n │ String Int64 Int64 Int64\n─────┼───────────────────────────────\n 1 │ Python 25 180 80\n 2 │ Julia 26 175 60\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> addvariable_tomodality!(md, 2, [:name,:weight])\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×4 SubDataFrame\n Row │ name age height weight\n │ String Int64 Int64 Int64\n─────┼───────────────────────────────\n 1 │ Python 25 180 80\n 2 │ Julia 26 175 60\n- Modality 2 / 2\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ sex name weight\n │ Char String Int64\n─────┼──────────────────────\n 1 │ M Python 80\n 2 │ F Julia 60\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.dropmodalities!-Tuple{AbstractMultiDataset, Integer}","page":"Manipulation","title":"MultiData.dropmodalities!","text":"dropmodalities!(md, indices)\ndropmodalities!(md, index)\n\nRemove the i-th modality from a multimodal dataset while dropping all variables in it, and return the dataset itself.\n\nNote: if the dropped variables are contained in other modalities they will also be removed from them. This can lead to the removal of additional modalities other than the i-th.\n\nIf the intention is to remove a modality without dropping the variables use removemodality! instead.\n\nArguments\n\nmd is a MultiDataset;\nindex is an Integer indicating the index of the modality to drop;\nindices is an AbstractVector{Integer} indicating the indices of the modalities to drop.\n\nExamples\n\njulia> df = DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60])\n2×5 DataFrame\n Row │ name age sex height weight\n │ String Int64 Char Int64 Int64\n─────┼─────────────────────────────────────\n 1 │ Python 25 M 180 80\n 2 │ Julia 26 F 175 60\n\njulia> md = MultiDataset([[1, 2],[3,4],[5],[2,3]], df)\n● MultiDataset\n └─ dimensionalities: (0, 0, 0, 0)\n- Modality 1 / 4\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 4\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ sex height\n │ Char Int64\n─────┼──────────────\n 1 │ M 180\n 2 │ F 175\n- Modality 3 / 4\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n- Modality 4 / 4\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age sex\n │ Int64 Char\n─────┼─────────────\n 1 │ 25 M\n 2 │ 26 F\n\njulia> dropmodalities!(md, [2,3])\n[ Info: Variable 3 was last variable of modality 2: removing modality\n[ Info: Variable 3 was last variable of modality 2: removing modality\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n\njulia> dropmodalities!(md, 2)\n[ Info: Variable 2 was last variable of modality 2: removing modality\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.eachmodality-Tuple{AbstractMultiDataset}","page":"Manipulation","title":"MultiData.eachmodality","text":"eachmodality(md)\n\nReturn a (lazy) iterator of the modalities of a multimodal dataset.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.insertmodality!","page":"Manipulation","title":"MultiData.insertmodality!","text":"insertmodality!(md, col, new_modality, existing_variables)\ninsertmodality!(md, new_modality, existing_variables)\n\nInsert new_modality as new modality to multimodal dataset, and return the dataset. Existing variables can be added to the new modality while adding it to the dataset by passing the corresponding indices as existing_variables. If col is specified then the variables will be inserted starting at index col.\n\nArguments\n\nmd is a MultiDataset;\ncol is an Integer indicating the column in which to insert the columns of new_modality;\nnew_modality is an AbstractDataFrame which will be added to the multimodal dataset as a sub-dataframe of a new modality;\nexisting_variables is an AbstractVector{Integer} or AbstractVector{Symbol}. It indicates which variables of the multimodal dataset internal dataframe structure to insert in the new modality.\n\nExamples\n\njulia> df = DataFrame(\n :name => [\"Python\", \"Julia\"],\n :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]\n )\n2×2 DataFrame\n Row │ name stat1\n │ String Array…\n─────┼───────────────────────────────────────────\n 1 │ Python [0.841471, 0.909297, 0.14112, -0…\n 2 │ Julia [0.540302, -0.416147, -0.989992,…\n\njulia> md = MultiDataset(df; group = :all)\n● MultiDataset\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n\njulia> insertmodality!(md, DataFrame(:age => [30, 9]))\n● MultiDataset\n └─ dimensionalities: (0, 1, 0)\n- Modality 1 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 3\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n- Modality 3 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 30\n 2 │ 9\n\njulia> md.data\n2×3 DataFrame\n Row │ name stat1 age\n │ String Array… Int64\n─────┼──────────────────────────────────────────────────\n 1 │ Python [0.841471, 0.909297, 0.14112, -0… 30\n 2 │ Julia [0.540302, -0.416147, -0.989992,… 9\n\nor, selecting the column\n\njulia> df = DataFrame(\n :name => [\"Python\", \"Julia\"],\n :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]\n )\n2×2 DataFrame\n Row │ name stat1\n │ String Array…\n─────┼───────────────────────────────────────────\n 1 │ Python [0.841471, 0.909297, 0.14112, -0…\n 2 │ Julia [0.540302, -0.416147, -0.989992,…\n\njulia> md = MultiDataset(df; group = :all)\n● MultiDataset\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n\njulia> insertmodality!(md, 2, DataFrame(:age => [30, 9]))\n● MultiDataset\n └─ dimensionalities: (1, 0)\n- Modality 1 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 30\n 2 │ 9\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n\njulia> md.data\n2×3 DataFrame\n Row │ name age stat1\n │ String Int64 Array…\n─────┼──────────────────────────────────────────────────\n 1 │ Python 30 [0.841471, 0.909297, 0.14112, -0…\n 2 │ Julia 9 [0.540302, -0.416147, -0.989992,…\n\nor, adding an existing variable:\n\njulia> df = DataFrame(\n :name => [\"Python\", \"Julia\"],\n :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]\n )\n2×2 DataFrame\n Row │ name stat1\n │ String Array…\n─────┼───────────────────────────────────────────\n 1 │ Python [0.841471, 0.909297, 0.14112, -0…\n 2 │ Julia [0.540302, -0.416147, -0.989992,…\n\njulia> md = MultiDataset([[2]], df)\n● MultiDataset\n └─ dimensionalities: (1,)\n- Modality 1 / 1\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n\n\njulia> insertmodality!(md, DataFrame(:age => [30, 9]); existing_variables = [1])\n● MultiDataset\n └─ dimensionalities: (1, 0)\n- Modality 1 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n- Modality 2 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age name\n │ Int64 String\n─────┼───────────────\n 1 │ 30 Python\n 2 │ 9 Julia\n\n\n\n\n\n","category":"function"},{"location":"manipulation/#MultiData.keeponlymodalities!-Tuple{AbstractMultiDataset, AbstractVector{<:Integer}}","page":"Manipulation","title":"MultiData.keeponlymodalities!","text":"TODO\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.modality-Tuple{AbstractMultiDataset, Integer}","page":"Manipulation","title":"MultiData.modality","text":"modality(md, i)\n\nReturn the i-th modality of a multimodal dataset.\n\nmodality(md, indices)\n\nReturn a Vector of modalities at indices of a multimodal dataset.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.nmodalities-Tuple{AbstractMultiDataset}","page":"Manipulation","title":"MultiData.nmodalities","text":"nmodalities(md)\n\nReturn the number of modalities of a multimodal dataset.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.removemodality!-Tuple{AbstractMultiDataset, Integer}","page":"Manipulation","title":"MultiData.removemodality!","text":"removemodality!(md, indices)\nremovemodality!(md, index)\n\nRemove i-th modality from a multimodal dataset, and return the dataset.\n\nNote: to completely remove a modality and all variables in it use dropmodalities! instead.\n\nArguments\n\nmd is a MultiDataset;\nindex is an Integer that indicates which modality to remove from the multimodal dataset;\nindices is an AbstractVector{Integer} that indicates the modalities to remove from the multimodal dataset;\n\nExamples\n\njulia> df = DataFrame(:name => [\"Python\", \"Julia\"],\n :age => [25, 26],\n :sex => ['M', 'F'],\n :height => [180, 175],\n :weight => [80, 60])\n )\n2×5 DataFrame\n Row │ name age sex height weight\n │ String Int64 Char Int64 Int64\n─────┼─────────────────────────────────────\n 1 │ Python 25 M 180 80\n 2 │ Julia 26 F 175 60\n\njulia> md = MultiDataset([[1, 2],[3],[4],[5]], df)\n● MultiDataset\n └─ dimensionalities: (0, 0, 0, 0)\n- Modality 1 / 4\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 4\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Modality 3 / 4\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ height\n │ Int64\n─────┼────────\n 1 │ 180\n 2 │ 175\n- Modality 4 / 4\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n\njulia> removemodality!(md, [3])\n● MultiDataset\n └─ dimensionalities: (0, 0, 0)\n- Modality 1 / 3\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Modality 3 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ height\n │ Int64\n─────┼────────\n 1 │ 180\n 2 │ 175\n\njulia> removemodality!(md, [1,2])\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n- Spare variables\n └─ dimensionality: 0\n2×4 SubDataFrame\n Row │ name age sex height\n │ String Int64 Char Int64\n─────┼─────────────────────────────\n 1 │ Python 25 M 180\n 2 │ Julia 26 F 175\n\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.removevariable_frommodality!-Tuple{AbstractMultiDataset, Integer, Integer}","page":"Manipulation","title":"MultiData.removevariable_frommodality!","text":"removevariable_frommodality!(md, i_modality, var_indices)\nremovevariable_frommodality!(md, i_modality, var_index)\nremovevariable_frommodality!(md, i_modality, var_name)\nremovevariable_frommodality!(md, i_modality, var_names)\n\nRemove variable at index var_index from the modality at index i_modality in a multimodal dataset, and return the dataset itself.\n\nAlternatively to var_index the variable name can be used. Multiple variables can be dropped from the multimodal dataset at once, by passing a Vector of Symbols (for names), or a Vector of integers (for indices) as a last argument.\n\nNote: when all variables are dropped from a modality, it will be removed.\n\nArguments\n\nmd is a MultiDataset;\ni_modality is an Integer indicating the modality in which the variable(s) will be dropped;\nvar_index is an Integer that indicates the index of the variable to drop from a specific modality of the multimodal dataset;\nvar_indices is an AbstractVector{Integer} indicating the indices of the variables to drop from a specific modality of the multimodal dataset;\nvar_name is a Symbol indicating the name of the variable to drop from a specific modality of the multimodal dataset;\nvar_names is an AbstractVector{Symbol} indicating the name of the variables to drop from a specific modality of the multimodal dataset;\n\nExamples\n\njulia> df = DataFrame(:name => [\"Python\", \"Julia\"],\n :age => [25, 26],\n :sex => ['M', 'F'],\n :height => [180, 175],\n :weight => [80, 60])\n )\n2×5 DataFrame\n Row │ name age sex height weight\n │ String Int64 Char Int64 Int64\n─────┼─────────────────────────────────────\n 1 │ Python 25 M 180 80\n 2 │ Julia 26 F 175 60\n\njulia> md = MultiDataset([[1,2,4],[2,3,4],[5]], df)\n● MultiDataset\n └─ dimensionalities: (0, 0, 0)\n- Modality 1 / 3\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ name age height\n │ String Int64 Int64\n─────┼───────────────────────\n 1 │ Python 25 180\n 2 │ Julia 26 175\n- Modality 2 / 3\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ age sex height\n │ Int64 Char Int64\n─────┼─────────────────────\n 1 │ 25 M 180\n 2 │ 26 F 175\n- Modality 3 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n\njulia> removevariable_frommodality!(md, 3, 5)\n[ Info: Variable 5 was last variable of modality 3: removing modality\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ name age height\n │ String Int64 Int64\n─────┼───────────────────────\n 1 │ Python 25 180\n 2 │ Julia 26 175\n- Modality 2 / 2\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ age sex height\n │ Int64 Char Int64\n─────┼─────────────────────\n 1 │ 25 M 180\n 2 │ 26 F 175\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n\njulia> removevariable_frommodality!(md, 1, :age)\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name height\n │ String Int64\n─────┼────────────────\n 1 │ Python 180\n 2 │ Julia 175\n- Modality 2 / 2\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ age sex height\n │ Int64 Char Int64\n─────┼─────────────────────\n 1 │ 25 M 180\n 2 │ 26 F 175\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n\njulia> removevariable_frommodality!(md, 2, [3,4])\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name height\n │ String Int64\n─────┼────────────────\n 1 │ Python 180\n 2 │ Julia 175\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n- Spare variables\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ sex weight\n │ Char Int64\n─────┼──────────────\n 1 │ M 80\n 2 │ F 60\n\njulia> removevariable_frommodality!(md, 1, [:name,:height])\n[ Info: Variable 4 was last variable of modality 1: removing modality\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n- Spare variables\n └─ dimensionality: 0\n2×4 SubDataFrame\n Row │ name sex height weight\n │ String Char Int64 Int64\n─────┼──────────────────────────────\n 1 │ Python M 180 80\n 2 │ Julia F 175 60\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#man-variables","page":"Manipulation","title":"Variables","text":"","category":"section"},{"location":"manipulation/","page":"Manipulation","title":"Manipulation","text":"Modules = [MultiData]\nPages = [\"variables.jl\"]","category":"page"},{"location":"manipulation/#MultiData.dropsparevariables!-Tuple{AbstractMultiDataset}","page":"Manipulation","title":"MultiData.dropsparevariables!","text":"dropsparevariables!(md)\n\nDrop all variables that are not contained in any of the modalities in a multimodal dataset.\n\nArguments\n\nmd is a MultiDataset, that is the structure at which sparevariables will be dropped.\n\nExamples\n\njulia> md = MultiDataset([[1]], DataFrame(:age => [30, 9], :name => [\"Python\", \"Julia\"]))\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 30\n 2 │ 9\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n\n\njulia> dropsparevariables!(md)\n2×1 DataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.dropvariables!-Tuple{AbstractMultiDataset, Integer}","page":"Manipulation","title":"MultiData.dropvariables!","text":"dropvariables!(md, i)\ndropvariables!(md, variable_name)\ndropvariables!(md, indices)\ndropvariables!(md, variable_names)\ndropvariables!(md, i_modality, indices)\ndropvariables!(md, i_modality, variable_names)\n\nDrop the i-th variable from a multimodal dataset, and return the dataset itself.\n\nArguments\n\nmd is an MultiDataset;\ni is an Integer that indicates the index of the variable to drop;\nvariable_name is a Symbol that idicates the variable to drop;\nindices is an AbstractVector{Integer} that indicates the indices of the variables to drop;\nvariable_names is an AbstractVector{Symbol} that indicates the variables to drop.\ni_modality: index of the modality; if this argument is specified, indices are considered as relative to the i_modality-th modality\n\nExamples\n\njulia> md = MultiDataset([[1, 2],[3, 4, 5]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ sex height weight\n │ Char Int64 Int64\n─────┼──────────────────────\n 1 │ M 180 80\n 2 │ F 175 60\n\njulia> dropvariables!(md, 4)\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ sex weight\n │ Char Int64\n─────┼──────────────\n 1 │ M 80\n 2 │ F 60\n\njulia> dropvariables!(md, :name)\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ sex weight\n │ Char Int64\n─────┼──────────────\n 1 │ M 80\n 2 │ F 60\n\njulia> dropvariables!(md, [1,3])\n[ Info: Variable 1 was last variable of modality 1: removing modality\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\nTODO: To be reviewed\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.hasvariables-Tuple{AbstractDataFrame, Symbol}","page":"Manipulation","title":"MultiData.hasvariables","text":"hasvariables(df, variable_name)\nhasvariables(md, i_modality, variable_name)\nhasvariables(md, variable_name)\nhasvariables(df, variable_names)\nhasvariables(md, i_modality, variable_names)\nhasvariables(md, variable_names)\n\nCheck whether a multimodal dataset contains a variable named variable_name.\n\nInstead of a single variable name a Vector of names can be passed. If this is the case, this function will return true only if md contains all the specified variables.\n\nArguments\n\ndf is an AbstractDataFrame, which is one of the two structure in which you want to check the presence of the variable;\nmd is an AbstractMultiDataset, which is one of the two structure in which you want to check the presence of the variable;\nvariable_name is a Symbol indicating the variable, whose existence I want to verify;\ni_modality is an Integer indicating in which modality to look for the variable.\n\nExamples\n\njulia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> hasvariables(md, :age)\ntrue\n\njulia> hasvariables(md.data, :name)\ntrue\n\njulia> hasvariables(md, :height)\nfalse\n\njulia> hasvariables(md, 1, :sex)\nfalse\n\njulia> hasvariables(md, 2, :sex)\ntrue\n\njulia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> hasvariables(md, [:sex, :age])\ntrue\n\njulia> hasvariables(md, 1, [:sex])\nfalse\n\njulia> hasvariables(md, 2, [:sex])\ntrue\n\njulia> hasvariables(md.data, [:name, :sex])\ntrue\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.insertvariables!-Tuple{AbstractMultiDataset, Integer, Symbol, AbstractVector}","page":"Manipulation","title":"MultiData.insertvariables!","text":"insertvariables!(md, col, index, values)\ninsertvariables!(md, index, values)\ninsertvariables!(md, col, index, value)\ninsertvariables!(md, index, value)\n\nInsert a variable in a multimodal dataset with a given index.\n\nnote: Note\nEach inserted variable will be added in as a spare variables.\n\nArguments\n\nmd is an AbstractMultiDataset;\ncol is an Integer indicating in which position to insert the new variable. If no col is passed, the new variable will be placed last in the md's underlying dataframe structure;\nindex is a Symbol and denote the name of the variable to insert. Duplicated variable names will be renamed to avoid conflicts: see makeunique argument for insertcols! in DataFrames documentation;\nvalues is an AbstractVector that indicates the values for the newly inserted variable. The length of values should match ninstances(md);\nvalue is a single value for the new variable. If a single value is passed as a last argument this will be copied and used for each instance in the dataset.\n\nExamples\n\njulia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> insertvariables!(md, :weight, [80, 75])\n2×4 DataFrame\n Row │ name age sex weight\n │ String Int64 Char Int64\n─────┼─────────────────────────────\n 1 │ Python 25 M 80\n 2 │ Julia 26 F 75\n\njulia> md\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 75\n\njulia> insertvariables!(md, 2, :height, 180)\n2×5 DataFrame\n Row │ name height age sex weight\n │ String Int64 Int64 Char Int64\n─────┼─────────────────────────────────────\n 1 │ Python 180 25 M 80\n 2 │ Julia 180 26 F 75\n\njulia> insertvariables!(md, :hair, [\"brown\", \"blonde\"])\n2×6 DataFrame\n Row │ name height age sex weight hair\n │ String Int64 Int64 Char Int64 String\n─────┼─────────────────────────────────────────────\n 1 │ Python 180 25 M 80 brown\n 2 │ Julia 180 26 F 75 blonde\n\njulia> md\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Spare variables\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ height weight hair\n │ Int64 Int64 String\n─────┼────────────────────────\n 1 │ 180 80 brown\n 2 │ 180 75 blonde\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.keeponlyvariables!-Tuple{AbstractMultiDataset, AbstractVector{<:Integer}}","page":"Manipulation","title":"MultiData.keeponlyvariables!","text":"keeponlyvariables!(md, indices)\nkeeponlyvariables!(md, variable_names)\n\nDrop all variables that do not correspond to the indices in indices from a multimodal dataset.\n\nNote: if the dropped variables are contained in some modality they will also be removed from them; as a side effect, this can lead to the removal of modalities.\n\nArguments\n\nmd is a MultiDataset;\nindices is an AbstractVector{Integer} that indicates which indices to keep in the multimodal dataset;\nvariable_names is an AbstractVector{Symbol} that indicates which variables to keep in the multimodal dataset.\n\nExamples\n\njulia> md = MultiDataset([[1, 2],[3, 4, 5],[5]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))\n● MultiDataset\n └─ dimensionalities: (0, 0, 0)\n- Modality 1 / 3\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 3\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ sex height weight\n │ Char Int64 Int64\n─────┼──────────────────────\n 1 │ M 180 80\n 2 │ F 175 60\n- Modality 3 / 3\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ weight\n │ Int64\n─────┼────────\n 1 │ 80\n 2 │ 60\n\njulia> keeponlyvariables!(md, [1,3,4])\n[ Info: Variable 5 was last variable of modality 3: removing modality\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ sex height\n │ Char Int64\n─────┼──────────────\n 1 │ M 180\n 2 │ F 175\n\njulia> keeponlyvariables!(md, [:name, :sex])\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\nTODO: review\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.nvariables-Tuple{AbstractDataFrame}","page":"Manipulation","title":"MultiData.nvariables","text":"nvariables(md)\nnvariables(md, i)\n\nReturn the number of variables in a multimodal dataset.\n\nIf an index i is passed as second argument, then the number of variables of the i-th modality is returned.\n\nAlternatively, nvariables can be called on a single modality.\n\nArguments\n\nmd is a MultiDataset;\ni (optional) is an Integer indicating the modality of the multimodal dataset whose number of variables you want to know.\n\nExamples\n\njulia> md = MultiDataset([[1],[2]], DataFrame(:age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\n\njulia> nvariables(md)\n2\n\njulia> nvariables(md, 2)\n1\n\njulia> mod2 = modality(md, 2)\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> nvariables(mod2)\n1\n\njulia> md = MultiDataset([[1, 2],[3, 4, 5]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×3 SubDataFrame\n Row │ sex height weight\n │ Char Int64 Int64\n─────┼──────────────────────\n 1 │ M 180 80\n 2 │ F 175 60\n\njulia> nvariables(md)\n5\n\njulia> nvariables(md, 2)\n3\n\njulia> mod2 = modality(md,2)\n2×3 SubDataFrame\n Row │ sex height weight\n │ Char Int64 Int64\n─────┼──────────────────────\n 1 │ M 180 80\n 2 │ F 175 60\n\njulia> nvariables(mod2)\n3\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.sparevariables-Tuple{AbstractMultiDataset}","page":"Manipulation","title":"MultiData.sparevariables","text":"sparevariables(md)\n\nReturn the indices of all the variables that are not contained in any of the modalities of a multimodal dataset.\n\nArguments\n\nmd is a MultiDataset, which is the structure whose indices of the sparevariables are to be known.\n\nExamples\n\njulia> md = MultiDataset([[1],[3]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n\njulia> md.data\n2×3 DataFrame\n Row │ name age sex\n │ String Int64 Char\n─────┼─────────────────────\n 1 │ Python 25 M\n 2 │ Julia 26 F\n\njulia> sparevariables(md)\n1-element Vector{Int64}:\n 2\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.variableindex-Tuple{AbstractDataFrame, Symbol}","page":"Manipulation","title":"MultiData.variableindex","text":"variableindex(df, variable_name)\nvariableindex(md, i_modality, variable_name)\nvariableindex(md, variable_name)\n\nReturn the index of the variable. When i_modality is passed, the function returns the index of the variable in the sub-dataframe of the modality identified by i_modality. It returns 0 when the variable is not contained in the modality identified by i_modality.\n\nArguments\n\ndf is an AbstractDataFrame;\nmd is an AbstractMultiDataset;\nvariable_name is a Symbol indicating the variable whose index you want to know;\ni_modality is an Integer indicating of which modality you want to know the index of the variable.\n\nExamples\n\njulia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ name age\n │ String Int64\n─────┼───────────────\n 1 │ Python 25\n 2 │ Julia 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> md.data\n2×3 DataFrame\n Row │ name age sex\n │ String Int64 Char\n─────┼─────────────────────\n 1 │ Python 25 M\n 2 │ Julia 26 F\n\njulia> variableindex(md, :age)\n2\n\njulia> variableindex(md, :sex)\n3\n\njulia> variableindex(md, 1, :name)\n1\n\njulia> variableindex(md, 2, :name)\n0\n\njulia> variableindex(md, 2, :sex)\n1\n\njulia> variableindex(md.data, :age)\n2\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.variables-Tuple{AbstractDataFrame}","page":"Manipulation","title":"MultiData.variables","text":"variables(md, i)\n\nReturn the names as Symbols of the variables in a multimodal dataset.\n\nWhen called on a object of type MultiDataset a Dict is returned which will map the modality index to an AbstractVector{Symbol}.\n\nNote: the order of the variable names is granted to match the order of the variables in the modality.\n\nIf an index i is passed as second argument, then the names of the variables of the i-th modality are returned as an AbstractVector.\n\nAlternatively, nvariables can be called on a single modality.\n\nArguments\n\nmd is an MultiDataset;\ni is an Integer indicating from which modality of the multimodal dataset to get the names of the variables.\n\nExamples\n\njulia> md = MultiDataset([[2],[3]], DataFrame(:name => [\"Python\", \"Julia\"], :age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n\njulia> variables(md)\nDict{Integer, AbstractVector{Symbol}} with 2 entries:\n 2 => [:sex]\n 1 => [:age]\n\njulia> variables(md, 2)\n1-element Vector{Symbol}:\n :sex\n\njulia> variables(md, 1)\n1-element Vector{Symbol}:\n :age\n\njulia> mod2 = modality(md, 2)\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> variables(mod2)\n1-element Vector{Symbol}:\n :sex\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#man-instances","page":"Manipulation","title":"Instances","text":"","category":"section"},{"location":"manipulation/","page":"Manipulation","title":"Manipulation","text":"Modules = [MultiData]\nPages = [\"instances.jl\"]","category":"page"},{"location":"manipulation/#MultiData.deleteinstances!-Tuple{AbstractMultiDataset, AbstractVector{<:Integer}}","page":"Manipulation","title":"MultiData.deleteinstances!","text":"deleteinstances!(md, i)\n\nRemove the i-th instance in a multimodal dataset, and return the dataset itself.\n\ndeleteinstances!(md, i_instances)\n\nRemove the instances at i_instances in a multimodal dataset, and return the dataset itself.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.instance-Tuple{AbstractDataFrame, Integer}","page":"Manipulation","title":"MultiData.instance","text":"instance(md, i)\n\nReturn the i-th instance in a multimodal dataset.\n\ninstance(md, i_modality, i_instance)\n\nReturn the i_instance-th instance in a multimodal dataset with only variables from the the i_modality-th modality.\n\ninstance(md, i_instances)\n\nReturn instances at i_instances in a multimodal dataset.\n\ninstance(md, i_modality, i_instances)\n\nReturn iinstances at `iinstancesin a multimodal dataset with only variables from the thei_modality`-th modality.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.keeponlyinstances!-Tuple{AbstractMultiDataset, AbstractVector{<:Integer}}","page":"Manipulation","title":"MultiData.keeponlyinstances!","text":"keeponlyinstances!(md, i_instances)\n\nRemove all instances from a multimodal dataset, which index does not appear in i_instances.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#MultiData.pushinstances!-Tuple{AbstractMultiDataset, DataFrameRow}","page":"Manipulation","title":"MultiData.pushinstances!","text":"pushinstances!(md, instance)\n\nAdd an instance to a multimodal dataset, and return the dataset itself.\n\nThe instance can be a DataFrameRow or an AbstractVector but in both cases the number and type of variables should match those of the dataset.\n\n\n\n\n\n","category":"method"},{"location":"manipulation/#SoleBase.ninstances-Tuple{AbstractDataFrame}","page":"Manipulation","title":"SoleBase.ninstances","text":"ninstances(md)\n\nReturn the number of instances in a multimodal dataset.\n\nExamples\n\njulia> md = MultiDataset([[1],[2]],DataFrame(:age => [25, 26], :sex => ['M', 'F']))\n● MultiDataset\n └─ dimensionalities: (0, 0)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 25\n 2 │ 26\n- Modality 2 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> mod2 = modality(md, 2)\n2×1 SubDataFrame\n Row │ sex\n │ Char\n─────┼──────\n 1 │ M\n 2 │ F\n\njulia> ninstances(md) == ninstances(mod2) == 2\ntrue\n\n\n\n\n\n","category":"method"},{"location":"filesystem/","page":"Filesystem","title":"Filesystem","text":"CurrentModule = MultiData","category":"page"},{"location":"filesystem/#man-filesystem","page":"Filesystem","title":"Filesystem","text":"","category":"section"},{"location":"filesystem/","page":"Filesystem","title":"Filesystem","text":"Pages = [\"filesystem.md\"]","category":"page"},{"location":"filesystem/","page":"Filesystem","title":"Filesystem","text":"Modules = [MultiData]\nPages = [\"filesystem.jl\"]","category":"page"},{"location":"filesystem/#MultiData.datasetinfo-Tuple{AbstractString}","page":"Filesystem","title":"MultiData.datasetinfo","text":"datasetinfo(datasetpath; onlywithlabels = [], shufflelabels = [], rng = Random.GLOBAL_RNG)\n\nShow dataset size on disk and return a Touple with first element a vector of selected IDs, second element the labels DataFrame or nothing and third element the total size in bytes.\n\nArguments\n\nonlywithlabels is used to select which portion of the Dataset to load, by specifying labels and their values to use as filters. See loaddataset for more info.\nshufflelabels is an AbstractVector of names of labels to shuffle (default = [], means no shuffle).\nrng is a random number generator to be used when shuffling (for reproducibility); can be either a Integer (used as seed for MersenneTwister) or an AbstractRNG.\n\n\n\n\n\n","category":"method"},{"location":"filesystem/#MultiData.loaddataset-Tuple{AbstractString}","page":"Filesystem","title":"MultiData.loaddataset","text":"loaddataset(datasetpath; onlywithlabels = [], shufflelabels = [], rng = Random.GLOBAL_RNG)\n\nCreate a MultiDataset or a LabeledMultiDataset from a Dataset, based on the presence of file Labels.csv.\n\nArguments\n\ndatasetpath is an AbstractString that denote the Dataset's position;\nonlywithlabels is an AbstractVector{AbstractVector{Pair{AbstractString,AbstractVector{Any}}}} and it's used to select which portion of the Dataset to load, by specifying labels and their values. Beginning from the center, each Pair{AbstractString,AbstractVector{Any}} must contain, as AbstractString the label's name, and, as AbstractVector{Any} the values for that label. Each Pair in one vector must refer to a different label, so if the Dataset has in total n labels, this vector of Pair can contain maximun n element. That's because the elements will combine with each other. Every vector of Pair act as a filter. Note that the same label can be used in different vector of Pair as they do not combine with each other. If onlywithlabels is an empty vector (default) the function will load the entire Dataset.\nshufflelabels is an AbstractVector of names of labels to shuffle (default = [], means no shuffle).\nrng is a random number generator to be used when shuffling (for reproducibility); can be either a Integer (used as seed for MersenneTwister) or an AbstractRNG.\n\nExamples\n\njulia> df_data = DataFrame(\n :id => [1, 2, 3, 4, 5],\n :age => [30, 9, 30, 40, 9],\n :name => [\"Python\", \"Julia\", \"C\", \"Java\", \"R\"],\n :stat => [deepcopy(ts_sin), deepcopy(ts_cos), deepcopy(ts_sin), deepcopy(ts_cos), deepcopy(ts_sin)]\n )\n5×4 DataFrame\n Row │ id age name stat\n │ Int64 Int64 String Array…\n─────┼─────────────────────────────────────────────────────────\n 1 │ 1 30 Python [0.841471, 0.909297, 0.14112, -0…\n 2 │ 2 9 Julia [0.540302, -0.416147, -0.989992,…\n 3 │ 3 30 C [0.841471, 0.909297, 0.14112, -0…\n 4 │ 4 40 Java [0.540302, -0.416147, -0.989992,…\n 5 │ 5 9 R [0.841471, 0.909297, 0.14112, -0…\n\njulia> lmd = LabeledMultiDataset(\n MultiDataset([[4]], deepcopy(df_data)),\n [2,3],\n)\n● LabeledMultiDataset\n ├─ labels\n │ ├─ age: Set([9, 30, 40])\n │ └─ name: Set([\"C\", \"Julia\", \"Python\", \"Java\", \"R\"])\n └─ dimensionalities: (1,)\n- Modality 1 / 1\n └─ dimensionality: 1\n5×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n 3 │ [0.841471, 0.909297, 0.14112, -0…\n 4 │ [0.540302, -0.416147, -0.989992,…\n 5 │ [0.841471, 0.909297, 0.14112, -0…\n- Spare variables\n └─ dimensionality: 0\n5×1 SubDataFrame\n Row │ id\n │ Int64\n─────┼───────\n 1 │ 1\n 2 │ 2\n 3 │ 3\n 4 │ 4\n 5 │ 5\n\njulia> savedataset(\"langs\", lmd, force = true)\n\njulia> loaddataset(\"langs\", onlywithlabels = [ [\"name\" => [\"Julia\"], \"age\" => [\"9\"]] ] )\nInstances count: 1\nTotal size: 981670 bytes\n● LabeledMultiDataset\n ├─ labels\n │ ├─ age: Set([\"9\"])\n │ └─ name: Set([\"Julia\"])\n └─ dimensionalities: (1,)\n- Modality 1 / 1\n └─ dimensionality: 1\n1×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.540302, -0.416147, -0.989992,…\n- Spare variables\n └─ dimensionality: 0\n1×1 SubDataFrame\n Row │ id\n │ Int64\n─────┼───────\n 1 │ 2\n\njulia> loaddataset(\"langs\", onlywithlabels = [ [\"name\" => [\"Julia\"], \"age\" => [\"30\"]] ] )\nInstances count: 0\nTotal size: 0 bytes\nERROR: AssertionError: No instance found\n\njulia> loaddataset(\"langs\", onlywithlabels = [ [\"name\" => [\"Julia\"]] , [\"age\" => [\"9\"]] ] )\nInstances count: 2\nTotal size: 1963537 bytes\n● LabeledMultiDataset\n ├─ labels\n │ ├─ age: Set([\"9\"])\n │ └─ name: Set([\"Julia\", \"R\"])\n └─ dimensionalities: (1,)\n- Modality 1 / 1\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.540302, -0.416147, -0.989992,…\n 2 │ [0.841471, 0.909297, 0.14112, -0…\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ id\n │ Int64\n─────┼───────\n 1 │ 2\n 2 │ 5\n\njulia> loaddataset(\"langs\", onlywithlabels = [ [\"name\" => [\"Julia\"]], [\"name\" => [\"C\"], \"age\" => [\"30\"]] ] )\nInstances count: 2\nTotal size: 1963537 bytes\n● LabeledMultiDataset\n ├─ labels\n │ ├─ age: Set([\"9\", \"30\"])\n │ └─ name: Set([\"C\", \"Julia\"])\n └─ dimensionalities: (1,)\n- Modality 1 / 1\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.540302, -0.416147, -0.989992,…\n 2 │ [0.841471, 0.909297, 0.14112, -0…\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ id\n │ Int64\n─────┼───────\n 1 │ 2\n 2 │ 3\n\n\n\n\n\n","category":"method"},{"location":"filesystem/#MultiData.savedataset-Tuple{AbstractString, AbstractMultiDataset}","page":"Filesystem","title":"MultiData.savedataset","text":"savedataset(datasetpath, md; instance_ids, name, force = false)\n\nSave md AbstractMultiDataset on disk at path datasetpath in the following format:\n\ndatasetpath ├─ Example1 │ └─ Modality1.csv │ └─ Modality2.csv │ └─ ... │ └─ Modalityn.csv │ └─ Metadata.txt ├─ Example2 │ └─ Modality1.csv │ └─ Modality2.csv │ └─ ... │ └─ Modalityn.csv │ └─ Metadata.txt ├─ ... ├─ Example_n ├─ Metadata.txt └─ Labels.csv\n\nArguments\n\ninstance_ids is an AbstractVector{Integer} that denote the identifier of the instances,\nname is an AbstractString and denote the name of the Dataset, that will be saved in the Metadata of the Dataset,\nforce is a Bool, if it's set to true, then in case datasetpath already exists, it will be overwritten otherwise the operation will be aborted. (default = false)\nlabels_indices is an AbstractVector{Integer} and contains the indices of the labels' column (allowed only when passing a MultiDataset)\n\nAlternatively to an AbstractMultiDataset, a DataFrame can be passed as second argument. If this is the case a third positional argument is required representing the grouped_variables of the dataset. See MultiDataset for syntax of grouped_variables.\n\n\n\n\n\n","category":"method"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = MultiData","category":"page"},{"location":"#MultiData","page":"Home","title":"MultiData","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The aim of this package is to provide a simple and comfortable interface for managing multimodal data. It is built on top of DataFrames.jl with Machine learning applications in mind.","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Currently this packages is still not registered so you need to run the following commands in a Julia REPL to install it:","category":"page"},{"location":"","page":"Home","title":"Home","text":"import Pkg\nPkg.add(\"MultiData\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"To install the developement version, run:","category":"page"},{"location":"","page":"Home","title":"Home","text":"import Pkg\nPkg.add(\"https://github.com/aclai-lab/MultiData.jl#dev\")","category":"page"},{"location":"#Usage","page":"Home","title":"Usage","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"To instantiate a multimodal dataset, use the MultiDataset constructor by providing: a) a DataFrame containing all variables from different modalities, and b) a Vector{Vector{Union{Symbol,String,Int64}}} object representing a grouping of some of the variables (identified by column index or name) into different modalities.","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> using MultiData\n\njulia> ts_cos = [cos(i) for i in 1:50000];\n\njulia> ts_sin = [sin(i) for i in 1:50000];\n\njulia> df_data = DataFrame(\n :id => [1, 2],\n :age => [30, 9],\n :name => [\"Python\", \"Julia\"],\n :stat => [deepcopy(ts_sin), deepcopy(ts_cos)]\n )\n2×4 DataFrame\n Row │ id age name stat \n │ Int64 Int64 String Array… \n─────┼─────────────────────────────────────────────────────────\n 1 │ 1 30 Python [0.841471, 0.909297, 0.14112, -0…\n 2 │ 2 9 Julia [0.540302, -0.416147, -0.989992,…\n\njulia> grouped_variables = [[2,3], [4]]; # group 2nd and 3rd variables in the first modality\n # the 4th variable in the second modality and\n # leave the first variable as a \"spare variable\"\n\njulia> md = MultiDataset(df_data, grouped_variables)\n● MultiDataset\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age name \n │ Int64 String\n─────┼───────────────\n 1 │ 30 Python\n 2 │ 9 Julia\n- Modality 2 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat \n │ Array… \n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n- Spare variables\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ id \n │ Int64\n─────┼───────\n 1 │ 1\n 2 │ 2\n","category":"page"},{"location":"","page":"Home","title":"Home","text":"Now md holds a MultiDataset and all of its modalities can be conveniently iterated as elements of a Vector:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> for (i, f) in enumerate(md)\n println(\"Modality: \", i)\n println(f)\n println()\n end\nModality: 1\n2×2 SubDataFrame\n Row │ age name \n │ Int64 String\n─────┼───────────────\n 1 │ 30 Python\n 2 │ 9 Julia\n\nModality: 2\n2×1 SubDataFrame\n Row │ stat \n │ Array… \n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…","category":"page"},{"location":"","page":"Home","title":"Home","text":"Note that each element of a MultiDataset is a SubDataFrame:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> eltype(md)\nSubDataFrame\n","category":"page"},{"location":"","page":"Home","title":"Home","text":"note: Spare variables\nSpare variables will never be seen when accessing a MultiDataset through its iterator interface. To access them see sparevariables.","category":"page"},{"location":"utils/","page":"Utils","title":"Utils","text":"CurrentModule = MultiData","category":"page"},{"location":"utils/#man-utils","page":"Utils","title":"Utils","text":"","category":"section"},{"location":"utils/","page":"Utils","title":"Utils","text":"Pages = [\"utils.md\"]","category":"page"},{"location":"utils/","page":"Utils","title":"Utils","text":"paa\nlinearize_data\nunlinearize_data","category":"page"},{"location":"utils/#MultiData.paa","page":"Utils","title":"MultiData.paa","text":"paa(x; f = identity, t = (1, 0, 0))\n\nPiecewise Aggregate Approximation\n\nApply f function to each dimensionality of x array divinding it in t[1] windows taking t[2] extra points left and t[3] extra points right.\n\nNote: first window will always consider t[2] = 0 and last one will always consider t[3] = 0.\n\n\n\n\n\n","category":"function"},{"location":"utils/#MultiData.linearize_data","page":"Utils","title":"MultiData.linearize_data","text":"linearize_data(d)\n\nLinearize dimensional object d.\n\n\n\n\n\n","category":"function"},{"location":"utils/#MultiData.unlinearize_data","page":"Utils","title":"MultiData.unlinearize_data","text":"unlinearize_data(d, dims)\n\nUnlinearize a vector d to a shape dims.\n\n\n\n\n\n","category":"function"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"CurrentModule = MultiData","category":"page"},{"location":"datasets/#man-datasets","page":"Datasets","title":"Datasets","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Pages = [\"datasets.md\"]","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"A machine learning dataset are a collection of instances (or samples), each one described by a number of variables. In the case of tabular data, a dataset looks like a database table, where every column is a variable, and each row corresponds to a given instance. However, a dataset can also be non-tabular; for example, each instance can consist of a multivariate time-series, or an image.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"When data is composed of different modalities) combining their statistical properties is non-trivial, since they may be quite different in nature one another.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"The abstract representation of a multimodal dataset provided by this package is the AbstractMultiDataset.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"AbstractMultiDataset\ngrouped_variables\ndata\ndimensionality","category":"page"},{"location":"datasets/#MultiData.AbstractMultiDataset","page":"Datasets","title":"MultiData.AbstractMultiDataset","text":"Abstract supertype for all multimodal datasets.\n\nA concrete multimodal dataset should always provide accessors data, to access the underlying tabular structure (e.g., DataFrame) and grouped_variables, to access the grouping of variables (a vector of vectors of column indices).\n\n\n\n\n\n","category":"type"},{"location":"datasets/#MultiData.grouped_variables","page":"Datasets","title":"MultiData.grouped_variables","text":"grouped_variables(amd)::Vector{Vector{Int}}\n\nReturn the indices of the variables grouped by modality, of an AbstractMultiDataset. The grouping describes how the different modalities are composed from the underlying AbstractDataFrame structure.\n\nSee also data, AbstractMultiDataset.\n\n\n\n\n\n","category":"function"},{"location":"datasets/#MultiData.data","page":"Datasets","title":"MultiData.data","text":"data(amd)::AbstractDataFrame\n\nReturn the structure that underlies an AbstractMultiDataset.\n\nSee also grouped_variables, AbstractMultiDataset.\n\n\n\n\n\n","category":"function"},{"location":"datasets/#SoleBase.dimensionality","page":"Datasets","title":"SoleBase.dimensionality","text":"dimensionality(df)\n\nReturn the dimensionality of a dataframe df.\n\nIf the dataframe has variables of various dimensionalities :mixed is returned.\n\nIf the dataframe is empty (no instances) :empty is returned. This behavior can be controlled by setting the keyword argument force:\n\n:no (default): return :mixed in case of mixed dimensionality\n:max: return the greatest dimensionality\n:min: return the lowest dimensionality\n\n\n\n\n\n","category":"function"},{"location":"datasets/#man-unlabeled-datasets","page":"Datasets","title":"Unlabeled Datasets","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"In unlabeled datasets there is no labeling variable, and all of the variables (also called feature variables, or features) have equal role in the representation. These datasets are used in unsupervised learning contexts, for discovering internal correlation patterns between the features. Multimodal unlabeled datasets can be instantiated with MultiDataset.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Modules = [MultiData]\nPages = [\"src/MultiDataset.jl\"]","category":"page"},{"location":"datasets/#MultiData.MultiDataset","page":"Datasets","title":"MultiData.MultiDataset","text":"MultiDataset(df, grouped_variables)\n\nCreate a MultiDataset from an AbstractDataFrame df, initializing its modalities according to the grouping in grouped_variables.\n\ngrouped_variables is an AbstractVector of variable grouping which are AbstractVectors of integers representing the index of the variables selected for that modality.\n\nNote that the order matters for both the modalities and the variables.\n\njulia> df = DataFrame(\n :age => [30, 9],\n :name => [\"Python\", \"Julia\"],\n :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]],\n :stat2 => [[cos(i) for i in 1:50000], [sin(i) for i in 1:50000]]\n )\n2×4 DataFrame\n Row │ age name stat1 stat2 ⋯\n │ Int64 String Array… Array… ⋯\n─────┼──────────────────────────────────────────────────────────────────────────────────────\n 1 │ 30 Python [0.841471, 0.909297, 0.14112, -0… [0.540302, -0.416147, -0.989992,… ⋯\n 2 │ 9 Julia [0.540302, -0.416147, -0.989992,… [0.841471, 0.909297, 0.14112, -0…\n\njulia> md = MultiDataset([[2]], df)\n● MultiDataset\n └─ dimensionalities: (0,)\n- Modality 1 / 1\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ name\n │ String\n─────┼────────\n 1 │ Python\n 2 │ Julia\n- Spare variables\n └─ dimensionality: mixed\n2×3 SubDataFrame\n Row │ age stat1 stat2\n │ Int64 Array… Array…\n─────┼─────────────────────────────────────────────────────────────────────────────\n 1 │ 30 [0.841471, 0.909297, 0.14112, -0… [0.540302, -0.416147, -0.989992,…\n 2 │ 9 [0.540302, -0.416147, -0.989992,… [0.841471, 0.909297, 0.14112, -0…\n\nMultiDataset(df; group = :none)\n\nCreate a MultiDataset from an AbstractDataFrame df, automatically selecting modalities.\n\nThe selection of modalities can be controlled by the group argument which can be:\n\n:none (default): no modality will be created\n:all: all variables will be grouped by their dimensionality\na list of dimensionalities which will be grouped.\n\nNote: :all and :none are the only Symbols accepted by group.\n\nTODO: fix passing a vector of Integer to group\n\nTODO: rewrite examples\n\nExamples\n\njulia> df = DataFrame(\n :age => [30, 9],\n :name => [\"Python\", \"Julia\"],\n :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]],\n :stat2 => [[cos(i) for i in 1:50000], [sin(i) for i in 1:50000]]\n )\n2×4 DataFrame\n Row │ age name stat1 stat2 ⋯\n │ Int64 String Array… Array… ⋯\n─────┼──────────────────────────────────────────────────────────────────────────────────────\n 1 │ 30 Python [0.841471, 0.909297, 0.14112, -0… [0.540302, -0.416147, -0.989992,… ⋯\n 2 │ 9 Julia [0.540302, -0.416147, -0.989992,… [0.841471, 0.909297, 0.14112, -0…\n\njulia> md = MultiDataset(df)\n● MultiDataset\n └─ dimensionalities: ()\n- Spare variables\n └─ dimensionality: mixed\n2×4 SubDataFrame\n Row │ age name stat1 stat2 ⋯\n │ Int64 String Array… Array… ⋯\n─────┼──────────────────────────────────────────────────────────────────────────────────────\n 1 │ 30 Python [0.841471, 0.909297, 0.14112, -0… [0.540302, -0.416147, -0.989992,… ⋯\n 2 │ 9 Julia [0.540302, -0.416147, -0.989992,… [0.841471, 0.909297, 0.14112, -0…\n\n\njulia> md = MultiDataset(df; group = :all)\n● MultiDataset\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age name\n │ Int64 String\n─────┼───────────────\n 1 │ 30 Python\n 2 │ 9 Julia\n- Modality 2 / 2\n └─ dimensionality: 1\n2×2 SubDataFrame\n Row │ stat1 stat2\n │ Array… Array…\n─────┼──────────────────────────────────────────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0… [0.540302, -0.416147, -0.989992,…\n 2 │ [0.540302, -0.416147, -0.989992,… [0.841471, 0.909297, 0.14112, -0…\n\n\njulia> md = MultiDataset(df; group = [0])\n● MultiDataset\n └─ dimensionalities: (0, 1, 1)\n- Modality 1 / 3\n └─ dimensionality: 0\n2×2 SubDataFrame\n Row │ age name\n │ Int64 String\n─────┼───────────────\n 1 │ 30 Python\n 2 │ 9 Julia\n- Modality 2 / 3\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat1\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n- Modality 3 / 3\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat2\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.540302, -0.416147, -0.989992,…\n 2 │ [0.841471, 0.909297, 0.14112, -0…\n\n\n\n\n\n","category":"type"},{"location":"datasets/#MultiData._empty-Tuple{MultiDataset}","page":"Datasets","title":"MultiData._empty","text":"_empty(md)\n\nReturn a copy of a multimodal dataset with no instances.\n\nNote: since the returned AbstractMultiDataset will be empty its columns types will be Any.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#man-supervised-datasets","page":"Datasets","title":"Labeled Datasets","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"In labeled datasets, one or more variables are considered to have special semantics with respect to the other variables; each of these labeling variables (or target variables) can be thought as assigning a label to each instance, which is typically a categorical value (classification label) or a numerical value (regression label). Supervised learning methods can be applied on these datasets for modeling the target variables as a function of the feature variables.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"As an extension of the AbstractMultiDataset, AbstractLabeledMultiDataset has an interface that can be implemented to represent multimodal labeled datasets.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"AbstractLabeledMultiDataset\nlabeling_variables\ndataset","category":"page"},{"location":"datasets/#MultiData.AbstractLabeledMultiDataset","page":"Datasets","title":"MultiData.AbstractLabeledMultiDataset","text":"Abstract supertype for all labeled multimodal datasets (used in supervised learning).\n\nAs any multimodal dataset, any concrete labeled multimodal dataset should always provide the accessors data, to access the underlying tabular structure (e.g., DataFrame) and grouped_variables, to access the grouping of variables. In addition to these, implementations are required for labeling_variables, to access the indices of the labeling variables.\n\nSee also AbstractMultiDataset.\n\n\n\n\n\n","category":"type"},{"location":"datasets/#MultiData.labeling_variables","page":"Datasets","title":"MultiData.labeling_variables","text":"labeling_variables(almd)::Vector{Int}\n\nReturn the indices of the labelling variables, of the AbstractLabeledMultiDataset. with respect to the underlying AbstractDataFrame structure (see data).\n\nSee also grouped_variables, AbstractLabeledMultiDataset.\n\n\n\n\n\n","category":"function"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Multimodal labeled datasets can be instantiated with LabeledMultiDataset.","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Modules = [MultiData]\nPages = [\"LabeledMultiDataset.jl\", \"labels.jl\"]","category":"page"},{"location":"datasets/#MultiData.LabeledMultiDataset","page":"Datasets","title":"MultiData.LabeledMultiDataset","text":"LabeledMultiDataset(md, labeling_variables)\n\nCreate a LabeledMultiDataset by associating an AbstractMultiDataset with some labeling variables, specified as a column index (Int) or a vector of column indices (Vector{Int}).\n\nArguments\n\nmd is the original AbstractMultiDataset;\nlabeling_variables is an AbstractVector of integers indicating the indices of the variables that will be set as labels.\n\nExamples\n\njulia> lmd = LabeledMultiDataset(MultiDataset([[2],[4]], DataFrame(\n :id => [1, 2],\n :age => [30, 9],\n :name => [\"Python\", \"Julia\"],\n :stat => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]\n )), [1, 3])\n● LabeledMultiDataset\n ├─ labels\n │ ├─ id: Set([2, 1])\n │ └─ name: Set([\"Julia\", \"Python\"])\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 30\n 2 │ 9\n- Modality 2 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n\n\n\n\n\n\n","category":"type"},{"location":"datasets/#MultiData.joinlabels!-Tuple{AbstractLabeledMultiDataset, Vararg{Symbol}}","page":"Datasets","title":"MultiData.joinlabels!","text":"joinlabels!(lmd, [lbls...]; delim = \"_\")\n\nOn a labeled multimodal dataset, collapse the labeling variables identified by lbls into a single labeling variable of type String, by means of a join that uses delim for string delimiter.\n\nIf not specified differently this function will join all labels.\n\nlbls can be an Integer indicating the index of the label, or a Symbol indicating the name of the labeling variable.\n\n!!! note\n\nThe resulting labels will always be of type String.\n\nnote: Note\nThe resulting labeling variable will always be added as last column in the underlying DataFrame.\n\nExamples\n\njulia> lmd = LabeledMultiDataset(\n MultiDataset(\n [[2],[4]],\n DataFrame(\n :id => [1, 2],\n :age => [30, 9],\n :name => [\"Python\", \"Julia\"],\n :stat => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]\n )\n ),\n [1, 3],\n )\n● LabeledMultiDataset\n ├─ labels\n │ ├─ id: Set([2, 1])\n │ └─ name: Set([\"Julia\", \"Python\"])\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 30\n 2 │ 9\n- Modality 2 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n\n\njulia> joinlabels!(lmd)\n● LabeledMultiDataset\n ├─ labels\n │ └─ id_name: Set([\"1_Python\", \"2_Julia\"])\n └─ dimensionalities: (0, 1)\n- Modality 1 / 2\n └─ dimensionality: 0\n2×1 SubDataFrame\n Row │ age\n │ Int64\n─────┼───────\n 1 │ 30\n 2 │ 9\n- Modality 2 / 2\n └─ dimensionality: 1\n2×1 SubDataFrame\n Row │ stat\n │ Array…\n─────┼───────────────────────────────────\n 1 │ [0.841471, 0.909297, 0.14112, -0…\n 2 │ [0.540302, -0.416147, -0.989992,…\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MultiData.label-Tuple{AbstractLabeledMultiDataset, Integer, Integer}","page":"Datasets","title":"MultiData.label","text":"label(lmd, j, i)\n\nReturn the value of the i-th labeling variable for instance at index i_instance in a labeled multimodal dataset.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MultiData.labeldomain-Tuple{AbstractLabeledMultiDataset, Integer}","page":"Datasets","title":"MultiData.labeldomain","text":"labeldomain(lmd, i)\n\nReturn the domain of i-th label of a labeled multimodal dataset.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MultiData.labels-Tuple{AbstractLabeledMultiDataset}","page":"Datasets","title":"MultiData.labels","text":"labels(lmd, i_instance)\nlabels(lmd)\n\nReturn the labels of instance at index i_instance in a labeled multimodal dataset. A dictionary of type labelname => value is returned.\n\nIf only the first argument is passed then the labels for all instances are returned.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MultiData.nlabelingvariables-Tuple{AbstractLabeledMultiDataset}","page":"Datasets","title":"MultiData.nlabelingvariables","text":"nlabelingvariables(lmd)\n\nReturn the number of labeling variables of a labeled multimodal dataset.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MultiData.setaslabeling!-Tuple{AbstractLabeledMultiDataset, Integer}","page":"Datasets","title":"MultiData.setaslabeling!","text":"setaslabeling!(lmd, i)\nsetaslabeling!(lmd, var_name)\n\nSet i-th variable as label.\n\nThe variable name can be passed as second argument instead of its index.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MultiData.unsetaslabeling!-Tuple{AbstractLabeledMultiDataset, Integer}","page":"Datasets","title":"MultiData.unsetaslabeling!","text":"unsetaslabeling!(lmd, i)\nunsetaslabeling!(lmd, var_name)\n\nRemove i-th labeling variable from labels list.\n\nThe variable name can be passed as second argument instead of its index.\n\n\n\n\n\n","category":"method"}] } diff --git a/dev/utils/index.html b/dev/utils/index.html index 99a2f18..3e1ed43 100644 --- a/dev/utils/index.html +++ b/dev/utils/index.html @@ -1,2 +1,2 @@ -Utils · MultiData.jl

Utils

MultiData.paaFunction
paa(x; f = identity, t = (1, 0, 0))

Piecewise Aggregate Approximation

Apply f function to each dimensionality of x array divinding it in t[1] windows taking t[2] extra points left and t[3] extra points right.

Note: first window will always consider t[2] = 0 and last one will always consider t[3] = 0.

source
+Utils · MultiData.jl

Utils

MultiData.paaFunction
paa(x; f = identity, t = (1, 0, 0))

Piecewise Aggregate Approximation

Apply f function to each dimensionality of x array divinding it in t[1] windows taking t[2] extra points left and t[3] extra points right.

Note: first window will always consider t[2] = 0 and last one will always consider t[3] = 0.

source