Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/irregular_structure_and_operations #536

Merged
merged 158 commits into from
Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from 75 commits
Commits
Show all changes
158 commits
Select commit Hold shift + click to select a range
e314ee8
Initial 1D implementation
opintosant Nov 28, 2022
f504223
Base irregular structure and representation reading from dataframe
opintosant Jan 16, 2023
de32dfe
Add scatterplot functionality
opintosant Jan 16, 2023
a7ab765
Code cleanup and implement mean
opintosant Jan 29, 2023
3904443
Update default constructor FDataIrregular to take arrays as input. Lo…
opintosant Jan 29, 2023
583a2be
Add functionality to construct an FDataIrregular object from a FDataG…
opintosant Jan 29, 2023
ab15784
Allowing selecting the keyword argument 'marker' for plot andscatter.…
opintosant Jan 30, 2023
265cb7e
Implemented basic operations sum, __neg__, gmean, equals, copy
opintosant Jan 30, 2023
5a994d1
Added basic loader for bone density irregular dataframe from CRAN
opintosant Jan 30, 2023
a9434c7
Implemented function concatenate
opintosant Jan 30, 2023
d0c295d
Added conversion to matrix and grid
opintosant Mar 1, 2023
c28b451
Added corrections to target in fetch_bone_density. Recovered function…
opintosant Mar 1, 2023
1243291
Fix round function to only round coordinate values. Added custom Irre…
opintosant Mar 1, 2023
87267b7
Fix wrong argument type for return_X_y
opintosant Mar 1, 2023
24bea75
Change return type of operations from FDataGrid back to FDataIrregular
opintosant Mar 1, 2023
a5a0c2b
Bring back interpolation to irregular class
opintosant Mar 1, 2023
21e099c
Corrections in format
opintosant Mar 1, 2023
7e2aa65
Move all changes to operations to separate PR. Fix getitem to work wi…
opintosant Mar 1, 2023
db112b9
Return a single object in getitem for slices
opintosant Mar 1, 2023
8ba5ef6
Modified domain range to be separate from sample range and allow mult…
opintosant Mar 7, 2023
e562097
General fix of formatting
opintosant Mar 8, 2023
37cb3a7
Allow irregular class and dataset to be imported
opintosant Mar 8, 2023
4d4b797
Added test of basic generation of FDataIrregular
opintosant Mar 8, 2023
6f8ca61
First iteration of FDataIrregular data structure test
opintosant Mar 9, 2023
de4d470
Added mean,var,gmean, concatenate. Added structure for other operations
opintosant Mar 2, 2023
13c476f
Restrict domain
opintosant Mar 2, 2023
0f4275b
Added docstring to shift. Implemented __str__
opintosant Mar 2, 2023
f52fd98
Implemented and tested operations with scalarss
opintosant Mar 2, 2023
d7a0097
Implemented operations with vectores and arrays. Unsolved problem wit…
opintosant Mar 2, 2023
4260d68
Add the case of operations between FDataIrregular
opintosant Mar 7, 2023
ac37a25
Fix formatting issues after rebase
opintosant Mar 9, 2023
c6f5859
Fix wrong shape in gmean function values
opintosant Mar 9, 2023
4f8985a
Remove outdated reference to data_matrix in __str__ method
opintosant Mar 9, 2023
de17d26
Correct return types for fixtures
opintosant Mar 9, 2023
2b6e078
Test for arithmetic operations for scalar irregular data
opintosant Mar 9, 2023
c3dba8b
Fix incorrect selection of sample_ranges in FDataIrregular
opintosant Mar 22, 2023
00ad9de
Fully functioning structure for multidimensional data, passes full test
opintosant Mar 23, 2023
f27420e
Add fixed seed to random generators in tests
opintosant Mar 23, 2023
62401cf
Set uninitialized variable to 0
opintosant Mar 23, 2023
42596f8
Update test_irregular_operations.py to use a better test structure in…
opintosant Mar 23, 2023
197fa28
Preliminary test of reduction operations for FDataIrregular
opintosant Mar 23, 2023
bd0f9d8
Finalize testing of numeric reductions and comparison operators
opintosant Mar 23, 2023
4356d6d
Merge branch 'develop' into feature/irregular_operations
opintosant Mar 23, 2023
73b79c1
Added custom numpy Dtype for Irregular data
opintosant Apr 13, 2023
46f187e
Added to_basis method to FDataIrregular
opintosant Apr 13, 2023
94644fe
General cleanup and wemake and flake 8 formatting
opintosant Apr 13, 2023
5ce6a86
Wemake format and FDataIrregular docstring
opintosant Apr 21, 2023
5a4ace7
Docstrings and examples. Fixed incorrect domain range in concatenate.
opintosant Apr 22, 2023
2a845c4
Delete legacy function gmean.
opintosant Apr 22, 2023
8a81bd7
Remove gmean from testing
opintosant Apr 23, 2023
194cab5
Add initial documentation for FDataIrregular
opintosant Apr 23, 2023
f82e7e3
IrregularBasisSmoother. Fixed error in coordinates function
opintosant Apr 24, 2023
56aa321
Cleanup to_grid function and adapt it to multidimensional datasets. A…
opintosant Apr 25, 2023
0b10333
Add fixed seed to irregular tests
opintosant Apr 25, 2023
9cba9e8
Reformat test_irregular.py to be more efficient and clean. Added test…
opintosant Apr 25, 2023
786f186
Fix error in concatenate for multiple domain dimensions. Add test for…
opintosant Apr 25, 2023
e2ebf2e
Extend testing of to_basis to include multidimensional datasets (usin…
opintosant Apr 26, 2023
55bbcb1
Comply with PEP8 and wemake
opintosant Apr 27, 2023
2c3d6c7
Fix incorrect assertions in init test.
opintosant Apr 27, 2023
3f58908
Fix incorrect implementation of fetch_bone_density with argument as_f…
opintosant Apr 27, 2023
0ca3f49
Make IrregularScatterPlot and IrregularPlot, as well as class PlotIrr…
opintosant Apr 27, 2023
29da53e
Test implementation of to_basis
opintosant Apr 27, 2023
95d86bc
Make test_irregular_operations cleaner and comply with PEP8 and wemake
opintosant Apr 27, 2023
bbe6e06
Make representation/irregular.py PEP8 and wemake compliant.
opintosant Apr 27, 2023
c657967
Clean up implementation of restrict function. Add test for restritct …
opintosant Apr 27, 2023
7fd0f44
Remove done TODOs
opintosant Apr 27, 2023
aa9230c
Merge branch 'develop' into feature/irregular_operations
opintosant Apr 27, 2023
00a348e
Fix errors in Doctest
opintosant May 1, 2023
22beb0e
Revert "Remove done TODOs"
opintosant May 1, 2023
b1e67b1
Merge branch 'feature/irregular_operations' of github.com:GAA-UAM/sci…
opintosant May 1, 2023
f2244f3
Revert "Revert "Remove done TODOs""
opintosant May 1, 2023
c438deb
Fix incorrect isort format for improts in _real_datasets
opintosant May 1, 2023
aff4baa
Fix incorrect style in _real_datasets and smoothing/_basis
opintosant May 1, 2023
87d754f
Fix incorrect sorting of import in exploratory/visualization/represen…
opintosant May 1, 2023
5f7b4c9
Merge branch 'develop' into feature/irregular_operations
vnmabus Jun 13, 2023
c352d57
Merge branch 'develop' into feature/irregular_operations
pcuestas Sep 16, 2023
b7a7489
Fix mistake in PlotIrregular (number of measurements in functions ind…
pcuestas Sep 20, 2023
22388f3
Correct rst suggestions
pcuestas Sep 28, 2023
494fa67
NotImplementedError instead of warning. Fix FDataIrregular.isna() and…
pcuestas Sep 28, 2023
b750e74
Use default_rng instead of RandomState
pcuestas Sep 28, 2023
59d10d4
Correct "axies" to "axes"
pcuestas Sep 28, 2023
1b7a25e
Upper case attributes description and function arguments typing
pcuestas Sep 28, 2023
cbc03a9
Use asarray
pcuestas Sep 28, 2023
810b5b0
Remove unnecessary attributes:
pcuestas Sep 28, 2023
598d5ac
Not implemented error
pcuestas Sep 28, 2023
3d9329e
Remove num_observations:
pcuestas Sep 28, 2023
a78673a
Not implemented errors
pcuestas Sep 28, 2023
2d98830
from fdatagrid
pcuestas Sep 28, 2023
8091e78
cls not typed
pcuestas Sep 28, 2023
15454a6
Typing
pcuestas Sep 28, 2023
10479ef
Typing
pcuestas Sep 28, 2023
bdc7858
Remove sample_points (old)
pcuestas Sep 28, 2023
6d15f06
Fix __eq__
pcuestas Sep 28, 2023
8f1ea02
indices_start_end function
pcuestas Sep 28, 2023
7508581
Comments in _get_sample_range_from_data
pcuestas Oct 6, 2023
71d08a6
Remove property FDataIrregular.n_measurements
pcuestas Oct 6, 2023
742ecc1
Rename FDataIrregular attributes: `function_`:
pcuestas Oct 6, 2023
272d0f6
Style
pcuestas Oct 6, 2023
8420f1e
Refactor _get_domain_range_from_sample_range
pcuestas Oct 7, 2023
2e933a4
Refactor _get_sample_range_from_data
pcuestas Oct 7, 2023
d92d005
Remove use of np.matrix
pcuestas Oct 7, 2023
3c13035
Fix mean and var for more than one dimension
pcuestas Oct 7, 2023
91b74b6
Check if is instance in equals()
pcuestas Oct 7, 2023
80b814a
Change order of checks in equals functions
pcuestas Oct 7, 2023
5183987
Integrate IrregularBasisSmoother into BasisSmoother
pcuestas Oct 7, 2023
1a15e9c
nbytes typo
pcuestas Oct 7, 2023
8a6dd5d
Comments
pcuestas Oct 12, 2023
07ffeff
Remove Optional
pcuestas Oct 16, 2023
47d1ba1
Remove Optional
pcuestas Oct 16, 2023
e225c3e
Rename to_matrix -> _to_data_matrix, remove np.matrix usage
pcuestas Oct 16, 2023
c4cd858
Remove types from docstring, remove single line function
pcuestas Oct 16, 2023
dec2db3
Remove use of indices_start_end (in representation)
pcuestas Oct 16, 2023
bd28940
Remove use of indices_start_end (all remaining)
pcuestas Oct 16, 2023
fff8230
spacing
pcuestas Oct 16, 2023
20fb03d
Private functions in grid and irregular for BasisSmoother
pcuestas Oct 17, 2023
8081d68
Rewrite from_fdatagrid to remove all python loops (at the cost of mor…
pcuestas Oct 17, 2023
7bbe8bc
Remove unused imports
pcuestas Oct 17, 2023
0b9b5bb
from_fdatagrid without innecessary comments
pcuestas Oct 24, 2023
e1ddca4
Make `from_dataframe` private
pcuestas Oct 24, 2023
789aedc
Merge branch 'develop' into feature/irregular_operations
pcuestas Oct 24, 2023
e172815
sum of fdatairregular objects
pcuestas Oct 24, 2023
eee7909
FDataIrregular mean (inherited from FData method)
pcuestas Oct 24, 2023
c7a6235
Test mean function
pcuestas Oct 24, 2023
d82693f
std of FDataIrregular
pcuestas Oct 24, 2023
e202aa7
Private method in FDataIrregular to get common points and correspondi…
pcuestas Oct 24, 2023
c2afecc
test_stats_std for fdatairregular,
pcuestas Oct 24, 2023
0f90db2
use ary.ndim instead of len(ary.shape)
eliegoudout Nov 14, 2023
26b5f0c
points_split and values_split as properties
eliegoudout Nov 14, 2023
fb69594
FDataIrregular.cleaned restrict method
eliegoudout Nov 14, 2023
ad71339
cleaner concatenate
eliegoudout Nov 15, 2023
f6a87ad
FDataIrregular.__init__: validate start_indices
eliegoudout Nov 23, 2023
eee51ad
FDataIrregular.round clean (why start_indices special treatment?)
eliegoudout Nov 23, 2023
3bd25fa
minor clean
eliegoudout Nov 23, 2023
ea7e6dd
FDataIrregular._to_data_matrix clean remove loops
eliegoudout Nov 23, 2023
6e32109
revert: remove *_split properties
eliegoudout Nov 23, 2023
5b0ba71
restrict keep empty samples
eliegoudout Nov 23, 2023
b107525
_reduceat v0
eliegoudout Nov 24, 2023
0e9949f
_get_sample_range_from_data update
eliegoudout Nov 24, 2023
6a8a90d
Two-modes _reduceat for later decision
eliegoudout Nov 29, 2023
22aa8c7
handle nan for domain range compute + enforce float type + allow len(…
eliegoudout Nov 29, 2023
11508c1
clean _sort_by_arguments
eliegoudout Nov 29, 2023
5a841ed
_reduceat wrapper + minor mods
eliegoudout Feb 2, 2024
1a1835e
removed useless op (???)
eliegoudout Feb 14, 2024
f0fe0d7
cleaner _sort_by_arguments from vnmabus
eliegoudout Feb 14, 2024
4a2fc88
resolve reviews
eliegoudout Feb 14, 2024
01fe0ba
better lexsort comment
eliegoudout Feb 15, 2024
3856e10
Merge branch 'develop' into pr-593
vnmabus Feb 19, 2024
dfaad23
Merge branch 'develop' into feature/irregular_operations
vnmabus Feb 22, 2024
cc7d6be
Merge branch 'develop' into feature/irregular_operations
vnmabus Feb 24, 2024
d06e843
Merge branch 'feature/irregular_operations' into feature/irregular_op…
vnmabus Feb 24, 2024
ee89c2c
fixed typo domain_range max
eliegoudout Feb 26, 2024
6bf925c
fixed restrict + allow domain_range broadcast
eliegoudout Feb 26, 2024
16c8109
Merge branch 'feature/irregular_operations' of https://github.com/eli…
eliegoudout Feb 26, 2024
a875bd7
Merge branch 'develop' into feature/irregular_operations
vnmabus Mar 7, 2024
777bd5d
Merge branch 'feature/irregular_operations' into feature/irregular_op…
vnmabus Mar 7, 2024
fb6502f
Fix doctests.
vnmabus Mar 7, 2024
cd7e73e
Fix typo.
vnmabus Mar 11, 2024
21f7bad
Merge pull request #593 from eliegoudout/feature/irregular_operations
vnmabus Mar 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/modules/representation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,28 @@ methods.

skfda.representation.basis.Basis


Irregular representation
------------------------

In practice, most functional datasets do not contain functions evaluated
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
uniformly over a fixed grid. In other words, it is paramount to be able
to represent irregular functional data.

While the FDataGrid class could support these kind of datasets, it is
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
inefficient to store a complete grid with low data density. Furthermore,
there are specific methods that can be applied to irregular data in order
to obtain, among other things, a better convesion to basis representation.
pcuestas marked this conversation as resolved.
Show resolved Hide resolved

The FDataIrregular class provides the functionality which suits these purposes.


.. autosummary::
:toctree: autosummary

skfda.representation.irregular.FDataIrregular


Generic representation
----------------------

Expand Down
1 change: 1 addition & 0 deletions skfda/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
"fetch_tecator",
"fetch_ucr",
"fetch_weather",
"fetch_bone_density",
],
"_samples_generators": [
"make_gaussian",
Expand Down
98 changes: 94 additions & 4 deletions skfda/datasets/_real_datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@

import numpy as np
import pandas as pd
import rdata
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [mypy] reported by reviewdog 🐶
Cannot find implementation or library stub for module named "rdata" [import]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [mypy] reported by reviewdog 🐶
Cannot find implementation or library stub for module named "rdata" [import-not-found]

from pandas import DataFrame, Series
from sklearn.utils import Bunch
from typing_extensions import Literal

import rdata

from ..representation import FDataGrid
from ..representation.irregular import FDataIrregular
from ..typing._numpy import NDArrayFloat, NDArrayInt


Expand Down Expand Up @@ -174,7 +174,7 @@ def fetch_ucr(
return_X_y: bool = False,
**kwargs: Any,
) -> Bunch | Tuple[FDataGrid, NDArrayInt]:
"""
r"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS360 Found an unnecessary use of a raw string: """

Fetch a dataset from the UCR/UEA repository.

The UCR/UEA Time Series Classification repository, hosted at
Expand All @@ -185,6 +185,7 @@ def fetch_ucr(

Args:
name: Dataset name.
return_X_y: Return tuple (data, target)
kwargs: Additional parameters for the function
:func:`skdatasets.repositories.ucr.fetch`.

Expand Down Expand Up @@ -261,7 +262,7 @@ def _fetch_fda_usc(name: str) -> Any:
Acoustic-Phonetic Continuous Speech Corpus, NTIS, US Dept of Commerce)
which is a widely used resource for research in speech recognition. A
dataset was formed by selecting five phonemes for
classification based on digitized speech from this database.
classification based on digitized speech from this database.
phonemes are transcribed as follows: "sh" as in "she", "dcl" as in
"dark", "iy" as the vowel in "she", "aa" as the vowel in "dark", and
"ao" as the first vowel in "water". From continuous speech of 50 male
Expand Down Expand Up @@ -1563,3 +1564,92 @@ def fetch_mco(
cite=":footcite:p:`ruiz-meana++_2003_cariporide`",
bibliography=".. footbibliography::",
) + _param_descr


def _fetch_loon_data(name: str) -> Any:
return _fetch_cran_no_encoding_warning(
name,
"loon.data",
version="0.1.3",
)


_bone_density_descr = """
The Bone Density dataset is a study of bone density
in boys and girls aged 8-17. It contains data from 423
individuals, measured irregularly in different times,
with an average of ~3 points per individual.

References:
https://cran.r-project.org/package=loon.data
Laura K. Bachrach, Trevor Hastie, May-Choo Wang,
Balasubramanian Narasimhan, and Robert Marcus (1999)
"Bone Mineral Acquisition in Healthy Asian, Hispanic, Black
and Caucasian Youth. A Longitudinal Study",
J Clin Endocrinol Metab, 84, 4702-12.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2009)
"The Elements of Statistical Learning",
2nd Edition, Springer New York <doi:10.1007/978-0-387-84858-7>

"""


def fetch_bone_density(
return_X_y: bool = False,
as_frame: bool = False,
) -> Bunch | Tuple[FDataGrid, NDArrayInt] | Tuple[DataFrame, Series]:
"""
Load the Bone Density dataset. This is an irregular dataset.

The data is obtained from the R package 'loon.data', which compiles several
irregular datasets. Sources to be determined.
"""
descr = _bone_density_descr
frame = None

raw_dataset = _fetch_loon_data("bone_ext")

data = raw_dataset["bone_ext"]

curve_name = "idnum"
argument_name = "age"
target_name = "sex"
coordinate_name = "spnbmd"

curves = FDataIrregular.from_dataframe(
data,
id_column=curve_name,
argument_columns=argument_name,
coordinate_columns=coordinate_name,
argument_names=[argument_name],
coordinate_names=[coordinate_name],
dataset_name="bone_ext",
)

target = pd.Series(
data.drop_duplicates(subset=["idnum"])[target_name],
name="group",
)

feature_name = curves.dataset_name.lower()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [mypy] reported by reviewdog 🐶
Item "None" of "str | None" has no attribute "lower" [union-attr]

target_names = target.values.tolist()

if as_frame:
curves = pd.DataFrame({feature_name: curves})
target_as_frame = target.reset_index(drop=True).to_frame()
frame = pd.concat([curves, target_as_frame], axis=1)
else:
target = target.values.codes

if return_X_y:
return curves, target

return Bunch(
data=curves,
target=target,
frame=frame,
categories={},
feature_names=[argument_name],
target_names=target_names,
DESCR=descr,
)
Loading