Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] EHN add a model to fit and predict heart-rate from power #6

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
7 changes: 6 additions & 1 deletion sksports/model/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@
# Cedric Lemaitre
# License: MIT

from .heart_rate import HeartRateRegressor
from .heart_rate import exp_heart_rate_model

from .power import strava_power_model

__all__ = ['strava_power_model']
__all__ = ['HeartRateRegressor',
'exp_heart_rate_model',
'strava_power_model']
228 changes: 228 additions & 0 deletions sksports/model/heart_rate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
"""Module to estimate heart rate from data."""

# Authors: Aart Goossens
# Guillaume Lemaitre
# License: MIT

from __future__ import division

import numpy as np
import pandas as pd
from scipy.optimize import leastsq, minimize

from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.utils import check_X_y, check_array
from sklearn.utils.validation import check_is_fitted


def exp_heart_rate_model(power, hr_start, hr_max, hr_slope, hr_drift,
rate_growth, rate_decay):
"""Model heart-rate from power date based on exponential
physiological-based model.

The model used is based on [1]_.

Parameters
----------
power : ndarray, shape (n_samples,)
The power data in watts.

hr_start : float
Initial heart-rate frequency.

hr_max : float
Heart-rate frequency of the athlete at maximum.

hr_slope : float
Slope considered if the model is linear. FIXME

hr_drift : float
Attenuation of the heart-rate other time. FIXME

rate_growth : float
Growth rate of the exponential when the power increases.

rate_decay : float
Decay rate of the exponential when the power decreases.

Returns
-------
hr_pred : ndarray, shape (n_samples,)
Prediction of the heart-rate frequencies associated to the power data.

References
----------
.. [1] de Smet, Dimitri, et al. "Heart rate modelling as a potential
physical fitness assessment for runners and cyclists." Workshop at ECML
& PKDD. 2016.

Examples
--------
>>> import numpy as np
>>> power = np.array([100] * 25 + [240] * 25 + [160] * 25)
>>> from sksports.model import exp_heart_rate_model
>>> hr_pred = exp_heart_rate_model(power, 75, 200, 0.30, 3e-5, 24, 30)
>>> print(hr_pred) # doctest: +ELLIPSIS
[...]

"""
power = check_array(power, ensure_2d=False)
power_corrected = power + power.cumsum() * hr_drift
hr_lin = power_corrected * hr_slope + hr_start
hr_ss = np.minimum(hr_max, hr_lin)

diff_hr = np.pad(np.diff(power_corrected), pad_width=1, mode='constant')
rate = np.where((diff_hr <= 0), rate_decay, rate_growth)

hr_pred = np.ones(power.size) * hr_start
for curr_idx in range(hr_pred.size):
hr_prev = hr_pred[curr_idx - 1] if curr_idx > 0 else hr_start
hr_pred[curr_idx] = (hr_prev +
(hr_ss[curr_idx] - hr_prev) / rate[curr_idx])

return hr_pred


def _model_residual(params, heart_rate, power):
"""Residuals between prediction and ground-truth."""
return heart_rate - exp_heart_rate_model(power, *params)


def _model_residual_least_square(params, heart_rate, power):
"""Least-square error of the residuals."""
return np.sum(_model_residual(params, heart_rate, power) ** 2)


class HeartRateRegressor(BaseEstimator, RegressorMixin):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the benefit of BaseEstimator and RegressorMixin? I don't see it used anywhere in the fit and predict methods. Is there other functionality of these classes that can be useful?
Either way, I like implementing this model like a regressor class, it makes the api much more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically, it allows to use the tool as pipeline, grid-search, and cross-validation from scikit-learn.
The mixin just define a score function which is the R^2 score while the BaseEstimator provides get_params and set_params to deal with the parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, I like implementing this model like a regressor class, it makes the api much more clear.

We can also the model public to the user. We just need a good name.

"""Estimate the heart-rate from power data.

The model used is based on the formulation in [1]_.

Parameters
----------
hr_start : float, default=75
Initial heart-rate frequency.

hr_max : float, default=200
Heart-rate frequency of the athlete at maximum.

hr_slope : float, default=0.30
Slope considered if the model is linear. FIXME

hr_drift : float, default=3e-5
Attenuation of the heart-rate other time. FIXME

rate_growth : float, default=24
Growth rate of the exponential when the power increases.

rate_decay : float, default=30
Decay rate of the exponential when the power decreases.

Attributes
----------
hr_start_ : float
Fitted initial heart-rate frequency.

hr_max_ : float
Fitted heart-rate frequency of the athlete at maximum.

hr_slope_ : float
Fitted slope considered if the model is linear. FIXME

hr_drift_ : float
Fitted attenuation of the heart-rate other time. FIXME

rate_growth_ : float
Fitted growth rate of the exponential when the power increases.

rate_decay_ : float
Fitted decay rate of the exponential when the power decreases.

References
----------
.. [1] de Smet, Dimitri, et al. "Heart rate modelling as a potential
physical fitness assessment for runners and cyclists." Workshop at ECML
& PKDD. 2016.

"""

def __init__(self, hr_start=75, hr_max=200, hr_slope=0.30, hr_drift=3e-5,
rate_growth=24, rate_decay=30):
self.hr_start = hr_start
self.hr_max = hr_max
self.hr_slope = hr_slope
self.hr_drift = hr_drift
self.rate_growth = rate_growth
self.rate_decay = rate_decay

def _check_inputs(self, X, y=None):
"""Validate the inputs.

Check if the inputs are Pandas inputs. Resample with a sampling-rate of
1 Hertz.

"""
if y is None:
is_df = True if hasattr(X, 'loc') else False

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is valid in most cases, why not just go for isinstance(X, pd.DataFrame)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to ducktype. It usually allows to not import the library (which we do anyway) and accept data structure which are inheriting from it.

X_ = X.resample('1s').mean() if is_df else X

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this raises a TypeError for a df with a non datetime-like index. Should we catch that here or add a check for a datetime-like index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, dataframe is not enough, we should check the index.

X_ = check_array(X_)
return X_.ravel(), is_df

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still need to return y here as the second return argument right? Else line 193 will raise a ValueError (unpack error).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is confusing, I should always return X_.ravel(), y, is_df and ignore y when it is None.
Right now, I am returning different thing depending if it is fit or predict.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah now I get it. That could work indeed but I think it's less confusing if a method always returns the same number of arguments.

else:
is_df = True if hasattr(X, 'loc') and hasattr(y, 'loc') else False
X_ = X.resample('1s').mean() if is_df else X
y_ = y.resample('1s').mean() if is_df else y
X_, y_ = check_X_y(X_, y_)
return X_.ravel(), y_, is_df

def fit(self, X, y):
Copy link

@AartGoossens AartGoossens May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: It still itches when I see Python variables starting with a capital... 😅 (but I recognize that it's accepted for these cases)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is kinda of imposed by scikit-learn estimators. power and heart_rate would be much better.

"""Compute the parameters model.

Parameters
----------
X : array-like, shape (n_samples,)
The power data.

y : array-like, shape (n_samples,)
The heart-rate data.

Returns
-------
self

"""
power, heart_rate, _ = self._check_inputs(X, y)
params_init = [self.hr_start, self.hr_max, self.hr_slope,
self.hr_drift, self.rate_growth, self.rate_decay]

res_opt = minimize(_model_residual_least_square, params_init,
args=(heart_rate, power),
method='Nelder-Mead')['x']

self.hr_start_, self.hr_max_, self.hr_slope_, self. hr_drift_, \
self.rate_growth_, self.rate_decay_ = res_opt

return self

def predict(self, X):
"""Predict the heart-rate frequency from the power data.

Parameters
----------
X : array-like, shape (n_samples,)
The power data.

Returns
-------
y : ndarray, shape (n_samples,)
The heart-rate predictions.

"""
check_is_fitted(self, ['hr_start_', 'hr_max_', 'hr_slope_',
'hr_drift_', 'rate_growth_', 'rate_decay_'])
power, is_df = self._check_inputs(X)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think self._check_inputs returns 3 items

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only 2 at predict


hr_pred = exp_heart_rate_model(power, self.hr_start_, self.hr_max_,
self.hr_slope_, self. hr_drift_,
self.rate_growth_, self.rate_decay_)

return pd.Series(hr_pred, index=X.index) if is_df else hr_pred
53 changes: 53 additions & 0 deletions sksports/model/tests/test_heart_rate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""Tests the module used to estimate the heart-rate."""

# Authors: Aart Goossens
# Guillaume Lemaitre
# License: MIT

import pytest
import numpy as np
import pandas as pd

from sksports.model import exp_heart_rate_model
from sksports.model import HeartRateRegressor

POWER = np.array([100] * 10 + [240] * 100 + [160] * 100)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can consider moving these constants to a pytest fixture. It's a matter of preference but I think these fixtures are really nice to work with and also well reusable in other tests. But I'm fine with doing improvements like this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I was playing with the tests for the moment :)

HEART_RATE = exp_heart_rate_model(POWER, 100, 180, 0.40, 3e-5, 20, 35)
HEART_RATE = HEART_RATE + np.random.random(HEART_RATE.size)


def test_exp_heart_rate_model():
interval = [10, 1010, 2010]
hr_pred = exp_heart_rate_model(POWER, 75, 200, 0.30, 3e-5, 24, 30)

diff_hr_pred = np.diff(hr_pred)
assert np.all(diff_hr_pred[interval[1]:interval[2]] > 0)
assert np.all(diff_hr_pred[interval[2]:] < 0)


@pytest.mark.parametrize(
"power, heart_rate, output_type",
[(POWER.reshape(-1, 1), HEART_RATE, np.ndarray),
(pd.DataFrame(
POWER,
index=pd.date_range('1/1/2011', periods=POWER.size, freq='s')),
pd.Series(
HEART_RATE,
index=pd.date_range('1/1/2011', periods=POWER.size, freq='s')),
pd.Series),
(POWER.reshape(-1, 1),
pd.Series(
HEART_RATE,
index=pd.date_range('1/1/2011', periods=POWER.size, freq='s')),
np.ndarray)]
)
def test_heart_rate_regressor(power, heart_rate, output_type):
reg = HeartRateRegressor()
reg.fit(power, heart_rate)

assert reg.hr_start_ == pytest.approx(100, 1.0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a question than a comment: what's best practice for models like this for deciding how and which model parameters to test? Do you have any examples from scikit-learn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we run a classifier or a regressor on a known dataset and check the score. It is non-trivial to actually test parameters. Here I think that we can make it on those 2 parameters with a large enough approximation to be sure that the optimization (or any change in the future) does not lead to a complete non-sense.

Basically, I think that we could make something like:

assert reg.score(X) > 0.9

to be sure that we are fitting the data properly.

assert reg.hr_max_ == pytest.approx(180, 1.0)
assert reg.hr_slope_ == pytest.approx(0.40, 0.01)

hr_pred = reg.predict(power)
assert isinstance(hr_pred, output_type)