[MRG] EHN add a model to fit and predict heart-rate from power #6

glemaitre · 2018-05-27T22:52:57Z

Partially addressing #5

Implement a scikit-learn like regressor to fit and predict heart-rate data from power.

codecov · 2018-05-27T23:06:45Z

Codecov Report

Merging #6 into master will increase coverage by 0.5%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master       #6     +/-   ##
=========================================
+ Coverage   97.04%   97.54%   +0.5%     
=========================================
  Files          23       25      +2     
  Lines         440      530     +90     
  Branches       48       52      +4     
=========================================
+ Hits          427      517     +90     
  Misses          6        6             
  Partials        7        7

Impacted Files	Coverage Δ
sksports/model/heart_rate.py	`100% <100%> (ø)`
sksports/model/__init__.py	`100% <100%> (ø)`	⬆️
sksports/model/tests/test_heart_rate.py	`100% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a80a679...af0c989. Read the comment docs.

glemaitre · 2018-05-27T23:10:27Z

@AartGoossens I modify the function which was making the estimation and prediction of the heart rate to follow the scikit-learn API. I have couple of suggestions:

Assuming that a user pass dataframe or series, then I think it would be useful to return such type at predict. It can be easily done by using the index of X and creating the dataframe or the series on the fly.
We should make a sanity check and decide a policy regarding the sampling time consistency. At fit we should check that X and y are sampled is the same manner. At predict time, we need to check that we have the same sampling rate than at fit. This is only possible when given a datetime index with a serie or dataframe thought. Then, we need to decide whether we make a conversion or raise an error.

I also want to improve the current test. Checking strictly floating values is a bit flaky. I still don't have a clear idea thought.

Let me know if you have any comment.

glemaitre · 2018-05-27T23:11:02Z

I forgot to ping @sladkovm as well, sorry.

AartGoossens · 2018-05-28T13:31:33Z

Nice work! And good to see that you're picking up this model first because I think there are many interesting use cases for this (thinking e.g. OpenData).

@sladkovm and me had discussions about input and return types before (ref1, ref2). I'm not in favor of explicitly handling multiple input types (but Numpy and probably Scipy accept both lists/ndarrays in some cases) and am even less in favor of converting the return type to the original input type. I think the interface of the library (and therefore the function signature) should be as concise as possible as this results in the most predictable behavior for the user of the library. Maybe you can you explain how Scipy is handling this? There must some good use-cases for this.
Good point. I have thought about this before but haven't come to a satisfactory solution. I think that for now I would be happy with just accepting 1Hz sampled data because I'm not sure what other sampling frequencies and irregular sampled data would mean for the validity of the model. Another option could be to allow specifying the sampling frequency with an optional input argument (with a default of 1).
About testing: I really have no idea what would be the best way to do this and I agree that checking floats is flaky. pytest.approx might be a step in the right direction.

glemaitre · 2018-05-28T13:40:44Z

but Numpy and probably Scipy accept both lists/ndarrays in some cases

check_array will convert list/ndarray/dataframe/series into ndarray such that we can work directly with numpy and scipy.

Regarding the output type, I really think that it would a plus to return the same type as input since we just need to reuse the index. It is coming almost for free and can avoid user to do that outside.

I think that for now I would be happy with just accepting 1Hz sampled data because I'm not sure what other sampling frequencies and irregular sampled data would mean for the validity of the model.

I see. We could resample the data using pandas in case that we get a dataframe. In case of numpy we just assume that this is 1Hz. It should be explained in the docstring thought.

AartGoossens · 2018-05-28T14:02:29Z

sksports/model/heart_rate.py

+    return np.sum(_model_residual(params, heart_rate, power) ** 2)
+
+
+class HeartRateRegressor(BaseEstimator, RegressorMixin):


Can you explain the benefit of BaseEstimator and RegressorMixin? I don't see it used anywhere in the fit and predict methods. Is there other functionality of these classes that can be useful?
Either way, I like implementing this model like a regressor class, it makes the api much more clear.

basically, it allows to use the tool as pipeline, grid-search, and cross-validation from scikit-learn.
The mixin just define a score function which is the R^2 score while the BaseEstimator provides get_params and set_params to deal with the parameters.

Either way, I like implementing this model like a regressor class, it makes the api much more clear.

We can also the model public to the user. We just need a good name.

AartGoossens · 2018-05-28T14:09:45Z

sksports/model/heart_rate.py

+        params_init = [self.hr_rest, self.hr_max, self.hr_slope, self.hr_drift,
+                       self.rate_growth, self.rate_decay]
+
+        if self.solver == 'leastsq':


Can you explain this if-else? To me it seems the flow for self.solver == 'leastsq' is redundant. I.e. why can minimize be used for least-squares?

'leastqs is actually using Levenverg-Marquardt algorithm. I was just playing with the different way of minimizing the problem. I am not really sure what is the best solution there. We can of course fix the solver as well.

NB: Levenvenberg-Maquardt will use the gradient will an algorithm as the simplex (Nelder-Mead) will not use it. So depending of the optimization problem and the convexity it would be more appropriate to use one or another.

AartGoossens · 2018-05-28T14:15:08Z

sksports/model/heart_rate.py

+        self.rate_growth = rate_growth
+        self.rate_decay = rate_decay
+
+    def fit(self, X, y):


Side note: It still itches when I see Python variables starting with a capital... 😅 (but I recognize that it's accepted for these cases)

Yep, this is kinda of imposed by scikit-learn estimators. power and heart_rate would be much better.

AartGoossens · 2018-05-28T14:18:18Z

sksports/model/heart_rate.py

+    def __init__(self, solver='Nelder-Mead', hr_rest=75, hr_max=200,
+                 hr_slope=0.30, hr_drift=3e-5, rate_growth=24, rate_decay=30):
+        self.solver = solver
+        self.hr_rest = hr_rest


Minor: I'd like to propose renaming hr_rest to hr_start or hr_0 because this model parameter is not related to the actual resting heart rate of the athlete and therefore might cause confusion.

good point, I was confused :) hr_start seems a good idea,

glemaitre · 2018-05-28T22:20:39Z

On a simple example (which will be in the tests) it seems that Nelder-Mead is the only solver leading to the right parameter. I think that we could simplify and keep the solver that you propose at the beginning.

AartGoossens · 2018-05-29T07:45:36Z

sksports/model/heart_rate.py

+
+        """
+        if y is None:
+            is_df = True if hasattr(X, 'loc') else False


Although this is valid in most cases, why not just go for isinstance(X, pd.DataFrame)?

It was to ducktype. It usually allows to not import the library (which we do anyway) and accept data structure which are inheriting from it.

AartGoossens · 2018-05-29T07:49:25Z

sksports/model/heart_rate.py

+        """
+        if y is None:
+            is_df = True if hasattr(X, 'loc') else False
+            X_ = X.resample('1s').mean() if is_df else X


I think this raises a TypeError for a df with a non datetime-like index. Should we catch that here or add a check for a datetime-like index?

Yes, dataframe is not enough, we should check the index.

AartGoossens · 2018-05-29T07:52:37Z

sksports/model/heart_rate.py

+            is_df = True if hasattr(X, 'loc') else False
+            X_ = X.resample('1s').mean() if is_df else X
+            X_ = check_array(X_)
+            return X_.ravel(), is_df


You still need to return y here as the second return argument right? Else line 193 will raise a ValueError (unpack error).

Since it is confusing, I should always return X_.ravel(), y, is_df and ignore y when it is None.
Right now, I am returning different thing depending if it is fit or predict.

Ah now I get it. That could work indeed but I think it's less confusing if a method always returns the same number of arguments.

AartGoossens · 2018-05-29T07:53:15Z

sksports/model/heart_rate.py

+        """
+        check_is_fitted(self, ['hr_start_', 'hr_max_', 'hr_slope_',
+                               'hr_drift_', 'rate_growth_', 'rate_decay_'])
+        power, is_df = self._check_inputs(X)


I think self._check_inputs returns 3 items

only 2 at predict

AartGoossens · 2018-05-29T07:55:29Z

sksports/model/tests/test_heart_rate.py

+from sksports.model import exp_heart_rate_model
+from sksports.model import HeartRateRegressor
+
+POWER = np.array([100] * 10 + [240] * 100 + [160] * 100)


You can consider moving these constants to a pytest fixture. It's a matter of preference but I think these fixtures are really nice to work with and also well reusable in other tests. But I'm fine with doing improvements like this later.

Good point. I was playing with the tests for the moment :)

AartGoossens · 2018-05-29T08:00:39Z

sksports/model/tests/test_heart_rate.py

+    reg = HeartRateRegressor()
+    reg.fit(power, heart_rate)
+
+    assert reg.hr_start_ == pytest.approx(100, 1.0)


More of a question than a comment: what's best practice for models like this for deciding how and which model parameters to test? Do you have any examples from scikit-learn?

Usually we run a classifier or a regressor on a known dataset and check the score. It is non-trivial to actually test parameters. Here I think that we can make it on those 2 parameters with a large enough approximation to be sure that the optimization (or any change in the future) does not lead to a complete non-sense.

Basically, I think that we could make something like:

assert reg.score(X) > 0.9

to be sure that we are fitting the data properly.

AartGoossens · 2018-05-29T08:06:36Z

On a simple example (which will be in the tests) it seems that Nelder-Mead is the only solver leading to the right parameter. I think that we could simplify and keep the solver that you propose at the beginning.

This is really interesting. What kind of results did you get? I expected other solvers to work at least reasonably well. I'm already looking forward to using OpenData to play around with different solvers... (it's time for me to get some more work into that the OpenData tooling)

glemaitre · 2018-05-29T08:56:32Z

What kind of results did you get?

AartGoossens · 2018-06-01T09:22:27Z

sksports/model/tests/test_heart_rate.py

+    return power
+
+
+@pytest.fixture(params=[


This looks interesting! If I'm correct you're not parametrizing the test but the fixture?...

Yes.I would think that fixture are good for data (parametrizing allows to build different supported type. Then parametrizing the test is useful for passing some parameters to the function or something specific to the test. This is just my thumb rule and not sure this is the best ;)

pep8speaks · 2018-06-01T22:30:26Z

Hello @glemaitre! Thanks for updating the PR.

In the file examples/model/plot_heart_rate.py, following are the PEP8 issues :

Line 19:1: E402 module level import not at top of file
Line 26:1: E402 module level import not at top of file
Line 27:1: E402 module level import not at top of file
Line 38:1: E402 module level import not at top of file
Line 53:1: E402 module level import not at top of file

Comment last updated on June 03, 2018 at 09:43 Hours UTC

glemaitre · 2018-06-02T22:10:46Z

Links to the documentation:

glemaitre · 2018-06-02T22:11:14Z

examples/model/plot_heart_rate.py

+                                  15, 40)
+data['heart-rate'] = heart_rate
+data.plot(legend=True)
+plt.title('Simulated power of a cyclist and inferred heart-rate.')


remove final dot

glemaitre · 2018-06-02T22:11:20Z

examples/model/plot_heart_rate.py

+power.plot(legend=True)
+heart_rate.plot(legend=True)
+heart_rate_initial.plot(legend=True)
+heart_rate_pred.plot(legend=True)


glemaitre · 2018-06-03T09:49:55Z

@AartGoossens I was checking the results of the optimization using Nelder-Mead and it can lead to fitted parameters without any mining even if the fitting is good (e.g. hr_max could be 300). Somehow, I was thinking to make a minimization with bounds but L-BFGS is really not stable.

Somehow I found that the best for the moment is to use Nelder-Mead at first and then refine with L-BFGS using bounds.

One of the drawback is that we make the fitting slower (i.e. we are minimizing twice). I also not sure 100% if combining both methods will be always stable or if L-BFGS can give erroneous results as when we use it alone.

glemaitre · 2018-06-03T09:56:45Z

Also HeartRateRegressor is maybe not the best name. If we go some other methods to do the same job then we will have some issue with naming.

AartGoossens · 2018-06-04T12:08:55Z

That's very interesting indeed. Is combining optimizations common practice? I haven't heard of it before.

If this is only about a better fit for the hr_max I want to mention this: I do not expect that the hr_max model parameter is really interesting for 2 reasons:

I expect it to properly minimize only for activities where there's enough high power data (i.e. where hr_lin > hr_max)
Heart rate and power are only approximately linear correlated for intensities where the almost all work is done aerobically (I think this is also mentioned in the article of De Smet). As a result the current model will overestimate heart rate for higher intensities and of course this will result in a worse model fit.

In conclusion: in order to properly fit/minimize the hr_max we need activities with high power (1) but we also know that for these activities the model fit is not that good (2)...

Either way, if the fit is improved by combining these optimizations it's worth it of course.
By the way, in my opinion the performance hit is not an issue.

EHN add a model to fit and predict heart-rate from power

5f4f2d4

glemaitre mentioned this pull request May 27, 2018

Integration of sweatpy #5

Open

5 tasks

AartGoossens reviewed May 28, 2018

View reviewed changes

address Aart comments and couple of enhancement

9d5708f

TST add basic tests

a60fc36

AartGoossens reviewed May 29, 2018

View reviewed changes

TST add fixture

87eee9c

AartGoossens reviewed Jun 1, 2018

View reviewed changes

glemaitre added 2 commits June 2, 2018 00:29

DOC add an example

aa6cf04

FIX remove useless commentss

7a90197

glemaitre added 3 commits June 2, 2018 10:23

DOC update the user guide

bad31be

DOC add an entry in whats new

7856abe

DOC add mathematical formulation

f99d2ea

glemaitre changed the title ~~[WIP] EHN add a model to fit and predict heart-rate from power~~ [MRG] EHN add a model to fit and predict heart-rate from power Jun 2, 2018

DOC fix couple of issue

4ae2268

glemaitre force-pushed the master branch 6 times, most recently from ab4c1e9 to ab457a0 Compare June 2, 2018 20:31

glemaitre force-pushed the master branch from ab457a0 to a80a679 Compare June 2, 2018 20:58

Merge remote-tracking branch 'origin/master' into heart_rate_model

0a3a7fa

glemaitre force-pushed the heart_rate_model branch from 63913f9 to 0a3a7fa Compare June 2, 2018 21:12

MAINT: interpret latex formula

446707a

glemaitre force-pushed the heart_rate_model branch from d2ef5e1 to 446707a Compare June 2, 2018 21:54

glemaitre commented Jun 2, 2018

View reviewed changes

glemaitre added 2 commits June 3, 2018 11:25

DOC nitpicks

96c4e07

FIX bounds parameters

af0c989

		return np.sum(_model_residual(params, heart_rate, power) ** 2)


		class HeartRateRegressor(BaseEstimator, RegressorMixin):

[MRG] EHN add a model to fit and predict heart-rate from power #6

Are you sure you want to change the base?

[MRG] EHN add a model to fit and predict heart-rate from power #6

Conversation

glemaitre commented May 27, 2018 • edited Loading

codecov bot commented May 27, 2018 • edited Loading

Codecov Report

glemaitre commented May 27, 2018

glemaitre commented May 27, 2018

AartGoossens commented May 28, 2018

glemaitre commented May 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AartGoossens May 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre commented May 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AartGoossens commented May 29, 2018

glemaitre commented May 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Jun 1, 2018 • edited Loading

Comment last updated on June 03, 2018 at 09:43 Hours UTC

glemaitre commented Jun 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre commented Jun 3, 2018

glemaitre commented Jun 3, 2018

AartGoossens commented Jun 4, 2018

glemaitre commented May 27, 2018 •

edited

Loading

codecov bot commented May 27, 2018 •

edited

Loading

AartGoossens May 28, 2018 •

edited

Loading

pep8speaks commented Jun 1, 2018 •

edited

Loading