Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate athlete Critical Power? #25

Open
ethanopp opened this issue Feb 17, 2020 · 22 comments
Open

Calculate athlete Critical Power? #25

ethanopp opened this issue Feb 17, 2020 · 22 comments

Comments

@ethanopp
Copy link

I've been playing with this library for a little bit now and it seems like the majority of models I am trying to run require an athlete to be defined for the df.

I've been able to do this, but can't figure out how to calculate the CP for that athlete.

Is calculating the CP for an athlete based off previous workout data something this library supports, or is it required that you define the CP for the athlete when assigning to a df.

The code shows cp as an optional argument, but leaving it blank causes a runtime error:

session, engine = db_connect()
# May need to filter on max power grouped by interval
df = pd.read_sql(
    sql=session.query(stravaBestSamples.interval.label('time'),
                      stravaBestSamples.heartrate,
                      stravaBestSamples.cadence,
                      stravaBestSamples.velocity_smooth.label('speed'),
                      func.max(stravaBestSamples.best_power).label('power')).
        group_by(stravaBestSamples.interval).
        filter(or_(stravaBestSamples.interval == 180,
                   stravaBestSamples.interval == 420,
                   stravaBestSamples.interval == 720),
               stravaBestSamples.type == 'Ride',
               # stravaBestSamples.timestamp_local >= '2020-02-01'
               ).statement, con=engine)
engine.dispose()
session.close()

wdf = WorkoutDataFrame(df)
wdf.athlete = Athlete(name='Ethan')  # , cp=175, w_prime=20000)
wdf.compute_w_prime_balance()
print(wdf.athlete.cp)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/Ethan/PycharmProjects/fitness/index.py", line 254, in <module>
    test()
  File "C:\Users\Ethan\PycharmProjects\fitness\pages\pandasTesting.py", line 37, in test
    wdf.compute_w_prime_balance()
  File "C:\Users\Ethan\PycharmProjects\fitness\venv\lib\site-packages\sweat\io\models\dataframes.py", line 35, in compute_w_prime_balance
    self.athlete.w_prime, algorithm, *args, **kwargs)
  File "C:\Users\Ethan\PycharmProjects\fitness\venv\lib\site-packages\sweat\pdm\w_prime_balance.py", line 112, in w_prime_balance
    return method(power, cp, w_prime, *args, **kwargs)
  File "C:\Users\Ethan\PycharmProjects\fitness\venv\lib\site-packages\sweat\pdm\w_prime_balance.py", line 46, in w_prime_balance_waterworth
    tau = get_tau_method(power, cp, tau_dynamic, tau_value)
  File "C:\Users\Ethan\PycharmProjects\fitness\venv\lib\site-packages\sweat\pdm\w_prime_balance.py", line 25, in get_tau_method
    static_tau = tau_w_prime_balance(power, cp)
  File "C:\Users\Ethan\PycharmProjects\fitness\venv\lib\site-packages\sweat\pdm\w_prime_balance.py", line 14, in tau_w_prime_balance
    delta_cp = cp - avg_power_below_cp
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'
@ethanopp
Copy link
Author

ethanopp commented Feb 18, 2020

After playing with this a little more and looking at some of the test cases, I think I may have figured this out... However am still a little confused from the results

def test():
    session, engine = db_connect()
    df = pd.read_sql(
        sql=session.query(stravaSamples.time.label('time'),
                          stravaSamples.heartrate, stravaSamples.cadence,
                          stravaSamples.velocity_smooth.label('speed'),
                          stravaSamples.watts.label('power')).filter(
            stravaSamples.type == 'Ride',
            stravaSamples.activity_id == '3110471562',
        ).statement, con=engine)

    wdf = WorkoutDataFrame(df)
    mmp = pd.Series(wdf.compute_mean_max_power(), name='mmp')
    # Update time index so it starts at 1 second (instead of 0)
    mmp.index += 1

    # Run CP model
    model = '3_parameter_non_linear'
    time_axis = mmp.index
    max_efforts = mmp
    fitted_params = critical_power.model_fit(
        time_axis,
        max_efforts,
        model=model)

    print(fitted_params)
    print(mmp.max())
Backend TkAgg is interactive backend. Turning interactive mode on.
{'cp': 165.992767580866, 'w_prime': 39044.511162117335, 'p_max': 312.04239353700245}
342.0

Why is p_max (returned from the model 312) not the same as the actual power max from the df (342)?

@ethanopp
Copy link
Author

@AartGoossens hope you may be able to chime in on the above? I’m sure it’s something silly I’m doing... thanks!

@AartGoossens
Copy link
Contributor

Hi Ethan,
Great to hear that you're using this library. :)

Your first error (first message) is indeed caused by the missing CP for the athlete.
The CP (and W' for that matter) are generally computed from a small number of maximal efforts. These efforts are then fitted to a critical power model (e.g. the "3 parameter non linear model" you are using in your second message).

The reason that the p_max that the model is returning is different from the actual power max is because this is the fitted model parameter which represents the interception of the critical power model with the vertical axis (t=0). Here's an visual representation of both the 2-parameter and 3-parameter critical power models:
cp models
In an ideal world this p_max model parameter would indeed be identical to your maximal power but this greatly depends on the data you are feeding the model. In this case you seem to be feeding the model data from a single workout which will not yield the best results. You will get better results when you only feed it the data from some maximal efforts or your "mean max power" over a longer period of time.

Can you explain what you are trying to achieve?

PS: I have plans to extend and improve this library (I haven't worked on it for a while...) but am considering to remove to WorkoutDataFrame interface because I think it's confusing for some people and is an extra interface for me to maintain.
PS 2: If you have suggestions for this library please let me know.

@ethanopp
Copy link
Author

Thanks for the response @AartGoossens! Happy to help out with the project once you start it up again. I'm a little new to the statistics portions of all of this, but have enough python experience to code some stuff out once I understand the math...

So ultimately what I am trying to do is build a small personal app that can run the CP model every day on a rolling period (perhaps rolling 90 days) and show me what my CP is every day based on the best efforts from that passed rolling time period. I'm building out my own charts which pull in metrics from a lot of other data sources which golden cheetah does not leverage, however I still want to use the golden cheetah metrics, hence why I am trying to replicate them... Here's an example of a modified PMC which is also incorporating HRV data and readiness scores from my oura ring, as well as non ride/run workouts which leverage HRss or weightlifting "WSS" (which I am calculating on my own), which can all be toggled with the top 3 teal buttons to show activity type specific CTL (fitness), but leverage ATL (fatigue) across all workouts (credits to Elevate app for some of my design choices):
image
Of course a key function of creating this chart is calculating TSS per workout, which requires an FTP/CP score. Currently I've been doing FTP tests every 6 weeks, but I am trying to move towards a different model wherein I can just auto-calculate my CP for a rolling period (tbd) and not have the FTP Testing days screw up my workout plans...

Since my original post I have tried out a few other tests, with larger date frames, however am still not able to match the numbers to what is showing in Golden Cheetah...

The dataset being tested against is from 01/01/2020 onward, and should match to the model params shown in golden cheetah:
image
image

Test 1 (fail):
Query the raw activity stream from strava (which is saved to database) for all workouts starting 01/01/2020 onward, and run compute_mean_max_power() on the dataset, to then pass to the model:

def test_samples():
    session, engine = db_connect()
    df = pd.read_sql(
        sql=session.query(stravaSamples.time.label('time'),
                          stravaSamples.heartrate, stravaSamples.cadence,
                          stravaSamples.velocity_smooth.label('speed'),
                          stravaSamples.watts.label('power')).filter(
            stravaSamples.date >= '2020-01-01'
        ).statement, con=engine)

    wdf = WorkoutDataFrame(df)
    mmp = pd.Series(wdf.compute_mean_max_power(), name='mmp').to_frame()
    # Update time index so it starts at 1 second (instead of 0)
    mmp['time'] = [x for x in range(1, len(mmp) + 1)]
    mmp.set_index('time', inplace=True)

    # Run CP model
    model = '3_parameter_non_linear'
    time_axis = mmp.index
    max_efforts = mmp['mmp']
    fitted_params = critical_power.model_fit(
        time_axis,
        max_efforts,
        model=model)

    print(fitted_params)
{'cp': 183.22516941252542, 'w_prime': 138594.30901078475, 'p_max': 315.0913700735806}

Golden Cheetah CP (247) != Python CP (183)

Test 2 (fail):
Query all workouts from strava, run compute_mean_max_power() on each workout and store those mmp's into a table. Query this table grouped by each interval (seconds) for all workouts after 01/01/2020 and pass to model (only 1 datapoint - the best of the bests- per interval is passed to model here):

def test_best_samples():
    session, engine = db_connect()

    # select max effort group by interval from every workout already stored with compute_mean_max_power()
    mmp = pd.read_sql(
        sql=session.query(func.max(stravaBestSamples.mmp).label('mmp'))
            .group_by(stravaBestSamples.interval.label('time'))
            .filter(stravaBestSamples.date >= '2020-01-01')
            .statement, con=engine)

    # Update time index so it starts at 1 second (instead of 0)
    mmp['time'] = [x for x in range(1, len(mmp) + 1)]
    mmp.set_index('time', inplace=True)

    # Run CP model
    model = '3_parameter_non_linear'
    time_axis = mmp.index
    max_efforts = mmp['mmp']
    fitted_params = critical_power.model_fit(
        time_axis,
        max_efforts,
        model=model)

    print(fitted_params)
{'cp': 189.93118488057303, 'w_prime': 23849.153224166424, 'p_max': 401.0400687869544}

Golden Cheetah CP (247) != Python CP (189)

As you can see, the outputs from the models above are not match what is showing in Golden Cheetah (screenshots above). So I am a little stuck here now, as I'm not sure what data (and how) I should be passing to the models... It is also worth mentioning that I am not defining an Athlete for any of this analysis, as I don't think I need to give it an athletes' defined CP to then run a model to output CP...right? (the whole point of me running this model is to try and have it tell me what my CP is)... From what I can see in the repo tests, it looks like passing the athlete with their defined CP is for some of the other functions like w_bal, which isn't necessarily what I'm going for here.

As a side note, I do have several other modules where your code has helped me so far, specifically with a lot of the "TrainingPeaks" metrics:
image

So thanks for sharing all of this code in the first place!

@AartGoossens
Copy link
Contributor

Really cool screenshots of your app! It is very interesting how you are combining training data and data from Oura.

And the CP model fitting problem is interesting too. And I think the answer lies in which data GC exactly uses for the model fitting:

Firstly, I think GC only uses a subset of your activities when you select "Data to fit: Performance tests". I expect that the other option ("MMP bests") will use all your activities. But I cannot find in the source code (should be somewhere here) what GC uses to recognize/tag performance tests, so I am not 100% sure.

Secondly, the "search interval" settings seems to specify a range of data on which the model is fitted. But, again, I cannot find the exact mechanism in the source code.

Maybe @liversedge can way in on this? Can you give us some pointers to how/where these settings are applied to the model fitting?

Finally (and maybe you are already aware of this, but wanted to mention it just to be sure), GoldenCheetah does not use this library under the hood, so it might very well be possible that you will never get exactly the same results as in GC.

@liversedge
Copy link
Member

In general we do not fit CP models to points beyond 20 minutes as they tend to be submaximal.
In addition we avoid points less than 2 minutes for the CP2 model.
For the CP3 model we do use data less than 2 minutes.
For best results filter out submax (power index really helps here) and perform a least squares fit.

@liversedge
Copy link
Member

The envelope fit for the extended cp model is complicated, but basically it tries to fit pmax using short duration window and cp using longer duration window and iteratively / brute force fits.

@ethanopp
Copy link
Author

@AartGoossens thanks for making the connection here with Mark. 1 last question for you, is it accurate for me to be setting the the time index to start at 1, rather than 0? (where I am setting mmp['time'])

@liversedge, what exactly do you mean by filtering out submax? Per your comment about filtering out intervals greater than 20 mins for the CP3 model, I'm getting much closer...

With the following, I am getting a CP3 of 257W, where Golden Cheetah is showing a CP of 253W,

def test_best_samples():
    session, engine = db_connect()
    # select max effort group by interval from every workout already stored with compute_mean_max_power()
    mmp = pd.read_sql(
        sql=session.query(func.max(stravaBestSamples.mmp).label('mmp'))
            .group_by(stravaBestSamples.interval.label('time'))
            .filter(stravaBestSamples.interval <= 1200)
            .statement, con=engine)

    # Update time index so it starts at 1 second (instead of 0)
    mmp['time'] = [x for x in range(1, len(mmp) + 1)]
    mmp.set_index('time', inplace=True)

    # Run CP model
    model = '3_parameter_non_linear'
    time_axis = mmp.index
    max_efforts = mmp['mmp']
    fitted_params = critical_power.model_fit(
        time_axis,
        max_efforts,
        model=model)

    print(fitted_params)

Unfortunately when I run the CP2 though, there's a little more of a gap, with Golden Cheetah showing 261, but the model giving me 280

    mmp = pd.read_sql(
        sql=session.query(func.max(stravaBestSamples.mmp).label('mmp'))
            .group_by(stravaBestSamples.interval.label('time'))
            .filter(stravaBestSamples.interval <= 1200, stravaBestSamples.interval >= 120)
            .statement, con=engine)

@liversedge
Copy link
Member

In any MMP curve the vast majority of points will not be maximal - unless you go out and do maximal efforts for every duration from 1s - 20mins. You can eyeball this by looking at how bumpy the curve is and also the lack of a definable S shape might also suggest an absence of short maximal efforts.

If you fit to a curve dominated by submax efforts your estimates will bias low.

@liversedge
Copy link
Member

Filtering out >20mins will reduce low bias as almost noone does truly maximal efforts that long and even if they do they add little to the fit vs the 20 min efforts anyway since the curve is very flat from 20 mins out.

@ethanopp
Copy link
Author

Ah I see thanks for explaining this - right now it is indeed fitting to ALL data points under 20 mins.

My goal here is to monitor how critical power is changing over time by using a rolling 90 day window to capture all workouts in this period, MMP all of these workouts, and then take the best efforts of each interval in this dataset (under 20 mins right now for the CP3) to feed to the model. My theory was that using a window of this size would allow for a large enough dataset where taking the max of all of these workouts MMPs would inherently filter out submax, but is there a additional way I can identify these submax data points programmatically?

What’s the approach in golden cheetah if no workouts are marked as performance tests and the user is running the model on the “MMP” setting?

Thanks!

@liversedge
Copy link
Member

In the CP chart you can enable filtering in option, the PDModel.cpp has details on how the estimator works (e.g. truncting >1200s)

@AartGoossens
Copy link
Contributor

@AartGoossens thanks for making the connection here with Mark. 1 last question for you, is it accurate for me to be setting the the time index to start at 1, rather than 0? (where I am setting mmp['time'])

Yes that makes sense because your first mmp value is actually your 1s max power.

@ethanopp
Copy link
Author

Sorry Mark, I'm not sure I understand - which filters?

I found this:
https://github.com/GoldenCheetah/GoldenCheetah/blob/7b659e69d8df1a3a61fffc5e7ac5b76faf435b00/src/Metrics/PDModel.cpp#L112

...Which I believe corresponds to here:
image

But when I run the following the numbers still aren't matching up:

def test_best_samples():
    session, engine = db_connect()
    query = session.query(stravaBestSamples.interval, func.max(stravaBestSamples.mmp).label('mmp')).group_by(
        stravaBestSamples.interval.label('time')).filter(
        or_(
            and_(stravaBestSamples.interval >= 180, stravaBestSamples.interval <= 240),
            and_(stravaBestSamples.interval >= 600, stravaBestSamples.interval <= 1200))
    ).statement

    mmp = pd.read_sql(sql=query, con=engine)

    # Run CP model
    model = '3_parameter_non_linear'
    time_axis = mmp['interval']
    max_efforts = mmp['mmp']
    fitted_params = critical_power.model_fit(
        time_axis,
        max_efforts,
        model=model)

    print(fitted_params)

Golden Cheetah shows 253 but the model outputs the following:
{'cp': 261.01736771687024, 'w_prime': 8892.956538533888, 'p_max': -437457668.36119807}

@ethanopp
Copy link
Author

@liversedge
Copy link
Member

options

@ethanopp
Copy link
Author

ethanopp commented Feb 28, 2020

Still not following... When selecting 'Filter Unique Bests' its only changing my CP by 1 point...
image

image

And when I am running the CP2 it shows data less than 2 mins?
image

Ultimately I am just trying to figure out what filters need to be put on the data that is fed into the model...and for CP3 when removing intervals less than 1200 seconds (20 mins) it's not matching up, as its showing my critical power as 256 instead of the 252W or 253W above in screenshots

@liversedge
Copy link
Member

I see- you are trying to replicate the results that gc gives in your own tool.
My best advice is for you to read the source code- I'm not here to read it for you.

@ethanopp
Copy link
Author

Alright thanks for the help thus far... Really I'm just trying to replicate the results from GC with this sub-repo within GC.

Unfortunately I'm a little new to coding (self-taught and using this to enhance) so have gone through the GC code as best I can, but from what I can tell the query I'm currently running matches with the dataset GC sends to the 3CP model.

The only other difference I was able to find was the initial parameters the two libraries use mentioned above, so hopefully @AartGoossens can take a look once he gets a chance

@liversedge
Copy link
Member

No probs, I think you are on a bit of a journey- coding, modelling, critical power model and diagnosing modelling errors.

Would recommend familiarising yourself with basic modelling concepts and some of the literature on CP.

The Jones (2010) review of CP is the best paper to read, its excellent and a few others spring to mind. Drop me an email liversedge gmail com and I'll forward you some stuff.

@ethanopp
Copy link
Author

Thanks will do!

@ethanopp
Copy link
Author

ethanopp commented Mar 3, 2020

Opening up the code if you guys are interested in the charts:
https://github.com/ethanopp/fitly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants