-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
erroneous results when using categorical variables with linear regression algorithm #12
Comments
correction: 3 independent variables and 1 dependent variable, not 4 independent variables. |
Hey @mb52089, thanks for the report. Can you give more details about what you mean by "incorrect" prediction? It's different than linear regression in another language, or the error is high? |
Thanks Andrew. We're predicting the % utilization of a resource on the day of service based on the % utilization x days in advance, the duration of the resource in minutes and the day of the week. The predicted value should be between 0 and 1. In the particular test example we're using, the predicted value should be around 66%. We get that value when we use the lightgbm algorithm, but when we use the linear regression we get -1.4 which is a value that doesn't make sense giving the context and the training data. However, if I remove the "day of week" categorical variable and re-run the prediction using the linear regression algorithm, I get a prediction in range. I wasn't sure if the gem deals with categorical variables differently in the linear regression than in the lightGBM algorithm. The data set has around 150 rows of independent variables. |
and this is all done in ruby/rails. |
If it's not too sensitive, paste the model summary and PMML here or send it to me over email (on my GitHub profile)? puts model.summary
puts model.to_pmml |
I just ran the model summary for the error condition: Math::DomainError: Numerical argument is out of domain - "sqrt" |
The model summary after I remove the categorical variable week_day: |
just sent to your chartkick email. I didn't know you were the author of chartkick. It's great too! |
To close the loop: the issue was likely related to multicollinearity, which can produce an unstable solution (the link provides a good explanation). One way to counteract this is to use GSL, which uses a different algorithm to produce a more stable solution. |
Going to reopen this until the model.send(:diagonal) for a model where you're seeing |
when I try to run model.send(:diagonal) I get the following error:
NoMethodError: undefined method `diagonal' for
#<Eps::Model:0x00007fe88850ca58>
from /Users/michaelburke/.rvm/gems/ruby-2.6.5@copient_health_rails6/bundler/gems/eps-509da754d6e9/lib/eps/model.rb:62:in
`method_missing'
…On Wed, Dec 4, 2019 at 10:59 PM Andrew Kane ***@***.***> wrote:
Going to reopen this until the model.summary error is fixed. @mb52089
<https://github.com/mb52089>, can you paste the output of:
model.send(:diagonal)
for a model where you're seeing Math::DomainError: Numerical argument is
out of domain - "sqrt"?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12?email_source=notifications&email_token=AANV5YRRW6QLC7NLSHU6LYLQXB4DBA5CNFSM4JU2BCJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF7MX4I#issuecomment-561957873>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANV5YUGBUE7WHVOFWQWUSDQXB4DBANCNFSM4JU2BCJA>
.
--
Michael Burke
404.271.8652
LinkedIn <https://www.linkedin.com/in/michael-burke-6418681/>
|
My bad, it should be: model.instance_variable_get("@estimator").send(:diagonal) |
Here you go:
=> [*0.0005296860721842933*, *0.0066308112665816495*,
*1.3595352803866229e-09*, *0.0012121905646438054*, *0.0312576935042156*,
*0.014730636756303176*]
…On Thu, Dec 5, 2019 at 6:47 AM Andrew Kane ***@***.***> wrote:
My bad, it should be:
***@***.***").send(:diagonal)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12?email_source=notifications&email_token=AANV5YSRW6MRRKAQSJKBW63QXDS4JA5CNFSM4JU2BCJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGAOMEI#issuecomment-562095633>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANV5YVDGL3TKXKADTJHBITQXDS4JANCNFSM4JU2BCJA>
.
--
Michael Burke
404.271.8652
LinkedIn <https://www.linkedin.com/in/michael-burke-6418681/>
|
Thanks. This is from the model that errors on the summary? I'm unable to reproduce with those numbers. |
Now that I have installed GSL, I can't seem to reproduce the error when I do the linear regression. Do you want me to uninstall GSL and see if I can reproduce? |
Yeah, GSL changes the code path, so you'll want to recreate the initial conditions. |
Here you go. After removing the gsl gem and re-bundling, I ran @model.instance_variable_get("@estimator").send(:diagonal) from a model that generated the following error when running @model.summary: Math::DomainError: Numerical argument is out of domain - "sqrt". Here's the output: [-666372359695044.8, 1.0761875986711336, -3777621086.706599, 0.19673979554666882, -339985897803588.5, 2.390797741078714] |
Thanks @mb52089, fixed the error message for unstable solutions. Pushing out a new release in a few with all the fixes we discussed. Thanks for the help! |
No problem at all. Thanks for all the great gems! |
We have a categorical variable for day_of_week as one of 4 independent variables in our model. The LightGBM algorithm works correctly but when I force the model to use the linear regression algorithm, the resultant prediction is incorrect. If I subsequently remove the categorical variable, the linear regression algorithm gives an accurate prediction. Here's an example of what our data set looks like:
{:day_of_service_util=>0.80952380952381, :day_in_advance_util=>0.714285714285714, :block_minutes=>420.0, :week_day=>"Fri"},
{:day_of_service_util=>0.69047619047619, :day_in_advance_util=>0.214285714285714, :block_minutes=>420.0, :week_day=>"Mon"},
{:day_of_service_util=>0.80952380952381, :day_in_advance_util=>0.238095238095238, :block_minutes=>420.0, :week_day=>"Mon"},
{:day_of_service_util=>0.80952380952381, :day_in_advance_util=>0.238095238095238, :block_minutes=>420.0, :week_day=>"Mon"}
day_of_service_util is the Target dependent variable.
Thanks for this great gem!
The text was updated successfully, but these errors were encountered: