-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LinAlgError: Matrix is Singular #116
Comments
Yes. I think that So, try increasing bw_min. Does this not work by fitting to the full dataset of 2941 polygons and a larger bw_min? |
Thank you for your quick answer. I tried different combinations of bw_min and It was not possible to solve the problem. |
Then it may be related to your model specification. Is there any variable that is perfectly collinear?
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Santiago Cardona Urrea ***@***.***>
Sent: Thursday, May 19, 2022 5:21:48 PM
To: pysal/mgwr ***@***.***>
Cc: Levi John Wolf ***@***.***>; Comment ***@***.***>
Subject: Re: [pysal/mgwr] LinAlgError: Matrix is Singular (Issue #116)
This message could be from someone attempting to impersonate a member of UoB. Please do not share information with the sender without verifying their identity. If in doubt, please contact the IT Service Desk for advice. --
Thank you for your quick answer. I tried different combinations of bw_min and It was not possible to solve the problem.
—
Reply to this email directly, view it on GitHub<#116 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AARFR45TAPWCPTEMJKEIPWTVKZTBZANCNFSM5WMMW3PA>.
You are receiving this because you commented.Message ID: ***@***.***>
|
I have checked colinearity but I did not find variables perfectly colinear. |
You checked local collinearity? This can change at each different bandwidth
that is explored when you are using the bandwidth search procedure.
…On Thu, May 19, 2022 at 5:09 PM Santiago Cardona Urrea < ***@***.***> wrote:
I have checked colinearity but I did not find variables perfectly colinear.
—
Reply to this email directly, view it on GitHub
<#116 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB553TPTT76D3743ULVXX53VK2UZVANCNFSM5WMMW3PA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I have had this problem before and found a workaround, not sure if it's a "valid" approach or not: If the variable that is causing you trouble is a floating point value, you might be able to get away with adding a little bit of random "dust" to it. For instance, I had a particular variable that for 80% of my observations was in the range of 1,000-10,000. But for about 20% of the observations, this variable is flat zero. The flat zeros were causing the issue if they happened to be the only ones in a particular bandwidth range, or so I surmised. So my solution was to add "dust" to all the variable values. A random amount between 0.00-0.99. My final values will all be rounded to the nearest whole number anyways. Adding the "dust" makes it so that all the troublesome parcels with the zero value are now technically different from one another. And hopefully the amount on them is so small that it won't meaningfully affect the predictions. |
Hi @larsiusprime! that's a reasonable way to avoid the singularity issue if you can afford that small bit of random noise in your analysis budget. For most, adding a random value somewhere between [0,1e-4] is probably sufficient. For any potential developer interested in solving this in our code, the solution would be to swap our current >>> import statsmodels
>>> import numpy
>>> from statsmodels import api as sm
>>> x = numpy.random.random(size=100)
>>> X = numpy.column_stack((numpy.ones_like(x), x, x,)) # perfectly collinear columns 2 & 3
>>> y = X @ numpy.array([[3, -2, 4])).T
>>> sm.OLS(endog=y, exog=X, hasconst=True).fit().summary()
"""
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 1.000
Model: OLS Adj. R-squared: 1.000
Method: Least Squares F-statistic: 1.130e+31
Date: Fri, 28 Jul 2023 Prob (F-statistic): 0.00
Time: 09:51:05 Log-Likelihood: 3265.6
No. Observations: 100 AIC: -6527.
Df Residuals: 98 BIC: -6522.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 3.0000 3.66e-16 8.21e+15 0.000 3.000 3.000
x1 1.0000 2.97e-16 3.36e+15 0.000 1.000 1.000
x2 1.0000 2.97e-16 3.36e+15 0.000 1.000 1.000
==============================================================================
Omnibus: 6.528 Durbin-Watson: 0.115
Prob(Omnibus): 0.038 Jarque-Bera (JB): 8.807
Skew: 0.267 Prob(JB): 0.0122
Kurtosis: 4.352 Cond. No. 1.36e+17
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 9.07e-33. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
""" |
Any idea if there will be a fix to this? The program essentially doesn't work and always results in this error. I had this error 3 years ago, came back to the same project and the error is still present. |
Good to hear. I have read through that thread, but don't have sufficient expertise myself to contribute. I'm essentially lost as how to progress with a project unless this can be fixed. Cheers. |
@jagreen1 Can you fit a regular OLS on your data? from spreg import OLS
OLS(y, x) If you cannot fit an OLS, then the problem is not with MGWR (#132). If you can, one way that frequently works is to increase the minimum bandwidth. This You can see that the original poster of this issue is setting |
Yes, there are no problems with the OLS. I'm using a logistic/binomial (0 or 1) dataset of 100k points, and a further subset of just 20k points. It works with some bandwidths and indiscriminately not with others. |
Interesting, OK. And, to confirm, the issue arises in Do you have any categorical/one-hot features, or are they all continuous? This is something I've long been interested in conceptually... I hope to have the proof of concept linked above completed by early Jan. |
@ljwolf Yes, I can confirm that this issue occurs during The independent variables are continuous (not categorical), however where data wasn't available I had to assign values of zero. Not sure if that causes an issue. I have primarily been using the MGWR GUI application, which often has the error I decided to try analyzing the data purely in python (not using the GUI), and I now receive a slightly different error when calling |
Yes, it would. See @larsiusprime's comment. It's a perfectly useful fix here. What "Matrix is Singular" means is that the weighted least squares matrix (Xt W X) is not invertible. This is often because some variable in X is perfectly collinear with another variable. If you fill all your missing data with zeros and this missing data occurs more commonly in some localities, then it's entirely possible that you're getting all zeros in some local model for some covariate... like, all x values for sites within the bandwidth are zero. When this happens at one site, that x becomes perfectly collinear with the intercept, that local model becomes degenerate, and the error is thrown. |
Hello everyone
I am trying to fit a GWR model. I am following examples codes and each has the same pipeline. When I measure the "gwr_selector" an issue related to LingAlgError: Matriz is Singular appears. I have 2941 polygons and 20 variables to fit the model. The unique way codes work is to fit them with 150 polygons and 5 variables. Do you know what kind of mistake I am making?
Bests
The text was updated successfully, but these errors were encountered: