-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Spearman p value #125
base: master
Are you sure you want to change the base?
Conversation
I changed this request a little -- it makes more sense to start from pearson, which has the same test, and go from there. |
@@ -26,15 +29,20 @@ import Statistics.Test.Internal (rankUnsorted) | |||
-- | Pearson correlation for sample of pairs. Exactly same as | |||
-- 'Statistics.Sample.correlation' | |||
pearson :: (G.Vector v (Double, Double), G.Vector v Double) | |||
=> v (Double, Double) -> Double | |||
pearson = correlation | |||
=> v (Double, Double) -> (Double, PValue Double) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should probably document the return value? that's not clear to an amateur like me at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's absolutely necessary to document meaning of p-value and what hypothesis is being tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most important thing is comments. For statistical applications it's crucial to describe what exactly function does
@@ -26,15 +29,20 @@ import Statistics.Test.Internal (rankUnsorted) | |||
-- | Pearson correlation for sample of pairs. Exactly same as | |||
-- 'Statistics.Sample.correlation' | |||
pearson :: (G.Vector v (Double, Double), G.Vector v Double) | |||
=> v (Double, Double) -> Double | |||
pearson = correlation | |||
=> v (Double, Double) -> (Double, PValue Double) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's absolutely necessary to document meaning of p-value and what hypothesis is being tested.
@@ -56,7 +65,7 @@ spearman :: ( Ord a | |||
, G.Vector v (Int, b) | |||
) | |||
=> v (a, b) | |||
-> Double | |||
-> (Double, PValue Double) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's especially important for Spearman correlation. What is meaning of p-value here? Is it described anywhere? I'm not sure that student's distribution will arise for ranks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wikipedia sources the following for this test:
Press; Vettering; Teukolsky; Flannery (1992). Numerical Recipes in C: The Art of Scientific Computing (2nd ed.). p. 640.
The equation is 14.6.2. Whether this approximation is optimal, I do not know, but I'm sure there are better methods out there for p-values for Spearman's correlation coefficient, but I used the Student's t distribution as a simple solution.
Unfortunately this changes the API, but I believe that a p-value should be reported with every correlation. This technique uses Student's t distribution, which is fine, but it would be neat to have exact p-values with small samples sizes as well (for the future).