Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Spearman p value #125

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

GregorySchwartz
Copy link

Unfortunately this changes the API, but I believe that a p-value should be reported with every correlation. This technique uses Student's t distribution, which is fine, but it would be neat to have exact p-values with small samples sizes as well (for the future).

@Shimuuar Shimuuar self-requested a review July 30, 2017 13:46
@GregorySchwartz
Copy link
Author

I changed this request a little -- it makes more sense to start from pearson, which has the same test, and go from there.

@@ -26,15 +29,20 @@ import Statistics.Test.Internal (rankUnsorted)
-- | Pearson correlation for sample of pairs. Exactly same as
-- 'Statistics.Sample.correlation'
pearson :: (G.Vector v (Double, Double), G.Vector v Double)
=> v (Double, Double) -> Double
pearson = correlation
=> v (Double, Double) -> (Double, PValue Double)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably document the return value? that's not clear to an amateur like me at least.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's absolutely necessary to document meaning of p-value and what hypothesis is being tested.

Copy link
Collaborator

@Shimuuar Shimuuar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most important thing is comments. For statistical applications it's crucial to describe what exactly function does

@@ -26,15 +29,20 @@ import Statistics.Test.Internal (rankUnsorted)
-- | Pearson correlation for sample of pairs. Exactly same as
-- 'Statistics.Sample.correlation'
pearson :: (G.Vector v (Double, Double), G.Vector v Double)
=> v (Double, Double) -> Double
pearson = correlation
=> v (Double, Double) -> (Double, PValue Double)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's absolutely necessary to document meaning of p-value and what hypothesis is being tested.

@@ -56,7 +65,7 @@ spearman :: ( Ord a
, G.Vector v (Int, b)
)
=> v (a, b)
-> Double
-> (Double, PValue Double)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's especially important for Spearman correlation. What is meaning of p-value here? Is it described anywhere? I'm not sure that student's distribution will arise for ranks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wikipedia sources the following for this test:

Press; Vettering; Teukolsky; Flannery (1992). Numerical Recipes in C: The Art of Scientific Computing (2nd ed.). p. 640.

The equation is 14.6.2. Whether this approximation is optimal, I do not know, but I'm sure there are better methods out there for p-values for Spearman's correlation coefficient, but I used the Student's t distribution as a simple solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants