-
-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ranking System v1 #960
Ranking System v1 #960
Conversation
@CrafterKolyan is attempting to deploy a commit to the github readme stats Team on Vercel. A member of the Team first needs to authorize it. |
Codecov Report
@@ Coverage Diff @@
## master #960 +/- ##
==========================================
+ Coverage 93.98% 94.26% +0.28%
==========================================
Files 22 22
Lines 682 663 -19
Branches 191 185 -6
==========================================
- Hits 641 625 -16
+ Misses 37 34 -3
Partials 4 4
Continue to review full report at Codecov.
|
I hate bots that close issues and now they also close PRs 😲 |
This seems much better than the current ranking system, why this hasn't been merged yet? |
I think the main problem is that @anuraghazra needs a lot of time to fully understand the solution and also he may want to do some extra testing on his side rather than rely on my research. (But maybe he simply missed this PR) |
Oh hi! So just looked at it actually I'm very cautious when it comes to changing these stats calculations because people will go mad if they see their ranks are not the same and a breaking change happened. But this PR and your description looks very promising. |
Hi @anuraghazra, The principle is very similar in this PR and #1186. Each metric (repos, commits, stars, ...) is associated to its own rank. For instance, the "stars" rank is computed as stars_rank = exp(-stars / STARS_MEAN) which ranges from 0 (no one is better) to 1 (every one is better). The difference lies in how we aggregate the individual ranks. In this PR, the author consider that if you are extremely good in one metric, your overall rank should be as well. This is done as rank = 7 / ( 1 / stars_rank + 1 / commits_rank + 1 / followers_rank + ...) so if For instance this user (esin) has 2.5k followers. In this PR, he would get S+. In mine he is a A (almost A+). In #1186, the overall rank is a weighted average of the individual ranks. rank = (1 * stars_rank + 0.25 * commits_rank + 0.5 * followers_rank + ...) / (1 + 0.25 + 0.5 + ...) This prevents the problem mentioned above, but it also means that, unless you are perfect (ranks = 0.) everywhere your overall rank will not be perfect. The weights are here to mitigate by reducing the impact of The reason why Linus Torvalds is not S+ is because he doesn't have a lot of repos compared to the average user (only 4 instead of 10) and not a lot of PRs/issues. However, it is very easily modified: You can either reduce the "weight" of
|
Hi @anuraghazra. I understand your fears about algorithm changing. Of course, almost nobody would like to understand he is not that good compared to others and of course almost nobody would share on their profile such grade of work they've done. To be honest I'm not sure if having a problem in ranking algorithm is good or bad. It ranks people higher and gives them extra motivation and self-confidence, even though algorithm may lie to them. From my point of view it seems that as your application became quite popular then people don't care much about the exact grading algorithm, they want to feel their significance to the society which is given to them here. I feel that your "encouraging" algorithm can make more to the open-source community than my "strict mathematical" approach. It is not about math and programming but about psychology. Anyway I will remain my pull request open in case you'll want to change the calculation algorithm for something better and also as a reference for those who is curious how can you approach to such kind of ranking task. |
Just wanted to applaud @francois-rozet and @CrafterKolyan for their great take on this. 👏👏👏👏👏 I personally like the idea of starting from 0 and getting to the Moon, it gives me a much greater sense of accomplishment. 🚀 🌔 But I don't judge those who get a boost in confidence by starting with half a circle and an A+. Some days I feel like I need these... Really happy with it as it is @anuraghazra, you've done an amazing work! |
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/github-readme-stats-team/github-readme-stats/92MYUQNT1iBguNAJw3JbXddgb4kz |
Okay I was just testing this out, planning to sort this ranking thing this week. Will consider both of the PRs, and release it under experimental flags @CrafterKolyan but I found this, how is this username getting S rank? (username=aju100) While using @francois-rozet's PR #1186 It is rank "A" which seems more correct. |
@anuraghazra It's because the user (aju100) has an outstanding number of repositories and, as mentioned in #960 (comment) a single good rank among the metrics leads to a good overall rank in #960, but not in #1186. Thank you for taking the time to sort this out! |
Ahh i see. aju100 has only 100 repos maybe not that much but anyways it should not be S rank. |
It is not that much, but much more than the average user. I should mention than the number of repos is not taken into account in #1186 (otherwise Linus Torvalds would not be S+). |
First of all, @francois-rozet and @CrafterKolyan, thanks a lot for addressing this topic. Here are my two cents. Overall, I think @francois-rozet algorithm is better balanced. I agree with @francois-rozet that the @CrafterKolyan algorithm creates an incorrect score when somebody has a lot of followers but I, however, also see one shortcoming with the implementation of @francois-rozet. The current version does not take the number of contributions into account. I understand why |
Hello @rickstaa, the reason why I don't consider contributions is because they are redondant with PRs, issues and commits. Since I take the latter into account, I don't need the former. |
@francois-rozet Good point, you are right I overlooked that fact while quickly scanning your code to answer #1425. In that case, I think we should go with @francois-rozet algorithm. |
To be honest I don't see the problem with many followers and 0 stars/commits/etc. The followers count is the hardest statistic to manipulate with as Github have some system to prevent multiaccount. |
IMHO, the rank should measure your stats as a developer not as an "influencer". Having tons of followers does not make you a good developer, GitHub is not Twitter or Instagram... Also, the problem does not arise only with followers. Someone with a very large number of empty repos but nothing else still gets S+. Same for commits, issues, ... you get it. Anyway, I would rather have your version of the rank than the one currently implemented, but @anuraghazra seemingly abondoned the idea... |
👋 is this still coming @anuraghazra ? 🙂 |
Probably a lot of people stuck at A+ like me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, this solve of ranking system is good enough. @anuraghazra Please look here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, this solve of ranking system is good enough. @anuraghazra Please look here.
My preference is with #1186. |
86aafe8
to
8bc69e7
Compare
Hey @rickstaa @anuraghazra will this ranking system be adopted after all? |
I am in favour of merging #1186 since it is more balanced (see #1186 (comment)). I, however, would like to have @anuraghazra's opinion before making such a breaking change. |
Closing, in favour of #1186. |
Usage of normal distribution is not justified at all and as its' support is
(-inf; inf)
you get some problems (e.g. #883 #455). Exponential distribution's support is[0; inf)
which means if we will calculate survival function (same as1 - cdf
(https://en.wikipedia.org/wiki/Survival_function#Definition)) then for all zeros we will get a person with a score equal to a 100 which makes a lot of sense. Also in practice exponential distribution is quite accurate showing activity of a person. See example below:This is the real activity distribution histogram (taken from https://movespring.com/blog/how-to-set-a-goal-for-your-next-activity-or-step-challenge-5f74c65ac49982000764facf):
This is the exponential distribution histogram with different parameters:
Here is one more example with real distribution (as blue) and fitted exponential distribution (as red):
Next step is to restore parameters of exponential distribution for real distribution. In my opinion Method of Moments (https://en.wikipedia.org/wiki/Method_of_moments_(statistics)) is the easiest to understand and comes from a single property we would like to have: expectation over our parameterized distribution would be equal to expectation of the real distribution. (see
*_VALUE
variables in code). As expectation of exponential distribution with parameter\lambda
is equal to1 / \lambda
then if we have an expectation (which is equal to an average over users) of real distribution then the restored\lambda = 1 / expectation
.Now we have 7 distributions over different aspects of Github Profile: Commits, Contributions, Issues, PRs, Stars, Followers, Repositories. We can understand how "good" a Github Profile in each aspect by calculating survival function over each of these 7 distributions in the points corresponding to Github Profile stats (the lesser value the better). To get a single number from 7 numbers we can have for example an average of these numbers but that wouldn't be good as a person who is great in one aspect and bad in others (e.g. Linus Torvalds with only 2 repositories and low stats in PRs and Issues and a ton of stars and followers) will never get an S+ rank so we need to have an aggregate functions which would stimulate low values at least in one aspect. One of such functions is
min(...)
over 7 aspects but this doesn't encourage you to develop any aspects except the only one you are best in, so we will use harmonic average (https://en.wikipedia.org/wiki/Harmonic_mean) which fits our needs (as it almost equal tomin(...)
for values much less than 1 and also has a non-zero gradient over each variable).As you can see from tests:
IMPORTANT NOTICE
Current values are not set to be equal to
1 / <average stat over users>
as I couldn't find any official (and even unofficial) statistics referring to these. So they are just set to what I see as an average Github User.