Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion]Is there any strategy to sample the github actors for Github bot detection? #35

Open
bifenglin opened this issue Nov 8, 2022 · 1 comment

Comments

@bifenglin
Copy link
Collaborator

I have loaded the top 500, 000 active repos and checked out the participating actors. But there is still lots of data, so we have to choose a strategy to sample the actors which could be a gound truth data for Github bot detection.

type count
Bot 1,112
Organization 250,183
User 9509701
@bifenglin bifenglin changed the title [discussion]Is there any strategy to sample the github actors? [discussion]Is there any strategy to sample the github actors for Github bot detection? Nov 8, 2022
@bifenglin
Copy link
Collaborator Author

We take the strategy that order the count of event and choose the top 10000 actors. Since it has enough event data to figure out whether bot or user.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant