-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pulls Tweets with same text and different id's #15
Comments
Can you post some examples? Also post the code that is used to generate the tweets. |
For example, this query: Query: SELECT t1.status, t2.status FROM tweets t1, tweets t2 WHERE t1.tweetid = 849076591657353217 AND t2.tweetid = 849076590088736770 The result: u'RT @Datosdeunamor: M\xc9DICO JAPON\xc9S REVELA COMO MATAR DE RA\xcdZ LA BACTERIA DE HELICOBACTER PYLORI QUE PROVOCA GASTRITIS, \xdaLCERAS Y M\xc1S!\u2026 ') In my test data set (364 tweets pulled from manchester) only 281 of them are distinct. |
Are these |
No sir they are not. |
I found that the issue in the labeler pulling the same tweet is that our database contains tweets with the same text but different id's, sometimes upwards of 20-30 times.
The text was updated successfully, but these errors were encountered: