quora-duplicate-questions
menshikh-iv
released this
14 Nov 05:08
·
16 commits
to master
since this release
Over 400,000 lines of potential question duplicate pairs. Each line contains IDs for each question in the pair, the full text for each question, and a binary value that indicates whether the line contains a duplicate pair or not.
attribute | value |
---|---|
File size | 21MB |
Number of pairs | 404290 |
License | probably https://www.quora.com/about/tos |
Read more:
Example
import gensim.downloader as api
import json
data = api.load("quora-duplicate-questions")
for question_pair in data:
print(json.dumps(question_pair, indent=4))
break
"""
Output:
{
"qid1": "1",
"question2": "What is the step by step guide to invest in share market?",
"qid2": "2",
"is_duplicate": "0",
"question1": "What is the step by step guide to invest in share market in india?",
"id": "0"
}
"""