Skip to content

quora-duplicate-questions

Compare
Choose a tag to compare
@menshikh-iv menshikh-iv released this 14 Nov 05:08
· 16 commits to master since this release
19b8560

Over 400,000 lines of potential question duplicate pairs. Each line contains IDs for each question in the pair, the full text for each question, and a binary value that indicates whether the line contains a duplicate pair or not.

attribute value
File size 21MB
Number of pairs 404290
License probably https://www.quora.com/about/tos

Read more:

Example

import gensim.downloader as api
import json

data = api.load("quora-duplicate-questions")

for question_pair in data:
    print(json.dumps(question_pair, indent=4))
    break

"""
Output:

{
    "qid1": "1",
    "question2": "What is the step by step guide to invest in share market?",
    "qid2": "2",
    "is_duplicate": "0",
    "question1": "What is the step by step guide to invest in share market in india?",
    "id": "0"
}
"""