duplicate-question

Machine Learning (10701) Project at Carnegie Mellon University IDENTIFYING DUPLICATE QUESTIONS

Questions Background and summary: This dataset was published by Quora for the purpose of solving the problem of identifying duplicate questions to simplify searching for answers to a question posed. As a simple example, the queries “What is the most populous state in the USA?” and “Which state in the United States has the most people?” should not exist separately on Quora because the intent behind both is identical. Having a canonical page for each logically distinct query makes knowledge-sharing more efficient, so that knowledge seekers can access all the answers to a question in a single location.

Goal: Given a sentence pair, identify if the sentences are semantically equivalent - that is, if the sentences are duplicates.

Input data: Over 400,00 lines of sentence pairs:

qid1, quid2: ID of question 1, 2
question1, question2: Text of each question
is_duplicate: Binary true/fase label indicating if the line is a duplicate pair

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
embedding_creator.py		embedding_creator.py
mycode.ipynb		mycode.ipynb
questions.csv		questions.csv
result_Adadelta.json		result_Adadelta.json
result_Adam.json		result_Adam.json
result_Cosine_Adadelta.json		result_Cosine_Adadelta.json
tanh_mycode.ipynb		tanh_mycode.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

duplicate-question

About

Releases

Packages

Languages

junwoony/duplicate-question

Folders and files

Latest commit

History

Repository files navigation

duplicate-question

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages