Team: 'input.txt' (E-mail: input.txt.2020@gmail.com)
- Joochan Kim(Leader, Code, Idea)
- Dobreva Iva(Code)
- Yujin Kim(Documentation, presentation)
Idea originated from Prof. Jinyeong Yeo @ Convei lab in Yonsei Univ.
Assistance was given by Gayeon Lee @ Convei lab in Yonsei Univ.
Most of the NLP datasets are so big that lead developers spend a lot of time and costs to train model. To reduce this burden, We propose a new approach that reduces size of data to lessen time and costs needed and improves performances.
- CEDR: Contextualized Embeddings for Document Ranking
- TIM_PLUS: Two-phase Influence Maximization
- Robust04: TREC Robust document collection for Retrieval task
- Download Models and Dataset and unzip
- run
graph/graph-generator.py
to make graph (Change data location at line 105 to/filename.pkl
. Check README.md) - run TIM_PLUS using step 2's result (check README)
- Use step 3's result to make seed.txt (Just copy the result and write into it)
- run
/graph/create-set.py
(data, pkl and seed.txt needed, check README.md) - run
/Robust-Ranker-Master/main.py
(Check README) - Compare MAP! ^-^