tsv-parser CSCI 572 Information Retrieval: extend Tika to convert tsv files into json files and remove near-duplicates