Update to v0.2: a more flexible framework for compressing index of any dense retrieval models #6

jingtaozhan · 2022-06-02T18:56:24Z

This is a major code update. The previous code is deprecated. Here are several features:

Flexible code framework. The previous code has a preprocessing process, which is abandoned now. Tokenization is done during training and inference. JPQ and RepCONC no longer depend on a certain dense retrieval model structure. They are two training instances that accept a dense retrieval model as input. So dense models of different architectures can all be the input of JPQ and RepCONC.
Support distributed training for RepCONC.
Support large batch sizes for RepCONC with GradCache
Already added and will add more examples about transferring dense retrieval models into memory-efficient ones.

jingtaozhan added 3 commits June 3, 2022 02:07

init

d114ed5

fix a typo

b3f2150

update example instructions

a8fde58

jingtaozhan merged commit 582d155 into main Jun 2, 2022

jingtaozhan mentioned this pull request Jun 2, 2022

Evaluating RepCONC on different datasets in a zero-shot fashion #1

Open

jingtaozhan deleted the update branch June 4, 2022 01:14

Provide feedback