Skip to content

Releases: AnswerDotAI/RAGatouille

0.0.9

11 Feb 04:31
Compare
Choose a tag to compare

What's Changed

  • fix: fix inversion method & pytorch Kmeans OOM by @bclavie in #179
  • performance: Optimize ColBERT index free search with torch.topk by @Diegi97 in #219
  • Calculate pid_docid_map.values() only once in add_to_index by @vishalbakshi in #267
  • Fix/185 return trainer best checkpoint path by @GeraudBourdin in #265
  • Finally remove the dependency hell by fully getting rid of poetry. Stay tuned for more updates!

New Contributors

Full Changelog: 0.0.8...0.0.9

0.0.8post1

19 Mar 13:45
Compare
Choose a tag to compare

Minor fix: Corrects from time import time import introduced in indexing overhaul and causing crashing issues as time was then used improperly.

0.0.8

18 Mar 19:49
d27b693
Compare
Choose a tag to compare

0.0.8 is finally here!

Major changes:

  • Indexing overhaul contributed by @jlscheerer #158
  • Relaxed dependencies to ensure lower install load #173
  • Indexing for under 100k documents will by default no longer use Faiss, performing K-Means in pure PyTorch instead. This is a bit of an experimental change, but benchmark results are encouraging and result in greatly increased compatibility. #173
  • CRUD improvements by @anirudhdharmarajan. Feature is still experimental/not fully supported, but rapidly improving!

Fixes:

  • Many small bug fixes, mainly around typing
  • Training triplets improvement (already present in 0.0.7 post versions) by @JoshuaPurtell

0.0.7post3

16 Feb 19:30
Compare
Choose a tag to compare
  • Improvements for data preprocessing issues and fixes for broken training example by @jonppe (#138) 🙏

0.0.7post2

13 Feb 21:45
b7ae28a
Compare
Choose a tag to compare

Fixes & tweaks to the previous release:

  • Automatically adjust batch size on longer contexts (32 for 512 tokens, 16 for 1024, 8 for 2048, decreasing like this until a minimum of 1)
  • Apply dynamic max context length to reranking

0.0.7post1

13 Feb 20:55
Compare
Choose a tag to compare

Release focusing on length adjustments. Much more dynamism and on-the-fly adaptation, both for query length and maximum document length!

  • Remove hardcoded maximum length: it is now inferred from your base model's maximum position encodings. This enables support for longer-context ColBERT, such as Jina ColBERT
  • Upstream changes to colbert-ai to allow any base model to be used, rather than pre-defined ones.
  • Query length now adjusts dynamically, from 32 (hardcoded minimum) to your model's maximum context window for longer queries.

0.0.6c2

11 Feb 21:05
5409914
Compare
Choose a tag to compare

(notes encompassing changes in the last few PyPi releases that were undocumented until now)

Changes:

  • Query only a subset documents based on doc ids by @PrimoUomo89 #94
  • Return chunk ids in results thanks to @PrimoUomo89 #125
  • Lower kmeans iterations when not necessary to run more #129
  • Properly license the library as Apache-2 on PyPi

Fixes:

  • Dynamically increase search hyper parameters for large k values and lower doc counts, reducing the number of situations where the total number of documents return is substantially below k #131
  • Fix enabling the use of Training data processing with hard negatives turned off by @corrius #117
  • Proper handling of different input types when pre-processing training triplets by @GautamR-Samagra #115

0.0.6b5

05 Feb 17:06
Compare
Choose a tag to compare

Minor fixes&improvements release.

Community contribs:

0.0.6b2

29 Jan 21:04
5dbac07
Compare
Choose a tag to compare
  • Fix newly introduced dependency issue

0.0.6b0

28 Jan 19:47
Compare
Choose a tag to compare
  • Fixes sometimes skipped shuffling of training triplets
  • Fixes accidental duplicates when input training data has many more positives than negatives.
  • Bump to colbert-ai 0.2.18, fully removing multiprocessing calls when indexing