Releases: AnswerDotAI/RAGatouille
Releases · AnswerDotAI/RAGatouille
0.0.9
What's Changed
- fix: fix inversion method & pytorch Kmeans OOM by @bclavie in #179
- performance: Optimize ColBERT index free search with torch.topk by @Diegi97 in #219
- Calculate
pid_docid_map.values()
only once inadd_to_index
by @vishalbakshi in #267 - Fix/185 return trainer best checkpoint path by @GeraudBourdin in #265
- Finally remove the dependency hell by fully getting rid of poetry. Stay tuned for more updates!
New Contributors
- @vishalbakshi made their first contribution in #267
- @GeraudBourdin made their first contribution in #265
Full Changelog: 0.0.8...0.0.9
0.0.8post1
Minor fix: Corrects from time import time
import introduced in indexing overhaul and causing crashing issues as time
was then used improperly.
0.0.8
0.0.8 is finally here!
Major changes:
- Indexing overhaul contributed by @jlscheerer #158
- Relaxed dependencies to ensure lower install load #173
- Indexing for under 100k documents will by default no longer use Faiss, performing K-Means in pure PyTorch instead. This is a bit of an experimental change, but benchmark results are encouraging and result in greatly increased compatibility. #173
- CRUD improvements by @anirudhdharmarajan. Feature is still experimental/not fully supported, but rapidly improving!
Fixes:
- Many small bug fixes, mainly around typing
- Training triplets improvement (already present in 0.0.7 post versions) by @JoshuaPurtell
0.0.7post3
0.0.7post2
Fixes & tweaks to the previous release:
- Automatically adjust batch size on longer contexts (32 for 512 tokens, 16 for 1024, 8 for 2048, decreasing like this until a minimum of 1)
- Apply dynamic max context length to reranking
0.0.7post1
Release focusing on length adjustments. Much more dynamism and on-the-fly adaptation, both for query length and maximum document length!
- Remove hardcoded maximum length: it is now inferred from your base model's maximum position encodings. This enables support for longer-context ColBERT, such as Jina ColBERT
- Upstream changes to
colbert-ai
to allow any base model to be used, rather than pre-defined ones. - Query length now adjusts dynamically, from 32 (hardcoded minimum) to your model's maximum context window for longer queries.
0.0.6c2
(notes encompassing changes in the last few PyPi releases that were undocumented until now)
Changes:
- Query only a subset documents based on doc ids by @PrimoUomo89 #94
- Return chunk ids in results thanks to @PrimoUomo89 #125
- Lower kmeans iterations when not necessary to run more #129
- Properly license the library as Apache-2 on PyPi
Fixes:
- Dynamically increase search hyper parameters for large k values and lower doc counts, reducing the number of situations where the total number of documents return is substantially below
k
#131 - Fix enabling the use of Training data processing with hard negatives turned off by @corrius #117
- Proper handling of different input types when pre-processing training triplets by @GautamR-Samagra #115