Releases: Sefaria/LLM
Releases · Sefaria/LLM
v1.3.2
app-v1.3.2 (2024-07-18)
v1.3.1
v1.3.0
app-v1.3.0 (2024-07-18)
Features
- add Hebrew sentencizer (6c63f70)
- basic style_guide.py (b870f91)
- generalize clusterer more so it can take any algorithm that requires n clusters to be optimized (d049a2c)
- integrate style guide into prompt generator (f28ed79)
- style guide works. yay. (49fbd31)
- style guide works. yay. (6a5aa7a)
Bug Fixes
- update code to latest version of openai (13b6833)
v1.2.5
app-v1.2.5 (2024-06-20)
Bug Fixes
chart-1.1.8
chart-v1.1.8 (2024-06-20)
v1.2.4
app-v1.2.4 (2024-06-17)
v1.2.3
app-v1.2.3 (2024-06-17)
v1.2.2
app-v1.2.2 (2024-06-17)
v1.2.1
app-v1.2.1 (2024-06-17)
Bug Fixes
- remove dependency on sefaria project from uniqueness_of_source.py (67c64c1)
v1.2.0
app-v1.2.0 (2024-06-17)
Features
- improve summary so it can potentially know if text isn't relevant to topic (47a5959)
- add basic metric.py file which can determine questions answered by a given source (b3dd118)
- add cluster caching (eeef931)
- add curated_topic.py to llm interface (bba5c9f)
- add dogs (0981a29)
- add embedding distance function (979c259)
- add embeddings model and cache for it (d067091)
- add file to create good and bad curation datasets (ce0ce92)
- add first version of summarize and embed algo (4cf63bc)
- add function to try to derive useful context for a source (4a7890e)
- add get_by_xml_list function (a539ecd)
- add input files for source curation (d1e8d6a)
- add metric to find curated topics that are good based on how distinct the sources are (68233c0)
- add script to translate a bunch of stuff (c8536dd)
- add sqlite caching for calls to basic langchain (8f61dca)
- allow translating a specific version (d206497)
- also print text when translation fails (3bf9ccf)
- differentiate titles that are repetitious (b3cf36c)
- export topic pages (346e9fa)
- export topic pages (12c4bd9)
- failed attempt to calculate pair-wise difference b/w embeddings (13bc16d)
- finalize title deduplication prompt (d868ec4)
- gather sources pipeline basically working. (31fa7c9)
- Guide: Question Generator for Learning Guide (064e219)
- improve clustering by using affinitypropogation to cluster noise and breakup large clusters. then use cluster summary cosine distance to merge very similar clusters. (4893fd2)
- improve description writing (9317de8)
- improve export of good bad datasets (3d70721)
- improve importing and instantiating CurateTopic (bfce0bf)
- improve random seed setting. improve summary so it can potentially know if text isn't relevant to topic (33a0535)
- introduce pipeline arch for gathering sources (c94236b)
- move core logic of clustering to cluster.py (cf41ee8)
- optimize threshold for merging similar cluster summaries. control verbosity. increase affinitypropogation iterations although I don't have data to indicate this helps... (3beebc5)
- push question extractor (ad60a6f)
- refactor and improve clustering optimization (3957aeb)
- save output to file (4a1673a)
- switch to basic langchain impl of voyage ai to use caching (af6e6e6)
- temp solution to avoid generating descriptions for certain topics (55ff05f)
- wait 10 min on rate limit error (8a947c1)
- WIP summarize questions (874e66f)
- write get_or_generate_topic_description() (354e550)
Bug Fixes
- add more slugs to blacklist (9fffef9)
- bugs in new generalized cluster.py (9e2c51b)
- check that match is not None (82a178c)
- dont calc stdev if other clusters are <= 1 (68e4729)
- dont cluster noise if there is none! (ce82db4)
- dont forget to strip output before checking word count (244401e)
- fix imports (a833c8a)
- installation command of LLM interface package (0903894)
- move back to white list for generating topic descs (1e8d36f)
- only output generated source first time it's generated (af86a6b)
- pass verbose down (d98ffbc)
- summarize clusters in parallel (5a07f6f)
- undo ability for uniqueness of source to say if topic doesn't apply to text (e2f94e8)