Skip to content

v1.2.0

Compare
Choose a tag to compare
@github-actions github-actions released this 17 Jun 07:03
· 56 commits to main since this release

app-v1.2.0 (2024-06-17)

Features

  • improve summary so it can potentially know if text isn't relevant to topic (47a5959)
  • add basic metric.py file which can determine questions answered by a given source (b3dd118)
  • add cluster caching (eeef931)
  • add curated_topic.py to llm interface (bba5c9f)
  • add dogs (0981a29)
  • add embedding distance function (979c259)
  • add embeddings model and cache for it (d067091)
  • add file to create good and bad curation datasets (ce0ce92)
  • add first version of summarize and embed algo (4cf63bc)
  • add function to try to derive useful context for a source (4a7890e)
  • add get_by_xml_list function (a539ecd)
  • add input files for source curation (d1e8d6a)
  • add metric to find curated topics that are good based on how distinct the sources are (68233c0)
  • add script to translate a bunch of stuff (c8536dd)
  • add sqlite caching for calls to basic langchain (8f61dca)
  • allow translating a specific version (d206497)
  • also print text when translation fails (3bf9ccf)
  • differentiate titles that are repetitious (b3cf36c)
  • export topic pages (346e9fa)
  • export topic pages (12c4bd9)
  • failed attempt to calculate pair-wise difference b/w embeddings (13bc16d)
  • finalize title deduplication prompt (d868ec4)
  • gather sources pipeline basically working. (31fa7c9)
  • Guide: Question Generator for Learning Guide (064e219)
  • improve clustering by using affinitypropogation to cluster noise and breakup large clusters. then use cluster summary cosine distance to merge very similar clusters. (4893fd2)
  • improve description writing (9317de8)
  • improve export of good bad datasets (3d70721)
  • improve importing and instantiating CurateTopic (bfce0bf)
  • improve random seed setting. improve summary so it can potentially know if text isn't relevant to topic (33a0535)
  • introduce pipeline arch for gathering sources (c94236b)
  • move core logic of clustering to cluster.py (cf41ee8)
  • optimize threshold for merging similar cluster summaries. control verbosity. increase affinitypropogation iterations although I don't have data to indicate this helps... (3beebc5)
  • push question extractor (ad60a6f)
  • refactor and improve clustering optimization (3957aeb)
  • save output to file (4a1673a)
  • switch to basic langchain impl of voyage ai to use caching (af6e6e6)
  • temp solution to avoid generating descriptions for certain topics (55ff05f)
  • wait 10 min on rate limit error (8a947c1)
  • WIP summarize questions (874e66f)
  • write get_or_generate_topic_description() (354e550)

Bug Fixes

  • add more slugs to blacklist (9fffef9)
  • bugs in new generalized cluster.py (9e2c51b)
  • check that match is not None (82a178c)
  • dont calc stdev if other clusters are <= 1 (68e4729)
  • dont cluster noise if there is none! (ce82db4)
  • dont forget to strip output before checking word count (244401e)
  • fix imports (a833c8a)
  • installation command of LLM interface package (0903894)
  • move back to white list for generating topic descs (1e8d36f)
  • only output generated source first time it's generated (af86a6b)
  • pass verbose down (d98ffbc)
  • summarize clusters in parallel (5a07f6f)
  • undo ability for uniqueness of source to say if topic doesn't apply to text (e2f94e8)