Skip to content

Latest commit

 

History

History
25 lines (21 loc) · 1022 Bytes

README.md

File metadata and controls

25 lines (21 loc) · 1022 Bytes

Neosearch

AI-based search engine done right.

ToDo

  • Compare trafilatura bs4 and newspaper3k
  • Implement the bulk indexer
  • Implement the batch system for spider
    • Implement the spider with Trafilatura
      • Parse title, body, and metadata from HTML
      • Parse title, body, and metadata from PDF, etc
  • Implement the dispatcher
    • Implement dispatcher for linkedin
    • Implement dispatcher for GitHub
    • Implement dispatcher for Medium
    • Implement dispatcher for X (previously Twitter)
  • Implement the ParadeDB retriever with LlamaIndex
  • Update Rag Retriever to use the searxng engine
  • Implement the reranker
    • Add support for Cohere Reranker
    • Add support for FlashRank Reranker
  • Add support for late-chunking for better IR