Our civilization is built on curiosity. Curiosity recommender system's object is suggesting perfect list after reading documents.
- Notion.so raw data generation
- Nosion.so raw data to markdown
1~2 processings are done by texonom/notion-node
- Markdown to Huggingface dataset
git clone https://github.com/texonom/texonom-md
python hf_upload.py chroma
- Extracted dataset to embedding
Run chroma server
pm2 start conf/chroma.json
Run embedding server
volume=data
model=thenlper/gte-small
docker run -d --name tei --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.3.0 --model-id $model
python index_to.py pgvector
- Use embedding for recommendation
- from dictionary dataset without id duplicating (prefer recent one)
- dataset tagging with date