This repo demonstrates we can use a Logseq graph's RDF export to intelligently query our graph. We use the LLamaIndex python library because it has third party RDF integration and can import knowledge to LLMs that exceed current token constraints e.g. ~4k tokens.
The example graph used is https://github.com/logseq/docs since it's public,
exports valid RDF and has a variety of practical knowledge. The experiment
uses the rdf_chat.py
script and the included docs.ttl, a Mar 9
RDF export of the docs graph. To generate your own rdf export, see
below.
I ran a couple of queries and saved the interactions in examples. Each file contains output from one or more uses of the script. Each use of the script has an Analysis section in which I describe how accurate the data is.
Overall, I'm pretty happy with the initial results. Using llama-index's defaults, I was able to get fairly accurate results on questions about the docs graph. I was able to list, search and even do some relational querying with varying levels of accuracy. It is annoying that the default LLM makes up stuff when it doesn't know an answer. Results varied depending on how questions were worded and whether meaningful words had quotes and the correct capitalization.
You'll need to have an https://openai.com/ account. Once you have an api key, set it in your terminal:
export OPENAI_API_KEY="MY-API-KEY"
Be sure to have python installed, preferably python3. Install llama-index with
pip3 install llama-index
.
# Do a live query
$ python3 rdf_chat.py live list platforms
ENV: text-davinci
INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 2135 tokens
PROMPT: list platforms
INFO:root:> [query] Total LLM token usage: 2181 tokens
INFO:root:> [query] Total embedding token usage: 2 tokens
RESPONSE:
All Platforms, Desktop, Publish Web, Web, Android, iOS
# Generate an index for cheaper embedded querying
$ python3 rdf_chat.py save-index
...
# Do a cached query
$ python3 rdf_chat.py list whiteboard only ui elements
ENV: text-embedding-ada-002-v2
PROMPT: list whiteboard only ui elements
INFO:root:> [query] Total LLM token usage: 2212 tokens
INFO:root:> [query] Total embedding token usage: 7 tokens
RESPONSE:
- Whiteboard/Toolbar
- Whiteboard/Action Bar
- Whiteboard/Quick Add
- Whiteboard/Dashboard
- Whiteboard/Context Menu
- Whiteboard___Canvas
To build a fresh rdf export of the docs graph, first install the rdf-export CLI. Then simply:
# From docs directory
$ logseq-rdf-export docs.ttl -a -c '{:exclude-properties [:initial-version :description]}'
Parsing 303 files...
Writing 272 triples to file docs.ttl
- Try chatgpt model e.g. gpt-3.5-turbo for better accuracy
- Try a custom prompt for better accuracy
- Try a different LLM. Is it possible to use one that we control?
- Build a RDF file that includes more data including descriptions of entities
- Try a graph that includes subclasses to see how useful that is. rdf-qa repo is promising
- https://github.com/mommi84/rdf-qa - The repo which inspired this experiment. Recommend trying it
- RDF Loader - The RDF loader for llama-index
- LLama Index docs