Skip to content

Commit

Permalink
feat(ChatKnowledge): Add custom text separators and refactor log conf…
Browse files Browse the repository at this point in the history
…iguration (#636)

### ChatKnowledge
- New custom text separators
- Modfy relations output format
- Read `KNOWLEDGE_CHAT_SHOW_RELATIONS` from environment
`KNOWLEDGE_CHAT_SHOW_RELATIONS`

### More knowledge command line capabilities
- Delete your knowledge space or document in space
- List knowledge space

### Refactor log configuration
- Get loggger with `logging.getLogger(__name__)`
- Setup logging with `setup_logging`
- Read logger directory from environment `DBGPT_LOG_DIR`

Close #636
  • Loading branch information
Aries-ckt authored Sep 28, 2023
2 parents 20bddde + 202a0ce commit dfec270
Show file tree
Hide file tree
Showing 54 changed files with 1,068 additions and 223 deletions.
6 changes: 5 additions & 1 deletion .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ EMBEDDING_MODEL=text2vec
#EMBEDDING_MODEL=bge-large-zh
KNOWLEDGE_CHUNK_SIZE=500
KNOWLEDGE_SEARCH_TOP_SIZE=5
# Control whether to display the source document of knowledge on the front end.
KNOWLEDGE_CHAT_SHOW_RELATIONS=False
## EMBEDDING_TOKENIZER - Tokenizer to use for chunking large inputs
## EMBEDDING_TOKEN_LIMIT - Chunk size limit for large inputs
# EMBEDDING_MODEL=all-MiniLM-L6-v2
Expand Down Expand Up @@ -154,4 +156,6 @@ SUMMARY_CONFIG=FAST
#** LOG **#
#*******************************************************************#
# FATAL, ERROR, WARNING, WARNING, INFO, DEBUG, NOTSET
DBGPT_LOG_LEVEL=INFO
DBGPT_LOG_LEVEL=INFO
# LOG dir, default: ./logs
#DBGPT_LOG_DIR=
203 changes: 203 additions & 0 deletions docs/getting_started/application/kbqa/kbqa.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,4 +91,207 @@ Prompt Argument
#### WEAVIATE
* WEAVIATE_URL=https://kt-region-m8hcy0wc.weaviate.network
```

## KBQA command line

### Load your local documents to DB-GPT

```bash
dbgpt knowledge load --space_name my_kbqa_space --local_doc_path ./pilot/datasets --vector_store_type Chroma
```

- `--space_name`: Your knowledge space name, default: `default`
- `--local_doc_path`: Your document directory or document file path, default: `./pilot/datasets`
- `--vector_store_type`: Vector store type, default: `Chroma`

**View the `dbgpt knowledge load --help`help**

```
dbgpt knowledge load --help
```

Here you can see the parameters:

```
Usage: dbgpt knowledge load [OPTIONS]
Load your local knowledge to DB-GPT
Options:
--space_name TEXT Your knowledge space name [default: default]
--vector_store_type TEXT Vector store type. [default: Chroma]
--local_doc_path TEXT Your document directory or document file path.
[default: ./pilot/datasets]
--skip_wrong_doc Skip wrong document.
--overwrite Overwrite existing document(they has same name).
--max_workers INTEGER The maximum number of threads that can be used to
upload document.
--pre_separator TEXT Preseparator, this separator is used for pre-
splitting before the document is actually split by
the text splitter. Preseparator are not included
in the vectorized text.
--separator TEXT This is the document separator. Currently, only
one separator is supported.
--chunk_size INTEGER Maximum size of chunks to split.
--chunk_overlap INTEGER Overlap in characters between chunks.
--help Show this message and exit.
```

### List knowledge space

#### List knowledge space

```
dbgpt knowledge list
```

Output should look something like the following:
```
+------------------------------------------------------------------+
| All knowledge spaces |
+----------+-------------+-------------+-------------+-------------+
| Space ID | Space Name | Vector Type | Owner | Description |
+----------+-------------+-------------+-------------+-------------+
| 6 | n1 | Chroma | DB-GPT | DB-GPT cli |
| 5 | default_2 | Chroma | DB-GPT | DB-GPT cli |
| 4 | default_1 | Chroma | DB-GPT | DB-GPT cli |
| 3 | default | Chroma | DB-GPT | DB-GPT cli |
+----------+-------------+-------------+-------------+-------------+
```

#### List documents in knowledge space

```
dbgpt knowledge list --space_name default
```

Output should look something like the following:
```
+------------------------------------------------------------------------+
| Space default description |
+------------+-----------------+--------------+--------------+-----------+
| Space Name | Total Documents | Current Page | Current Size | Page Size |
+------------+-----------------+--------------+--------------+-----------+
| default | 1 | 1 | 1 | 20 |
+------------+-----------------+--------------+--------------+-----------+
+-----------------------------------------------------------------------------------------------------------------------------------+
| Documents of space default |
+------------+-------------+---------------+----------+--------+----------------------------+----------+----------------------------+
| Space Name | Document ID | Document Name | Type | Chunks | Last Sync | Status | Result |
+------------+-------------+---------------+----------+--------+----------------------------+----------+----------------------------+
| default | 61 | Knowledge.pdf | DOCUMENT | 745 | 2023-09-28T03:25:39.065762 | FINISHED | document embedding success |
+------------+-------------+---------------+----------+--------+----------------------------+----------+----------------------------+
```

#### List chunks of document in space `default`

```
dbgpt knowledge list --space_name default --doc_id 61 --page_size 5
```


```
+-----------------------------------------------------------------------------------+
| Document 61 in default description |
+------------+-------------+--------------+--------------+--------------+-----------+
| Space Name | Document ID | Total Chunks | Current Page | Current Size | Page Size |
+------------+-------------+--------------+--------------+--------------+-----------+
| default | 61 | 745 | 1 | 5 | 5 |
+------------+-------------+--------------+--------------+--------------+-----------+
+-----------------------------------------------------------------------------------------------------------------------+
| chunks of document id 61 in space default |
+------------+-------------+---------------+----------+-----------------------------------------------------------------+
| Space Name | Document ID | Document Name | Content | Meta Data |
+------------+-------------+---------------+----------+-----------------------------------------------------------------+
| default | 61 | Knowledge.pdf | [Hidden] | {'source': '/app/pilot/data/default/Knowledge.pdf', 'page': 10} |
| default | 61 | Knowledge.pdf | [Hidden] | {'source': '/app/pilot/data/default/Knowledge.pdf', 'page': 9} |
| default | 61 | Knowledge.pdf | [Hidden] | {'source': '/app/pilot/data/default/Knowledge.pdf', 'page': 9} |
| default | 61 | Knowledge.pdf | [Hidden] | {'source': '/app/pilot/data/default/Knowledge.pdf', 'page': 8} |
| default | 61 | Knowledge.pdf | [Hidden] | {'source': '/app/pilot/data/default/Knowledge.pdf', 'page': 8} |
+------------+-------------+---------------+----------+-----------------------------------------------------------------+
```

#### More list usage

```
dbgpt knowledge list --help
```

```
Usage: dbgpt knowledge list [OPTIONS]
List knowledge space
Options:
--space_name TEXT Your knowledge space name. If None, list all
spaces
--doc_id INTEGER Your document id in knowledge space. If Not
None, list all chunks in current document
--page INTEGER The page for every query [default: 1]
--page_size INTEGER The page size for every query [default: 20]
--show_content Query the document content of chunks
--output [text|html|csv|latex|json]
The output format
--help Show this message and exit.
```


### Delete your knowledge space or document in space

#### Delete your knowledge space

```
dbgpt knowledge delete --space_name default
```

#### Delete your document in space

```
dbgpt knowledge delete --space_name default --doc_name Knowledge.pdf
```


#### More delete usage

```
dbgpt knowledge delete --help
```

```
Usage: dbgpt knowledge delete [OPTIONS]
Delete your knowledge space or document in space
Options:
--space_name TEXT Your knowledge space name [default: default]
--doc_name TEXT The document name you want to delete. If doc_name is
None, this command will delete the whole space.
-y Confirm your choice
--help Show this message and exit.
```

#### More knowledge usage

```
dbgpt knowledge --help
```

```
Usage: dbgpt knowledge [OPTIONS] COMMAND [ARGS]...
Knowledge command line tool
Options:
--address TEXT Address of the Api server(If not set, try to read from
environment variable: API_ADDRESS). [default:
http://127.0.0.1:5000]
--help Show this message and exit.
Commands:
delete Delete your knowledge space or document in space
list List knowledge space
load Load your local documents to DB-GPT
```
Loading

0 comments on commit dfec270

Please sign in to comment.