Skip to content

Commit

Permalink
correct the tokenizer for the chinese example
Browse files Browse the repository at this point in the history
  • Loading branch information
SeanLee97 committed Aug 30, 2024
1 parent 4f3a2e9 commit d0e8842
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,16 @@ pip install baguetter
## 快速入门

```python
from baguetter.indices import BMXSparseIndex
from typing import List
from baguetter.indices import BMXSparseIndex, TextPreprocessorConfig

# 自定义中文 tokenizer
def cjk_tokenizer(text: str) -> List[str]:
return list(text)

# 创建索引
idx = BMXSparseIndex()
idx = BMXSparseIndex(preprocessor_or_config=TextPreprocessorConfig(
custom_tokenizer=cjk_tokenizer))

# 添加文档
docs = [
Expand Down

0 comments on commit d0e8842

Please sign in to comment.