v0.12.0 - Centralized Chunk Configuration #153
benbrandt
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What's New
This release is a big API change to pull all chunk configuration options into the same place, at initialization of the splitters. This was motivated by two things:
Overall, I think this has aligned the library with the usage I have seen in the wild, and pulls all of the settings for the "domain" of chunking into a single unit.
Breaking Changes
Rust
true
, and this does logically make sense as the default behavior.TextSplitter
andMarkdownSplitter
now take aChunkConfig
in their::new
methodChunkSizer
,ChunkCapacity
andtrim
settings into a single struct that can be instantiated with a builder-lite pattern.with_trim_chunks
method has been removed fromTextSplitter
andMarkdownSplitter
. You can now settrim
in theChunkConfig
struct.ChunkCapacity
is now a struct instead of a Trait. If you were using a customChunkCapacity
, you can change yourimpl
to aFrom<TYPE> for ChunkCapacity
instead. and you should be able to still pass it in to all of the same methods.ChunkSizer
s take a concrete type in their method instead of an implMigration Examples
Default settings:
Hugging Face Tokenizers:
Tiktoken:
Ranges:
Markdown:
ChunkSizer impls
ChunkCapacity impls
Python
capacity
is now a required arguement in the__init__
and classmethods ofTextSplitter
andMarkdownSplitter
trim_chunks
parameter is now justtrim
in the__init__
and classmethods ofTextSplitter
andMarkdownSplitter
Migration Examples
Default settings:
Ranges:
Hugging Face Tokenizers:
Tiktoken:
Custom callback:
Markdown:
Full Changelog: v0.11.0...v0.12.0
This discussion was created from the release v0.12.0 - Centralized Chunk Configuration.
Beta Was this translation helpful? Give feedback.
All reactions