Releases: benbrandt/text-splitter
Releases · benbrandt/text-splitter
v0.20.0
Breaking Changes
- Switched backing Unicode segmentation implementation from
unicode-segmentation
toicu_segmenter
. This brings some modest performance gains, along with being able to leverage the official Unicode crate. There may be slight differences in chunk behavior in some edge cases, so treating this as a breaking change.
Full Changelog: v0.19.1...v0.20.0
v0.19.1
What's New
- Python splitters have new
chunk_all
andchunk_all_indices
method so the multiple texts can be processed in parallel. (For Rust, you should be able to userayon
to do this already)
Full Changelog: v0.19.0...v0.19.1
v0.19.0
v0.18.1
v0.18.0
v0.17.1
v0.17.0
Breaking Changes
- Support
tree-sitter@v0.24
for CodeSplitters. - Due to a slight change in the backing unicode segmentation implementation, there are some slight shifts in behavior for CodeSplitters as well (in my tests, mostly that semicolons have a more logical grouping with previous content).
Full Changelog: v0.16.1...v0.17.0
v0.16.1
What's New
Updates pulldown-cmark
to v0.12.1
to address an issue with high CPU usage for certain Markdown elements.
Full Changelog: v0.16.0...v0.16.1
v0.16.0
Breaking Changes
- Update to
v0.23.0
oftree-sitter
forCodeSplitter
. There was a breaking change for language definitions, so this is also a breaking change for us, especially on the Python side, since we support passing the language in. - Minimum Python version for the Python bindings is now 3.9 since 3.8 will be EOL next month.
Python
Make sure to upgrade to the latest version of your tree-sitter language package.
Rust
Make sure to upgrade to the latest version of your tree-sitter language package crate. These know have a LANGUAGE
constant rather than a language()
function.
// Before
tree_sitter_rust::language()
// After
tree_sitter_rust::LANGUAGE
What's New
MarkdownSplitter
can better parse the Commonmark HS extension for Definition Lists.
Full Changelog: v0.15.0...v0.16.0