Skip to content
This repository has been archived by the owner on Apr 6, 2023. It is now read-only.

Latest commit

 

History

History
65 lines (44 loc) · 1.38 KB

README.md

File metadata and controls

65 lines (44 loc) · 1.38 KB

Tokenizers Ruby

🙂 Fast state-of-the-art tokenizers for Ruby

Build Status

Installation

Add this line to your application’s Gemfile:

gem "tokenizers"

Getting Started

Load a pretrained tokenizer

tokenizer = Tokenizers.from_pretrained("bert-base-cased")

Encode

encoded = tokenizer.encode("I can feel the magic, can you?")
encoded.ids
encoded.tokens

Decode

tokenizer.decode(ids)

Load a tokenizer from files

tokenizer = Tokenizers::CharBPETokenizer.new("vocab.json", "merges.txt")

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/tokenizers-ruby.git
cd tokenizers-ruby
bundle install
bundle exec rake compile
bundle exec rake download:files
bundle exec rake test