Skip to content

sunadase/minbpe_rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Karpathy's minBPE in Rust

minbpe

byte-pair encoder made in order to practice rust i wouldnt imagine it to be following best practices atm as it's mostly identical to mingpts python code, will review reformat and try to make more rust like later.

todo:

  • BasicTokenizer
    • train
    • encode
    • decode
    • save
    • load
    • vocab type shud be vec u32?
    • encode different from minbpe?
  • REPL <- (next)
    • correct prints/whitespaces
    • take train model params
  • CLI
    • take train model params
  • Validate results <- (next)
  • Set-up Tests <- (next)
    • self
    • vs minbpe
    • vs tiktoken
  • RegexTokenizer
  • GPT4Tokenizer
  • Tests + Compare
  • Structs Traits:?
  • Review, Reorg, rustify
  • pyo3 python lib?

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages