SentencePiece BPE Tokenizer in Go

from blog Eli Bendersky's website, | ↗ original
Earlier this year I wrote a post about implementing BPE tokenization in Go, which made it possible to reproduce OpenAI's tokenizer. Today I want to mention a new project I've been hacking on recently: go-sentencepiece - a pure Go implementation of the SentencePiece tokenizer that's used for Google AI's models like …