Low-resource language models: making a start with Balochi

from blog Alex Strick van Linschoten, | ↗ original
Large Language Models are all the rage, but what do you do when the language you want to model is essentially unrepresented in the public datasets used for training? I have a few months before the start of my next maths module and I’d like to use the time in part to dive into the ins and outs of training your own language models from scratch. The...