LLaMA Now Goes Faster on CPUs

from blog justine.lol, | ↗ original
I just wrote 84 new matrix multiplication kernels for llamafile which enable it to read prompts / images faster. Compared to llama.cpp, prompt eval time with llamafile should go anywhere between 30% and 500% faster when using F16 and Q8_0 weights on CPU. The improvements are most dramatic for ARMv8.2+ (e.g....