lm.rs: run inference on Language Models locally on the CPU with Rust

from blog Simon Willison's Weblog, | ↗ original
lm.rs: run inference on Language Models locally on the CPU with Rust Impressive new LLM inference implementation in Rust by Samuel Vitorino. I tried it just now on an M2 Mac with 64GB of RAM and got very snappy performance for this Q8 Llama 3.2 1B, with Activity Monitor reporting 980% CPU usage over 13 threads. Here's how I compiled the library...