yet-another-applied-llm-benchmark

from blog Simon Willison's Weblog, | ↗ original
yet-another-applied-llm-benchmark Nicholas Carlini introduced this personal LLM benchmark suite back in February as a collection of over 100 automated tests he runs against new LLM models to evaluate their performance against the kinds of tasks he uses them for. There are two defining features of this benchmark that make it interesting. Most...