My benchmark for large language models

from blog Nicholas Carlini, | ↗ original
A benchmark of ~100 tests for language models, collected from actual questions I've asked of language models in the last year.