Task-Specific LLM Evals that Do & Don't Work

from blog Eugene Yan, | ↗ original
Evals for classification, summarization, translation, copyright regurgitation, and toxicity.