Task-Specific LLM Evals that Do & Don't Work

from blog Eugene Yan, 31 Mar 2024 | ↗ original

Evals for classification, summarization, translation, copyright regurgitation, and toxicity.

This is a short summary. ↗ Open original to view full content

any blockers? | original ↗

ring.muhokama.fun | original ↗

Two-Wrongs | original ↗

Effort Engine

Tao of Mac | original ↗

Go talk to the LLM

meain/blog | original ↗

Blogs on rohan ganapavarapu | original ↗

Push the Red Button | original ↗

Notes on software development | original ↗

Computer Things | original ↗

mcyoung | original ↗

12 Jan 2025 | original ↗

Exploring how an AI-powered reading experience could look like.

2024 Year in Review

22 Dec 2024 | original ↗

A peaceful year of steady progress on my craft and health

8 Dec 2024 | original ↗

How the sharing of 1M Bluesky posts surfaced the strong anti-AI sentiment on Bluesky.

1 Dec 2024 | original ↗

With regard to writing, there are many rules and also no rules at all.

24 Nov 2024 | original ↗

Description of post here (150 chars)