My finetuned models beat OpenAI’s GPT-4

from blog Alex Strick van Linschoten, | ↗ original
My last post outlined the kinds of evaluation I need and want to understand how well my finetuned LLM is performing in the task of structured data extraction from press releases. Let’s start with the core metric I’m interested in, accuracy, and then later we can dive into some of the other evaluation metrics as well. TL;DR The headline for this...