RLHF: Reinforcement Learning from Human Feedback

from blog Chip Huyen, | ↗ original
[LinkedIn discussion, Twitter thread] In literature discussing why ChatGPT is able to capture so much of our imagination, I often come across two narratives: Scale: throwing more data and compute at it. UX: moving from a prompt interface to a more natural chat interface. One narrative that is often glossed over is the incredible technical...