CLLMs - A Family of Efficient Parallel Decoders

from blog Tao of Mac, | ↗ original
Another technique for boosting inference speeds–this time at the expense of a little more fine-tuning effort, which seems fairly easy to justify for a 3.5x speed gain.