Layer-wise inferencing + batching: Small VRAM doesn't limit LLM throughput anymore

from blog Languages and Architecture, | ↗ original
Also posted today: Higher RAII, and the Seven Arcane Uses of Linear Types, about how linear types let us control the future! Currently, the general consensus is that you can't really run larger LLMs on ordinary computers. This is a short summary. ↗ Open original to view full content