Layer-wise inferencing + batching: Small VRAM doesn't limit LLM throughput anymore
Related
More from Languages and Architecture
Committing language interop sins for science June 17, 2024 — Anyone trying to make a new mainstream language is completely insane, unless they're backed by a huge corporation.
Committing language interop sins for science May 24, 2024 — Languages like C++, Typescript, Kotlin, and Swift had a brilliant approach: they were created to harness an existing ecosystem of libraries from another pre-existing language. But that's easier said than done!...
Linear types + whitelisted destroyers = powers yet unimagined! May 14, 2024 — Also posted today: Layer-wise inferencing + batching: Small VRAM doesn't limit LLM throughput anymore, on how even a normal small computer can now run...
The Memory Safety Grimoire, Part 1 April 24, 2024 — A fellow named Zeke came into my server one day. Zeke: "Wait, so with generational references, we now have four ways to do memory safety?" Evan: "In fact, there are fourteen by my count....
Results and measurements! July 11, 2023 — — Sponsor on GitHub or Patreon! Three...