CLLMs - A Family of Efficient Parallel Decoders

from blog Tao of Mac, 8 May 2024 | ↗ original

Another technique for boosting inference speeds–this time at the expense of a little more fine-tuning effort, which seems fairly easy to justify for a 3.5x speed gain.

This is a short summary. ↗ Open original to view full content

Speculative Decoding and Beyond: A Survey of Speculative Decoding Techniques

Confessions of a Code Addict | original ↗

Auto-WLM: machine learning enhanced workload management in Amazon Redshift

Metadata | original ↗

lm.rs: run inference on Language Models locally on the CPU with Rust

Simon Willison's Weblog | original ↗

Effective ML Through Merlin's Destruct Command

ring.muhokama.fun | original ↗

Layer-wise inferencing + batching: Small VRAM doesn't limit LLM throughput anymore

Languages and Architecture | original ↗

llm-cerebras

Simon Willison's Weblog | original ↗

Recording: CPython and ELF Essentials for Building a Basic Remote Profiler

Confessions of a Code Addict | original ↗

Higher RAII, and the Seven Arcane Uses of Linear Types

Languages and Architecture | original ↗

Understanding Evlis Tail Recursion

Fred Akalin | original ↗

Dynamic Adaptive Inverse Rectified Non-Linear Units Over Integrals of Ziplists

matt.sh | original ↗

More from Tao of Mac

The Supernote A6X2 Nomad

18 Jan 2025 | original ↗

I spent the holiday season practicing my handwriting, which was… unexpected. The reason why is that I got a Supernote Nomad, which was enough to warrant spending a pretty large chunk of time using it either exclusively or in tandem with my other devices. Although it’s been only a couple of months, the part of me that initially pondered the Nomad...

Brainwash An Executive Today!

14 Jan 2025 | original ↗

Welcome to my life. Seriously, this is an amazing likeness to some of the stuff I go through every day at work.

16GB Raspberry Pi 5

13 Jan 2025 | original ↗

This didn’t fully register when it came out last Thursday (had other stuff on my mind, I guess), but I still think they should have done this for the Raspberry Pi 500 first–because regular desktop users would reap the most benefits, and it would greatly increase the usable lifetime of the device. As to having 16GB on a Model B, it’s certainly...

How I Use LLMs for Coding and Writing

12 Jan 2025 | original ↗

I’ve come across a couple of posts about how people use LLMs for coding, so I thought I would share how I currently use AI in general–spanning office work, writing, and, of course, coding and a bit of fun. DisclaimerSince I know most people won’t read my site disclaimer, I encourage you to do so, and go through the rest of the post with the...

Notes for January 6-12

12 Jan 2025 | original ↗

Notes for January 6-12Work was a bit slow this week (it felt more like a half-week as people started popping back in), so I was able to keep a clear head and ended up doing a fair bit of writing for a change–bits of it will be surfacing in the next few hours or weeks. Going PaperlessAfter a conversation with friends, I installed paperless-ngx on...

CLLMs - A Family of Efficient Parallel Decoders

Related

More from Tao of Mac