Effort Engine

from blog Tao of Mac, 17 Apr 2024 | ↗ original

I’ve been pointing out that LLMs are barely optimized for ages now, so here’s another example of possible inference speedups that seems very promising (it works somewhat like on-the-fly distillation). If this technique checks out and ends up implemented in mainstream tooling like ollama, it’s going to significantly lower compute and memory...

This is a short summary. ↗ Open original to view full content

Layer-wise inferencing + batching: Small VRAM doesn't limit LLM throughput anymore

Languages and Architecture | original ↗

Speculative Decoding and Beyond: A Survey of Speculative Decoding Techniques

Confessions of a Code Addict | original ↗

lm.rs: run inference on Language Models locally on the CPU with Rust

Simon Willison's Weblog | original ↗

A Gentle Introduction to LLVM IR

mcyoung | original ↗

How I'm using AI as a technical writer

passo.uno | original ↗

Building LLM applications for production

Chip Huyen | original ↗

Workload Agnosticism in Large Language Models: The Foundation for the Next Generation of Computing

Jaz's Blog | original ↗

Everything I've learned so far about running local LLMs

null program | original ↗

LLMs and Programming in the first days of 2024

antirez | original ↗

Thoughts on LLM Agents

Fernando Borretti | original ↗

More from Tao of Mac

Brainwash An Executive Today!

14 Jan 2025 | original ↗

Welcome to my life. Seriously, this is an amazing likeness to some of the stuff I go through every day at work.

16GB Raspberry Pi 5

13 Jan 2025 | original ↗

This didn’t fully register when it came out last Thursday (had other stuff on my mind, I guess), but I still think they should have done this for the Raspberry Pi 500 first–because regular desktop users would reap the most benefits, and it would greatly increase the usable lifetime of the device. As to having 16GB on a Model B, it’s certainly...

How I Use LLMs for Coding and Writing

12 Jan 2025 | original ↗

I’ve come across a couple of posts about how people use LLMs for coding, so I thought I would share how I currently use AI in general–spanning office work, writing, and, of course, coding and a bit of fun. DisclaimerSince I know most people won’t read my site disclaimer, I encourage you to do so, and go through the rest of the post with the...

Notes for January 6-12

12 Jan 2025 | original ↗

Notes for January 6-12Work was a bit slow this week (it felt more like a half-week as people started popping back in), so I was able to keep a clear head and ended up doing a fair bit of writing for a change–bits of it will be surfacing in the next few hours or weeks. Going PaperlessAfter a conversation with friends, I installed paperless-ngx on...

Rodney Brooks' Predictions Scorecard

10 Jan 2025 | original ↗

Rodney Brooks’ latest predictions scorecard is a refreshing dose of reality in a tech landscape often clouded by hype. His candid assessment of self-driving cars, AI, and space travel highlights the gap between expectation and reality, especially where it relates to the AI hype cycle and a need to both temper expectstions and have a more rational...

Effort Engine

Related

More from Tao of Mac