LLaMA Now Goes Faster on CPUs

from blog justine.lol, 1 Apr 2024 | ↗ original

I just wrote 84 new matrix multiplication kernels for llamafile which enable it to read prompts / images faster. Compared to llama.cpp, prompt eval time with llamafile should go anywhere between 30% and 500% faster when using F16 and Q8_0 weights on CPU. The improvements are most dramatic for ARMv8.2+ (e.g....

This is a short summary. ↗ Open original to view full content

Introducing llamafile

Mozilla Hacks – the Web developer blog | original ↗

Llama 3.2: New Edge AI and Vision Models

Tao of Mac | original ↗

Llamafile’s progress, four months in

Mozilla Hacks – the Web developer blog | original ↗

Building LLVM in 90 seconds using Amazon Lambda

Posts on Made of Bugs | original ↗

Effort Engine

Tao of Mac | original ↗

Llama 3.1

Tao of Mac | original ↗

Llamafile v0.8.14: a new UI, performance gains, and more

Mozilla Hacks – the Web developer blog | original ↗

Meta AI release Llama 3.3

Simon Willison's Weblog | original ↗

llm-cerebras

Simon Willison's Weblog | original ↗

Layer-wise inferencing + batching: Small VRAM doesn't limit LLM throughput anymore

Languages and Architecture | original ↗

More from justine.lol

Weird Lexical Syntax

1 Nov 2024 | original ↗

I just learned 42 programming languages this month to build a new syntax highlighter for llamafile. I feel like I'm up to my eyeballs in programming languages right now. Now that it's halloween, I thought I'd share some of the spookiest most surprising syntax I've seen.

The Fastest Mutexes

2 Oct 2024 | original ↗

Imagine you have a workload where all your threads need to do a serialized operation. With Cosmo, if you're looking at htop, then it's going to appear like only one core is active, whereas glibc and musl libc will fill up your entire CPU meter. That's bad news if you're running a lot of jobs on the same server....

Cosmopolitan v3.9.2

22 Sept 2024 | original ↗

Cosmopolitan's Windows support may finally be feature complete. It's now possible to send signals between processes using kill() on Windows. Ten new torture test programs have been written to tease out more fixes and offer a high level of assurance that signal handling is correct. Some of these tests are good...

AI Training Shouldn't Erase Authorship

23 Aug 2024 | original ↗

In a world of infinite automation and infinite surveillance, survival is going to depend on being the least boring person. Over my career I've written and attached my name to thousands of public source code files. I know they are being scraped from the web and used to train AIs. But if I ask something like...

Bash One-Liners for LLMs

4 Dec 2023 | original ↗

I spent the last month working with Mozilla to launch an open source project called llamafile which is the new best way to run an LLM on your own computer. So far things have been going pretty smoothly. The project earned 5.6k stars on GitHub, 1073 upvotes on Hacker News, and received press coverage from ...

LLaMA Now Goes Faster on CPUs

Related

More from justine.lol