Run DeepSeek R1 or V3 with MLX Distributed

from blog Simon Willison's Weblog, 22 Jan 2025 | ↗ original

Run DeepSeek R1 or V3 with MLX Distributed Handy detailed instructions from Awni Hannun on running the enormous DeepSeek R1 or v3 models on a cluster of Macs using the distributed communication feature of Apple's MLX library. DeepSeek R1 quantized to 4-bit requires 450GB in aggregate RAM, which can be achieved by a cluster of three 192 GB M2...

This is a short summary. ↗ Open original to view full content

Monday's Trailheads

scattershot | original ↗

Layer-wise inferencing + batching: Small VRAM doesn't limit LLM throughput anymore

Languages and Architecture | original ↗

Calculating the size of all LFS files in a repo

Simon Willison TIL | original ↗

Parallelizing neural networks on one GPU with JAX

Will Whitney | original ↗

Llama 3.2: New Edge AI and Vision Models

Tao of Mac | original ↗

My issue with GPU-accelerated deep learning

Home on Erik Bernhardsson | original ↗

Improving on Gear Hashing with FastCDC

Joshleeb | original ↗

55 TOPS Raspberry Pi AI PC - 4 TPUs, 2 NPUs

Jeff Geerling's Blog | original ↗

IBM’s Machine Learning Accelerator at VLSI 2018

Real World Tech | original ↗

On AlphaTensor’s new matrix multiplication algorithms

The ryg blog | original ↗

More from Simon Willison's Weblog

The surprising way to save memory with BytesIO

31 Jan 2025 | original ↗

The surprising way to save memory with BytesIO Itamar Turner-Trauring explains that if you have a BytesIO object in Python calling .read() on it will create a full copy of that object, doubling the amount of memory used - but calling .getvalue() returns a bytes object that uses no additional memory, instead using copy-on-write. .getbuffer() is...

Datasette Public Office Hours 31st Jan at 2pm Pacific

30 Jan 2025 | original ↗

Datasette Public Office Hours 31st Jan at 2pm Pacific We're running another Datasette Public Office Hours session on Friday 31st January at 2pm Pacific (more timezones here). We'll be featuring demos from the community again - take a look at the videos of the six demos from our last session for an idea of what to expect. If you have something you...

Quoting Ashlee Vance

30 Jan 2025 | original ↗

Eventually, however, HudZah wore Claude down. He filled his Project with the e-mail conversations he’d been having with fusor hobbyists, parts lists for things he’d bought off Amazon, spreadsheets, sections of books and diagrams. HudZah also changed his questions to Claude from general ones to more specific ones. This flood of information and...

PyPI now supports project archival

30 Jan 2025 | original ↗

PyPI now supports project archival Neat new PyPI feature, similar to GitHub's archiving repositories feature. You can now mark a PyPI project as "archived", making it clear that no new releases are planned (though you can switch back out of that mode later if you need to). I like the sound of these future plans around this topic: Project archival...

Mistral Small 3

30 Jan 2025 | original ↗

Mistral Small 3 First model release of 2025 for French AI lab Mistral, who describe Mistral Small 3 as "a latency-optimized 24B-parameter model released under the Apache 2.0 license." More notably, they claim the following: Mistral Small 3 is competitive with larger models such as Llama 3.3 70B or Qwen 32B, and is an excellent open replacement...

Run DeepSeek R1 or V3 with MLX Distributed

Related

More from Simon Willison's Weblog