DeepSeek-R1 and exploring DeepSeek-R1-Distill-Llama-8B

from blog Simon Willison's Weblog, 20 Jan 2025 | ↗ original

DeepSeek are the Chinese AI lab who dropped the best currently available open weights LLM on Christmas day, DeepSeek v3. That model was trained in part using their unreleased R1 "reasoning" model. Today they've released R1 itself, along with a whole family of new models derived from that base. There's a whole lot of stuff in the new release....

This is a short summary. ↗ Open original to view full content

Can AI models reason: Just a stochastic parrot?

John D. Cook | original ↗

What I learned from looking at 900 most popular open source AI tools

Chip Huyen | original ↗

Everything I've learned so far about running local LLMs

null program | original ↗

Llama 3.2: New Edge AI and Vision Models

Tao of Mac | original ↗

What's new with ML in production

Vicki Boykis | original ↗

ML in Go with a Python sidecar

Eli Bendersky's website | original ↗

Can AI models reason like a human?

John D. Cook | original ↗

What’s Good for the Goose, AI Training Edition

Daring Fireball | original ↗

Deep learning for… Go

Home on Erik Bernhardsson | original ↗

Good Riddance, GPTBot

Matthias Ott | original ↗

More from Simon Willison's Weblog

Datasette Public Office Hours 31st Jan at 2pm Pacific

30 Jan 2025 | original ↗

Datasette Public Office Hours 31st Jan at 2pm Pacific We're running another Datasette Public Office Hours session on Friday 31st January at 2pm Pacific (more timezones here). We'll be featuring demos from the community again - take a look at the videos of the six demos from our last session for an idea of what to expect. If you have something you...

Quoting Ashlee Vance

30 Jan 2025 | original ↗

Eventually, however, HudZah wore Claude down. He filled his Project with the e-mail conversations he’d been having with fusor hobbyists, parts lists for things he’d bought off Amazon, spreadsheets, sections of books and diagrams. HudZah also changed his questions to Claude from general ones to more specific ones. This flood of information and...

PyPI now supports project archival

30 Jan 2025 | original ↗

PyPI now supports project archival Neat new PyPI feature, similar to GitHub's archiving repositories feature. You can now mark a PyPI project as "archived", making it clear that no new releases are planned (though you can switch back out of that mode later if you need to). I like the sound of these future plans around this topic: Project archival...

Mistral Small 3

30 Jan 2025 | original ↗

Mistral Small 3 First model release of 2025 for French AI lab Mistral, who describe Mistral Small 3 as "a latency-optimized 24B-parameter model released under the Apache 2.0 license." More notably, they claim the following: Mistral Small 3 is competitive with larger models such as Llama 3.3 70B or Qwen 32B, and is an excellent open replacement...

Quoting Antiqua et Nova

30 Jan 2025 | original ↗

104. Technology offers remarkable tools to oversee and develop the world's resources. However, in some cases, humanity is increasingly ceding control of these resources to machines. Within some circles of scientists and futurists, there is optimism about the potential of artificial general intelligence (AGI), a hypothetical form of AI that would...

DeepSeek-R1 and exploring DeepSeek-R1-Distill-Llama-8B

Related

More from Simon Willison's Weblog