SmolLM2

from blog Simon Willison's Weblog, 2 Nov 2024 | ↗ original

SmolLM2 New from Loubna Ben Allal and her research team at Hugging Face: SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. [...] It was trained on 11 trillion tokens using a diverse dataset...

This is a short summary. ↗ Open original to view full content

🔗 Linkdump: LLMs

garrit.xyz | original ↗

Llama 3.2: New Edge AI and Vision Models

Tao of Mac | original ↗

ML in Go with a Python sidecar

Eli Bendersky's website | original ↗

How Smol AI Developer Works

callmephilip | original ↗

parlabot - ask the Portuguese parliament

Duarte O.Carmo | original ↗

Wet blanket

Brian Birtles’ Blog | original ↗

Everything I've learned so far about running local LLMs

null program | original ↗

How I'm using AI as a technical writer

passo.uno | original ↗

Small non-convex MINLP: Pyomo vs GAMS

Yet Another Math Programming Consultant | original ↗

Introducing llamafile

Mozilla Hacks – the Web developer blog | original ↗

More from Simon Willison's Weblog

DeepSeek API Docs: Rate Limit

18 Jan 2025 | original ↗

DeepSeek API Docs: Rate Limit This is surprising: DeepSeek offer the only hosted LLM API I've seen that doesn't implement rate limits: DeepSeek API does NOT constrain user's rate limit. We will try out best to serve every request. However, please note that when our servers are under high traffic pressure, your requests may take some time to...

Lessons From Red Teaming 100 Generative AI Products

18 Jan 2025 | original ↗

Lessons From Red Teaming 100 Generative AI Products New paper from Microsoft describing their top eight lessons learned red teaming (deliberately seeking security vulnerabilities in) 100 different generative AI models and products over the past few years. The Microsoft AI Red Team (AIRT) grew out of pre-existing red teaming initiatives at the...

Quoting Greg Brockman

16 Jan 2025 | original ↗

Manual inspection of data has probably the highest value-to-prestige ratio of any activity in machine learning. — Greg Brockman, OpenAI, Feb 2023 Tags: machine-learning, openai, ai

Quoting gwern

16 Jan 2025 | original ↗

[...] much of the point of a model like o1 is not to deploy it, but to generate training data for the next model. Every problem that an o1 solves is now a training data point for an o3 (eg. any o1 session which finally stumbles into the right answer can be refined to drop the dead ends and produce a clean transcript to train a more refined...

Datasette Public Office Hours Application

16 Jan 2025 | original ↗

Datasette Public Office Hours Application We are running another Datasette Public Office Hours event on Discord tomorrow (Friday 17th January 2025) at 2pm Pacific / 5pm Eastern / 10pm GMT / more timezones here. The theme this time around is lightning talks - we're looking for 5-8 minute long talks from community members about projects they are...

SmolLM2

Related

More from Simon Willison's Weblog