Quoting Jack Clark

from blog Simon Willison's Weblog, 18 Nov 2024 | ↗ original

The main innovation here is just using more data. Specifically, Qwen2.5 Coder is a continuation of an earlier Qwen 2.5 model. The original Qwen 2.5 model was trained on 18 trillion tokens spread across a variety of languages and tasks (e.g, writing, programming, question answering). Qwen 2.5-Coder sees them train this model on an additional 5.5...

This is a short summary. ↗ Open original to view full content

Llama 3.2: New Edge AI and Vision Models

Tao of Mac | original ↗

Transformers for software engineers

Posts on Made of Bugs | original ↗

A very brief BitKnit retrospective

The ryg blog | original ↗

AI Transformer (LISP)

matt.sh | original ↗

Why Your Language Choice Doesn't Matter as Good Programmer

The Angry Dev | original ↗

Optimizing MiniZinc

Blog on Hillel Wayne | original ↗

50 programming languages in 58 days

Schemescape | original ↗

AI Transformer (QBASIC)

matt.sh | original ↗

How to Generate and Use Synthetic Data for Finetuning

Eugene Yan | original ↗

AI Transformer (Smalltalk)

matt.sh | original ↗

More from Simon Willison's Weblog

DeepSeek API Docs: Rate Limit

18 Jan 2025 | original ↗

DeepSeek API Docs: Rate Limit This is surprising: DeepSeek offer the only hosted LLM API I've seen that doesn't implement rate limits: DeepSeek API does NOT constrain user's rate limit. We will try out best to serve every request. However, please note that when our servers are under high traffic pressure, your requests may take some time to...

Lessons From Red Teaming 100 Generative AI Products

18 Jan 2025 | original ↗

Lessons From Red Teaming 100 Generative AI Products New paper from Microsoft describing their top eight lessons learned red teaming (deliberately seeking security vulnerabilities in) 100 different generative AI models and products over the past few years. The Microsoft AI Red Team (AIRT) grew out of pre-existing red teaming initiatives at the...

Quoting Greg Brockman

16 Jan 2025 | original ↗

Manual inspection of data has probably the highest value-to-prestige ratio of any activity in machine learning. — Greg Brockman, OpenAI, Feb 2023 Tags: machine-learning, openai, ai

Quoting gwern

16 Jan 2025 | original ↗

[...] much of the point of a model like o1 is not to deploy it, but to generate training data for the next model. Every problem that an o1 solves is now a training data point for an o3 (eg. any o1 session which finally stumbles into the right answer can be refined to drop the dead ends and produce a clean transcript to train a more refined...

Datasette Public Office Hours Application

16 Jan 2025 | original ↗

Datasette Public Office Hours Application We are running another Datasette Public Office Hours event on Discord tomorrow (Friday 17th January 2025) at 2pm Pacific / 5pm Eastern / 10pm GMT / more timezones here. The theme this time around is lightning talks - we're looking for 5-8 minute long talks from community members about projects they are...

Quoting Jack Clark

Related

More from Simon Willison's Weblog