Trying out QvQ - Qwen's new visual reasoning model

from blog Simon Willison's Weblog, 24 Dec 2024 | ↗ original

I thought we were done for major model releases in 2024, but apparently not: Alibaba's Qwen team just dropped the Apache2 2 licensed QvQ-72B-Preview, "an experimental research model focusing on enhancing visual reasoning capabilities". Their blog post is titled QvQ: To See the World with Wisdom - similar flowery language to their QwQ announcement...

This is a short summary. ↗ Open original to view full content

The Google Willow thing

Shtetl-Optimized | original ↗

I need the meat and potatoes

Birchtree | original ↗

MCTS and LLMs: what's the big deal?

seangoedecke.com RSS feed | original ↗

GQL:2024 is out

Peter Eisentraut | original ↗

May 2021 Gwern.net Newsletter

Gwern.net Newsletter | original ↗

Can AI models reason: Just a stochastic parrot?

John D. Cook | original ↗

Q2 2022

Szymon Kaliski | original ↗

What I learned from looking at 900 most popular open source AI tools

Chip Huyen | original ↗

Llama 3.2: New Edge AI and Vision Models

Tao of Mac | original ↗

What’s old is new and what’s new maybe isn’t what we want

Birchtree | original ↗

More from Simon Willison's Weblog

Gemini 2.0 is now available to everyone

5 Feb 2025 | original ↗

Gemini 2.0 is now available to everyone Big new Gemini 2.0 releases today: Gemini 2.0 Pro (Experimental) is Google's "best model yet for coding performance and complex prompts" - currently available as a free preview. Gemini 2.0 Flash is now generally available. Gemini 2.0 Flash-Lite looks particularly interesting: We’ve gotten a lot of positive...

o3-mini is really good at writing internal documentation

5 Feb 2025 | original ↗

o3-mini is really good at writing internal documentation I wanted to refresh my knowledge of how the Datasette permissions system works today. I already have extensive hand-written documentation for that, but I thought it would be interesting to see if I could derive any insights from running an LLM against the codebase. o3-mini has an input...

Ambsheets: Spreadsheets for exploring scenarios

5 Feb 2025 | original ↗

Ambsheets: Spreadsheets for exploring scenarios Delightful UI experiment by Alex Warth and Geoffrey Litt at Ink & Switch, exploring the idea of a spreadsheet with cells that can handle multiple values at once, which they call "amb" (for "ambiguous") values. A single sheet can then be used to model multiple scenarios. Here the cell for "Car"...

AI-Generated Slop Is Already In Your Public Library

5 Feb 2025 | original ↗

AI-Generated Slop Is Already In Your Public Library US libraries that use the Hoopla system to offer ebooks to their patrons sign agreements where they pay a license fee for anything selected by one of their members that's in the Hoopla catalog. The Hoopla catalog is increasingly filling up with junk AI slop ebooks like "Fatty Liver Diet...

Animating Rick and Morty One Pixel at a Time

4 Feb 2025 | original ↗

Animating Rick and Morty One Pixel at a Time Daniel Hooper says he spent 8 months working on the post, the culmination of which is an animation of Rick from Rick and Morty, implemented in 240 lines of GLSL - the OpenGL Shading Language which apparently has been directly supported by browsers for many years. The result is a comprehensive GLSL...

Trying out QvQ - Qwen's new visual reasoning model

Related

More from Simon Willison's Weblog