Qwen: Extending the Context Length to 1M Tokens

from blog Simon Willison's Weblog, 18 Nov 2024 | ↗ original

Qwen: Extending the Context Length to 1M Tokens The new Qwen2.5-Turbo boasts a million token context window (up from 128,000 for Qwen 2.5) and faster performance: Using sparse attention mechanisms, we successfully reduced the time to first token for processing a context of 1M tokens from 4.9 minutes to 68 seconds, achieving a 4.3x speedup. The...

This is a short summary. ↗ Open original to view full content

Llama 3.2: New Edge AI and Vision Models

Tao of Mac | original ↗

Jinja2 as a Pico-8 Preprocessor

GioCities | original ↗

Build and keep your context window

Vicki Boykis | original ↗

0015: imp internals, reflections, precedence, make mode, mutant, q3, error recovery, tonsky ui, subtext 10, factfulness, benchmarking advice, dependency hubs, independent research, zig wayland, retool, observable dependencies, ugly buildings, without scihub, wasm virtual memory, huawei breakdown, infrastructure langauges, stencil vectors, chiX

Scattered Thoughts | original ↗

Effort Engine

Tao of Mac | original ↗

[Update] Faster link time for Qt WebAssembly

Qt, linux and everything | original ↗

Optimizing MiniZinc

Blog on Hillel Wayne | original ↗

LLaMA Now Goes Faster on CPUs

justine.lol | original ↗

5000x faster CRDTs: An adventure in optimization

Seph | original ↗

Mixins Better for Performance

CSS Wizardry | original ↗

More from Simon Willison's Weblog

DeepSeek API Docs: Rate Limit

18 Jan 2025 | original ↗

DeepSeek API Docs: Rate Limit This is surprising: DeepSeek offer the only hosted LLM API I've seen that doesn't implement rate limits: DeepSeek API does NOT constrain user's rate limit. We will try out best to serve every request. However, please note that when our servers are under high traffic pressure, your requests may take some time to...

Lessons From Red Teaming 100 Generative AI Products

18 Jan 2025 | original ↗

Lessons From Red Teaming 100 Generative AI Products New paper from Microsoft describing their top eight lessons learned red teaming (deliberately seeking security vulnerabilities in) 100 different generative AI models and products over the past few years. The Microsoft AI Red Team (AIRT) grew out of pre-existing red teaming initiatives at the...

Quoting Greg Brockman

16 Jan 2025 | original ↗

Manual inspection of data has probably the highest value-to-prestige ratio of any activity in machine learning. — Greg Brockman, OpenAI, Feb 2023 Tags: machine-learning, openai, ai

Quoting gwern

16 Jan 2025 | original ↗

[...] much of the point of a model like o1 is not to deploy it, but to generate training data for the next model. Every problem that an o1 solves is now a training data point for an o3 (eg. any o1 session which finally stumbles into the right answer can be refined to drop the dead ends and produce a clean transcript to train a more refined...

Datasette Public Office Hours Application

16 Jan 2025 | original ↗

Datasette Public Office Hours Application We are running another Datasette Public Office Hours event on Discord tomorrow (Friday 17th January 2025) at 2pm Pacific / 5pm Eastern / 10pm GMT / more timezones here. The theme this time around is lightning talks - we're looking for 5-8 minute long talks from community members about projects they are...

Qwen: Extending the Context Length to 1M Tokens

Related

More from Simon Willison's Weblog