Qwen: Extending the Context Length to 1M Tokens

from blog Simon Willison's Weblog, | ↗ original
Qwen: Extending the Context Length to 1M Tokens The new Qwen2.5-Turbo boasts a million token context window (up from 128,000 for Qwen 2.5) and faster performance: Using sparse attention mechanisms, we successfully reduced the time to first token for processing a context of 1M tokens from 4.9 minutes to 68 seconds, achieving a 4.3x speedup. The...