Trying out QvQ - Qwen's new visual reasoning model
Related
More from Simon Willison's Weblog
Open WebUI I tried out this open source (MIT licensed, JavaScript and Python) localhost UI for accessing LLMs today for the first time. It's very nicely done. I ran it with uvx like this: uvx --python 3.11 open-webui serve On first launch it installed a bunch of dependencies and then downloaded 903MB to...
DeepSeek_V3.pdf The DeepSeek v3 paper (and model card) are out, after yesterday's mysterious release of the undocumented model weights. Plenty of interesting details in here. The model pre-trained on 14.8 trillion "high-quality and diverse tokens" (not otherwise documented). Following this, we conduct post-training, including Supervised...
Providers and deployers of AI systems shall take measures to ensure, to their best extent, a sufficient level of AI literacy of their staff and other persons dealing with the operation and use of AI systems on their behalf, taking into account their technical knowledge, experience, education and training and the context the AI systems are to be...
Cognitive load is what matters Excellent living document (the underlying repo has 625 commits since being created in May 2023) maintained by Artem Zakirullin about minimizing the cognitive load needed to understand and maintain software. This all rings very true to me. I judge the quality of a piece of code by how easy it is to change, and...
deepseek-ai/DeepSeek-V3-Base No model card or announcement yet, but this new model release from Chinese AI lab DeepSeek (an arm of Chinese hedge fund High-Flyer) looks very significant. It's a huge model - 685B parameters, 687.9 GB on disk (TIL how to size a git-lfs repo). The architecture is a Mixture of Experts with 256 experts, using 8 per...