Open WebUI
Open WebUI I tried out this open source (MIT licensed, JavaScript and Python) localhost UI for accessing LLMs today for the first time. It's very nicely done. I ran it with uvx like this: uvx --python 3.11 open-webui serve On first launch it installed a bunch of dependencies and then downloaded 903MB to...
DeepSeek_V3.pdf
DeepSeek_V3.pdf The DeepSeek v3 paper (and model card) are out, after yesterday's mysterious release of the undocumented model weights. Plenty of interesting details in here. The model pre-trained on 14.8 trillion "high-quality and diverse tokens" (not otherwise documented). Following this, we conduct post-training, including Supervised...
Cognitive load is what matters
Cognitive load is what matters Excellent living document (the underlying repo has 625 commits since being created in May 2023) maintained by Artem Zakirullin about minimizing the cognitive load needed to understand and maintain software. This all rings very true to me. I judge the quality of a piece of code by how easy it is to change, and...
deepseek-ai/DeepSeek-V3-Base
deepseek-ai/DeepSeek-V3-Base No model card or announcement yet, but this new model release from Chinese AI lab DeepSeek (an arm of Chinese hedge fund High-Flyer) looks very significant. It's a huge model - 685B parameters, 687.9 GB on disk (TIL how to size a git-lfs repo). The architecture is a Mixture of Experts with 256 experts, using 8 per...
Trying out QvQ - Qwen's new visual reasoning model
I thought we were done for major model releases in 2024, but apparently not: Alibaba's Qwen team just dropped the Apache2 2 licensed QvQ-72B-Preview, "an experimental research model focusing on enhancing visual reasoning capabilities". Their blog post is titled QvQ: To See the World with Wisdom - similar flowery language to their QwQ announcement...