Open WebUI
Related
More from Simon Willison's Weblog
DeepSeek_V3.pdf The DeepSeek v3 paper (and model card) are out, after yesterday's mysterious release of the undocumented model weights. Plenty of interesting details in here. The model pre-trained on 14.8 trillion "high-quality and diverse tokens" (not otherwise documented). Following this, we conduct post-training, including Supervised...
Providers and deployers of AI systems shall take measures to ensure, to their best extent, a sufficient level of AI literacy of their staff and other persons dealing with the operation and use of AI systems on their behalf, taking into account their technical knowledge, experience, education and training and the context the AI systems are to be...
Cognitive load is what matters Excellent living document (the underlying repo has 625 commits since being created in May 2023) maintained by Artem Zakirullin about minimizing the cognitive load needed to understand and maintain software. This all rings very true to me. I judge the quality of a piece of code by how easy it is to change, and...
deepseek-ai/DeepSeek-V3-Base No model card or announcement yet, but this new model release from Chinese AI lab DeepSeek (an arm of Chinese hedge fund High-Flyer) looks very significant. It's a huge model - 685B parameters, 687.9 GB on disk (TIL how to size a git-lfs repo). The architecture is a Mixture of Experts with 256 experts, using 8 per...
I thought we were done for major model releases in 2024, but apparently not: Alibaba's Qwen team just dropped the Apache2 2 licensed QvQ-72B-Preview, "an experimental research model focusing on enhancing visual reasoning capabilities". Their blog post is titled QvQ: To See the World with Wisdom - similar flowery language to their QwQ announcement...