deepseek-ai/DeepSeek-V3-Base
Related
More from Simon Willison's Weblog
Open WebUI I tried out this open source (MIT licensed, JavaScript and Python) localhost UI for accessing LLMs today for the first time. It's very nicely done. I ran it with uvx like this: uvx --python 3.11 open-webui serve On first launch it installed a bunch of dependencies and then downloaded 903MB to...
DeepSeek_V3.pdf The DeepSeek v3 paper (and model card) are out, after yesterday's mysterious release of the undocumented model weights. Plenty of interesting details in here. The model pre-trained on 14.8 trillion "high-quality and diverse tokens" (not otherwise documented). Following this, we conduct post-training, including Supervised...
Providers and deployers of AI systems shall take measures to ensure, to their best extent, a sufficient level of AI literacy of their staff and other persons dealing with the operation and use of AI systems on their behalf, taking into account their technical knowledge, experience, education and training and the context the AI systems are to be...
Cognitive load is what matters Excellent living document (the underlying repo has 625 commits since being created in May 2023) maintained by Artem Zakirullin about minimizing the cognitive load needed to understand and maintain software. This all rings very true to me. I judge the quality of a piece of code by how easy it is to change, and...
I thought we were done for major model releases in 2024, but apparently not: Alibaba's Qwen team just dropped the Apache2 2 licensed QvQ-72B-Preview, "an experimental research model focusing on enhancing visual reasoning capabilities". Their blog post is titled QvQ: To See the World with Wisdom - similar flowery language to their QwQ announcement...