deepseek-ai/DeepSeek-V3-Base

from blog Simon Willison's Weblog, | ↗ original
deepseek-ai/DeepSeek-V3-Base No model card or announcement yet, but this new model release from Chinese AI lab DeepSeek (an arm of Chinese hedge fund High-Flyer) looks very significant. It's a huge model - 685B parameters, 687.9 GB on disk (TIL how to size a git-lfs repo). The architecture is a Mixture of Experts with 256 experts, using 8 per...