The new chatbot from DeepSeek has made a significant impact in the AI market, introducing itself with the intriguing statement:
Hi, I was created so you can ask anything and get an answer that might even surprise you.
This AI model has not only become a formidable competitor but also contributed to one of NVIDIA's largest stock price drops.
Image: ensigame.com
What distinguishes DeepSeek's model is its innovative architecture and training methods, which include:
Multi-token Prediction (MTP): This technique allows the model to predict multiple words at once by analyzing different parts of a sentence, significantly improving both accuracy and efficiency.
Mixture of Experts (MoE): Utilizing 256 neural networks, with eight activated for each token processing task, this architecture speeds up AI training and enhances performance.
Multi-head Latent Attention (MLA): By focusing on the most significant parts of a sentence and extracting key details repeatedly, MLA reduces the chance of missing crucial information, enabling the AI to capture important nuances.
DeepSeek, a prominent Chinese startup, claims to have developed a competitive AI model at a minimal cost, stating they spent only $6 million on training DeepSeek V3 using just 2048 graphics processors.
Image: ensigame.com
However, analysts from SemiAnalysis have revealed that DeepSeek operates a vast computational infrastructure, comprising around 50,000 Nvidia Hopper GPUs, including 10,000 H800 units, 10,000 H100s, and additional H20 GPUs. These resources are spread across multiple data centers and used for AI training, research, and financial modeling.
The company's total investment in servers is approximately $1.6 billion, with operational expenses estimated at $944 million.
DeepSeek is a subsidiary of the Chinese hedge fund High-Flyer, which established the startup as a separate AI-focused division in 2023. Unlike most startups that rely on cloud providers, DeepSeek owns its data centers, allowing full control over AI model optimization and faster innovation implementation. The company remains self-funded, enhancing its flexibility and decision-making speed.
Image: ensigame.com
Moreover, some researchers at DeepSeek earn over $1.3 million annually, attracting top talent from leading Chinese universities (the company does not hire foreign specialists).
Despite these investments, DeepSeek's claim of training its latest model for just $6 million seems unrealistic. This figure only accounts for GPU usage during pre-training and excludes research expenses, model refinement, data processing, and overall infrastructure costs.
Since its inception, DeepSeek has invested over $500 million in AI development. Its compact structure allows for active and effective implementation of AI innovations, unlike larger, more bureaucratic companies.
Image: ensigame.com
DeepSeek's example shows that a well-funded independent AI company can compete with industry leaders. However, experts note that the company's success is largely due to significant investments, technical breakthroughs, and a strong team, rather than a "revolutionary budget" for AI model development.
Still, DeepSeek's costs remain lower than those of its competitors. For instance, DeepSeek spent $5 million on R1, while ChatGPT4o cost $100 million to train.