LOGO

Deepseek AI Model: A Top Open-Source Challenger

December 26, 2024
Deepseek AI Model: A Top Open-Source Challenger

A New Contender in Open AI: DeepSeek V3

A laboratory in China has reportedly developed a highly capable “open” AI model, currently considered one of the most powerful available.

Model Release and Capabilities

Developed by the AI firm DeepSeek, the model – named DeepSeek V3 – was released on Wednesday with a permissive license. This allows developers to freely download and modify it for a wide array of applications, including commercial use.

DeepSeek V3 demonstrates proficiency in handling diverse text-based tasks. These include coding, translation, and the creation of written content such as essays and emails, all generated from descriptive prompts.

Performance Benchmarks

Internal testing conducted by DeepSeek indicates that V3 surpasses both openly accessible models and “closed” AI systems, which are accessed solely through APIs.

In coding competitions hosted on Codeforces, a popular platform for programming contests, DeepSeek V3 outperformed notable models. These include Meta’s Llama 3.1 405B, OpenAI’s GPT-4o, and Alibaba’s Qwen 2.5 72B.

Furthermore, DeepSeek V3 achieved superior results on Aider Polyglot, a test specifically designed to evaluate a model’s ability to integrate new code into existing projects.

Technical Specifications

DeepSeek states that the model was trained on a massive dataset containing 14.8 trillion tokens. In data science, a token represents a unit of raw data; approximately 750,000 words equate to 1 million tokens.

The model’s size is also substantial, boasting 671 billion parameters (or 685 billion on the Hugging Face AI development platform). Parameters are the internal variables models utilize for predictions and decision-making.

This parameter count is approximately 1.6 times larger than that of Llama 3.1 405B, which has 405 billion parameters. Generally, a higher parameter count correlates with improved performance, though it also necessitates more powerful hardware.

Training and Cost

Despite its size, DeepSeek was able to train the model in roughly two months using a data center equipped with Nvidia H800 GPUs. This is particularly noteworthy given recent U.S. Department of Commerce restrictions on Chinese companies procuring these GPUs.

The company reports a training cost of only $5.5 million, a significantly lower figure compared to the development expenses of models like OpenAI’s GPT-4.

Limitations and Censorship

However, DeepSeek V3 exhibits certain limitations. For example, it avoids responding to inquiries about sensitive topics like Tiananmen Square.

As a Chinese company, DeepSeek is subject to regulatory oversight. Its models are benchmarked to ensure their responses align with “core socialist values.” Consequently, many Chinese AI systems refrain from addressing topics that could potentially displease regulators, such as discussions about the Xi Jinping regime.

DeepSeek and High-Flyer Capital Management

DeepSeek, which previously launched DeepSeek-R1 as a competitor to OpenAI’s o1 “reasoning” model, is backed by High-Flyer Capital Management.

High-Flyer is a Chinese quantitative hedge fund that leverages AI for its trading strategies. The firm constructs its own server clusters for model training, with a recent cluster reportedly containing 10,000 Nvidia A100 GPUs and costing approximately $138 million.

Founded by Liang Wenfeng, a computer science graduate, High-Flyer aims to achieve “superintelligent” AI through its DeepSeek organization.

The Future of AI Development

Wenfeng has characterized closed-source AI as a “temporary” advantage, asserting that it “hasn’t stopped others from catching up.”

This sentiment suggests a continuing drive towards open-source AI development and increased competition within the field.

TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.

#deepseek#ai model#open source ai#artificial intelligence#ai challenger