DeepSeek Explained: How Open-Weight AI Is Disrupting the Industry in 2026

When DeepSeek-R1 was released on January 20, 2025, it erased an assumption the AI industry had held as near-certain: that frontier AI required massive capital expenditure. DeepSeek — a Chinese AI lab spun out of the hedge fund High-Flyer — trained a model matching GPT-4o on reasoning benchmarks at a reported training cost of $5.6 million, compared to the hundreds of millions spent by OpenAI and Anthropic. This guide explains how they did it, what the models can do, how to use them, and what the disruption means.

1. Who Is DeepSeek

DeepSeek is an AI research lab founded in 2023 and headquartered in Hangzhou, China. It was established by High-Flyer, a quantitative hedge fund known for its investment in computational infrastructure. DeepSeek's stated mission is to pursue artificial general intelligence through fundamental research rather than commercial product development — an unusual position for a private company.

Despite operating with a reported team of under 200 researchers (compared to thousands at OpenAI or Google), DeepSeek produced a series of models in 2024–2025 that matched or exceeded leading closed models on key benchmarks. Their research papers provide unusual transparency into their architectural choices, making them highly influential beyond the models themselves.

2. The Models: R1, V3, and the Family

2.1 DeepSeek-V3 (December 2024)

DeepSeek-V3 is a 671-billion-parameter Mixture-of-Experts (MoE) model, but only 37 billion parameters are active per forward pass. It was trained on 14.8 trillion tokens. V3 excels at coding, mathematical reasoning, and Chinese language tasks. Its training reportedly cost approximately $5.6 million in compute — roughly 1/60th the estimated cost of training GPT-4.

2.2 DeepSeek-R1 (January 2025)

R1 is a reasoning-focused model in the same vein as OpenAI's o1. Unlike V3, which is a general-purpose model, R1 uses a reinforcement learning approach to "think" through problems step by step before answering. DeepSeek open-sourced R1 under a permissive MIT-like licence, making it the first competitive reasoning model available for local deployment and commercial use without restrictions. R1 matched o1 on AIME 2024 math benchmarks and outperformed it on several coding benchmarks.

2.3 Distilled Variants

DeepSeek also released distilled versions of R1 (1.5B, 7B, 8B, 14B, 32B, and 70B parameters) fine-tuned from Llama and Qwen base models. The 14B distilled model outperforms GPT-4o mini on several benchmarks while running on a single consumer GPU (RTX 4090 or equivalent), making capable reasoning models accessible without cloud API dependence.

3. Benchmark Performance

BenchmarkDeepSeek-R1OpenAI o1GPT-4oClaude 3.5 Sonnet
AIME 2024 (math)79.8%79.2%9.3%16.0%
MATH-50097.3%96.4%76.6%78.3%
SWE-bench Verified (coding)49.2%48.9%38.8%49.0%
GPQA Diamond (science)71.5%75.7%53.6%65.0%
Human evaluation (general)HighHighHighHigh

R1 is competitive with o1 on mathematical reasoning and coding. It falls slightly behind on general science PhD-level questions (GPQA) but at a fraction of the inference cost.

4. Architecture Innovations

4.1 Multi-head Latent Attention (MLA)

Standard multi-head attention stores full key-value (KV) caches for all previous tokens, which grows linearly with context length and becomes extremely memory-intensive. MLA compresses key-value pairs through low-rank projection into a "latent" representation before computing attention. This reduces KV cache memory by 5–13× compared to standard attention, enabling longer context windows and more efficient inference without sacrificing model quality. MLA is now being adopted in models outside DeepSeek.

4.2 DeepSeekMoE (Mixture of Experts)

DeepSeek's MoE architecture departs from standard MoE (which uses a few large experts) by using many more finer-grained experts — V3 has 256 experts, of which only 8 are activated per token. Additionally, some experts are designated "shared" and always active, ensuring baseline capabilities are consistent regardless of routing. This design improves expert utilisation and reduces routing imbalance, a common failure mode in MoE systems.

4.3 Group Relative Policy Optimisation (GRPO)

R1's reasoning capability was developed through GRPO, a reinforcement learning algorithm that evaluates multiple responses to the same question, ranks them by quality, and uses the relative differences to update the model — without needing a separate reward model. This simplification of the RL pipeline significantly reduced training costs while producing strong reasoning emergence.

4.4 FP8 Training

DeepSeek-V3 was trained using 8-bit floating-point (FP8) precision for most operations, previously considered too imprecise for stable large-model training. By implementing layer-level and tensor-level scaling strategies, DeepSeek achieved numerically stable FP8 training, roughly doubling compute efficiency compared to BF16 training on the same hardware.

5. The Cost Breakthrough Explained

DeepSeek's $5.6 million training cost claim requires context. This figure covers the final training run only — not preceding research, ablation experiments, failed runs, or the infrastructure investment. Nevertheless, it is dramatically below what comparable models cost to train at US labs.

Several factors explain the gap: (1) Architectural efficiency — MLA and MoE reduce compute per token materially; (2) FP8 training doubles hardware utilisation; (3) Engineering optimisation — DeepSeek published extensive pipeline parallelism and communication optimisation techniques; (4) Lower labour costs — Chinese researchers are paid significantly less than Silicon Valley counterparts; (5) Fewer dead-end experiments — DeepSeek concentrated resources on architectures proven through prior work.

The implication is profound: the assumption that only companies with billion-dollar compute budgets could train frontier models is broken. DeepSeek's cost efficiency has forced a re-evaluation of AI moats throughout the industry.

6. Open-Weight Licensing

DeepSeek releases model weights under a licence that permits free commercial use, modification, and distribution — with restrictions on using the models to train other commercial models (an anti-distillation clause targeting OpenAI-scale competitors) and compliance with Chinese regulatory requirements.

For most enterprise and developer use cases, the licence is effectively permissive. You can run DeepSeek models on your own infrastructure, fine-tune them on proprietary data, and deploy them in products without paying API fees. This makes DeepSeek models practically attractive beyond their benchmark performance.

7. How to Run DeepSeek Locally

7.1 Using Ollama (Recommended for Simplicity)

# Install Ollama (macOS, Linux, Windows)
# https://ollama.com/download

# Pull and run DeepSeek-R1 7B (requires ~8 GB VRAM or 16 GB RAM)
ollama run deepseek-r1:7b

# For better quality: 14B model (~16 GB VRAM / 32 GB RAM)
ollama run deepseek-r1:14b

# For maximum quality: 32B model (~35 GB VRAM or 64 GB RAM)
ollama run deepseek-r1:32b

# Once running, chat in the terminal or access via API:
curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1:14b",
  "messages": [{"role":"user","content":"Explain gradient descent"}]
}'

7.2 Using Python with Ollama API

import requests
import json

def ask_deepseek(question: str, model: str = "deepseek-r1:14b") -> str:
    """Query a local DeepSeek model via Ollama."""
    response = requests.post(
        "http://localhost:11434/api/chat",
        json={
            "model": model,
            "messages": [{"role": "user", "content": question}],
            "stream": False
        },
        timeout=120
    )
    return response.json()["message"]["content"]

# Example
answer = ask_deepseek("Write a Python function to find all prime numbers up to n using the Sieve of Eratosthenes.")
print(answer)

8. Using the DeepSeek API

DeepSeek provides a cloud API compatible with the OpenAI SDK, making migration trivial. As of early 2026, DeepSeek API pricing is approximately $0.14 per million input tokens and $0.28 per million output tokens for V3 — roughly 30× cheaper than GPT-4o.

from openai import OpenAI  # DeepSeek API is OpenAI-compatible

client = OpenAI(
    api_key="your-deepseek-api-key",
    base_url="https://api.deepseek.com/v1"
)

response = client.chat.completions.create(
    model="deepseek-chat",       # V3 general model
    # model="deepseek-reasoner", # R1 reasoning model
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What are the key differences between MoE and dense transformer architectures?"}
    ],
    max_tokens=1024,
    temperature=0.7
)
print(response.choices[0].message.content)

9. DeepSeek vs. GPT-4o vs. Claude 3.7

AttributeDeepSeek-R1DeepSeek-V3GPT-4oClaude 3.7 Sonnet
TypeReasoning (chain-of-thought)General purposeGeneral purposeGeneral / coding
Parameters671B (37B active)671B (37B active)Unknown (~200B est.)Unknown
Open weightYes (MIT-like)Yes (MIT-like)NoNo
API cost (input/1M)$0.55$0.14$2.50$3.00
Local deploymentYes (distilled variants)Yes (full)NoNo
Context window128K128K128K200K
Math reasoningExcellentVery goodGoodVery good
CodingExcellentExcellentExcellentExcellent
Data privacyLocal possibleLocal possibleCloud onlyCloud only

10. Industry & Geopolitical Impact

The release of DeepSeek-R1 in January 2025 triggered one of the largest single-day market cap losses in history: Nvidia lost approximately $593 billion in market value in one trading session as investors re-evaluated the assumption that ever-increasing GPU demand was structurally guaranteed. If efficient models could match expensive ones, the CapEx arms race thesis underlying Nvidia's valuation came into question.

More broadly, DeepSeek's success created three industry-level effects: First, it validated that the open-weight model ecosystem could reach frontier quality, reinvigorating the Llama/Mistral/DeepSeek open ecosystem. Second, it pressured closed API providers to cut prices — OpenAI, Google, and Anthropic all reduced API costs in the months following DeepSeek's release. Third, it intensified US-China AI rivalry: the US government subsequently tightened chip export controls to China, while Chinese AI labs accelerated investment to close remaining capability gaps.

11. Limitations and Considerations

  • Content filtering: DeepSeek models apply Chinese regulatory content restrictions. Topics related to Tiananmen Square, Taiwan independence, Tibet, and criticism of the Chinese government are filtered in API responses. Local deployments of open-weight models do not apply these filters.
  • Data privacy: DeepSeek's cloud API routes data through servers subject to Chinese law, which may require cooperation with government data requests. For sensitive enterprise data, local deployment of open-weight models is strongly preferred.
  • Reasoning verbosity: R1's chain-of-thought output can be extremely long — the model "thinks aloud" before answering. This increases token costs compared to a single-pass model. Some applications need to strip or summarise the reasoning output.
  • Multilingual performance: V3 and R1 perform best in Chinese and English. Performance in other languages, while generally capable, lags behind the top models for non-Latin scripts.

12. Frequently Asked Questions

Is DeepSeek really open source?

The model weights are publicly available and can be downloaded, modified, and deployed commercially — which is what "open-weight" means. The training code and full training data are not published. This is the same distinction that applies to Meta's Llama models. The licence is permissive for most uses but not fully OSI-approved "open source."

Is it safe to use DeepSeek for enterprise data?

For sensitive data, use the open-weight models on your own infrastructure via Ollama, vLLM, or similar. This completely eliminates data transmission to DeepSeek's or any cloud servers. The API should only be used with data you would be comfortable storing on a third-party cloud service subject to Chinese law.

How does DeepSeek-R1 compare to OpenAI's o3?

OpenAI's o3 (released in April 2025) significantly outperforms R1 on the hardest reasoning benchmarks (ARC-AGI, GPQA Diamond). However, o3 is far more expensive — $10–$60 per million output tokens depending on mode. For the vast majority of reasoning tasks, R1 delivers adequate quality at 20–100× lower cost. o3 is best reserved for genuinely hard scientific reasoning or complex mathematical problems where R1 falls short.

13. Glossary

Open-weight model
A model whose trained weights are publicly released for download and local deployment, as opposed to closed models only accessible via API.
Mixture of Experts (MoE)
An architecture where different subsets of model parameters ("experts") are activated for different inputs, allowing large parameter counts without proportional compute costs.
Multi-head Latent Attention (MLA)
DeepSeek's innovation that compresses key-value caches through low-rank projection, reducing memory requirements by 5–13× versus standard attention.
GRPO (Group Relative Policy Optimisation)
DeepSeek's reinforcement learning algorithm that uses relative quality rankings of multiple sampled responses to train reasoning without a separate reward model.
KV Cache
The cached key-value pairs from previous tokens in a transformer's attention computation, enabling efficient autoregressive generation.
FP8 Training
Using 8-bit floating-point arithmetic for training computations, reducing memory and compute requirements versus 16-bit (BF16) training.

14. References & Further Reading

Try it now: run ollama run deepseek-r1:7b — on most machines it downloads in under 5 minutes and runs locally with no API key, no cost, and complete data privacy.