REVIEWS / AI MODELS / OWNER INSIGHTS
🦉 WE READ 439 OWNER COMMENTS
DeepSeek R1: what owners actually say
Owners praise DeepSeek R1's open-source reasoning power but hit speed bottlenecks, censorship on sensitive topics, and missing core features like function calling
What owners complain about
- Slow local inference COMMON
Owners running distilled models locally report dramatically slower responses vs. cloud alternatives — one user waited 2 minutes for 'highest peak in California' vs 6 seconds on OpenAI o1. The visible chain-of-thought process adds significant latency before any answer appears.
- Censorship on Chinese political topics COMMON
Multiple users report the cloud version refuses to discuss Tiananmen Square, responding with 'I am sorry, I cannot answer that question.' One user noted the model produced a full internal reasoning trace about Tiananmen, then deleted it and output a deflection saying it couldn't approach the topic.
- No function calling or structured outputs SOME
Owners note DeepSeek R1 does not natively support function calling or structured outputs as of current release, with the team acknowledging these features are planned for a future major update but no timeline given.
- Distilled small models degrade significantly SOME
Users report the 8B distilled model fails tasks the full model handles — one user got 'much worse' results on a number reasoning puzzle with the 8B Llama distill, and small models (12B) produced noticeably lower quality jokes and stories.
- Not superior for actual coding work FEW
One developer who extensively compared R1/o1/Sonnet for code generation concluded Claude 3.5 Sonnet still makes fewer mistakes and remains the best for practical coding, questioning whether the reasoning/thinking process adds real value for development tasks.
What owners love
- Exceptional math and benchmark performance
Owners highlight 97.3% on MATH-500 and a 2029 Codeforces rating. One user tested 2024 Putnam Exam questions and found R1 gave satisfactory answers where Claude 3.5 Sonnet, GPT-4o, and o1 all failed.
- Open-source and freely available
Multiple users praise the open-weight release, with the 671B MoE architecture (37B active parameters per inference) being runnable locally via Ollama and distillations available in multiple sizes (8B, 14B, 70B) for various hardware.
- Innovative pure RL training approach
Technically minded owners find the pure reinforcement learning approach without supervised fine-tuning genuinely fascinating and differentiated from competitors, calling it potentially revolutionary for the field.
- Intellectual honesty about uncertainty
One user was impressed the model acknowledged it didn't know details about obscure data structures rather than hallucinating — a notable behavior for a smaller model.
- Cost-effective compute via MoE
Owners appreciate the selective activation design where only 37B of 671B parameters fire per inference, routing inputs to the most relevant expert networks and optimizing compute efficiency.
Surprising patterns
- Censorship appears version-dependent: users running smaller distilled models locally (7B, 14B via Ollama) report the model freely discusses Tiananmen Square without restriction, while the cloud-hosted full model consistently refuses — suggesting censorship is layered at the serving layer, not baked into the weights.
- The visible <think> chain-of-thought is described as entertaining in its own right — users share the model's internal monologue (e.g., 'Hmm, I'm trying to figure out what exactly is going on') almost as a feature, treating it as a window into the model's process even when the final answer is wrong or mediocre.
- Several users discovered that running distilled models through different toolchains (raw GGUF via Ollama vs. Docker-served Open WebUI) produced noticeably different quality and speed, suggesting the hosting setup matters as much as the model choice for practical use.
WHO SHOULD SKIP IT
Developers who need function calling, structured outputs, or fast local inference on modest hardware should wait — R1 currently lacks production API features and smaller distills degrade sharply, making it more of an experimental and research tool than a drop-in replacement for Claude or GPT in daily workflows.
Synthesised from 439 real owner comments across 6 platforms. Every point is grounded in the comments — no marketing, no AI guessing. How we do it →