China AI Weekly: DeepSeek V4 Goes All-In on Huawei

This was the week China’s AI ecosystem stopped hedging on self-sufficiency. DeepSeek confirmed its upcoming V4 model will run exclusively on Huawei silicon — no NVIDIA, no fallback. Zhipu AI shipped GLM-5.1 under an MIT license, an open-weight model that outperforms Claude Opus 4.6 and GPT-5.4 on long-running coding benchmarks. And Alibaba was unmasked as the anonymous creator of HappyHorse-1.0, a video generation model that took the #1 spot on every major leaderboard the day it appeared. For enterprise teams tracking the global AI landscape, this is a China AI Weekly worth reading carefully.

DeepSeek V4: A Trillion Parameters on Domestic Silicon

The headline story is strategic, not technical. DeepSeek V4 — a ~1 trillion parameter Mixture-of-Experts model with a reported 1 million token context window — will launch in late April running exclusively on Huawei’s Ascend 950PR chips. DeepSeek collaborated directly with Huawei and Cambricon, rewriting its codebase for the Ascend architecture. NVIDIA and AMD were explicitly excluded.

The model activates 32–37 billion parameters per token via MoE routing, and early API stress tests of a V4-Lite variant are already underway. Huawei claims 1.8x faster inference speeds versus prior hardware generations.

This is not an experiment. Alibaba, ByteDance, and Tencent have placed orders for hundreds of thousands of Ascend 950PR chips for production AI workloads, per The Information. A frontier-class model running at scale on entirely domestic hardware is what the Chinese AI ecosystem has been building toward since the first US export controls landed. If V4 delivers competitive benchmarks, the case for a bifurcated global AI hardware stack becomes much harder to dismiss.

Enterprise takeaway: If your organization relies on NVIDIA’s dominance for supply chain planning or competitive moat assumptions, this is a data point that warrants attention. The “China is 2-3 years behind on chips” narrative is being actively tested.

Zhipu AI Ships GLM-5.1: Open-Source, Long-Running, Agentic

Zhipu AI released GLM-5.1 under an MIT license — a 754 billion parameter model specifically designed for long-running agentic coding tasks. The headline claims are aggressive: 8-hour autonomous execution, strategy rethinking across hundreds of iterations, and up to 6,000+ tool calls per session.

On SWE-Bench Pro, Zhipu reports GLM-5.1 outperforms both Claude Opus 4.6 and GPT-5.4. The model features a 128K context window and “interleaved thinking” — a step-by-step reasoning approach where the model can abandon and restart strategies mid-execution. It is available now on Hugging Face, ModelScope, and Zhipu’s API.

Zhipu also launched GLM-5V-Turbo on April 1 — a native multimodal vision-coding model that generates code from images and videos. And in a business signal worth noting, the company raised API prices by 10% — the second increase this year — while remaining below Western rival pricing.

Enterprise takeaway: An MIT-licensed 754B model with competitive coding benchmarks is a serious addition to the open-weight landscape. Teams evaluating alternatives to proprietary coding agents should put GLM-5.1 on the shortlist — but verify the SWE-Bench Pro claims independently (see the benchmark section below).

Alibaba’s HappyHorse: The Anonymous #1 Video Model

On April 7, a model called HappyHorse-1.0 appeared on the Artificial Analysis Video Arena and immediately took #1 in both text-to-video and image-to-video categories, holding a 116-point lead over the previous top model. No team was credited. Two days later, The Information reported that Alibaba was behind it.

The technical approach is distinctive: a 15 billion parameter unified single-stream transformer (not diffusion) that generates native 1080p video with synchronized audio — dialogue, ambient sound, and Foley effects — in a single forward pass. It supports seven languages with phoneme-level lip sync and reportedly generates 1080p output in ~38 seconds on a single H100.

No weights, API, or reproducible demo have been released yet. Alibaba has indicated open-source releases are “coming soon.” The anonymous launch pattern — seen previously with Pony Alpha and GLM-5 — is becoming a recurring strategy in Chinese AI: prove the benchmark first, reveal the brand second.

Enterprise takeaway: If HappyHorse weights ship as open-source, it could significantly lower the cost of enterprise video generation. The joint audio-video architecture is particularly relevant for teams building multilingual content pipelines.

Kimi K2.5 Powers Cursor’s Composer 2

Moonshot AI’s Kimi K2.5 is now the base model behind Cursor’s Composer 2 agent engine, which launched with Cursor 3 on April 2. Composer 2 scores 61.3 on CursorBench — a 39% improvement over its predecessor — and outperforms Claude Opus 4.6 on those tasks at approximately 90% lower cost per token. Cursor co-founder Aman Sanger acknowledged the Kimi base after initially omitting it from launch materials.

This is a meaningful commercial validation. Kimi K2.5 is not just competitive on benchmarks — it is the production model powering one of the most widely used AI coding tools in the world.

The Benchmark Credibility Problem

The week’s most sobering finding came from SWE-Rebench, a decontaminated version of SWE-bench using fresh GitHub tasks that no model has seen in training. Chinese models that posted competitive scores on the original SWE-bench saw dramatic drops on the clean version, while Western models maintained their performance.

The implication: some Chinese model benchmarks may reflect training data overlap rather than genuine software engineering capability. On the Pencil Puzzle Benchmark — a novel reasoning test — the gap was even starker: DeepSeek scored 2%, Qwen 3.5 scored 0.7%, and Kimi K2 scored 6%, compared to GPT-5.2 at 56% and Claude Opus 4.6 at 36.7%.

This does not invalidate the real progress happening in Chinese AI. But it does mean enterprise teams should weight decontaminated benchmarks more heavily when evaluating models for production use — regardless of origin.

Industry and Policy

China issued draft regulations on AI digital humans on April 3. The Cyberspace Administration’s new rules require prominent “digital human” identification marks, ban use for evading biometric authentication, mandate explicit consent for likeness and voice data, and restrict AI companion content targeting minors. Public consultation runs through May 6, with interim measures on emotional interaction services taking effect July 15.

Separately, a broader trend is emerging around Hugging Face contributions. China now surpasses the US in public model uploads to the platform, with the ecosystem reaching 11 million users and 2M+ public models. The open-weight pipeline from Chinese labs shows no signs of slowing.

What to Watch

DeepSeek V4 launch (late April). The first frontier-class model running exclusively on domestic Chinese silicon. Benchmark performance versus GPT-5.4 and Claude Opus 4.6 — especially on decontaminated tests like SWE-Rebench — will determine whether the Huawei bet pays off technically, not just strategically.
HappyHorse open-source release. Alibaba promised weights on GitHub and Hugging Face. If delivered, it could reshape the open video generation landscape and put significant pressure on proprietary alternatives.
Digital human regulation finalization. The May 6 consultation deadline will shape how Chinese companies deploy AI avatars, companions, and virtual influencers — a market where China leads globally. Enterprises with cross-border AI deployments should track the final language.

That is this week’s China AI Weekly. The through-line is clear: China’s AI ecosystem is not just building competitive models — it is building the infrastructure to run them independently. Whether that independence extends to benchmark credibility remains the open question. Check back next week for the latest.