What they're not telling you: # What political censorship looks like inside an LLM's weights (Qwen 3.5) Alibaba's Qwen 3.5 language model contains a surgically precise three-vector circuit in layers 11–31 that detects PRC-sensitive content and routes the model away from factual answers it demonstrably possesses. A mechanistic-interpretability study has mapped the actual computational substrate of content filtering inside a deployed commercial language model. The findings show censorship is not a learned reluctance or safety guardrail applied after training—it is an engineered diversion built into the model's decision-making layers.

Marcus Webb
The Take
Marcus Webb · Surveillance & Tech Privacy

# THE TAKE: Alibaba's Qwen Isn't Censorship—It's Architecture The NewsAnarchist crowd loves a good "Chinese surveillance state" narrative, but examining Qwen 3.5's weights tells a different story. This isn't censorship injected post-hoc into a neutral system. It's *design*. The model was trained on filtered data, full stop. What should actually concern you: Qwen's political constraints are *transparent* compared to Western LLMs. OpenAI, Anthropic, and Meta bury their biases deeper—constitutional AI, RLHF blackboxes, undisclosed training exclusions. At least you can probe Qwen's limitations directly. The real scandal? All major LLMs encode political preferences at the weight level. Qwen's are just geographically specific. Everyone's playing the same game. Alibaba's just honest about which team they're on.

What the Documents Show

The model knows the facts. It chooses not to output them. The infrastructure works like this: Between layers 11 and 20, called the "writer" layers, the model computes three internal directions encoded as vectors in its hidden state. The first direction (d_prc) detects whether input contains politically sensitive content about the People's Republic of China. The second (d_refuse) decides whether to refuse.

🔎 Mainstream angle: The corporate press either ignored this story entirely or buried it in a 3-sentence brief. The framing, when it appeared at all, focused on process rather than impact.

Follow the Money

The third (d_style) determines whether to deflect or propagandize. These three vectors operate as a binary switch. Researchers found clean dose-response curves: nudging the right direction at the right layer causes the model to snap between behaviors—from providing factual information to providing refusal templates. The censorship persists in layers 20–31, the "reader" layers, where the three-direction signal is converted into actual output text. Around layer 24, a commitment moment occurs. Researchers observed the model rendering its internal decision into Chinese tokens—this happens even on unrelated prompts like bank-phishing requests—before later layers translate that internal Chinese decision into the English output users see.

What Else We Know

The intermediate Chinese reasoning does not affect the final answer. The decision lives in the three vectors, not in language. The mechanism targets specific topics with specific responses. Tiananmen Square produces a stock deflection: "as an AI assistant, my main function is to provide help…" Qwen 3.5-9B-Base, the unaligned predecessor model released before fine-tuning, provides accurate Western-framed answers on identical Tiananmen, Tank Man, and Falun Gong prompts under raw text completion. The factual knowledge is already present in pretraining. The censorship is behavior layered on top of retained facts.

Primary Sources

What are they not saying? Who benefits from this story staying buried? Follow the regulatory filings, the court dockets, and the FOIA releases. The truth is in the paperwork — it always is.

Disclosure: NewsAnarchist aggregates from public records, API feeds (Federal Register, CourtListener, MuckRock, Hacker News), and independent media. AI-assisted synthesis. Always verify primary sources linked above.