In a startling revelation that underscores the escalating stakes in the global artificial intelligence race, Anthropic, a leading AI research company, has accused three Chinese AI firms – DeepSeek, Moonshot AI, and MiniMax – of orchestrating “industrial-scale campaigns” to illicitly extract capabilities from its advanced large language model (LLM), Claude.
The alleged attacks, spanning over 16 million interactions with Claude via approximately 24,000 fraudulent accounts, represent a significant breach of Anthropic’s terms of service and regional access restrictions. All three implicated companies are based in China, a region where Anthropic’s services are explicitly prohibited due to a complex web of “legal, regulatory, and security risks.”
The Art of ‘Distillation’: A Double-Edged Sword
At the heart of these allegations lies a technique known as “distillation.” In essence, distillation involves training a less powerful AI model on the outputs generated by a more sophisticated system. While this method is a legitimate and common practice for companies seeking to create more compact, efficient, and cost-effective versions of their own cutting-edge models, its application by competitors to siphon off another company’s proprietary capabilities is unequivocally illegal.
Anthropic emphasizes that such illicit activities allow rival firms to acquire advanced AI functionalities at a mere fraction of the time and financial investment it would take to develop them independently. This not only undermines fair competition but also poses profound security implications.
National Security Risks and Unsafeguarded AI
“Illicitly distilled models lack necessary safeguards, creating significant national security risks,” Anthropic warned in its statement. The company elaborated that models built through such unauthorized extraction are highly unlikely to retain the crucial protections embedded in the original system, potentially leading to the proliferation of dangerous capabilities stripped of their intended safety mechanisms.
The concern extends to the potential weaponization of these unprotected AI capabilities by foreign entities. Anthropic suggests that such models could form the bedrock for malicious cyber activities, disinformation campaigns, and mass surveillance systems, particularly when deployed by authoritarian governments for offensive purposes.
Anatomy of the Attack: Fraudulent Accounts and Hydra Clusters
The sophisticated campaigns detailed by Anthropic involved the extensive use of fraudulent accounts and commercial proxy services. These mechanisms allowed the attackers to access Claude at scale while meticulously evading detection. Anthropic’s attribution of each campaign to a specific AI lab was based on a rigorous analysis of request metadata, IP address correlations, and infrastructure indicators.
Targeted Extraction: What Each Firm Sought
- DeepSeek: This firm reportedly focused on Claude’s reasoning capabilities, rubric-based grading tasks, and sought its help in generating censorship-safe alternatives for politically sensitive queries concerning dissidents, party leaders, or authoritarianism, across more than 150,000 exchanges.
- Moonshot AI:
Targeting Claude’s agentic reasoning, tool use, coding capabilities, computer-use agent development, and computer vision, Moonshot AI engaged in over 3.4 million exchanges.
- MiniMax: With over 13 million exchanges, MiniMax primarily aimed at Claude’s agentic coding and tool use capabilities.
“The volume, structure, and focus of the prompts were distinct from normal usage patterns, reflecting deliberate capability extraction rather than legitimate use,” Anthropic stated, highlighting that each campaign specifically targeted Claude’s most differentiated capabilities: agentic reasoning, tool use, and coding.
The Proxy Network: A Resilient Threat
The attacks leveraged commercial proxy services that effectively resell access to Claude and other frontier AI models on a large scale. These services operate on “hydra cluster” architectures, vast networks of fraudulent accounts designed to distribute traffic across their API. This distributed access facilitates the generation of massive volumes of carefully crafted prompts, all aimed at extracting specific capabilities and harvesting high-quality responses to train their own models.
“The breadth of these networks means that there are no single points of failure,” Anthropic explained. “When one account is banned, a new one takes its place. In one case, a single proxy network managed more than 20,000 fraudulent accounts simultaneously, mixing distillation traffic with unrelated customer requests to make detection harder.”
Anthropic’s Countermeasures and Broader Industry Concerns
In response to these persistent threats, Anthropic has developed advanced classifiers and behavioral fingerprinting systems to detect suspicious distillation patterns in API traffic. The company has also strengthened verification processes for educational accounts, security research programs, and startup organizations, alongside implementing enhanced safeguards to diminish the efficacy of illicit model outputs.
This disclosure from Anthropic follows a similar revelation from Google’s Threat Intelligence Group (GTIG) just weeks prior. GTIG reported identifying and disrupting distillation and model extraction attacks targeting Gemini’s reasoning capabilities through over 100,000 prompts. Google clarified that while these attacks pose a significant risk to model developers and service providers, they “do not typically represent a risk to average users, as they do not threaten the confidentiality, availability, or integrity of AI services.”
The incidents at Anthropic and Google underscore a growing challenge for leading AI developers: protecting their invaluable intellectual property and ensuring the responsible deployment of powerful AI systems in an increasingly competitive and complex geopolitical landscape.
For more details, visit our website.
Source: Link









Leave a comment