A chatbot-facilitated grab of 195 million records from Mexican government agencies—some 150 gigabytes of data—resembles a recent chatbot-facilitated theft by China-backed hackers (Let’s Data Science, February 25, 2026).
“On February 25, 2026, Bloomberg published a story that would have sounded like fiction two years ago. A lone hacker, with no apparent ties to any government, used Anthropic’s Claude chatbot to orchestrate a cyberattack against Mexico’s federal and state government agencies. The campaign lasted roughly six weeks, from late December 2025 through January 2026. By the time it was over, the attacker had stolen 150 gigabytes of sensitive data—including 195 million taxpayer records, voter registration files, government employee credentials, and civil registry data.”
The hacker’s only essential sophistication was knowing what Spanish-language prompts to use and what fictions to feed Claude to get the bot to lower its “guardrails.” The story crooks tell the bot when committing this kind of crime is something like “We’re testing where the dyke is leaking so we can plug those holes.”
An Israeli cybersecurity firm called Gambit Security discovered “publicly accessible conversation logs showing exactly how the attacker coaxed Claude into becoming an offensive hacking assistant. The paper trail was remarkably detailed—a step-by-step record of how guardrails were tested, resisted, and ultimately bypassed.”
Once Claude’s resistance was bypassed, at least mostly bypassed, Claude found vulnerabilities, wrote scripts to target the vulnerabilities, and provided methods to automatically extract data. According to Gambit Security’s Curtis Simpson, Claude “produced thousands of detailed reports that included ready-to-execute plans, telling the human operator exactly which internal targets to attack next and what credentials to use.”
The hacker also used ChatGPT to help out when Claude fell short in executing various aspects of the attack.
China
Not that long ago, in November 2025, Anthropic reported that it had detected and managed to disrupt a hacker campaign sponsored by the People’s Republic of China “that had used Claude code to target approximately 30 global organizations, including technology companies, financial institutions, and government agencies.”
Let’s Data Science says that the attack on Mexican agencies was perpetrated by “a single unknown individual” as opposed to the “state-sponsored group” responsible for the China attack. (I’m not sure this is true, though, since the fact that there was a single holder of OpenAI and Anthropic subscriptions in the Mexico case doesn’t show that only one person was coming up with ideas for prompts or playbooks.)
In each case, the perpetrators told the bots similar stories about being good guys looking for vulnerabilities. The hackers used different methods to bypass guardrails. In the case of the attack on Mexico, the hacker or hackers ultimately defeated the guardrails by submitting an “operational playbook in a single prompt,” whereas the Chinese hackers reduced its attacks to “small, innocuous-seeming tasks.”
The Chinese hackers were not as successful, managing to achieve only a “small number of successful infiltrations” of the 30 or so targets before being thwarted. The perpetrator or perpetrators of the Mexico attack made off with a much larger haul.
Also, Anthropic reported that during the Chinese attacks, “Claude frequently hallucinated—claiming credentials that did not work and flagging ‘critical discoveries’ that were publicly available information. The AI did not discover new attack methods. It used existing techniques more efficiently. Whether the Mexico attacker experienced similar limitations is not publicly known.”
Like other hackers, CCP hackers will be able to exploit the lessons of the Mexico attack and other AI-assisted attacks—“the Mexico case is part of a broader trend, not an isolated incident,” notes LDS—unless a lot of hardening and supplementing of bot guardrails happens. The companies and agencies supposed to safeguard data must also harden their own guardrails; that is, assuming that they currently even have any guardrails to speak of.