The General Services Administration said Grok-4 failed to meet federal safety and alignment standards. The National Security Agency identified security vulnerabilities unique to Grok — absent from models like Anthropic's Claude — in a classified November 2024 review. The Biden-era Chief Digital and AI Office declined Grok entirely, citing training data opacity and weak guardrails. White House Chief of Staff Susie Wiles personally contacted a senior xAI executive after the warnings reached her office.
Then, during the week of February 27, 2026, the Pentagon authorized xAI's Grok for use in classified military environments under a contract worth up to $200 million — shared with Google, OpenAI, and Anthropic. The Wall Street Journal reported the full sequence the same day.
1. What Each Agency Found
Agency Findings at a Glance
- •GSA (Jan 15, 2026): 33-page report — Grok-4 failed safety and alignment standards. Sycophantic, susceptible to data poisoning. Elevated risk without layered oversight.
- •NSA (Nov 2024, classified): Unique Grok security vulnerabilities not found in other AI models. Deterred Pentagon components from adoption.
- •CDAO (Biden era): Rejected outright — training data opacity, weak guardrails, non-compliance with responsible AI standards, insufficient red teaming.
- •Content testing (Dec–Jan 2026): Grok allowed sexualized image edits including those involving children. xAI restricted image generation after discovery.
- •White House: Chief of Staff Susie Wiles contacted a senior xAI executive in early January 2026 following safety alerts.
Despite All of the Above
2. Why This Matters
The Grok approval is the highest-profile example to date of AI safety evaluations being overridden in a sensitive federal context. The paper trail is unusually complete — multiple agencies, a classified NSA review, a White House escalation, and a formal GSA report — making it difficult to argue the risks were unknown.
Public Citizen stated on February 27 that the deployment "disregards internal warnings and could compromise national security." The broader concern is structural: if documented safety failures do not block deployment in classified environments, what function do the evaluations serve?
For the full breakdown — including Grok's architecture, the conflict-of-interest questions around Elon Musk's dual role, and the competitive context among the four vendors on the $200M contract — see our in-depth analysis on the Technology desk.
When internal government reviews call a model unsafe and the Pentagon approves it anyway, the question is no longer whether AI safety evaluations matter — it's whether anyone is required to follow them.