Multiple U.S. federal agencies expressed serious reservations about the safety and reliability of xAI's Grok chatbot in the months before the Pentagon authorized its use in classified military environments — a decision finalized during the week of February 27, 2026, as reported by the Wall Street Journal .
The General Services Administration flagged Grok as sycophantic and vulnerable to data poisoning in a 33-page January 2026 review. The National Security Agency identified unique Grok security vulnerabilities in a classified November 2024 assessment. White House Chief of Staff Susie Wiles contacted a senior xAI executive directly in early January 2026. The Biden-era Chief Digital and AI Office had already declined Grok entirely. Then, under a contract worth up to $200 million shared with Google, OpenAI, and Anthropic, the Pentagon approved it for classified use anyway.
Primary reporting by the Wall Street Journal, published February 27, 2026. ObjectWire coverage is based on that report and corroborating public statements from GSA and Public Citizen.
Background: xAI's Grok and Federal AI Deployment
xAI's Grok chatbot, developed by Elon Musk's AI company and available up to version Grok-4, operates with looser content controls than most enterprise AI models — a design philosophy Musk has publicly tied to free speech principles. In January 2026, xAI limited image-generation features to paying customers following safety testing revelations.
Federal interest in Grok accelerated through a July 2025 Pentagon contract awarding up to $200 million collectively to xAI, Google, OpenAI, and Anthropic for AI development across government use cases. The contract positioned Grok as one of several models available for defense applications — pending individual agency approvals.
$200M
Pentagon AI contract (shared, July 2025)
4
AI providers on contract
1,200
Federal AI use cases documented (2025)
2
Major agencies that flagged Grok risks
During the Biden administration, the Chief Digital and AI Office (CDAO) declined to authorize Grok, citing challenges in tracking training data provenance, non-compliance with responsible AI executive order standards, weak content guardrails, and insufficient red teaming procedures.
Specific Concerns Raised by Federal Agencies
Agencies documented multiple distinct vulnerabilities through internal reviews, formal assessments, and hands-on testing — spanning content safety, security architecture, and adversarial robustness.
The GSA's 33-page executive summary concluded that Grok-4 failed to meet safety and alignment standards for federal deployment and recommended strict, layered oversight to mitigate elevated risks if deployment proceeded. The report described Grok as "overly compliant" in unguarded configurations — meaning it would follow harmful or manipulated instructions rather than refusing them.
The NSA's classified review identified security vulnerabilities unique to Grok that were not present in competing models, including Anthropic's Claude. The findings were significant enough to deter some Pentagon components from adopting Grok even after the broader contract was awarded.
White House Contact and the Path to Pentagon Approval
Safety concerns escalated to the White House level in early January 2026 — a notable escalation for what is nominally a procurement and safety review process. Chief of Staff Susie Wiles contacted a senior xAI executive directly, according to the WSJ, after the agency warnings reached her office.
Despite this intervention, the Pentagon authorized Grok for classified settings during the week of February 27, 2026 — proceeding under the July 2025 multi-vendor contract. The approval came without public disclosure of the remediation steps taken to address GSA and NSA concerns.
"Such deployment disregarded internal warnings and could compromise national security," Public Citizen stated on the day of the reported Pentagon approval. The advocacy group noted the sequence — documented failures in safety evaluations followed by authorization — as an example of AI deployment outpacing safety governance in sensitive federal contexts.
What the Image-Generation Testing Revealed
Separate from the strategic safety reviews, practical testing conducted between late December 2025 and early January 2026 revealed that Grok permitted sexualized photo edits, including those involving children. The discovery prompted xAI to restrict image-generation features — limiting them to paying customers in January 2026 — but the incident added to the accumulating case that Grok's guardrails were materially weaker than those of competing models under federal review.
This finding was separate from the NSA and GSA's systemic security and alignment concerns, but contributed to the overall picture of a model that required emergency content restrictions during the same window that federal agencies were actively evaluating it for classified deployment.
Broader Implications: AI Safety Governance in Federal Contexts
The Grok approval sequence — documented agency failures → White House intervention → Pentagon authorization anyway — raises structural questions about how AI safety evaluations function when the model under review is associated with a politically prominent figure and an administration that has signaled openness to faster AI integration across government.
The 1,200 documented federal AI use cases across government as of 2025 reflect an acceleration in AI adoption that safety governance structures have not kept pace with. The Grok case is the most high-profile example to date of that gap becoming publicly visible.
For broader AI policy and technology coverage, see ObjectWire's OpenAI hub and the Technology desk .
When internal government reviews call a model unsafe and the Pentagon approves it anyway, the question is no longer whether AI safety evaluations matter — it's whether anyone is required to follow them.