The US Government Now Vets AI Models Before They Go Public

The United States government is tightening its grip on powerful artificial intelligence systems. The National Institute of Standards and Technology’s Center for AI Standards and Innovation (CAISI) will conduct pre-deployment evaluations on frontier AI models from Google DeepMind, Microsoft, and Elon Musk’s xAI to assess whether their advanced capabilities pose cybersecurity risks before those models reach the public.

This AI pre-deployment evaluation push marks a significant shift in Washington’s posture toward Silicon Valley. Google DeepMind, Microsoft, and xAI join Anthropic and OpenAI, which signed similar agreements nearly two years ago under the Biden administration, when the agency was still called the US Artificial Intelligence Safety Institute.

CAISI Director Chris Fall described the move as essential, saying that “independent, rigorous measurement science is essential to understanding frontier AI and its national security implications.” An interagency task force within CAISI will allow officials from across the government to test the models, including in classified settings.

According to the agency, CAISI has already completed more than 40 such evaluations, including on state-of-the-art models that have not yet been released to the public. The scope of this AI pre-deployment evaluation framework is clearly broader than many had anticipated.

White House National Economic Council Director Kevin Hassett signaled the administration is going even further, saying officials are studying a possible executive order that would create “a clear road map” for how advanced AI systems should be evaluated before release, comparing the process to how the FDA tests drugs for safety.

Microsoft’s chief responsible AI officer, Natasha Crampton, welcomed the collaboration, noting that evaluations tied to national security and public safety require close cooperation between industry and governments with deep technical and security expertise.

Still, questions remain. It remains unclear what testing standards CAISI will use, and experts warn that capability assessments are only as strong as the threat models behind them, with calls for the agency to publish exactly what it is testing for, not just who it is testing with.