AI Agent Telemetry Standardization Drives Arize AI and Google Cloud Partnership

Owl illustration representing wisdom and oversight in AI agent telemetry, with enterprise software observability concepts in the background

Agentic functions are enjoying significant freedom of movement, but unstandardized AI agent telemetry leaves us in the Wild West. Developers are empowering agents to call multiple system tools, invoke AI models, improve user requests, and hand off work to other domain-specific agents. That is, of course, excellent news for system adaptability. However, it creates a monitoring nightmare.

AI agent telemetry standardization is no longer a developer-side concern. It has become a business imperative, and industry leaders are starting to formalize the infrastructure to address it.

Arize is partnering with Google Cloud subsequent to the hyperscaler launching Gemini Enterprise Agent Platform last month. The Arize AX enterprise agent development platform not only receives traces from the Gemini Agent service, it also aligns agent telemetry around OpenTelemetry and OpenInference. The goal is simple but significant: software engineering teams should be able to instrument agents once, analyze behavior consistently, and avoid locking critical observability data inside a single platform.

Richard Young, technical director at Arize, made the case for portability as the central priority. As he wrote on the Arize blog, “When you use standards like OpenTelemetry and OpenInference, you keep optionality without losing visibility. Standardized agent telemetry lets you change frameworks, models, tools, or observability backends without rebuilding your instrumentation every time.”

The complexity of a live agent run makes this urgency very real. A single agent run can include request rewriting, retrieval, multiple tool and model calls, retries, and handoffs before producing a final answer. Without structured telemetry covering each of those steps, debugging becomes painstaking guesswork and evaluation becomes extremely difficult.

At scale, furthermore, the problem multiplies. A single agent run is a manageable transcript. A thousand agents all running across production, handing off between each other, calling external tools, hitting retrieval systems and spawning sub-agents simultaneously? That becomes a data problem, notes David Girvin, AI security researcher at Sumo Logic.

There is also a security dimension that the observability community has not yet fully addressed. Girvin warned that OpenTelemetry agent conventions are being written by ML engineers for ML engineers, but the CISO hasn’t shown up to that conversation yet. When they do, teams that have instrumented purely for observability will find their telemetry doesn’t hold up for board-level investigation.

Meanwhile, Noam Levy, Founding Engineer at groundcover, raises a structural challenge even beyond standards adoption. Teams still face fragmented telemetry across providers, the trace format from OpenAI, for instance, looks different from Anthropic’s. That forces engineering teams to build systems that constantly adapt to upstream changes, adding cost and fragility. Levy suggests that eBPF changes the foundation by operating at the OS level, capturing signals from how software actually runs rather than how it is instrumented.

The industry appears to be forming a solidifying consensus around the need to standardize measures of agent behavior. When those measures also include standardized methods of measurement, then we may achieve structured agent telemetry with enough semantic detail to support evaluation and agentic improvement.

The direction is clear. AI agent telemetry standardization is the foundation that must be in place before enterprise agentic systems can scale safely.