Glostarep

Google’s New TPU 8 Training and Inference Chips Are Built for the Age of AI Agents

Google’s New TPU 8 Training and Inference Chips Are Built for the Age of AI Agents

Google has officially unveiled its eighth-generation Tensor Processing Units, and this time the company is doing something it has never done before splitting its flagship AI chip into two purpose-built variants. Meet the Google TPU 8 training and inference duo: the TPU 8t, designed for model training, and the TPU 8i, engineered specifically for inference workloads.

The announcement came at Google Cloud Next 2026, where Google made clear that the era of one-size-fits-all AI chips may be behind them. Both chips were developed in partnership with Google DeepMind, and Google says they feature “purpose-built architectures” tailored to support model training, agent development, and real-time inference at scale.

Amin Vahdat, Google’s SVP and Chief Technologist for AI and Infrastructure, put it plainly: “Our eighth-generation TPUs are the culmination of more than a decade of development. By customizing and co-designing silicon with hardware, networking, and software, including model architecture and application requirements, we can deliver dramatically more power efficiency and absolute performance.”

New TPU 8 Training and Inference Chips

The TPU 8i is where the real headline specs live. Designed with more memory bandwidth to serve latency-sensitive inference workloads, it is scalable to 1,152 chips in a single pod, delivering 11.6 exaflops of FP8 compute performance, with a total HBM capacity of 331.8TB per pod and 19.2Tbps of bidirectional scale-up bandwidth per chip. Also, Google says fundamentally, “redesigned the stack” for the 8i introducing four capabilities that eliminate the ‘waiting room’ effect, where user requests are intentionally queued or delayed to maximize hardware utilization. These include pairing 288GB of HBM with 384MB of on-chip SRAM, doubling the physical CPU hosts per server by moving to Google’s custom Axion Arm-based CPUs, doubling interconnect bandwidth for Mixture of Expert models, and reducing on-chip latency by up to 5x through a new on-chip Collectives Acceleration Engine. 

The timing of these Google TPU 8 training and inference chips is no accident. The broader AI market is increasingly shifting beyond training models toward inference the process of delivering real-time responses and Google appears to be positioning itself to capture a larger share of this next phase of AI demand.  That shift has real competitive stakes: Nvidia continues to dominate training with its GPUs but is now facing growing competition in inference, while Nvidia has also introduced its own inference-focused chip offering following technology acquired from Groq. 

Hence, The TPUv8 series set to replace the existing TPUv7 “Ironwood” lineup Wccftech, which Google introduced last year and described as its first chip built for the age of inference. The new generation takes that vision further splitting responsibilities cleanly between training and serving, a design philosophy that signals where Google sees the AI infrastructure race heading next.

Leave a Comment

Your email address will not be published. Required fields are marked *