Posted in

The Building Blocks of Agentic AI: From Kernels to Clusters

The race toward superintelligence is accelerating, and the tools we build today will determine how quickly we get there. At the PyTorch Conference 2025 in San Francisco, we unveiled a suite of five new projects designed to bridge the gap between cutting-edge AI research and scalable production systems. This PyTorch-native stack interoperates seamlessly with the broader ecosystem, scales from thousands to hundreds of thousands of GPUs, and runs across diverse hardware environments.

PyTorch Stack Overview

Our new agentic building blocks—ExecuTorch 1.0, Torchforge, Monarch, TorchComms, Helion, and OpenEnv—support the entire lifecycle of agentic AI. From deploying post-trained LLMs on mobile devices and wearables to running reinforcement learning at scale, these tools simplify distributed execution, enable fault-tolerant communication, and accelerate custom kernel development.

The stack is built on three core principles: it’s architected for massive scale from day one, deeply integrated into the PyTorch ecosystem for intuitive Pythonic development, and ready for heterogeneous hardware spanning edge devices to multi-cloud environments.

Next-Generation PyTorch Stack

Helion revolutionizes kernel authoring with a Python-embedded domain-specific language that compiles directly to Triton. By automating performance tuning for GPUs and accelerators, Helion reduces the code required for advanced kernels by 4x compared to Triton, making sophisticated ML engineering accessible to a broader developer community. As Luis Ceze, Vice President of AI Systems Software at NVIDIA, notes: “Making Python-based kernel authoring simpler and more accessible is important to NVIDIA. Supporting Meta’s work on Helion will help developers unlock new levels of performance on NVIDIA systems.”

Torchcomms provides a unified API for robust, fault-tolerant distributed communication across massive clusters. It scales effortlessly across diverse hardware environments and more than 100,000 GPUs, ensuring resilient operations regardless of infrastructure. Anush Elangovan, Vice President AI Software at AMD, comments: “Torchcomms exemplifies the kind of problem-solving innovation our industry needs—delivering robust, fault-tolerant distributed communication that scales seamlessly across diverse hardware ecosystems.”

Monarch reimagines cluster-scale execution with a single-controller architecture that abstracts away multi-node complexities. This distributed execution engine makes scalable code feel like working on a local, single-GPU workflow, democratizing access to cluster-scale capabilities without sacrificing developer experience. Luca Antiga, CTO of Lightning.AI, observes: “With Monarch, we see a glimpse of the future of training. We’re stoked that the team picked Lightning as the ideal platform for the launch.”

Torchforge is purpose-built for scalable reinforcement learning post-training and agentic development. It separates infrastructure concerns from model development, offering clear RL abstractions with a scalable implementation. Key features include usability for rapid research, hackability for power users, and scalability across thousands of GPUs. In collaboration with Stanford and CoreWeave, we’ve integrated Weaver into torchforge to explore state-of-the-art RL approaches. Azalia Mirhoseini, Founder of the Scaling Intelligence Lab at Stanford University, shares: “Our work on Weaver is about pushing the boundaries of reward modeling and automated verification. Integrating it with torchforge allows us to quickly experiment with new ideas while relying on a robust, scalable RL backbone.”

ExecuTorch 1.0 delivers Meta’s end-to-end solution for on-device AI, enabling advanced capabilities directly on mobile, desktop, and edge devices. Backed by industry leaders including Qualcomm, Apple, and Arm, the GA release offers enhanced performance, stability, and integration across platforms. Jeff Gehlhaar, Senior Vice President of Engineering at Qualcomm Technologies, states: “ExecuTorch GA offers portability, stability, and performance, empowering developers to efficiently deliver innovative AI features across mobile, desktop, and IoT devices.”

OpenEnv, developed in partnership with Hugging Face, launches as an open hub for reinforcement learning environments. Developers can now build, share, and explore OpenEnv-compatible environments for training and deployment. The initiative includes an RFC for the OpenEnv 0.1 specification to gather community feedback. Clem Delangue, Co-Founder & CEO of Hugging Face, emphasizes: “The next wave of AI will be defined not just by open models, but by open environments. Partnering with Meta to launch the OpenEnv hub gives developers everywhere a common foundation to build, test, and deploy the next generation of AI agents.”

These projects are available today as open source and ready for community contributions. The PyTorch ecosystem has never been more vibrant, and we’re excited to see what the community builds with these foundational tools for agentic AI.