Remove Ethernet Remove Fashion Remove UDP port
article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

Our paper, “ RDMA over Ethernet for Distributed AI Training at Meta Scale ,” provides the details on how we design, implement, and operate one of the world’s largest AI networks at scale. We opted for RDMA Over Converged Ethernet version 2 (RoCEv2) as the inter-node communication transport for the majority of our AI capacity.

Network 132