Remove Bandwidth Remove Fashion Remove Protocol
article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

We ensure that there is enough ingress bandwidth on the rack switch to not hinder the training workload. The BE is a specialized fabric that connects all RDMA NICs in a non-blocking architecture, providing high bandwidth, low latency, and lossless transport between any two GPUs in the cluster, regardless of their physical location.

Network 132
article thumbnail

How Meta trains large language models at scale

Engineering at Meta

Optimal connectivity between GPUs: Large-scale model training involves transferring vast amounts of data between GPUs in a synchronized fashion. Solving this problem requires a robust and high-speed network infrastructure as well as efficient data transfer protocols and algorithms. Both of these options had tradeoffs.

article thumbnail

The WAN Accelerator and Modern Network Optimization

CATO Networks

While WAN optimization and acceleration are still important, increased bandwidth availability, cloud, and mobile have significantly shifted the paradigm. What is a WAN accelerator Simply put, a WAN accelerator is any hardware or software appliance that provides bandwidth optimization across a WAN. Here, well answer those questions.

WAN 52