Remove Application Remove Infiniband Remove UDP port
article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

ECMP and path pinning We initially considered the widely adopted ECMP, which places flows randomly based on the hashes on the five-tuple: source and destination IPs, source and destination UDP ports, and protocol. It may depend on the relative throughput between GPU and network, which may not be applicable to all scenarios.

Network 132