article thumbnail

HN755: Optimizing Ethernet to Meet AI Infrastructure Demands

Packet Pushers

Ethernet competes with InfiniBand as a network fabric for AI workloads such as model training. And while Ethernet has kept up with increasing demands to support greater bandwidth and throughput, it was. Read more » Ethernet competes with InfiniBand as a network fabric for AI workloads such as model training.

article thumbnail

NB471: Nvidia Unveils 800G Ethernet, InfiniBand Switches For AI Fabrics; ‘Ghost Jobs’ Haunt Job Boards

Packet Pushers

Nvidia announces new 800G switches, one for Ethernet and one for InfiniBand, for building AI fabrics. Nvidia also announces an “AI supercomputer,” a rack-scale pre-built bundle of Nvidia GPUs and CPUs connected via InfiniBand switches. Take a Network Break!

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Hedge 244: Networks for AI

Rule 11

Why is InfiniBand so popular for building AI networks? What about Ethernet for AI? Why is InfiniBand so popular for building AI networks? What about Ethernet for AI? What are the requirements for running AI workloads over a data center fabric?

article thumbnail

NAN071: Understanding the Infrastructure Requirements for AI Workloads (Sponsored)

Packet Pushers

We also talk about InfiniBand and Ethernet as network fabrics for AI workloads, cabling considerations, and more. We also talk about InfiniBand and Ethernet as network fabrics for AI workloads, cabling considerations, and more. This is a sponsored episode. This is a sponsored episode. Read more »

article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

Our paper, “ RDMA over Ethernet for Distributed AI Training at Meta Scale ,” provides the details on how we design, implement, and operate one of the world’s largest AI networks at scale. We opted for RDMA Over Converged Ethernet version 2 (RoCEv2) as the inter-node communication transport for the majority of our AI capacity.

Network 132
article thumbnail

Building Meta’s GenAI Infrastructure

Engineering at Meta

With this in mind, we built one cluster with a remote direct memory access (RDMA) over converged Ethernet (RoCE) network fabric solution based on the Arista 7800 with Wedge400 and Minipack2 OCP rack switches. The other cluster features an NVIDIA Quantum2 InfiniBand fabric. Both of these solutions interconnect 400 Gbps endpoints.