article thumbnail

Hedge 244: Networks for AI

Rule 11

Why is InfiniBand so popular for building AI networks? Jeff Tantsura joins Tom Ammon and Russ White to discuss networks for AI workloads. Why is InfiniBand so popular for building AI networks? Jeff Tantsura joins Tom Ammon and Russ White to discuss networks for AI workloads. What about Ethernet for AI?

article thumbnail

How Meta trains large language models at scale

Engineering at Meta

Supporting GenAI at scale has meant rethinking how our software, hardware, and network infrastructure come together. Solving this problem requires a robust and high-speed network infrastructure as well as efficient data transfer protocols and algorithms. requires revisiting trade-offs made for other types of workloads.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

NB471: Nvidia Unveils 800G Ethernet, InfiniBand Switches For AI Fabrics; ‘Ghost Jobs’ Haunt Job Boards

Packet Pushers

Take a Network Break! Nvidia announces new 800G switches, one for Ethernet and one for InfiniBand, for building AI fabrics. Nvidia also announces an “AI supercomputer,” a rack-scale pre-built bundle of Nvidia GPUs and CPUs connected via InfiniBand switches.

article thumbnail

HN755: Optimizing Ethernet to Meet AI Infrastructure Demands

Packet Pushers

Ethernet competes with InfiniBand as a network fabric for AI workloads such as model training. In other words, AI workloads do best with a lossless network. Read more » Ethernet competes with InfiniBand as a network fabric for AI workloads such as model training. Read more »

article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for training, enabling large models with hundreds of billions of parameters such as LLAMA 3.1 Distributed training, in particular, imposes the most significant strain on data center networking infrastructure.

Network 132
article thumbnail

Network Break 359: Arista Increases Its 400G Switch Portfolio; Nvidia Accelerates InfiniBand

Packet Pushers

This week's Network Break examines new 400G switches from Arista, discusses the Wi-Fi Alliance's certification program for the HaLow long-range low-power standard, targets key Nvidia announcements, catches up on the latest in space networking, and more IT news.

article thumbnail

Top Tips for Debugging and Optimizing NVIDIA Networking Performance

Router-switch

In today’s high-speed networking world, optimizing and troubleshooting performance is crucial, especially with high-performance equipment like NVIDIA Infiniband switches. Whether you’re a data center admin or network engineer, mastering effective techniques is key.