Bandwidth and Infiniband - IT Networking Pro Today

Bandwidth

Infiniband

HN755: Optimizing Ethernet to Meet AI Infrastructure Demands

Packet Pushers

OCTOBER 25, 2024

Ethernet competes with InfiniBand as a network fabric for AI workloads such as model training. And while Ethernet has kept up with increasing demands to support greater bandwidth and throughput, it was. Read more » Ethernet competes with InfiniBand as a network fabric for AI workloads such as model training.

Ethernet

Ethernet Infiniband Bandwidth Networking

How Meta trains large language models at scale

Engineering at Meta

JUNE 12, 2024

There are two leading choices in the industry that fit these requirements: RoCE and InfiniBand fabrics. On the other hand, Meta had built research clusters with InfiniBand as large as 16K GPUs. So we decided to build both: two 24k clusters , one with RoCE and another with InfiniBand. Both of these options had tradeoffs.

Infiniband

Infiniband Data Centers Topology Networking

Join 5,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

MORE WEBINARS

Trending Sources

NAN071: Understanding the Infrastructure Requirements for AI Workloads (Sponsored)

Packet Pushers

AUGUST 21, 2024

We discuss key considerations including bandwidth, the substantial power and cooling requirements of AI infrastructure, and GPUs. We also talk about InfiniBand and Ethernet as network fabrics for AI workloads, cabling considerations, and more. This is a sponsored episode. This is a sponsored episode.

Infiniband

Infiniband Ethernet Bandwidth Networking

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

MORE WEBINARS

A RoCE network for distributed AI training at scale

Engineering at Meta

AUGUST 5, 2024

We ensure that there is enough ingress bandwidth on the rack switch to not hinder the training workload. The BE is a specialized fabric that connects all RDMA NICs in a non-blocking architecture, providing high bandwidth, low latency, and lossless transport between any two GPUs in the cluster, regardless of their physical location.

Networking

Networking Network Topology Data Centers

Building Meta’s GenAI Infrastructure

Engineering at Meta

MARCH 12, 2024

The other cluster features an NVIDIA Quantum2 InfiniBand fabric. Through careful co-design of the network, software, and model architectures, we have successfully used both RoCE and InfiniBand clusters for large, GenAI workloads (including our ongoing training of Llama 3 on our RoCE cluster) without any network bottlenecks.

Infiniband

Infiniband Data Centers Networking Network

Top Tips for Debugging and Optimizing NVIDIA Networking Performance

Router-switch

SEPTEMBER 25, 2024

In today’s high-speed networking world, optimizing and troubleshooting performance is crucial, especially with high-performance equipment like NVIDIA Infiniband switches. In this blog, we’ll share top tips for debugging and optimizing NVIDIA Infiniband networking performance.

Infiniband

Infiniband Networking Network Data Centers

HN755: Optimizing Ethernet to Meet AI Infrastructure Demands

How Meta trains large language models at scale

Webinars

Trending Sources

NAN071: Understanding the Infrastructure Requirements for AI Workloads (Sponsored)

Webinars

A RoCE network for distributed AI training at scale

Building Meta’s GenAI Infrastructure

Top Tips for Debugging and Optimizing NVIDIA Networking Performance

Stay Connected