Infiniband - IT Networking Pro Today

NB471: Nvidia Unveils 800G Ethernet, InfiniBand Switches For AI Fabrics; ‘Ghost Jobs’ Haunt Job Boards

Packet Pushers

MARCH 25, 2024

Nvidia announces new 800G switches, one for Ethernet and one for InfiniBand, for building AI fabrics. Nvidia also announces an “AI supercomputer,” a rack-scale pre-built bundle of Nvidia GPUs and CPUs connected via InfiniBand switches. Take a Network Break!

Infiniband

Infiniband Ethernet NaaS Networking

Hedge 244: Networks for AI

Rule 11

SEPTEMBER 20, 2024

Why is InfiniBand so popular for building AI networks? Why is InfiniBand so popular for building AI networks? What are the requirements for running AI workloads over a data center fabric? What about Ethernet for AI? Jeff Tantsura joins Tom Ammon and Russ White to discuss networks for AI workloads. What about Ethernet for AI?

Infiniband

Infiniband Ethernet Networking Network

How Meta trains large language models at scale

Engineering at Meta

JUNE 12, 2024

There are two leading choices in the industry that fit these requirements: RoCE and InfiniBand fabrics. On the other hand, Meta had built research clusters with InfiniBand as large as 16K GPUs. So we decided to build both: two 24k clusters , one with RoCE and another with InfiniBand. Both of these options had tradeoffs.

Infiniband

Infiniband Data Centers Topology Networking

HN755: Optimizing Ethernet to Meet AI Infrastructure Demands

Packet Pushers

OCTOBER 25, 2024

Ethernet competes with InfiniBand as a network fabric for AI workloads such as model training. Read more » Ethernet competes with InfiniBand as a network fabric for AI workloads such as model training. Read more » Ethernet competes with InfiniBand as a network fabric for AI workloads such as model training.

Ethernet

Ethernet Infiniband Bandwidth Networking

AI Unbound: Your Data Center, Your Way – Powering AI with simplified, secure, high-performance data centers

Juniper

MARCH 2, 2025

With self-optimizing Ethernet AI fabrics, Juniper delivers congestion auto-tuning, advanced load balancing, and NIC-to-switch monitoring and tuning, achieving performance comparable to InfiniBand while allowing customers to leverage the broader ecosystem of Ethernet-trained professionals and tools.

Data Centers

Data Centers Ethernet Infiniband Networking

NAN071: Understanding the Infrastructure Requirements for AI Workloads (Sponsored)

Packet Pushers

AUGUST 21, 2024

We also talk about InfiniBand and Ethernet as network fabrics for AI workloads, cabling considerations, and more. We also talk about InfiniBand and Ethernet as network fabrics for AI workloads, cabling considerations, and more. On todays Network Automation Nerds, we get into the infrastructure required to support AI workloads.

Infiniband

Infiniband Ethernet Bandwidth Networking

Network Break 359: Arista Increases Its 400G Switch Portfolio; Nvidia Accelerates InfiniBand

Packet Pushers

NOVEMBER 15, 2021

This week's Network Break examines new 400G switches from Arista, discusses the Wi-Fi Alliance's certification program for the HaLow long-range low-power standard, targets key Nvidia announcements, catches up on the latest in space networking, and more IT news.

Infiniband

Infiniband Networking Network

Top Tips for Debugging and Optimizing NVIDIA Networking Performance

Router-switch

SEPTEMBER 25, 2024

In today’s high-speed networking world, optimizing and troubleshooting performance is crucial, especially with high-performance equipment like NVIDIA Infiniband switches. In this blog, we’ll share top tips for debugging and optimizing NVIDIA Infiniband networking performance.

Infiniband

Infiniband Networking Network Data Centers

Building Meta’s GenAI Infrastructure

Engineering at Meta

MARCH 12, 2024

The other cluster features an NVIDIA Quantum2 InfiniBand fabric. Through careful co-design of the network, software, and model architectures, we have successfully used both RoCE and InfiniBand clusters for large, GenAI workloads (including our ongoing training of Llama 3 on our RoCE cluster) without any network bottlenecks.

Infiniband

Infiniband Data Centers Networking Network

A RoCE network for distributed AI training at scale

Engineering at Meta

AUGUST 5, 2024

Thus, we took two steps to improve the performance. First, we experimentally determined the right parameter settings for the number of channels and channel buffer size across various training job sizes and collective types.

Networking

Networking Network Topology Data Centers

Storage, DoGE, and cognitive biases against tape

SysAdmin1138 Explains

APRIL 7, 2025

The actual reality is somewhat different, as the few data archive people I know mention they do great restore/archive runs about every 8 to 10 years, largely driven by changes in drive connectivity (SCSI, SATA, FibreChannel, Infiniband, SAS, etc), OS and software support, and corporate purchasing cycles.

Infiniband

Infiniband Media Government Cloud

IT Networking Pro Today

NB471: Nvidia Unveils 800G Ethernet, InfiniBand Switches For AI Fabrics; ‘Ghost Jobs’ Haunt Job Boards

Hedge 244: Networks for AI

Trending Sources

How Meta trains large language models at scale

HN755: Optimizing Ethernet to Meet AI Infrastructure Demands

AI Unbound: Your Data Center, Your Way – Powering AI with simplified, secure, high-performance data centers

NAN071: Understanding the Infrastructure Requirements for AI Workloads (Sponsored)

Network Break 359: Arista Increases Its 400G Switch Portfolio; Nvidia Accelerates InfiniBand

Top Tips for Debugging and Optimizing NVIDIA Networking Performance

Building Meta’s GenAI Infrastructure

A RoCE network for distributed AI training at scale

Storage, DoGE, and cognitive biases against tape

Stay Connected