article thumbnail

AI Unbound: Your Data Center, Your Way – Powering AI with simplified, secure, high-performance data centers

Juniper

Official Juniper Networks Blogs AI Unbound: Your Data Center, Your Way – Powering AI with simplified, secure, high-performance data centers The rapid advancement of AI applications is reshaping industries, driving an unprecedented demand for scalable, high-performing, and flexible data center solutions.

article thumbnail

Hedge 244: Networks for AI

Rule 11

What are the requirements for running AI workloads over a data center fabric? Why is InfiniBand so popular for building AI networks? link] download What are the requirements for running AI workloads over a data center fabric? Why is InfiniBand so popular for building AI networks? What about Ethernet for AI?

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How Meta trains large language models at scale

Engineering at Meta

Data center deployment Once we’ve chosen a GPU and system, the task of placing them in a data center for optimal usage of resources (power, cooling, networking, etc.) There are two leading choices in the industry that fit these requirements: RoCE and InfiniBand fabrics. Both of these options had tradeoffs.

article thumbnail

Building Meta’s GenAI Infrastructure

Engineering at Meta

Today, we’re sharing details on two versions of our 24,576-GPU data center scale cluster at Meta. Custom designing much of our own hardware, software, and network fabrics allows us to optimize the end-to-end experience for our AI researchers while ensuring our data centers operate efficiently.

article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

Distributed training, in particular, imposes the most significant strain on data center networking infrastructure. Constructing a reliable, high-performance network infrastructure capable of accommodating this burgeoning demand necessitates a reevaluation of data center network design.

article thumbnail

Top Tips for Debugging and Optimizing NVIDIA Networking Performance

Router-switch

In today’s high-speed networking world, optimizing and troubleshooting performance is crucial, especially with high-performance equipment like NVIDIA Infiniband switches. Whether you’re a data center admin or network engineer, mastering effective techniques is key.