Data Centers, Fashion and Infiniband - IT Networking Pro Today

Search:

DAY

WEEK

MONTH

YEAR

Select your country:
Sign up | Log in

Data Centers

Fashion

Infiniband

How Meta trains large language models at scale

Engineering at Meta

JUNE 12, 2024

This means we need to regularly checkpoint our training state and efficiently store and retrieve training data. Optimal connectivity between GPUs: Large-scale model training involves transferring vast amounts of data between GPUs in a synchronized fashion. requires revisiting trade-offs made for other types of workloads.

Infiniband

Infiniband Data Centers Topology Network

Building Meta’s GenAI Infrastructure

Engineering at Meta

MARCH 12, 2024

Today, we’re sharing details on two versions of our 24,576-GPU data center scale cluster at Meta. Custom designing much of our own hardware, software, and network fabrics allows us to optimize the end-to-end experience for our AI researchers while ensuring our data centers operate efficiently.

Infiniband

Infiniband Data Centers Server Network

A RoCE network for distributed AI training at scale

Engineering at Meta

AUGUST 5, 2024

Distributed training, in particular, imposes the most significant strain on data center networking infrastructure. Constructing a reliable, high-performance network infrastructure capable of accommodating this burgeoning demand necessitates a reevaluation of data center network design.

Network

Network Networking Topology Data Centers

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

How Meta trains large language models at scale

Building Meta’s GenAI Infrastructure

A RoCE network for distributed AI training at scale

Webinars

Stay Connected