Remove Bandwidth Remove Data Centers Remove Network
article thumbnail

HN748: How AI and HPC Are Changing Data Center Networks

Packet Pushers

On todays episode of Heavy Networking, Rob Sherwood joins us to discuss the impact that High Performance Computing (HPC)and artificial intelligence computing are having on data center network design. Theres also power and cooling issues, massive bandwidth requirements, and changes in how we. Thats the boring part.

article thumbnail

SiTime product launch boosts efficiency of AI data centres

DCNN Magazine

The company states that this is the only single-chip timing product that delivers the most resilient performance for AI compute-nodes with high bandwidth and network synchronisation. Efficient clusters require high-bandwidth interconnects and tightly synchronised orchestration to minimise AI accelerator idle time.

Energy 274
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for training, enabling large models with hundreds of billions of parameters such as LLAMA 3.1 Distributed training, in particular, imposes the most significant strain on data center networking infrastructure.

Network 124
article thumbnail

MXT Holdings improves data centre connectivity in Mexico

DCNN Magazine

MXT manages over 3,500km of long-haul and metropolitan fibre optic networks in Central and Southeast Mexico. Its network is deployed across key states, including Quintana Roo, Chiapas, and Tabasco. Its network is deployed across key states, including Quintana Roo, Chiapas, and Tabasco.

article thumbnail

Watch Meta’s engineers discuss optimizing large-scale networks

Engineering at Meta

Managing network solutions amidst a growing scale inherently brings challenges around performance, deployment, and operational complexities. They present key ideas underpinning the FBOSS model that helped them build a stable and scalable network. non-blocking architecture).

article thumbnail

OCP Summit 2024: The open future of networking hardware for AI

Engineering at Meta

At Open Compute Project Summit (OCP) 2024, we’re sharing details about our next-generation network fabric for our AI training clusters. We’ve expanded our network hardware portfolio and are contributing two new disaggregated network fabrics and a new NIC to OCP. At Meta, we believe that open hardware drives innovation.

article thumbnail

Hybrid vs. Multi-cloud: The Good, the Bad and the Network Observability Needed

Kentik

Below is a hypothetical company with its data center in the center of the building. Outlined in light blue is the hybrid cloud which includes the on-premises network, as well as the virtual public cloud (VPC) in the AWS public cloud. This conserves bandwidth on the corporate internet connection. VPCs and Security.

Cloud 124