article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for training, enabling large models with hundreds of billions of parameters such as LLAMA 3.1 Distributed training, in particular, imposes the most significant strain on data center networking infrastructure.

Network 132
article thumbnail

Network Monitoring Trends 2025

Obkio

Discover the top network monitoring trends shaping 2025 to stay ahead the ever-evolving tech landscape!

Network 105
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Explore network programmability with the DevNet XRd Sandbox

Cisco Wireless

providing all the benefits of using containers in network operations. XRd comes with all the programmability aspects from IOS-XR, including Telemetry and YANG models, which makes it ideal for developers and network […]

Network 116
article thumbnail

Cisco Modeling Labs Free: Get Free Hands-on Practice in Network Simulation

Cisco Wireless

Cisco Modeling Labs Free is now available, making our premier network virtualization platform easier and more accessible than ever.

article thumbnail

OCP Summit 2024: The open future of networking hardware for AI

Engineering at Meta

At Open Compute Project Summit (OCP) 2024, we’re sharing details about our next-generation network fabric for our AI training clusters. We’ve expanded our network hardware portfolio and are contributing two new disaggregated network fabrics and a new NIC to OCP.

Network 117
article thumbnail

Beyond the Data Center: High-Performance Networks for AI

Cisco Wireless

Traditionally, transporting data between geographically distributed data centers required leasing high-capacity circuits from service providers or investing in dedicated optical transport networks.

article thumbnail

HN762: A Network Automation Roadmap

Packet Pushers

Once you get past a handful of Python scripts, network automation can be…daunting. If you want to make network automation process-driven, repeatable, reliable, and something that doesn’t just rely on your scripts and the knowledge inside your head, there’s an entire landscape that opens up before you.

Network 98