Remove Bandwidth Remove Networking Remove Topology
article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for training, enabling large models with hundreds of billions of parameters such as LLAMA 3.1 Distributed training, in particular, imposes the most significant strain on data center networking infrastructure.

Network 124
article thumbnail

Certification Internet service via iPerf3

Network Engineering

Occasionally, customers report issues such as high latency or not achieving their subscribed bandwidth. To address these concerns, we certify the last-mile connection using iPerf3 for traffic and bandwidth analysis. Attached is a topology diagram illustrating the proposed setup.

Internet 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Practical Steps for Enhancing Reliability in Cloud Networks - Part I

Kentik

When evaluating solutions, whether to internal problems or those of our customers, I like to keep the core metrics fairly simple: will this reduce costs, increase performance, or improve the network’s reliability? It’s often taken for granted by network specialists that there is a trade-off among these three facets. Durability.

Cloud 104
article thumbnail

Using Chakra execution traces for benchmarking and network performance optimization

Engineering at Meta

Meta presents Chakra execution traces , an open graph-based representation of AI/ML workload execution, laying the foundation for benchmarking and network performance optimization. At Meta, our endeavors are not only geared towards pushing the boundaries of AI/ML but also towards optimizing the vast networks that enable these computations.

Network 96
article thumbnail

Sustainable Networks: powering the future, responsibly

Juniper

Official Juniper Networks Blogs Sustainable Networks: powering the future, responsibly Imagine a data center humming with 100,000 cutting-edge GPUs, the backbone of the AI/ML and Gen AI revolution. Though reliable, this approach is increasingly at odds with todays sustainability goals.

52
article thumbnail

CCNA, CCNP & Firewall Interview Questions 2025: A Complete Networking Guide

NW Kings

As we progress into 2025, the landscape of networking continues to evolve rapidly, with new technologies, protocols, and security measures shaping the way organizations design and manage their networks. CCNA Interview Questions The CCNA certification serves as a foundational credential for network engineers.

article thumbnail

Network observability: Hype or reality?

Kentik

If you haven’t yet heard the term “network observability,” you will be hearing it soon. Some say that network observability is just marketing hype from vendors. They say, “networks have always been observable, so there’s nothing new here.” I say network observability is not just vendor hype, and this blog will make the case.

Network 85