Bandwidth and Topology - IT Networking Pro Today

A RoCE network for distributed AI training at scale

Engineering at Meta

AUGUST 5, 2024

Topology We built a dedicated backend network specifically for distributed training. To support large language models (LLMs), we expanded the backend network towards the DC-scale, e.g., incorporating topology-awareness into the training job scheduler. We designed a two-stage Clos topology for AI racks, known as an AI Zone.

Network

Network Networking Topology Data Centers

Certification Internet service via iPerf3

Network Engineering

JANUARY 8, 2025

Occasionally, customers report issues such as high latency or not achieving their subscribed bandwidth. To address these concerns, we certify the last-mile connection using iPerf3 for traffic and bandwidth analysis. Attached is a topology diagram illustrating the proposed setup.

Internet

Internet Bandwidth Topology Server

How Meta trains large language models at scale

Engineering at Meta

JUNE 12, 2024

We optimized the RoCE cluster for quick build time, and the InfiniBand cluster for full-bisection bandwidth. We implemented collective communication patterns with network topology awareness so that they can be less latency-sensitive. Our intent was to build and learn from the operational experience.

Infiniband

Infiniband Data Centers Topology Network

Using Chakra execution traces for benchmarking and network performance optimization

Engineering at Meta

SEPTEMBER 7, 2023

Such predictions become even more complex when the compute engines aren’t ready or when changes in network topology and bandwidth become necessary. As a result, traces sourced from one system might not accurately simulate on another with a different GPU, network topology, and bandwidth.

Network

Network Networking Topology Bandwidth

Why latency is the new outage

Kentik

SEPTEMBER 20, 2021

Not as difficult as time travel, but it’s difficult enough so that for 30+ years IT professionals have tried to skirt the issue by adding more bandwidth between locations or by rolling out faster routers and switches. Over the last few decades network managers have focused on adding bandwidth and reducing the network outages.

TCP

TCP Routers Bandwidth IP Address

Today’s Enterprise WAN Isn’t What It Used To Be

Kentik

MARCH 13, 2023

Yes, there’s something to say about how applications are written, but on the public internet side, we’ve seen a decrease in latency, cost, and a massive increase in available bandwidth. This coincided with the advent of the public cloud like AWS, Azure, GCP, etc. Yes, of course, I’m oversimplifying here. I know there are always exceptions.

WAN

WAN Wide Area Network Topology Internet

Practical Steps for Enhancing Reliability in Cloud Networks - Part I

Kentik

APRIL 4, 2023

By collecting and analyzing network telemetry, including traffic flows, bandwidth usage, packet loss rates, and error rates, NetOps leverage monitoring to detect and diagnose potential bottlenecks, security threats, and other issues that can impact network reliability, often before end users even notice a problem.

Cloud

Cloud Network Networking Bandwidth

Building Meta’s GenAI Infrastructure

Engineering at Meta

MARCH 12, 2024

In the graph below, AllGather collective performance is shown (as normalized bandwidth on a 0-100 scale) when a large number of GPUs are communicating with each other at message sizes where roofline performance is expected. This helped push our large clusters to achieve great and expected performance just as our small clusters.

Infiniband

Infiniband Data Centers Server Network

Why is Cisco ACI replacing traditional networks?

The Network DNA

JUNE 10, 2024

spine-leaf topologies provide excessive-bandwidth, low-latency, non-blocking server-to-server connectivity. Adding spine switches increases fabric bandwidth. Adding leaf switches increases end point bandwidth. Let's look at the top nine benefits and characteristics of ACI as compared to traditional networks.

Network

Network Networking Virtual machine IP Address

How to Configure Static Routes on Cisco

NW Kings

JANUARY 7, 2025

This feature in networks predicts and stabilizes the topology. Low Overhead : Static routes do not consume bandwidth for routing updates or require additional CPU resources to compute paths. Disadvantages: Manual updates required for topology changes, not scalable for large networks, and risk of misconfiguration.

Routers

Routers IP Address Protocol Topology

Why is my SaaS application so slow?

Kentik

DECEMBER 13, 2021

If you’re working from home, could it be possible that you’re competing with other local devices for that precious, limited bandwidth? For example, after thorough review of the traceroute and the BGP topology, a peering connection at the local IX could resolve the issue. Check your local network.

Application

Application DNS Routers Server

Independent Compliance and Security Assessment – Two Additions to the All-New Cato Management Application

CATO Networks

DECEMBER 14, 2021

New Topology View and a New Backend The top-level topology view has been redesigned to accommodate deployments of thousands of sites and tens of thousands of users. We enhanced security reporting with an all-new threats dashboard and opened up application performance with another new dashboard.

Application

Application Topology Cloud Bandwidth

Traditional WAN vs. SD-WAN: Everything You Need to Know

CATO Networks

AUGUST 22, 2023

Some of its limitations includes: Cost: MPLS connections are expensive and have hard caps on available bandwidth. If an organizations bandwidth needs exceed the current hardware capacity, new or additional hardware is required, and this can be a slow and expensive process.

WAN

WAN Bandwidth Data Centers Routers

What’s next for the internet in Afghanistan?

Kentik

AUGUST 31, 2021

The diagram below illustrates the topology of the internet of Afghanistan based on BGP data from 10 years ago. Six ASNs represented its domestic internet with international bandwidth coming from either satellite providers or its neighboring countries, Pakistan, Tajikistan, Uzbekistan and Iran.

Internet

Internet Government Wireless Bandwidth

The business case for SD-WAN: Because MPLS is Not Fit for the Cloud

CATO Networks

NOVEMBER 15, 2017

The problem is that as companies adopt cloud-based services, deploy more bandwidth-intensive applications, and connect an increasing number of devices and remote locations, business requirements change and new technical challenges arise. However, it comes with performance limitations and other challenges.

MPLS

MPLS WAN Cloud Wide Area Network

Network observability: Hype or reality?

Kentik

AUGUST 30, 2021

Why is my bandwidth bill so high? Commonly cited problems are a loss of visibility or understanding of the topology, loss of control over network policies (because developers can now create network constructs on their own), and new networking tools from the cloud providers that are often siloed and shallow in features.

Network

Network Networking DevOps Cloud

Built-In Multi-Region Replication with Confluent Platform 5.4-preview

Confluent

SEPTEMBER 16, 2019

However, in order to operate a reliable stretch cluster, datacenters must be relatively close to each other and have a very stable, low latency, and high-bandwidth connection among the DCs. datacenter topology. This changes with the preview release of Confluent Platform 5.4, This is sometimes referred to as a 2.5

Bandwidth

Bandwidth WAN Topology Networking

SNMP vs. NetFlow

Kentik

JANUARY 29, 2020

That includes adding in high-value data such as threat feed and threat modeling, routing, topology, and other important networking information to model answers to difficult questions. Flow can also be used to understand consumption of bandwidth in a more granular manner.

Port

Port Network Networking Data Centers

The Network Also Needs to be Observable, Part 3: Network Telemetry Types

Kentik

JANUARY 28, 2021

Why is my bandwidth bill so high? What users and applications are consuming my network bandwidth? The (typically static) configuration data representing the operating intent for all configurable network elements such as addresses, ID’s, ACLs, topology info, location data, even device details such as hardware and software versions.

Network

Network Networking DNS IP Address

SNMP vs. Flow

Kentik

JANUARY 29, 2020

That includes adding in high-value data such as threat feed and threat modeling, routing, topology, and other important networking information to model answers to difficult questions. Flow can also be used to understand consumption of bandwidth in a more granular manner.

Port

Port Network Networking Data Centers

SD-WAN and Cloud Security

CATO Networks

MAY 6, 2018

Traditionally, enterprises configure their WAN in a classic hub-and-spoke topology, where users in sites access resources in headquarters or a datacenter. Bandwidth-intensive traffic, bound for the Internet and cloud, are backhauled across the MPLS WAN.

WAN

WAN Cloud Wide Area Network MPLS

Securing Your Network Against Attacks: Prevent, Detect, and Mitigate Cyberthreats

Kentik

MARCH 15, 2023

Cyberthreat strategies have evolved in step with modern cloud networks, often using cheap, virtualized cloud resources to exploit the threat surface topology I briefly described above. These attacks aim to overwhelm a service’s bandwidth capabilities with prohibitively high traffic volumes. Protocol-based.

Network

Network Networking Protocol IP Address

IT Managers: Read This Before Leaving Your MPLS Provider

CATO Networks

APRIL 20, 2022

Youve been told to cut costs Its no secret that MPLS circuits cost a fortune often 3-4x the price of MPLS alternatives (like SD-WAN,) for only a fraction of the bandwidth. Get crystal clear on your WAN challenges: Do any of these challenges sound familiar? But the bottom line isnt the only factor to take into consideration.

MPLS

MPLS WAN SASE Network

Internet Underlay Visibility is Critical for SD-WAN Overlays

Kentik

MARCH 26, 2018

To quickly resolve problems in either scenario, IT managers need new tools that can gain visibility into the end-to-end topology and various paths that traffic flows are traversing from the Internet breakout connection into multiple cloud provider networks.

WAN

WAN Internet Wide Area Network MPLS

When Reliability Goes Wrong in Cloud Networks

Kentik

MAY 31, 2023

Under this model, network topology is highly variable, creating a complexity that can mask root causes and make proactive availability configurations a highly brittle point of the network. Replication, analysis, and data transfer all present opportunities for security threats, data integrity loss, and intense bandwidth and memory consumption.

Network

Network Networking Cloud Routers

Engineering dependability and fault tolerance in a distributed system

High Scalability

FEBRUARY 19, 2021

The mechanisms described above — such as the role placement algorithm — can only be effective when all of the participating entities are in agreement on the topology of the cluster together with the status and health of each node. For example, there is a mechanism to manage the relocation of roles when the topology changes.

Engineering

Engineering Topology Protocol Networking

What is SD-WAN?

CATO Networks

AUGUST 7, 2018

Using the old approach to support the new needs results in expensive global connectivity, complex topologies and widely dispersed point products that are difficult to maintain and secure. SD-WANs reduce bandwidth costs by leveraging inexpensive services, such as Internet broadband, whenever possible. What are the Benefits of SD-WANs?

WAN

WAN MPLS Wide Area Network Data Centers

Choosing an SD-WAN Architecture for Real-Time Communications

CATO Networks

SEPTEMBER 13, 2017

Note that enterprise topologies that have all sites close or all on a single access carrier may not have this exposure to the core Internet. However, in the event of network issues or congestion, mechanisms to allocate the available reduced bandwidth for optimal business value are critical. One area is identity.

WAN

WAN MPLS Media Cloud

Top Ten Technology Trends for 2024

Vedcraft

JUNE 16, 2024

Team Topologies approach to organizing software engineering teams has emerged as a great reference for building an effective platform engineering team. Click here to see the consolidated list of tools & technologies.

Cloud

Cloud Engineering Application Data Centers

Building and deploying MySQL Raft at Meta

Engineering at Meta

MAY 16, 2023

MySQL Raft replication topologies A Raft ring would consist of several MySQL instances (four in the diagram) in different regions. Once in a while, automation could also change the regional placement of MySQL topology. The communication round-trip time (RTT) between these regions would range from 10 to 100 milliseconds.

Engineering

Engineering Protocol Server Topology

IT Networking Pro Today

A RoCE network for distributed AI training at scale

Certification Internet service via iPerf3

Trending Sources

How Meta trains large language models at scale

Using Chakra execution traces for benchmarking and network performance optimization

Why latency is the new outage

Today’s Enterprise WAN Isn’t What It Used To Be

Practical Steps for Enhancing Reliability in Cloud Networks - Part I

Building Meta’s GenAI Infrastructure

Why is Cisco ACI replacing traditional networks?

How to Configure Static Routes on Cisco

Why is my SaaS application so slow?

Independent Compliance and Security Assessment – Two Additions to the All-New Cato Management Application

Traditional WAN vs. SD-WAN: Everything You Need to Know

What’s next for the internet in Afghanistan?

The business case for SD-WAN: Because MPLS is Not Fit for the Cloud

Network observability: Hype or reality?

Built-In Multi-Region Replication with Confluent Platform 5.4-preview

SNMP vs. NetFlow

The Network Also Needs to be Observable, Part 3: Network Telemetry Types

SNMP vs. Flow

SD-WAN and Cloud Security

Securing Your Network Against Attacks: Prevent, Detect, and Mitigate Cyberthreats

IT Managers: Read This Before Leaving Your MPLS Provider

Internet Underlay Visibility is Critical for SD-WAN Overlays

When Reliability Goes Wrong in Cloud Networks

Engineering dependability and fault tolerance in a distributed system

What is SD-WAN?

Choosing an SD-WAN Architecture for Real-Time Communications

Top Ten Technology Trends for 2024

Building and deploying MySQL Raft at Meta

Stay Connected