Remove Bandwidth Remove Engineering Remove Protocol
article thumbnail

Watch Meta’s engineers discuss optimizing large-scale networks

Engineering at Meta

Also, the pivot to metaverse has led to a significant increase in AI, HPC, and machine learning workloads that demand huge networking bandwidth and compute capacity and pose challenges around safe co-existence of existing web, legacy and modern workloads.

article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

We ensure that there is enough ingress bandwidth on the rack switch to not hinder the training workload. The BE is a specialized fabric that connects all RDMA NICs in a non-blocking architecture, providing high bandwidth, low latency, and lossless transport between any two GPUs in the cluster, regardless of their physical location.

Network 132
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Protocols of Transport Layer Explained

NW Kings

The protocols of transport layer play a significant role in data transmission across the network, ensuring reliable communication between sender and receiver. This blog will explore the key protocols within the transport layer, focusing on TCP ( Transmission Control Protocol ) and UDP (User Datagram Protocol) in detail.

article thumbnail

Better video for mobile RTC with AV1 and HD

Engineering at Meta

But, as we’ve implemented AV1 for mobile RTC , we’ve also had to address a number of challenges including scaling, improving video quality for low-bandwidth users as well as high-end networks, CPU and battery usage, and maintaining quality stability. As seen in Figure 1, some calls operate in very low-bandwidth conditions.

Bandwidth 104
article thumbnail

Practical Steps for Enhancing Reliability in Cloud Networks - Part I

Kentik

More than anything, reliability becomes the principal challenge for network engineers working in and with the cloud. Even the most detailed reliability engineering can be easily undermined in an insecure network. While there is much to be said about cloud costs and performance , I want to focus this article primarily on reliability.

Cloud 104
article thumbnail

Using Device Telemetry to Answer Questions About Your Network Health

Kentik

Traditional network monitoring relies on telemetry sources such as Simple Network Messaging Protocol (SNMP), sFlow, NetFlow, CPU, memory, and other device-specific metrics. network operators and engineers cast as wide a net as possible to source their telemetry. What is device telemetry?

Network 97
article thumbnail

How Meta trains large language models at scale

Engineering at Meta

Solving this problem requires a robust and high-speed network infrastructure as well as efficient data transfer protocols and algorithms. This requires robust and high-speed network infrastructure as well as efficient data transfer protocols and algorithms. This has encompassed developments in a wide range of areas.