Remove Application Remove Fashion Remove Networking
article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for training, enabling large models with hundreds of billions of parameters such as LLAMA 3.1 Distributed training, in particular, imposes the most significant strain on data center networking infrastructure.

Network 132
article thumbnail

NetOps for Application Developers: Understanding the Importance of Network Operations in Modern Development

Kentik

One of the great successes of software development in the last ten years has been the relatively decentralized approach to application development made available by containerization, allowing for rapid iteration, service-specific stacks, and (sometimes) elegant deployment and orchestration implementations that piece it all together.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Maintaining large-scale AI capacity at Meta

Engineering at Meta

Meta runs different types of backend networks, topologies, and training jobs that have tight dependencies between software and hardware components. Basically, any kind of operation that updates or verifies software and firmware components in the clusters, including the networking path. And what do we mean by maintaining?

Fashion 138
article thumbnail

Building Meta’s GenAI Infrastructure

Engineering at Meta

We are sharing details on the hardware, network, storage, design, performance, and software that help us extract high throughput and reliability for various AI workloads. Network At Meta, we handle hundreds of trillions of AI model executions per day. We use this cluster design for Llama 3 training.

article thumbnail

When SASE-based XDR Expands into Network Operations: Revolutionizing Network Monitoring

CATO Networks

Cato XDR breaks the mold: Now, one platform tackles both security threats and network issues for true SASE convergence. SASE, or Secure Access Service Edge , represents the core evolution of todays enterprise networks converging network and security functions into a single, unified, cloud-native architecture. And guess what?

SASE 52
article thumbnail

The WAN Accelerator and Modern Network Optimization

CATO Networks

Network latency costs money. However, when I discuss latency reduction and WAN acceleration with network managers and CIOs, one of the key takeaways is that getting network optimization right has changed significantly over the last decade. Application-specific acceleration techniques boost the efficiency of applications.

WAN 52
article thumbnail

Will cloud-based networking be your next WAN?

CATO Networks

And its also no secret that as more applications move to the cloud, significant changes are hosted onto the WAN. And with the cloud, users access applications in and outside of the office. Cloud-based networking makes it simple to address these challenges in a secure and scalable fashion. Lets find out.

WAN 52