Remove Engineering Remove Port Remove Topology
article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

Topology We built a dedicated backend network specifically for distributed training. To support large language models (LLMs), we expanded the backend network towards the DC-scale, e.g., incorporating topology-awareness into the training job scheduler. We designed a two-stage Clos topology for AI racks, known as an AI Zone.

Network 132
article thumbnail

Announcing Complete Azure Observability for Kentik Cloud

Kentik

Live traffic flow arrows demonstrate how Azure Express Routes, Firewalls, Load Balancers, Application Gateways, and VWANs connect in the Kentik Map, which updates dynamically as topology changes for effortless architecture reference. For example, Express Route metrics include data about inbound and outbound dropped packets.

Cloud 105
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Multi-Cloud Made Simple: Announcing Kentik Observability Enhancements for AWS and Google Cloud

Kentik

Kentik Cloud users can now access the new Kentik Map for Google Cloud to automatically visualize detailed Google Cloud and hybrid cloud infrastructure topology. Advanced analysis : With Kentik’s powerful analytics engine, you can perform an in-depth analysis of flow logs from any cloud.

Cloud 97
article thumbnail

Building Shared State Microservices for Distributed Systems Using Kafka Streams

Confluent

At the core of each shared state microservice we built was a Kafka Streams instance with a rather simple processing topology. Additionally, our processing topology includes a second key-value store for scheduling. Nitzan Gilkis is a senior software engineer at Imperva, where he works on the DDoS Protection for Networks product.

article thumbnail

SNMP vs. NetFlow

Kentik

This includes port, IP source, destination, port numbers, and other markings such as quality of service (QoS). SNMP data is also used by network engineers to troubleshoot reported problems along with network architects to do things like capacity planning.

Port 16
article thumbnail

SNMP vs. Flow

Kentik

This includes port, IP source, destination, port numbers, and other markings such as quality of service (QoS). SNMP data is also used by network engineers to troubleshoot reported problems along with network architects to do things like capacity planning.

Port 78
article thumbnail

Network observability: Hype or reality?

Kentik

The term has a literal engineering definition, that, in a nutshell, means the internal state of any system is knowable solely by external observation. Port numbers and IP addresses are less useful in traffic analytics. The concept of observability has taken hold in the DevOps, SRE and application performance monitoring (APM) space.

Network 85