article thumbnail

Article: Site Reliability Engineering for Native Mobile Apps

InfoQ Articles

In this article, we will describe how we can apply Site Reliability Engineering (SRE) principles to the mobile app development. Then, we will delve into organization topology, i.e. how an organization can be designed to adopt SRE for mobile app development. By Abhijith Krishnappa.

article thumbnail

Watch: Meta’s engineers on building network infrastructure for AI

Engineering at Meta

The 2023 edition of Networking at Scale focused on how Meta’s engineers and researchers have been designing and operating the network infrastructure over the last several years for Meta’s AI workloads, including our numerous ranking and recommendation workloads and the immense GenAI models.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Critically Engaging With Models

Mathias Verraes

Then, well examine three organisational models that are specific to software development: the Spotify Model, the Agile Fluency Model, and Team Topologies. Team Topologies Image: Henny Portman Team Topologies is a software organisational model that focuses on fast flow and value creation.

Topology 162
article thumbnail

Optimizing Kafka Streams Applications

Confluent

Kafka Streams introduced the processor topology optimization framework at the Kafka Streams DSL layer. In what follows, we provide some context around how a processor topology was generated inside Kafka Streams before 2.1, Along with it, we will demonstrate a few known issues that impact efficiency of the generated processor topology.

article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

Topology We built a dedicated backend network specifically for distributed training. To support large language models (LLMs), we expanded the backend network towards the DC-scale, e.g., incorporating topology-awareness into the training job scheduler. We designed a two-stage Clos topology for AI racks, known as an AI Zone.

Network 132
article thumbnail

Routing packets on atypical network topology

Network Engineering

Hello everyone, Im currently working on setting up a redundant Layer 2 link using VXLAN over IPsec for a client. The attached diagram illustrates the current network configuration. My objective is to ensure redundancy for the Layer 2 connection by leveraging VXLAN over IPsec.

article thumbnail

Engineering dependability and fault tolerance in a distributed system

High Scalability

This means a system that is not merely available but is also engineered with extensive redundant measures to continue to work as its users expect. reliability situations, where continuity of service is essential, with redundant elements continuously in-service, such as with airplane engines. This ensures reliability.