Remove Application Remove Fashion Remove Server
article thumbnail

Maintaining large-scale AI capacity at Meta

Engineering at Meta

Instead, we ensure components are compatible with each other and roll component upgrades up in a sliding fashion. Maintenance trains Meta maintains capacity by using maintenance trains, which involves shutting down small amounts of capacity in a cyclic fashion. This approach also allows us to guarantee capacity availability.

Fashion 138
article thumbnail

Building Meta’s GenAI Infrastructure

Engineering at Meta

It played and continues to play an important role in the development of Llama and Llama 2 , as well as advanced AI models for applications ranging from computer vision, NLP, and speech recognition, to image generation , and even coding. Under the hood Our newer AI clusters build upon the successes and lessons learned from RSC.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

NetOps for Application Developers: Understanding the Importance of Network Operations in Modern Development

Kentik

One of the great successes of software development in the last ten years has been the relatively decentralized approach to application development made available by containerization, allowing for rapid iteration, service-specific stacks, and (sometimes) elegant deployment and orchestration implementations that piece it all together.

article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

The scheduler does this by learning the position of GPU servers in the logical topology to recommend a rank assignment. The second approach involved posting each message to a different queue, in a round-robin fashion. It may depend on the relative throughput between GPU and network, which may not be applicable to all scenarios.

article thumbnail

Deploying Kafka Streams and KSQL with Gradle – Part 2: Managing KSQL Implementations

Confluent

Now in part 2, we’ll discuss the challenges we faced developing, building, and deploying the KSQL portion of our application and how we used Gradle to address them. We’ll demonstrate using Gradle to execute and test our KSQL streaming code, as well as building and deploying our KSQL applications in a continuous fashion.

article thumbnail

Journey to Event Driven – Part 2: Programming Models for the Event-Driven Architecture

Confluent

Although the principles behind event-driven frameworks are sound, those behind event sourcing, CQRS and hydrating application state are separate concerns so we often see them handled explicitly as an orthogonal concern (e.g., operational processes) or externally (think GitHub for your applications state). Scaling mechanism.

article thumbnail

Journey to Event Driven – Part 3: The Affinity Between Events, Streams and Serverless

Confluent

Serverless functions provide a synergistic relationship with event streaming applications; they behave differently with respect to streaming workloads but are both event driven. They moved from simple use cases like, “build a thumbnail of this image,” to mainstream application logic like process payments. VM, container, server.

Cloud 109