Mon.Aug 05, 2024

article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for training, enabling large models with hundreds of billions of parameters such as LLAMA 3.1 405B. This week at ACM SIGCOMM 2024 in Sydney, Australia, we are sharing details on the network we have built at Meta over the past few years to support our large-scale distributed AI training workload.

article thumbnail

Article: Architectural Retrospectives: The Key to Getting Better at Architecting

InfoQ Articles

The purpose of an architectural retrospective is to use experience to help the development team improve their architecting skills and their way of working as they make architectural decisions. This is different than traditional architecture reviews which are focused on improving the architecture.

109
109
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Announcing the General Availability of Row and Column Level Security with Databricks Unity Catalog

databricks

Row filters and column masks control data access by filtering rows and masking column values using SQL UDFs in database queries.

109
109
article thumbnail

DCPerf: An open source benchmark suite for hyperscale compute applications

Engineering at Meta

We are open-sourcing DCPerf, a collection of benchmarks that represents the diverse categories of workloads that run in data center cloud deployments. We hope that DCperf can be used more broadly by academia, the hardware industry, and internet companies to design and evaluate future products. DCPerf is available now on GitHub. Hyperscale and cloud datacenter deployments constitute the largest market share of server deployments in the world today.

article thumbnail

#ClouderaLife Employee Spotlight: Stephanie Han

Cloudera Blog

In this Employee Spotlight, we sat down with Stephanie Han to learn about her tenure at Cloudera, her journey from accounting to leading diversity, equality & inclusion (DEI) programs, and her impressive volunteer work. Meet Stephanie Han Stephanie is a Senior Program Manager in the HR team at Cloudera. She’s been with the company since 2019 and plays a key role in a variety of employee-centric initiatives including Cloudera’s employee volunteering program , talent management program, a

article thumbnail

Breaking Down Data Silos for Digital Transformation Success

Dataversity

In the race to become data-driven, many enterprises are stumbling over an age-old hurdle: data silos. A recent study by IDC found that data silos cost the global economy a whopping $3.1 trillion annually. Despite years of digital transformation efforts, the divide between technical and non-technical teams persists, hindering the full potential of data assets.

article thumbnail

Why I am not an FBI Agent

SubnetZero

Some years back I wrote a post poking fun at the Federal Bureau of Investigation, based on an experience I had at a briefing in their office. The funny thing is–and this did not color my article– there was a point in my life when I badly wanted to be an FBI agent.

52

More Trending

article thumbnail

North Carolina A&T Visit Summary

Zynga

North Carolina A&T Visit Summary In March, six Zyngites traveled to Greensboro, North Carolina for a day on the campus of Zynga HBCU partner school, North Carolina A&T. Zynga was thrilled to connect with partners and students from NCAT in person, after so much time spent virtually. The day started with a lunch meeting with the Zynga students who have been part of the scholarship program for 2-3 years, along with the program lead, Dr.

article thumbnail

NB489: Shareholders Sue CrowdStrike; Intel to Fire 15,000 Employees

Packet Pushers

Take a Network Break! This week we discuss a proposed class action lawsuit against CrowdStrike, while Delta investigates options to seek damages from CrowdStrike and Microsoft. Microsoft Azure goes down after a DDoS defense error, campus switch sales are forecast to drop significantly in 2024, and DigiCert warns customers that an error it made will.

article thumbnail

HBCU Game Jam at Spelman College

Zynga

HBCU Game Jam at Spelman College In April, Zynga demonstrated its commitment to fostering talent and diversity in the gaming industry by sponsoring the HBCU Game Jam hosted by Spelman College in Atlanta, Georgia. Drawing nine Zyngites from various parts of North America, this event marked the second year of the HBCU Game Jam, which drew participation from over 120 students representing multiple historically black colleges and universities (HBCUs) including Spelman College, North Carolina A&T

40
article thumbnail

Bootstrapping Talos Linux over SSH

Scott's Weblog

For those that aren’t aware, Talos Linux is a purpose-built Linux distribution designed for running Kubernetes. Bootstrapping a Talos Linux cluster is normally done via the Talos API, but this requires direct network access to the Talos Linux nodes. What happens if you don’t have direct network access to the nodes? In this post, I’ll share with you how to bootstrap a Talos Linux cluster over SSH.

Port 40
article thumbnail

Q&A with Jerry Volcy, Co-Director of the Spelman Innovation Lab

Zynga

Q&A with Jerry Volcy, Co-Director of the Spelman Innovation Lab Tell us who you are and what your role is in the partnership with Zynga? I am the Co-Director of the Spelman Innovation Lab. The Innovation Lab is a unit of the College that aims to put Spelman students in a leadership position for adopting and using technology to enhance the curricular art, humanities, social, and natural science programs of the College.