February, 2024

article thumbnail

Introducing Apache Kafka 3.7

Confluent

Apache Kafka 3.7 introduces updates to the Consumer rebalance protocol, an official Apache Kafka Docker image, JBOD support in Kraft-based clusters, and more!

Protocol 140
article thumbnail

DotSlash: Simplified executable deployment

Engineering at Meta

We’ve open sourced DotSlash , a tool that makes large executables available in source control with a negligible impact on repository size, thus avoiding I/O-heavy clone operations. With DotSlash, a set of platform-specific executables is replaced with a single script containing descriptors for the supported platforms. DotSlash handles transparently fetching, decompressing, and verifying the appropriate remote artifact for the current operating system and CPU.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

OLMo is Here, Powered by Mosaic AI + Databricks

databricks

As Chief Scientist (Neural Networks) at Databricks, I lead our research team toward the goal of giving everyone the ability to build and.

article thumbnail

Introducing DoorDash’s In-House Search Engine

DoorDash Engineering

We reviewed the architecture of our global search at DoorDash in early 2022 and concluded that our rapid growth meant within three years we wouldn’t be able to scale the system efficiently, particularly as global search shifted from store-only to a hybrid item-and-store search experience. Our analysis identified Elasticsearch as our architecture’s primary bottleneck.

article thumbnail

Three Ways AI Will Change in the New Year

Dataversity

In the fast-paced landscape of 2023, organizations embraced artificial intelligence (AI) and its related technologies, experiencing a surge in diverse AI applications. According to data from McKinsey, there was a significant 55% adoption rate of AI across global industries by employees. However, as we step into 2024, organizations recognize that while AI is critical for competitiveness and […] The post Three Ways AI Will Change in the New Year appeared first on DATAVERSITY.

article thumbnail

Article: A Primer on Idempotence for AWS Serverless Architecture

InfoQ Articles

Understand idempotence in AWS serverless, tackling challenges from at-least-once delivery. Implement and automate with AWS Lambda, emphasizing early planning for consistent outcomes. Use tools like Lambda Powertools and prioritize testing for reliability.

Cloud 117
article thumbnail

Welcome Noteable: Making Data Streaming Easier and More Approachable

Confluent

Confluent has hired many Noteable employees to help make application development easier for both Kafka and Flink developers.

More Trending

article thumbnail

Announcing Public Preview of Delta Sharing with Cloudflare R2 Integration

databricks

Special thanks to Phillip Jones, Senior Product Manager, and Harshal Brahmbhatt, Systems Engineer from Cloudflare for their contributions to this blog. Organizations across.

article thumbnail

Documenting Critical Data Elements

TDAN

Many Data Governance or Data Quality programs focus on “critical data elements,” but what are they and what are some key features to document for them? A critical data element is any data element in your organization that has a high impact on your organization’s ability to execute its business strategy.

Email 105
article thumbnail

Ask a Data Ethicist: Can We Trust Unexplainable AI?

Dataversity

In last month’s column, I asked readers to send in their “big questions” when it comes to data and AI. This month’s question more than answered that call! It encompasses the enormous areas of trust in AI tools and explainability. How can we know if an AI tool is delivering an ethical result if we have […] The post Ask a Data Ethicist: Can We Trust Unexplainable AI?

article thumbnail

Article: Spring Boot 3.2 and Spring Framework 6.1 Add Java 21, Virtual Threads, and CRaC

InfoQ Articles

Spring Framework 6.1 and Spring Boot 3.2 run on Java 21. They make concurrent programming simpler and more efficient with virtual threads, as well as improving reactive programming and Kotlin coroutines. For “Scale to Zero” startup time reduction, the OpenJDK project CRaC received initial support, while the existing GraalVM Native Image integration got faster through a GraalVM release.

109
109
article thumbnail

New with Confluent Platform: Seamless Migration Off ZooKeeper, Arm64 Support, and More

Confluent

Confluent Platform 7.6 brings upgrading for existing clusters from ZooKeeper to KRaft, compaction support for Tiered Storage, OAuth (early access), improvements to the Oracle CDC premium connector, and more.

121
121
article thumbnail

Aligning Velox and Apache Arrow: Towards composable data management

Engineering at Meta

We’ve partnered with Voltron Data and the Arrow community to align and converge Apache Arrow with Velox , Meta’s open source execution engine. Apache Arrow 15 includes three new format layouts developed through this partnership: StringView, ListView, and Run-End-Encoding (REE). This new convergence helps Meta and the larger community build data management systems that are unified, more efficient, and composable.

article thumbnail

Announcing the General Availability of Azure Private Link and Azure Storage firewall support for Databricks SQL Serverless

databricks

We are excited to announce the upcoming general availability of Azure Private Link support for Databricks SQL (DBSQL) Serverless, planned in April 2024.

Firewall 112
article thumbnail

Health Care Outside of the Box

Cloudera Blog

How enterprise-grade data management creates better and more efficient care. In the last few years, the acceptance of telehealth has become more widespread as patients and providers found they could maintain continuity through phone and video collaboration, instead of in-person visits. In many cases, a level of care that once required a drive to the clinic or hospital could be delivered over a mobile phone or laptop, with no travel and no waiting room.

article thumbnail

Protecting Your Data: 5 IAM Trends to Watch

Dataversity

In our increasingly digital world, organizations recognize the importance of securing their data. As cloud-based technologies proliferate, the need for a robust identity and access management (IAM) strategy is more critical than ever. IAM serves as the gatekeeper to an organization’s sensitive information, ensuring that only authorized individuals have an appropriate level of access.

Cloud 110
article thumbnail

Article: How Platform and Site Reliability Engineering Are Evolving DevOps

InfoQ Articles

Companies are now looking to grow and more effectively manage DevOps with platform engineering and site reliability engineering roles. No one has these roles perfectly carved out right now — there’s just too much to do and not enough people to do it — but knowing where these three disciplines do and don’t overlap will help organizations evolve and take advantage when they are ready.

DevOps 106
article thumbnail

IoT Data Streaming for Building Private Wireless Networks

Confluent

Confluent enables real-time, reliable, scalable, and secure communication between IoT devices, applications, and backend systems. Streamline data processing and unlock analytics to boost productivity and time to market while lowering infrastructure costs.

IoT 119
article thumbnail

Meta loves Python

Engineering at Meta

By now you’re already aware that Python 3.12 has been released. But did you know that several of its new features were developed by Meta ? Meta engineer Pascal Hartig ( @passy ) is joined on the Meta Tech Podcast by Itamar Oren and Carl Meyer, two software engineers at Meta, to discuss their teams’ contributions to the latest Python release, including new hooks that allow for custom JITs like Cinder , Immortal Objects , improvements to the type system, faster comprehensions, and more.

article thumbnail

Databricks adds new migration Brickbuilder Solutions to help customers succeed with AI

databricks

For the past two years, Databricks has collaborated with leading consulting partners to build innovative solutions for industry, migration, and data and AI.

110
110
article thumbnail

The 4 most important tools for data-first product development

Mixpanel

One of the biggest challenges of building a product is that your users often don’t know what features they want or need. They just know what outcome they want to achieve by using the product. Even when users think they know what they want, they may not always be right. It’s only after they’ve made a feature request, you’ve shipped it, and they’ve tried it out that they realize, “Oh wait, that’s not the outcome I expected.

Bandwidth 103
article thumbnail

The Cool Kids Corner: CLEAR Communication 

Dataversity

Hello! I’m Mark Horseman, and welcome to The Cool Kids Corner. This is my monthly check-in to share with you the people and ideas I encounter as the data evangelist with DATAVERSITY. (Read last month’s column here.) This month, we’re talking about communication. Communication is the cornerstone of socializing anything you do with data, whether that’s […] The post The Cool Kids Corner: CLEAR Communication appeared first on DATAVERSITY.

article thumbnail

Why building AI-powered agents is so challenging. For now.

Ben Morris

Despite growing excitement about the potential for AI-driven agents, there are a lot of problems to solve before we can build agent-based architectures on any scale…

100
100
article thumbnail

The Art of Lean Governance: Addressing the Elephant in the Room

TDAN

Hands down one of the most frequent observations when walking the data factory at different clients is the excessive use of spreadsheets for data collection and purification. These spreadsheets are part of a critical data enrichment process for getting reports out the door on time.

article thumbnail

How DotSlash makes executable deployment simpler

Engineering at Meta

Andres Suarez and Michael Bolin, two software engineers at Meta, join Pascal Hartig ( @passy ) on the Meta Tech Podcast to discuss the ins and outs of DotSlash , a new open source tool from Meta. DotSlash takes the pain out of distributing binaries and toolchains to developers. Instead of committing large, platform-specific executables to a repository, DotSlash combines a fast Rust program with a JSON manifest prefixed with a #!

article thumbnail

Performance Improvements for Stateful Pipelines in Apache Spark Structured Streaming

databricks

Introduction Apache Spark™ Structured Streaming is a popular open-source stream processing platform that provides scalability and fault tolerance, built on top of the S.

article thumbnail

Article: Generative AI: Shaping a New Future for Fraud Prevention

InfoQ Articles

This article explores how generative AI affects fraud detection by reducing false positives and dynamically adapting to changing fraud patterns. This combination offers a potent preventive solution when integrated with machine learning. The efficacy and scalability of fraud prevention initiatives are enhanced by this innovative approach.

article thumbnail

IT leaders Need to Invest in AI – Could ITAM and FinOps Be the Solution?

Dataversity

Artificial intelligence is the top investment area for CIOs in 2024. IT leaders see in generative AI an opportunity to accelerate innovation, improve employee productivity, and gain competitive advantage. Unfortunately, investing in AI is not cheap. CIOs will need to find significant budget to make traction in their AI roadmap and we believe IT asset […] The post IT leaders Need to Invest in AI – Could ITAM and FinOps Be the Solution?

article thumbnail

Materialized Views in Hive for Iceberg Table Format

Cloudera Blog

Overview This blog post describes support for materialized views for the Iceberg table format. Apache Iceberg is a high-performance open table format for petabyte-scale analytic datasets. It has been designed and developed as an open community standard to ensure compatibility across languages and implementations. It brings the reliability and simplicity of SQL tables to big data while enabling engines like Hive, Impala, Spark, Trino, Flink, and Presto to work with the same tables at the same

Cloud 91
article thumbnail

Stop Complaining About Your Data – And Do Something About It

TDAN

Organizations are drowning in a sea of data, facing challenges that range from inconsistent quality to inefficient and ineffective management. It’s easy to complain about the state of your data, but a more productive tactic involves taking actionable steps to address these issues.

article thumbnail

Data Products, Data Contracts, and Change Data Capture

Confluent

Discover how to build resilient data pipelines with Confluent Data Portal. Learn essential strategies for isolating upstream systems and empowering downstream consumers.

80
article thumbnail

Furthering Our Commitment to Responsible AI Development Through Industry and Government Organizations

databricks

At Databricks, we've upheld principles of responsible development throughout our long-standing history of building innovative data and AI products. We are committed to.

article thumbnail

Article: Advice for Engineering Managers: Enabling Developers To Become (More) Creative

InfoQ Articles

As an engineering manager, it is your responsibility to help facilitate creative thinking skills among the development team, but that's easier said than done. This article provides advice on how can you help amplify the creative thinking skills of your software development colleagues. we examine how different levels of creativity influence creativity and strategies to encourage creativity.

article thumbnail

Experiment Faster and with Less Effort

DoorDash Engineering

Business Policy Experiments Using Fractional Factorial Designs At DoorDash, we constantly strive to improve our experimentation processes by addressing four key dimensions, including velocity to increase how many experiments we can conduct, toil to minimize our launch and analysis efforts, rigor to ensure a sound experimental design and robustly efficient analyses, and efficiency to reduce costs associated with our experimentation efforts.

article thumbnail

Back to the Financial Regulatory Future

Cloudera Blog

It’s hard to believe it’s been 15 years since the global financial crisis of 2007/2008. While this might be a blast from the past we’d rather leave in the proverbial rear-view mirror, in March of 2023 we were back to the future with the collapse of Silicon Valley Bank (SVB), the largest US bank to fail since 2008. While there are clear reasons SVB collapsed, which can be reviewed here , my purpose in this post isn’t to rehash the past but to present some of the regulatory and compliance c

article thumbnail

7 Ways AI Will Transform Data Storage

Dataversity

The rapid adoption of artificial intelligence and machine learning (AI/ML) over the past year has transformed just about everything – ushering in a new era of innovation and growth the world has never seen. The same goes for data storage, where the technologies’ impact will be transformative, enabling greater business agility that companies need to […] The post 7 Ways AI Will Transform Data Storage appeared first on DATAVERSITY.

article thumbnail

Introducing Confluent’s Migration Accelerator: Accelerate Your Journey to a Complete Data Streaming Platform

Confluent

Confluent Migration Accelerator, a new program in partnership with the Confluent partner ecosystem to jump-start organizations' data streaming journeys.

80