August, 2023

article thumbnail

Activating Data from the Lakehouse: Databricks Ventures Invests in Hightouch

databricks

It’s no secret that modern organizations are doubling down on their investments in data - investments that uncover deep customer insights that provide a.

120
120
article thumbnail

How to shuffle a big dataset (2018)

Jane Street

At Jane Street, we often work with data that has a very lowsignal-to-noise ratio, but fortunately we also have a lot of data.Where practitioners in many fiel.

111
111
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

10 Highest-Paying Data Analytics Jobs in 2023

Dataversity

As one of the fastest-growing fields, technology continues to drive transformative changes across various industries, with new advancements emerging each year. Consequently, the demand for data analytics jobs is expected to surge in the near future, with a significant need for data science practitioners worldwide. The U.S. Bureau of Labor Statistics (2021) projects a 22% growth […] The post 10 Highest-Paying Data Analytics Jobs in 2023 appeared first on DATAVERSITY.

article thumbnail

Powering Renewable Energy with Data Streaming

Confluent

How real-time data streaming is powering peer-to-peer trading of renewable energy with ever-increasing data volumes.

Energy 98
article thumbnail

Introducing Immortal Objects for Python

Engineering at Meta

Instagram has introduced Immortal Objects – PEP-683 – to Python. Now, objects can bypass reference count checks and live throughout the entire execution of the runtime, unlocking exciting avenues for true parallelism. At Meta, we use Python (Django) for our frontend server within Instagram. To handle parallelism, we rely on a multi-process architecture along with asyncio for per-process concurrency.

Server 98
article thumbnail

Article: Streamlining Code with Unnamed Patterns/Variables: A Comparative Study of Java, Kotlin, and Scala

InfoQ Articles

Explore the use of the Unnamed Patterns/Variables in programming languages like Java, Kotlin, and Scala. Enhancing code readability, allowing omission of unnecessary components, and simplifying code are key features. Expect further innovative uses as languages evolve.

96
article thumbnail

Announcing Databricks Belgrade Development Center

databricks

We are thrilled to announce the opening of Databricks’ latest development center in Belgrade, Serbia. This addition joins our existing R&D centers in A.

97

More Trending

article thumbnail

The Cool Kids Corner: Change Management for Data Literacy

Dataversity

Hello! I’m Mark Horseman, and welcome to The Cool Kids Corner. This is my monthly check-in to share with you the people and ideas I encounter as a data evangelist with DATAVERSITY. This month we’re talking data literacy and change management. What is data literacy? Why is it important? What are the barriers? How do […] The post The Cool Kids Corner: Change Management for Data Literacy appeared first on DATAVERSITY.

article thumbnail

Confluent Champion: Niki Kapsi’s Journey From SDR to Commercial Account Executive

Confluent

Meet Commercial AE Niki Kapsi and learn about the “entrepreneurial” side of her role at Confluent.

98
article thumbnail

Scaling the Instagram Explore recommendations system

Engineering at Meta

Explore is one of the largest recommendation systems on Instagram. We leverage machine learning to make sure people are always seeing content that is the most interesting and relevant to them. Using more advanced machine learning models, like Two Towers neural networks, we’ve been able to make the Explore recommendation system even more scalable and flexible.

Media 98
article thumbnail

Article: AI-based Prose Programming for Subject Matter Experts: Will this work?

InfoQ Articles

In this article, author Markus Völter discusses the future of programming using Large Language Model (LLM) tools like ChatGPT and GitHub’s Copilot for prose-to-code generation. He also talks about what new approaches and language changes need to be in place to help non-programmers take advantage of the "program in prose" techniques.

article thumbnail

Using MLflow AI Gateway and Llama 2 to Build Generative AI Apps

databricks

To build customer support bots, internal knowledge graphs, or Q&A systems, customers often use Retrieval Augmented Generation (RAG) applications which leverage pre-trained models.

Gateway 96
article thumbnail

How DoorDash Migrated from StatsD to Prometheus

DoorDash Engineering

Accurate and reliable observability is essential when supporting a large distributed service, but this is only possible if your tools are equally scalable. Unfortunately, this was a challenge at DoorDash because of peak traffic failures while using our legacy metrics infrastructure based on StatsD. Just when we most needed observability data, the system would leave us in the lurch.

Server 83
article thumbnail

9 Best Practices for Real-Time Data Management

Dataversity

In the era of digital transformation, data has become the new oil. Businesses increasingly rely on real-time data to make informed decisions, improve customer experiences, and gain a competitive edge. However, managing and handling real-time data can be challenging due to its volume, velocity, and variety. This article will guide you through nine best practices […] The post 9 Best Practices for Real-Time Data Management appeared first on DATAVERSITY.

article thumbnail

Developing a Career at Confluent: Collaboration Is Key

Confluent

Senior software engineer Yash Mayya talks about his career path to Confluent and working on Kafka Connect.

article thumbnail

Fixit 2: Meta’s next-generation auto-fixing linter

Engineering at Meta

Fixit is dead! Long live Fixit 2 – the latest version of our open-source auto-fixing linter. Fixit 2 allows developers to efficiently build custom lint rules and perform auto-fixes for their codebases. Fixit 2 is available today on PyPI. Python is one of the most popular languages in use at Meta. Meta’s production engineers (PEs) are specialized software engineers (SWEs) who focus on reliability, efficiency, and scalability.

article thumbnail

Article: How Emotional Connections Can Drive Change: Applying Fearless Change Patterns

InfoQ Articles

When trying to bring innovation into an organization, communication is important. It is vital to share information in a clear and logical way but it is just as important to understand and accept how people are feeling about the innovation. To do this, leaders can make use of strategies that help them create an emotional connection.

81
article thumbnail

The Simplification of AI Data

databricks

Talk to any data science organization and they will almost unanimously tell you that the biggest challenge to building high quality AI models.

94
article thumbnail

How DoorDash Improves Holiday Predictions via Cascade ML Approach

DoorDash Engineering

At DoorDash, we generate supply and demand forecasts to proactively plan operations such as acquiring the right number of Dashers (delivery drivers) and adding extra pay when we anticipate low supply. It is challenging to generate accurate forecasts during holidays because certain machine learning techniques (e.g., XGBoost , Gradient Boosting , Random Forest ) have difficulty handling high variation with limited data.

article thumbnail

Why AI Forces Data Management to Up Its Game

Dataversity

The Information Age has flooded the modern enterprise with data. Demand for enterprise storage capacity will only increase in the years ahead. By the end of this decade, new enterprise storage capacity shipments are forecast to be 15 ZB per year, with the active installed base exceeding 45 ZB. Where Is This Growth Coming From? Business and […] The post Why AI Forces Data Management to Up Its Game appeared first on DATAVERSITY.

article thumbnail

What is an Apache Kafka Cluster? (And Why You Should Care)

Confluent

Learn what an Apache Kafka cluster is, and what makes a cluster special.

96
article thumbnail

Scheduling Jupyter Notebooks at Meta

Engineering at Meta

At Meta, Bento is our internal Jupyter notebooks platform that is leveraged by many internal users. Notebooks are also being used widely for creating reports and workflows (for example, performing data ETL ) that need to be repeated at certain intervals. Users with such notebooks would have to remember to manually run their notebooks at the required cadence – a process people might forget because it does not scale with the number of notebooks used.

Network 96
article thumbnail

Article: Engineering as Art: Embracing Creativity beyond Science

InfoQ Articles

Achieving a staff+ engineering role is a considerable achievement that many engineers seek as the next step in their career growth. In this article, we’ll discuss the challenges that staff+ engineers can face and how our struggles are similar to those of artists. Specifically, we’ll look at the parallels between creating art, creating software, and dealing with organizational dynamics.

article thumbnail

Delta UniForm: a universal format for lakehouse interoperability

databricks

One of the key challenges that organizations face when adopting the open data lakehouse is selecting the optimal format for their data. Among.

article thumbnail

Oxidizing OCaml: Data Race Freedom

Jane Street

OCaml with Jane Street extensions is available from our public opam repo. Only a slice of the features described in this series are currently implemented.

52
article thumbnail

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Dataversity

Generating actionable insights across growing data volumes and disconnected data silos is becoming increasingly challenging for organizations. Working across data islands leads to siloed thinking and the inability to implement critical business initiatives such as Customer, Product, or Asset 360. As data is generated, stored, and used across data centers, edge, and cloud providers, managing a […] The post Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data a

article thumbnail

Celebrating Excellence: Kora wins ‘Best Industry Paper’ at 2023 VLDB Conference

Confluent

Learn how Confluent’s cloud-native Apache Kafka engine stood out from other data management systems with its uniquely elastic, reliable, and cost-efficient design

article thumbnail

Using short-lived certificates to protect TLS secrets

Engineering at Meta

Short-lived certificates (SLCs) are part of our latest efforts to further secure our Transport Layer Security (TLS) private keys on our edge networks. SLCs have a very short exposure compared to traditional certificates and lower the chances of a compromised private key being abused. Implementing SLCs has required us to address tradeoffs between operability and reliability, while satisfying the strict security requirements of our edge environment.

article thumbnail

Article: Leveraging Eclipse JNoSQL 1.0.0: Quarkus Integration and Building a Pet-Friendly REST API

InfoQ Articles

Eclipse JNoSQL 1.0.0 modernizes NoSQL integration with advanced features, standardized specs (Jakarta NoSQL & Jakarta Data), enhanced queries, schema migration, and Quarkus framework compatibility. It simplifies NoSQL use, boosts performance, scalability, and integrates seamlessly. Empowering developers with tools to streamline data management in modern apps.

DevOps 76
article thumbnail

What’s new with Databricks SQL?

databricks

At this year's Data+AI Summit, Databricks SQL continued to push the boundaries of what a data warehouse can be, leveraging AI across the.

89
article thumbnail

Achieving NIS2 Compliance: Essential Steps for Companies 

CATO Networks

Introduction In an increasingly digital world, cybersecurity has become a critical concern for companies. With the rise of sophisticated cyber threats, protecting critical infrastructure and ensuring the continuity of essential services has become a top priority. The EUs Network and Information Security Directive (NIS2), which supersedes the previous directive from 2016, establishes a framework to enhance the security and resilience of network and information systems.

SASE 52
article thumbnail

A Data-Driven Organization Requires Everyone’s Hands on the Wheel

Dataversity

Companies are driving ahead with data transformation – but many run into challenges right from the start. So, where are they going off track? First, it’s important to define what it means to be data-driven. Data-driven organizations not only collect data, they collect the right data and use it to inform all decisions made across the business. These […] The post A Data-Driven Organization Requires Everyone’s Hands on the Wheel appeared first on DATAVERSITY.

article thumbnail

Introducing Confluent Platform 7.5

Confluent

Confluent Platform 7.5 brings SSO for Control Center, simplified interface with Confluent using v3 of the REST proxy API, and bidirectional Cluster Linking.

80
article thumbnail

How Meta is improving password security and preserving privacy

Engineering at Meta

Meta is developing new privacy-enhancing technologies (PETs) to innovate and solve problems with less data. These technologies enable teams to build and launch privacy-enhanced products in a way that’s verifiable and safeguards user data. Using state-of-the-art cryptographic techniques, we have developed Private Data Lookup (PDL) that allows users to privately query a server-side data set.

Server 95
article thumbnail

Article: Reducing Verification Lead Time by 50% by Lowering Defect Slippage and Applying AI/ML Techniques

InfoQ Articles

Can we increase our flexibility? Can we increase our test coverage? Can we increase our efficiency? And is it possible to reduce our verification lead-time by 50%? One company challenged itself with these questions. This article explores two important “‘pillars”’ of their testing strategy: shifting left and using state-of-the-art techniques to support verification activities.

article thumbnail

Efficient Fine-Tuning with LoRA: A Guide to Optimal Parameter Selection for Large Language Models

databricks

With the rapid advancement of neural network-based techniques and Large Language Model (LLM) research, businesses are increasingly interested in AI applications for value.

article thumbnail

Day Two Cloud 208: HashiCorp Licensing Changes And The Day Two Cloud-Chaos Lever Crossover

Packet Pushers

Today on Day Two Cloud we dive into the implications of licensing changes that HashiCorp has made to its popular Terraform software. In short, the company has switched from an open source to a business source license. HashiCorp says it felt compelled to make the change to ensure that some other business entity doesn't take the open-source software and turn it into a competing product (looking at you, AWS).

Cloud 52