Tech on the Rocks

エピソード

How Denormalized is Building ‘DuckDB for Streaming’ with Apache DataFusion

2024/09/13

Summary
In this episode, Kostas and Nitay are joined by Amey Chaugule and Matt Green, co-founders of Denormalized. They delve into how Denormalized is building an embedded stream processing engine—think “DuckDB for streaming”—to simplify real-time data workloads. Drawing from their extensive backgrounds at companies like Uber, Lyft, Stripe, and Coinbase. Amey and Matt discuss the challenges of existing stream processing systems like Spark, Flink, and Kafka. They explain how their approach leverages Apache DataFusion, to create a single-node solution that reduces the complexities inherent in distributed systems.

The conversation explores topics such as developer experience, fault tolerance, state management, and the future of stream processing interfaces. Whether you’re a data engineer, application developer, or simply interested in the evolution of real-time data infrastructure, this episode offers valuable insights into making stream processing more accessible and efficient.

Contacts & Links
Amey Chaugule
Matt Green
Denormalized
Denormalized Github Repo
Chapters
00:00 Introduction and Background
12:03 Building an Embedded Stream Processing Engine
18:39 The Need for Stream Processing in the Current Landscape
22:45 Interfaces for Interacting with Stream Processing Systems
26:58 The Target Persona for Stream Processing Systems
31:23 Simplifying Stream Processing Workloads and State Management
34:50 State and Buffer Management
37:03 Distributed Computing vs. Single-Node Systems
42:28 Cost Savings with Single-Node Systems
47:04 The Power and Extensibility of Data Fusion
55:26 Integrating Data Store with Data Fusion
57:02 The Future of Streaming Systems
01:00:18 intro-outro-fade.mp3
Click here to view the episode transcript.

続きを読む一部表示

1 時間 2 分

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

無料で聴く
Unifying structured and unstructured data for AI: Rethinking ML infrastructure with Nikhil Simha and Varant Zanoyan

2024/08/30

Summary
In this episode, we dive deep into the future of data infrastructure for AI and ML with Nikhil Simha and Varant Zanoyan, two seasoned engineers from Airbnb and Facebook. Nikhil and Varant share their journey from building real-time data systems and ML infrastructure at tech giants to launching their own venture.
The conversation explores the intricacies of designing developer-friendly APIs, the complexities of handling both batch and streaming data, and the delicate balance between customer needs and product vision in a startup environment.
Contacts & Links
Nikhil Simha
Varant Zanoyan
Chronon project

Chapters
00:00 Introduction and Past Experiences
04:38 The Challenges of Building Data Infrastructure for Machine Learning
08:01 Merging Real-Time Data Processing with Machine Learning
14:08 Backfilling New Features in Data Infrastructure
20:57 Defining Failure in Data Infrastructure
26:45 The Choice Between SQL and Data Frame APIs
34:31 The Vision for Future Improvements
38:17 Introduction to Chrono and Open Source
43:29 The Future of Chrono: New Computation Paradigms
48:38 Balancing Customer Needs and Vision
57:21 Engaging with Customers and the Open Source Community
01:01:26 Potential Use Cases and Future Directions
Click here to view the episode transcript.

続きを読む一部表示

1 時間 2 分

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

無料で聴く
Stream processing, LSMs and leaky abstractions with Chris Riccomini

2024/08/23

Overview

In this episode, we chat with Chris Riccomini about the evolution of stream processing and the challenges in building applications on streaming systems. We also chat about leaky abstractions, good and bad API designs, what Chris loves and hates about Rust and finally about his exciting new project that involves object storage and LSMs.

Connect with Chris at:
LinkedIn
X
Blog
Materialized View Newsletter - His newsletter
The missing README - His book
SlateDB - His latest OSS Project
Chapters
00:00 Introduction and Background
04:05 The State of Stream Processing Today
08:53 The Limitations of SQL in Streaming Systems
14:00 Prioritizing the Developer Experience in Stream Processing
18:15 Improving the Usability of Streaming Systems
27:54 The Potential of State Machine Programming in Complex Systems
32:41 The Power of Rust: Compiling and Language Bindings
34:06 The Shift from Sidecar to Embedded Libraries Driven by Rust
35:49 Building an LSM on Object Storage: Cost-Effective State Management
39:47 The Unbundling and Composable Nature of Databases
47:30 The Future of Data Systems: More Companies and Focus on Metadata

Click here to view the episode transcript.

続きを読む一部表示

53 分

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

無料で聴く

特集

カテゴリー別

エピソード

How Denormalized is Building ‘DuckDB for Streaming’ with Apache DataFusion

カートのアイテムが多すぎます

カートに追加できませんでした。

ウィッシュリストに追加できませんでした。

ほしい物リストの削除に失敗しました。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Unifying structured and unstructured data for AI: Rethinking ML infrastructure with Nikhil Simha and Varant Zanoyan

カートのアイテムが多すぎます

カートに追加できませんでした。

ウィッシュリストに追加できませんでした。

ほしい物リストの削除に失敗しました。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Stream processing, LSMs and leaky abstractions with Chris Riccomini

カートのアイテムが多すぎます

カートに追加できませんでした。

ウィッシュリストに追加できませんでした。

ほしい物リストの削除に失敗しました。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました