Gehen Sie mit der App Player FM offline!
About Data Pipelines
Manage episode 392812462 series 2920782
Lars dove into data pipelines, and emerged bearing arrows and wishing for a lot fewer copies.
What is there to think about regarding data pipelines, what is interesting about them?
Which tools are out there, and why might you want to use them?
Why all this talk about making fewer copies of data?
What does Lars' current ideal pipeline look like, and where does Elixir fit in?
Links
- Matt Topol
- Apache Arrow
- Large language models
- Vector search
- BigQuery
- sed
- AWK
- jq
- Replacing Hadoop with bash - "Command-line Tools can be 235x Faster than your Hadoop Cluster"
- Hadoop
- MapReduce
- Unix pipes
- Directed acyclic graph
- tee - to "materialize inbetween states"
- Apache Beam
- Apache Spark
- Apache Flink
- Apache Pulsar
- Airbyte - shoves data between systems using connectors
- Cronjob
- Fivetran - Airbyte competitor
- Apache Airflow
- ETL - Extract, transform, load
- Designing data-intensive applications
- Stream processing
- Ephemerality
- Data lake
- Data warehouse
- The people's front of Judea
- DBT - SQL-SQL batch-work-thingy
- SQL with Jinja templates
- Snowflake - data warehouse thing
- Scala
- Broadway
- Oban - "robust job processing for Elixir"
- Dashbit
- pandas - Python data library
- APL
- Arrow flight
- GRPC
- DataFusion - query execution engine
- Polars - "DataFrames in Rust"
- Explorer - built on top of Polars
- Voltron data
- The Composable Codex
- Pyarrow - Arrow bindings for Python
Quotes
- I've been reading a lot about data pipelines
- What's so special about data pipelines?
- There's a lot of special tooling
- There's a lot of bad, bad tooling
- Less than optimal tooling
- Converging on something biggerlk
- He got me eventually
- All of your steps in one bucket
- What tools do you associate with data?
- I inherited a data pipeline
- BashReduce
- Iterate on the L and the T
- The modern data stack
- And then you demand more work
- No unnecessary copies
- Barely a copy
- Reconnecting with my Python roots
62 Episoden
Manage episode 392812462 series 2920782
Lars dove into data pipelines, and emerged bearing arrows and wishing for a lot fewer copies.
What is there to think about regarding data pipelines, what is interesting about them?
Which tools are out there, and why might you want to use them?
Why all this talk about making fewer copies of data?
What does Lars' current ideal pipeline look like, and where does Elixir fit in?
Links
- Matt Topol
- Apache Arrow
- Large language models
- Vector search
- BigQuery
- sed
- AWK
- jq
- Replacing Hadoop with bash - "Command-line Tools can be 235x Faster than your Hadoop Cluster"
- Hadoop
- MapReduce
- Unix pipes
- Directed acyclic graph
- tee - to "materialize inbetween states"
- Apache Beam
- Apache Spark
- Apache Flink
- Apache Pulsar
- Airbyte - shoves data between systems using connectors
- Cronjob
- Fivetran - Airbyte competitor
- Apache Airflow
- ETL - Extract, transform, load
- Designing data-intensive applications
- Stream processing
- Ephemerality
- Data lake
- Data warehouse
- The people's front of Judea
- DBT - SQL-SQL batch-work-thingy
- SQL with Jinja templates
- Snowflake - data warehouse thing
- Scala
- Broadway
- Oban - "robust job processing for Elixir"
- Dashbit
- pandas - Python data library
- APL
- Arrow flight
- GRPC
- DataFusion - query execution engine
- Polars - "DataFrames in Rust"
- Explorer - built on top of Polars
- Voltron data
- The Composable Codex
- Pyarrow - Arrow bindings for Python
Quotes
- I've been reading a lot about data pipelines
- What's so special about data pipelines?
- There's a lot of special tooling
- There's a lot of bad, bad tooling
- Less than optimal tooling
- Converging on something biggerlk
- He got me eventually
- All of your steps in one bucket
- What tools do you associate with data?
- I inherited a data pipeline
- BashReduce
- Iterate on the L and the T
- The modern data stack
- And then you demand more work
- No unnecessary copies
- Barely a copy
- Reconnecting with my Python roots
62 Episoden
Toate episoadele
×Willkommen auf Player FM!
Player FM scannt gerade das Web nach Podcasts mit hoher Qualität, die du genießen kannst. Es ist die beste Podcast-App und funktioniert auf Android, iPhone und im Web. Melde dich an, um Abos geräteübergreifend zu synchronisieren.