21,311 subscribers
Gehen Sie mit der App Player FM offline!
Podcasts, die es wert sind, gehört zu werden
GESPONSERT


Nuts and Bolts of Apache Kafka
Archivierte Serien ("Inaktiver Feed" status)
When? This feed was archived on March 24, 2016 19:45 (
Why? Inaktiver Feed status. Unsere Server waren nicht in der Lage einen gültigen Podcast-Feed für einen längeren Zeitraum zu erhalten.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 422750684 series 65608


Topics, Partitions, and APIs oh my! This episode we’re getting further into how Apache Kafka works and its use cases. Also, Allen is staying dry, Joe goes for broke, and Michael (eventually) gets on the right page.
The full show notes are available on the website at https://www.codingblocks.net/episode236
News
- Thanks for the reviews! angingjellies and Nick Brooker
- Please leave us a review! (/review)
- Atlanta Dev Con is coming up, on September 7th, 2024 (www.atldevcon.com)
Kafka Topics
- They are partitioned – this means they are distributed (or can be) across multiple Kafka brokers into “buckets”
- New events written to Kafka are appended to partitions
- The distribution of data across brokers is what allows Kafka to scale so well as data can be written to and read from many brokers simultaneously
- Events with the same key are written to the same partition as the original event
- Kafka guarantees reads of events within a partition are always read in the order that they were written
- For fault tolerance and high availability, topics can be replicated…even across regions and data centers
- NOTE: If you’re using a cloud provider, know that this can be very costly as you pay for inbound and outbound traffic across regions and availability zones
- Typical replication configurations for production setups are 3 replicas
Kafka APIS
- Admin API – used for managing and inspecting topics, brokers, and other Kafka objects
- Producer API – used to write events to Kafka topics
- Consumer API – used to read data from Kafka topics
- Kafka Streams API – the ability to implement stream processing applications/microservices. Some of the key functionality includes functions for transformations, stateful operations like aggregations, joins, windowing, and more
- In the Kafka streams world, these transformations and aggregations are typically written to other topics (in from one topic, out to one or more other topics)
- Kafka Connect API – allows for the use of reusable import and export connectors that usually connect external systems. These connectors allow you to gather data from an external system (like a database using CDC) and write that data to Kafka. Then you could have another connector that could push that data to another system OR it could be used for transforming data in your streams application
- These connectors are referred to as Sources and Sinks in the connector portfolio (confluent.io)
- Source – gets data from an external system and writes it to a Kafka topic
- Sink – pushes data to an external system from a Kafka topic
Use Cases
- Message queue – usually talking about replacing something like ActiveMQ or RabbitMQ
** Message brokers are often used for responsive types of processing, decoupling systems, etc. – Kafka is usually a great alternative that scales, generally has faster throughput, and offers more functionality - Website activity tracking – this was one of the very first use cases for Kafka – the ability to rebuild user actions by recording all the user activities as events
- How and why Kafka was developed (LinkedIn)
- Typically different activity types would be written to different topics – like web page interactions to one topic and searches to another
- Metrics – aggregating statistics from distributed applications
- Log aggregation – some use Kafka for storage of event logs rather than using something like HDFS or a file server or cloud storage – but why? Because using Kafka for the event storage abstracts away the events from the files
- Stream processing – taking events in and further enriching those events and publishing them to new topics
- Event sourcing – using Kafka to store state changes from an application that are used to replay the current state of an object or system
- Commit log – using Kafka as an external commit log is a way for synchronizing data between distributed systems, or help rebuild the state in a failed system
Tip of the Week
- Rémi Gallego is a music producer who makes music under a variety of names like The Algorithm and Boucle Infini, almost all of it is instrumental Synthwave with a hard-rock edge. They also make a lot of video game music, including 2 of my favorite game soundtracks of all time “The Last Spell” and “Hell is for Demons” (YouTube)
- Did you know that the Kubernetes-focused TUI we’ve raved about before can be used to look up information about other things as well, like :helm and :events. Events is particularly useful for figuring out mysteries. You can see all the “resources” available to you with “?”. You might be surprised at everything you see (pop-eye, x-ray, and monitoring)
- WarpStream is an S3 backed, API compliant Kafka Alternative. Thanks MikeRg! (warpstream.com)
- Cloudflare’s trillion message Kafka setup, thanks Mikerg! (blog.bytebytego.com)
- Want the power and flexibility of jq, but for yaml? Try yq! (gitbook.io)
- Zenith is terminal graphical metrics for your *nix system written in Rust, thanks MikeRg! (github.com)
- 8 Big (O)Notation Every Developer should Know (medium.com)
- Another Git cheat sheet (wizardzines.com)
305 Episoden
Archivierte Serien ("Inaktiver Feed" status)
When?
This feed was archived on March 24, 2016 19:45 (
Why? Inaktiver Feed status. Unsere Server waren nicht in der Lage einen gültigen Podcast-Feed für einen längeren Zeitraum zu erhalten.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 422750684 series 65608


Topics, Partitions, and APIs oh my! This episode we’re getting further into how Apache Kafka works and its use cases. Also, Allen is staying dry, Joe goes for broke, and Michael (eventually) gets on the right page.
The full show notes are available on the website at https://www.codingblocks.net/episode236
News
- Thanks for the reviews! angingjellies and Nick Brooker
- Please leave us a review! (/review)
- Atlanta Dev Con is coming up, on September 7th, 2024 (www.atldevcon.com)
Kafka Topics
- They are partitioned – this means they are distributed (or can be) across multiple Kafka brokers into “buckets”
- New events written to Kafka are appended to partitions
- The distribution of data across brokers is what allows Kafka to scale so well as data can be written to and read from many brokers simultaneously
- Events with the same key are written to the same partition as the original event
- Kafka guarantees reads of events within a partition are always read in the order that they were written
- For fault tolerance and high availability, topics can be replicated…even across regions and data centers
- NOTE: If you’re using a cloud provider, know that this can be very costly as you pay for inbound and outbound traffic across regions and availability zones
- Typical replication configurations for production setups are 3 replicas
Kafka APIS
- Admin API – used for managing and inspecting topics, brokers, and other Kafka objects
- Producer API – used to write events to Kafka topics
- Consumer API – used to read data from Kafka topics
- Kafka Streams API – the ability to implement stream processing applications/microservices. Some of the key functionality includes functions for transformations, stateful operations like aggregations, joins, windowing, and more
- In the Kafka streams world, these transformations and aggregations are typically written to other topics (in from one topic, out to one or more other topics)
- Kafka Connect API – allows for the use of reusable import and export connectors that usually connect external systems. These connectors allow you to gather data from an external system (like a database using CDC) and write that data to Kafka. Then you could have another connector that could push that data to another system OR it could be used for transforming data in your streams application
- These connectors are referred to as Sources and Sinks in the connector portfolio (confluent.io)
- Source – gets data from an external system and writes it to a Kafka topic
- Sink – pushes data to an external system from a Kafka topic
Use Cases
- Message queue – usually talking about replacing something like ActiveMQ or RabbitMQ
** Message brokers are often used for responsive types of processing, decoupling systems, etc. – Kafka is usually a great alternative that scales, generally has faster throughput, and offers more functionality - Website activity tracking – this was one of the very first use cases for Kafka – the ability to rebuild user actions by recording all the user activities as events
- How and why Kafka was developed (LinkedIn)
- Typically different activity types would be written to different topics – like web page interactions to one topic and searches to another
- Metrics – aggregating statistics from distributed applications
- Log aggregation – some use Kafka for storage of event logs rather than using something like HDFS or a file server or cloud storage – but why? Because using Kafka for the event storage abstracts away the events from the files
- Stream processing – taking events in and further enriching those events and publishing them to new topics
- Event sourcing – using Kafka to store state changes from an application that are used to replay the current state of an object or system
- Commit log – using Kafka as an external commit log is a way for synchronizing data between distributed systems, or help rebuild the state in a failed system
Tip of the Week
- Rémi Gallego is a music producer who makes music under a variety of names like The Algorithm and Boucle Infini, almost all of it is instrumental Synthwave with a hard-rock edge. They also make a lot of video game music, including 2 of my favorite game soundtracks of all time “The Last Spell” and “Hell is for Demons” (YouTube)
- Did you know that the Kubernetes-focused TUI we’ve raved about before can be used to look up information about other things as well, like :helm and :events. Events is particularly useful for figuring out mysteries. You can see all the “resources” available to you with “?”. You might be surprised at everything you see (pop-eye, x-ray, and monitoring)
- WarpStream is an S3 backed, API compliant Kafka Alternative. Thanks MikeRg! (warpstream.com)
- Cloudflare’s trillion message Kafka setup, thanks Mikerg! (blog.bytebytego.com)
- Want the power and flexibility of jq, but for yaml? Try yq! (gitbook.io)
- Zenith is terminal graphical metrics for your *nix system written in Rust, thanks MikeRg! (github.com)
- 8 Big (O)Notation Every Developer should Know (medium.com)
- Another Git cheat sheet (wizardzines.com)
305 Episoden
Todos os episódios
×



1 Things to Know when Considering Multi-Tenant or Multi-Threaded Applications


1 Two Water Coolers Walk Into a Bar…


1 How did We Even Arrive Here?


1 AI, Blank Pages, and Client Libraries…oh my!


1 Alternatives to Administering and Running Apache Kafka


1 Nuts and Bolts of Apache Kafka




1 StackOverflow AI Disagreements, Kotlin Coroutines and More


1 Llama 3 is Here, Spending Time on Environmental Setup and More


1 Ktor, Logging Ideas, and Plugin Safety


1 Importance of Data Structures, Bad Documentation and Comments and More




1 Multi-Value, Spatial, and Event Store Databases


1 Overview of Object Oriented, Wide Column, and Vector Databases


1 Picking the Right Database Type – Tougher than You Think


1 There is still cool stuff on the internet


1 Reflecting on 2023 and Looking Forward to 2024


1 Gartner Top Strategic Technology Trends 2024


1 2023 Holiday Season Developer Shopping List


1 Gartner and your Life Partners


1 Open Telemetry – Instrumentation and Metrics


1 Keyboards, Cloud Costs, Static Analysis, and Philosophy


1 Code Confidence using NASA’s Ten Simple Rules




1 Tracing Specifics – Know your System with OpenTelmetry




1 Software in Audio and How to Lead


1 Team Leadership, TUIs, and AI Lawsuits


1 Better Application Management with Custom Apps


1 Errors vs Exceptions, Reddit Rebels, and the 2023 StackOverflow Survey


1 Easy and Cheap AI for Developers, Reddit API Changes and Sherlocking 1:55:36


1 Gitlab vs Github, AI vs Microservices


1 Supporting Your Code, README vs Wiki and Test Coverage 1:16:35




1 Understanding Serial Transactions for Databases like Redis


1 Designing Data-Intensive Applications – Lost Updates and Write Skew


1 ChatGPT and the Future of Everything


1 Designing Data-Intensive Applications – Weak Isolation and Snapshotting


1 Designing Data-Intensive Applications – Multi-Object Transactions


1 Designing Data-Intensive Applications – Transactions






1 Job Hopping and Favorite Dev Books


1 Technical Challenges of Scale at Twitter






1 Git from the Bottom Up – Reset, Stash, and Reflog


1 Git from the Bottom Up – The Index


1 Git from the Bottom Up – Rebasing


1 Git from the Bottom Up – Commits


1 Git from the Bottom Up – Blobs and Trees




1 Stack Overflow 2022 Survey Says …


1 Site Reliability Engineering – More Evolution of Automation


1 Site Reliability Engineering – Evolution of Automation


1 Site Reliability Engineering – (Still) Monitoring Distributed Systems


1 Site Reliability Engineering – Monitoring Distributed Systems


1 Site Reliability Engineering – Eliminating Toil


1 Site Reliability Engineering – Service Level Indicators, Objectives, and Agreements


1 Site Reliability Engineering – Embracing Risk


1 Software Reliability Engineering – Hope is not a strategy




1 Minimum Viable Continuous Delivery




1 PagerDuty’s Security Training for Engineers, The Dramatic Conclusion


1 PagerDuty’s Security Training for Engineers, Penultimate


1 PagerDuty’s Security Training for Engineers! Part Deux


1 PagerDuty’s Security Training for Engineers




1 Designing Data-Intensive Applications – Secondary Indexes, Rebalancing, Routing


1 Designing Data-Intensive Applications – Partitioning




1 Should You Speak at a Conference?


1 Transactions in Distributed Systems


1 Docker Licensing, Career and Coding Questions


1 Why Get Into Competitive Programming?


1 Are Microservices … for real?


1 2021 State of the Developer Ecosystem




1 Designing Data-Intensive Applications – Leaderless Replication


1 Designing Data-Intensive Applications – Multi-Leader Replication


1 Designing Data-Intensive Applications – Single Leader Replication










1 Specialize or Bounce Around?






1 Who Owns Open-Source Software?


1 Our Favorite Developer Tools of 2020




1 What is a Developer Game Jam?


1 The DevOps Handbook – Create Organizational Learning




1 The DevOps Handbook – Enable Daily Learning 1:52:56


1 The DevOps Handbook – The Value of A/B Testing 1:50:13


1 Is Kubernetes Programming? 1:42:15


1 The DevOps Handbook – Enabling Safe Deployments 1:36:58


1 The DevOps Handbook – Anticipating Problems 1:22:25


1 The DevOps Handbook – The Technical Practices of Feedback 1:51:11


1 The DevOps Handbook – Architecting for Low-Risk Releases 2:05:23


1 The DevOps Handbook – The Technical Practices of Flow 1:50:51


1 Survey Says … 2020 Stack Overflow Developer Survey 2:07:41


1 Google’s Engineering Practices – How to Navigate a Code Review 1:42:16


1 Google’s Engineering Practices – What to Look for in a Code Review 1:41:48


1 Google’s Engineering Practices – Code Review Standards 1:39:01


1 Big Data – How Far is Too Far? 1:51:10


1 Designing Data-Intensive Applications – To B-Tree or not to B-Tree 1:56:40


1 How to Work Remote Effectively 2:10:24


1 Designing Data-Intensive Applications – SSTables and LSM-Trees 1:38:11


1 Designing Data-Intensive Applications – Storage and Retrieval 2:15:58


1 Why Attend Developer Conferences and What were the Hot Topics at NDC London 2020? 1:16:32


1 Designing Data-Intensive Applications – Data Models: Query Languages 1:38:44


1 Designing Data-Intensive Applications – Data Models: Relationships 2:14:00


1 Designing Data-Intensive Applications – Data Models: Relational vs Document 1:53:53


1 Designing Data-Intensive Applications – Maintainability 2:06:42


1 Designing Data-Intensive Applications – Scalability 1:52:19


1 Designing Data-Intensive Applications – Reliability 2:02:01


1 Developer Shopping Spree 2019 2:32:52


1 DevOps: Job Title or Job Responsibility? 2:00:06


1 3factor app – Async Serverless 1:31:37


1 3factor app – Reliable Eventing 2:03:13


1 3factor app – Realtime GraphQL 2:21:42


1 The Pragmatic Programmer – How to Build Pragmatic Teams 2:07:45


1 The Pragmatic Programmer – How to use Exceptions 1:54:00


1 The Pragmatic Programmer – How to Generate Code 2:02:51


1 The Pragmatic Programmer – How to Debug 1:53:30


1 The Pragmatic Programmer – Know Thy Tools 2:03:32


1 The Pragmatic Programmer – How to Estimate 1:58:10


1 The Pragmatic Programmer – Tracer Bullets and Prototyping 1:45:34


1 The Pragmatic Programmer – Is Your Code Orthogonal? 2:01:23


1 The Pragmatic Programmer – The Evils of Duplication 1:46:04


1 The Pragmatic Programmer – Investing in Your Knowledge Portfolio 2:36:43


1 Should Your Web App be a Progressive Web App (PWA)? 1:58:28


1 The Second Date is Always Easier 1:38:01


1 Why Date-ing is Hard 1:46:45


1 What Should You Learn Next? 1:14:30


1 Tackling Tough Developer Questions 1:44:15




1 Data Structures – Heaps and Tries 1:28:06


1 Data Structures – (some) Trees 1:49:56


1 Data Structures – Hashtable vs Dictionary 1:49:08


1 Data Structures – Arrays and Array-ish 2:38:38


1 Data Structures – Primitives 2:42:57


1 Developer Shopping Spree 2:43:37


1 Azure Functions and CosmosDB from MS Ignite 1:07:08


1 How to Learn Programming Skills 1:12:19


1 Comparing Git Workflows 1:48:53


1 Does Big O Matter? 1:52:13


1 What is Algorithmic Complexity? 1:41:50


1 Thunder Talks 2:00:48


1 Lightning Talks 1:58:50


1 Graph Algorithms 1:15:18


1 Algorithms You Should Know 2:15:22


1 Search Driven Apps 2:18:37


1 Programmer Strengths and Weaknesses 1:50:19


1 Understanding Complexity Theory 1:51:15




1 Design Anti-Patterns: YoYo, The God Object and More 1:28:40


1 Deliberate Practice for Programmers 2:16:51


1 Clean Architecture – Are Microservices Truly Decoupled? 2:11:56


1 Clean Architecture – What is the Humble Object Pattern? 1:44:16


1 Clean Architecture – Make Your Architecture Scream 2:25:23


1 Clean Architecture – The Art of Drawing Lines 1:53:59


1 Clean Architecture – Keeping Your Options Open 2:19:01


1 Clean Architecture – How to Quantify Component Coupling 2:19:16


1 Clean Architecture – Components and Component Cohesion 1:58:09


1 How to Spend $2,500 on Developer Gear 1:58:31


1 Clean Architecture – Programming Paradigms 2:08:55


1 Clean Architecture – Fight for Architecture 1:37:37


1 Object Oriented Mistakes 1:55:47


1 Project Management Anti-patterns 2:05:06


1 Software Design Anti-patterns 2:04:17


1 Software Architecture – What is Supple Design?


1 Software Architecture – Explicit Constraints, Processes, Specification Pattern, and more 2:03:34


1 Software Architecture – Strategic Design and Domain Events 1:58:07


1 Software Architecture – Aggregate Roots, Factories, and Repositories 2:11:52


1 Software Architecture – The Domain in Domain Driven Design 1:41:33


1 How to Jumpstart Your Next App 1:27:28


1 Why Domain Driven Design 1:32:08


1 How We Badly Built Stuff 1:38:26


1 Clean Code – How to Build Maintainable Systems 1:58:21


1 Clean Code – How to Write Classes the Right Way 1:23:15


1 Clean Code – How to Write Amazing Unit Tests


1 Clean Code – Integrating with Third Party Libraries the Right Way 1:18:07


1 Clean Code – Error Handling 1:25:48


1 Clean Code – Objects vs Data Structures


1 Clean Code – Formatting Matters 2:18:34


1 Clean Code – Comments Are Lies 2:19:16


1 Clean Code – How to Write Amazing Functions


1 Clean Code – Writing Meaningful Names 1:51:36


1 Caching in the Application Framework


1 Caching Overview and Hardware 1:35:44


1 Stack Overflow Salaries and Landing the Job 2:39:15


1 Nulls, Procs, and Impostor Syndrome 1:40:46


1 Command, Repository and Mediator Design Patterns 2:08:04


1 Dev Talk: Django, VB vs C#, and Bash on Windows 1:52:43


1 How to be an Advanced Programmer 2:23:17


1 How to be an Intermediate Programmer 2:50:19


1 How to be a Programmer: Personal and Team Skills 2:26:16


1 Our Favorite Developer Tools for 2015 1:53:52


1 The Twelve Factor App: Dev/Prod Parity, Logs, and Admin Processes 1:40:25


1 The Twelve-Factor App: Port Binding, Concurrency, and Disposability 1:14:58


1 Toys for Developers 1:49:25


1 The Twelve-Factor App: Backing Services, Building and Releasing, Stateless Processes 1:22:32


1 The Twelve-Factor App: Codebase, Dependencies, and Config 1:13:38


1 Javascript Promises and Beyond 1:18:23


1 Design Patterns Part 4 – Adapter, Facade, and Memento 1:14:12


1 Hierarchical Data cont’d – Path Enumeration and Closure Tables 1:09:19


1 Hierarchical Data – Adjacency Lists and Nested Set Models 1:38:03


1 Your Questions Our Answers SYN-ACK with Packet Loss 1:34:44


1 Algorithms, Puzzles and the Technical Interview 1:26:21


1 ASP.NET 5 – It’s Basically Java 1:33:51


1 Delegate all the things! 1:28:32


1 Back to Basics – Encapsulation for Object Oriented Programming 1:07:22


1 Silverlighting through your College Enumeration 1:03:15


1 Our Favorite Tools 1:20:54


1 We’re Testing Your Patience… 1:26:25


1 Design Patterns – Iterators, Observers, and Chains, Oh My 1:03:48


1 Programmer Questions and Answers 1:06:40


1 Got Any Hot Stacks?! 59:01


1 Design Patterns Part 2 – Oh behave! 1:03:09


1 Static Analysis w/ NDepends – How good is your code? 1:23:25


1 Databases the SQL [see-kwuhl] 1:33:13


1 All Your Database Are Belong to Us 59:19


1 What programmer do you want to be? 1:08:16


1 Design Patterns Part 1 – You Create Me! 1:37:14


1 C# 6 and Roslyn – Pour Some Sugar On Me 1:33:10


1 Aspectacular with Vlad Hrybok – You down with AOP? 1:22:21


1 Accessories for Programmers 55:14


1 SOLID as a Rock! 1:03:47


1 There’s Something About LINQ 1:04:57


1 We Still Don’t Understand Open Source Licensing 47:56


1 OWASP and You! 1:07:27


1 Source Control Etiquette 56:28


1 Boxing and Unboxing in .NET 33:32


1 I is for Interface 45:24
Willkommen auf Player FM!
Player FM scannt gerade das Web nach Podcasts mit hoher Qualität, die du genießen kannst. Es ist die beste Podcast-App und funktioniert auf Android, iPhone und im Web. Melde dich an, um Abos geräteübergreifend zu synchronisieren.