21,311 subscribers
Gehen Sie mit der App Player FM offline!
Podcasts, die es wert sind, gehört zu werden
GESPONSERT
Intro to Apache Kafka
Archivierte Serien ("Inaktiver Feed" status)
When? This feed was archived on March 24, 2016 19:45 (
Why? Inaktiver Feed status. Unsere Server waren nicht in der Lage einen gültigen Podcast-Feed für einen längeren Zeitraum zu erhalten.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 420453426 series 65608


We finally start talking about Apache Kafka! Also, Allen is getting acquainted with Aesop, Outlaw is killing clusters, and Joe was paying attention in drama class.
The full show notes are available on the website at https://www.codingblocks.net/episode235
News
- Atlanta Dev Con is coming up, on September 7th, 2024 (www.atldevcon.com)
Intro to Apache Kafka
What is it?
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Core capabilities
- High throughput – Deliver messages at network-limited throughput using a cluster of machines with latencies as low as 2ms.
- Scalable – Scale production clusters up to a thousand brokers, trillions of messages per day, petabytes of data, and hundreds of thousands of partitions. Elastically expand and contract storage and processing
- Permanent storage – Store streams of data safely in a distributed, durable, fault-tolerant cluster.
- High availability – Stretch clusters efficiently over availability zones or connect separate clusters across geographic regions.
Ecosystem
- Built-in stream processing – Process streams of events with joins, aggregations, filters, transformations, and more, using event-time and exactly-once processing.
- Connect to almost anything – Kafka’s out-of-the-box Connect interface integrates with hundreds of event sources and event sinks including Postgres, JMS, Elasticsearch, AWS S3, and more.
- Client libraries – Read, write, and process streams of events in a vast array of programming languages
- Large ecosystem of open source tools – Large ecosystem of open source tools: Leverage a vast array of community-driven tooling.
Trust and Ease of Use
- Mission critical – Support mission-critical use cases with guaranteed ordering, zero message loss, and efficient exactly-once processing.
- Trusted by thousands of organizations – Thousands of organizations use Kafka, from internet giants to car manufacturers to stock exchanges. More than 5 million unique lifetime downloads.
- Vast user community – Kafka is one of the five most active projects of the Apache Software Foundation, with hundreds of meetups around the world.
What is it?
- Getting data in real-time from event sources like databases, sensors, mobile devices, cloud services, applications, etc. in the form of streams of events. Those events are stored “durably” (in Kafka) for processing, either in real-time or retrospectively, and then routed to various destinations depending on your needs. It’s this continuous flow and processing of data that is known as “streaming data”
How can it be used? (some examples) - Processing payments and financial transactions in real-time
- Tracking automobiles and shipments in real time for logistical purposes
- Capture and analyze sensor data from IoT devices or other equipment
- To connect and share data from different divisions in a company
Apache Kafka as an event streaming platform?
- It contains three key capabilities that make it a complete streaming platform
- Can publish and subscribe to streams of events
- Can store streams of events durably and reliably for as long as necessary (infinitely if you have the storage)
- To process streams of events in real-time or retrospectively
- Can be deployed to bare metal, virtual machines or to containers on-prem or in the cloud
- Can be run self-managed or via various cloud providers as a managed service
How does Kafka work?
- A distributed system that’s composed of servers and clients that communicate using a highly performant TCP protocol
Servers
- Kafka runs as a cluster of one or more servers that can span multiple data centers or cloud regions
- Brokers – these are a portion of the servers that are the storage layer
- Kafka Connect – these are servers that constantly import and export data from existing systems in your infrastructure such as relational databases
- Kafka clusters are highly scalable and fault-tolerant
Clients
- Allows you to write distributed applications that allow to read, write and process streams of events in parallel that are fault-tolerant and scale
- These clients are available in many programming languages – both the ones provided by the core platform as well as 3rd party clients
Concepts
Events
- It’s a record of something that happened – also called a “record” in the documentation
- Has a key
- Has a value
- Has an event timestamp
- Can have additional metadata
Producers and Consumers
- Producers – these are the client applications that publish/write events to Kafka
- Consumers – these are the client applications that read/subscribe to events from Kafka
- Producers and consumers are completely decoupled from each other
Topics
- Events are stored in topics
- Topics are like folders on a file system – events would be the equivalent of files within that folder
- Topics are mutli-producer and multi-subscriber
- There can be zero, one or many producers or subscribers to a topic that write to or read from that topic respectively
- Unlike many message queuing systems, these events can be read from as many times as necessary because they are not deleted after being consumed
- Deleting of messages is handled on a per topic configuration that determines how long events are retained
- Kafka’s performance is not dependent on the amount of data nor the duration of time data is stored, so storing for longer periods is not a problem
Resources we Like
- Why Strimzi moved away from statefulsets (github.com)
Tip of the Week
- Flipper Zero is a multi-functional interaction device mixed with a Tamagotchi. It has a variety of IO options built in, RFID, NFC, GPIO, Bluetooth, USB, and a variety of low-voltage pins like you’d see on an Arduino. Using the device upgrades the dolphin, encouraging you to try new things…and it’s all open-source with a vibrant community behind it. (shop.flipperzero.one)
- Kafka Tui?! Kaskade is a cool-looking Kafka TUI that has got to be better than using the scripts in the build folder that comes with Kafka. (github.com/sauljabin/kaskade)
- Microstudio is a web-based integrated development environment for making simple games and it’s open source! (microstudio.dev)
- Bing Copilot has a number of useful prompts (bing.com)
- Designer (photos)
- Vacation Planner
- Cooking assistant
- Fitness trainer
- Sharing metrics between projects in GCP, Azure, and maybe AWS???
- GCP (projects): (cloud.google.com)
- Azure (resource groups or subscriptions): (learn.microsoft.com)
- AWS (multiple accounts): (docs.aws.amazon.com)
- Checking wifi in your home – Android Only (play.google.com)
- Powering POE without running cables (Amazon)
- Omada specific – cloud vs local hardware (Amazon)
- How to “shutdown” a Kafka cluster in Kubernetes:
kubectl annotate kafka my-kafka-cluster strimzi.io/pause-reconciliation="true" --context=my-context --namespace=my-namespace
kubectl delete strimzipodsets my-kafka-cluster --context=my-context --namespace=my-namespace
- Then to “restart” the cluster:
kubectl annotate kafka my-kafka-cluster strimzi.io/pause-reconciliation- --context=my-context --namespace=my-namespace
305 Episoden
Archivierte Serien ("Inaktiver Feed" status)
When?
This feed was archived on March 24, 2016 19:45 (
Why? Inaktiver Feed status. Unsere Server waren nicht in der Lage einen gültigen Podcast-Feed für einen längeren Zeitraum zu erhalten.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 420453426 series 65608


We finally start talking about Apache Kafka! Also, Allen is getting acquainted with Aesop, Outlaw is killing clusters, and Joe was paying attention in drama class.
The full show notes are available on the website at https://www.codingblocks.net/episode235
News
- Atlanta Dev Con is coming up, on September 7th, 2024 (www.atldevcon.com)
Intro to Apache Kafka
What is it?
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Core capabilities
- High throughput – Deliver messages at network-limited throughput using a cluster of machines with latencies as low as 2ms.
- Scalable – Scale production clusters up to a thousand brokers, trillions of messages per day, petabytes of data, and hundreds of thousands of partitions. Elastically expand and contract storage and processing
- Permanent storage – Store streams of data safely in a distributed, durable, fault-tolerant cluster.
- High availability – Stretch clusters efficiently over availability zones or connect separate clusters across geographic regions.
Ecosystem
- Built-in stream processing – Process streams of events with joins, aggregations, filters, transformations, and more, using event-time and exactly-once processing.
- Connect to almost anything – Kafka’s out-of-the-box Connect interface integrates with hundreds of event sources and event sinks including Postgres, JMS, Elasticsearch, AWS S3, and more.
- Client libraries – Read, write, and process streams of events in a vast array of programming languages
- Large ecosystem of open source tools – Large ecosystem of open source tools: Leverage a vast array of community-driven tooling.
Trust and Ease of Use
- Mission critical – Support mission-critical use cases with guaranteed ordering, zero message loss, and efficient exactly-once processing.
- Trusted by thousands of organizations – Thousands of organizations use Kafka, from internet giants to car manufacturers to stock exchanges. More than 5 million unique lifetime downloads.
- Vast user community – Kafka is one of the five most active projects of the Apache Software Foundation, with hundreds of meetups around the world.
What is it?
- Getting data in real-time from event sources like databases, sensors, mobile devices, cloud services, applications, etc. in the form of streams of events. Those events are stored “durably” (in Kafka) for processing, either in real-time or retrospectively, and then routed to various destinations depending on your needs. It’s this continuous flow and processing of data that is known as “streaming data”
How can it be used? (some examples) - Processing payments and financial transactions in real-time
- Tracking automobiles and shipments in real time for logistical purposes
- Capture and analyze sensor data from IoT devices or other equipment
- To connect and share data from different divisions in a company
Apache Kafka as an event streaming platform?
- It contains three key capabilities that make it a complete streaming platform
- Can publish and subscribe to streams of events
- Can store streams of events durably and reliably for as long as necessary (infinitely if you have the storage)
- To process streams of events in real-time or retrospectively
- Can be deployed to bare metal, virtual machines or to containers on-prem or in the cloud
- Can be run self-managed or via various cloud providers as a managed service
How does Kafka work?
- A distributed system that’s composed of servers and clients that communicate using a highly performant TCP protocol
Servers
- Kafka runs as a cluster of one or more servers that can span multiple data centers or cloud regions
- Brokers – these are a portion of the servers that are the storage layer
- Kafka Connect – these are servers that constantly import and export data from existing systems in your infrastructure such as relational databases
- Kafka clusters are highly scalable and fault-tolerant
Clients
- Allows you to write distributed applications that allow to read, write and process streams of events in parallel that are fault-tolerant and scale
- These clients are available in many programming languages – both the ones provided by the core platform as well as 3rd party clients
Concepts
Events
- It’s a record of something that happened – also called a “record” in the documentation
- Has a key
- Has a value
- Has an event timestamp
- Can have additional metadata
Producers and Consumers
- Producers – these are the client applications that publish/write events to Kafka
- Consumers – these are the client applications that read/subscribe to events from Kafka
- Producers and consumers are completely decoupled from each other
Topics
- Events are stored in topics
- Topics are like folders on a file system – events would be the equivalent of files within that folder
- Topics are mutli-producer and multi-subscriber
- There can be zero, one or many producers or subscribers to a topic that write to or read from that topic respectively
- Unlike many message queuing systems, these events can be read from as many times as necessary because they are not deleted after being consumed
- Deleting of messages is handled on a per topic configuration that determines how long events are retained
- Kafka’s performance is not dependent on the amount of data nor the duration of time data is stored, so storing for longer periods is not a problem
Resources we Like
- Why Strimzi moved away from statefulsets (github.com)
Tip of the Week
- Flipper Zero is a multi-functional interaction device mixed with a Tamagotchi. It has a variety of IO options built in, RFID, NFC, GPIO, Bluetooth, USB, and a variety of low-voltage pins like you’d see on an Arduino. Using the device upgrades the dolphin, encouraging you to try new things…and it’s all open-source with a vibrant community behind it. (shop.flipperzero.one)
- Kafka Tui?! Kaskade is a cool-looking Kafka TUI that has got to be better than using the scripts in the build folder that comes with Kafka. (github.com/sauljabin/kaskade)
- Microstudio is a web-based integrated development environment for making simple games and it’s open source! (microstudio.dev)
- Bing Copilot has a number of useful prompts (bing.com)
- Designer (photos)
- Vacation Planner
- Cooking assistant
- Fitness trainer
- Sharing metrics between projects in GCP, Azure, and maybe AWS???
- GCP (projects): (cloud.google.com)
- Azure (resource groups or subscriptions): (learn.microsoft.com)
- AWS (multiple accounts): (docs.aws.amazon.com)
- Checking wifi in your home – Android Only (play.google.com)
- Powering POE without running cables (Amazon)
- Omada specific – cloud vs local hardware (Amazon)
- How to “shutdown” a Kafka cluster in Kubernetes:
kubectl annotate kafka my-kafka-cluster strimzi.io/pause-reconciliation="true" --context=my-context --namespace=my-namespace
kubectl delete strimzipodsets my-kafka-cluster --context=my-context --namespace=my-namespace
- Then to “restart” the cluster:
kubectl annotate kafka my-kafka-cluster strimzi.io/pause-reconciliation- --context=my-context --namespace=my-namespace
305 Episoden
Todos os episódios
×
1 Things to Know when Considering Multi-Tenant or Multi-Threaded Applications

1 Two Water Coolers Walk Into a Bar…

1 How did We Even Arrive Here?

1 AI, Blank Pages, and Client Libraries…oh my!

1 Alternatives to Administering and Running Apache Kafka

1 Nuts and Bolts of Apache Kafka

1 StackOverflow AI Disagreements, Kotlin Coroutines and More

1 Llama 3 is Here, Spending Time on Environmental Setup and More

1 Ktor, Logging Ideas, and Plugin Safety

1 Importance of Data Structures, Bad Documentation and Comments and More

1 Multi-Value, Spatial, and Event Store Databases

1 Overview of Object Oriented, Wide Column, and Vector Databases


1 Picking the Right Database Type – Tougher than You Think


1 There is still cool stuff on the internet


1 Reflecting on 2023 and Looking Forward to 2024


1 Gartner Top Strategic Technology Trends 2024


1 2023 Holiday Season Developer Shopping List


1 Gartner and your Life Partners


1 Open Telemetry – Instrumentation and Metrics


1 Keyboards, Cloud Costs, Static Analysis, and Philosophy


1 Code Confidence using NASA’s Ten Simple Rules




1 Tracing Specifics – Know your System with OpenTelmetry




1 Software in Audio and How to Lead


1 Team Leadership, TUIs, and AI Lawsuits


1 Better Application Management with Custom Apps


1 Errors vs Exceptions, Reddit Rebels, and the 2023 StackOverflow Survey


1 Easy and Cheap AI for Developers, Reddit API Changes and Sherlocking 1:55:36


1 Gitlab vs Github, AI vs Microservices


1 Supporting Your Code, README vs Wiki and Test Coverage 1:16:35




1 Understanding Serial Transactions for Databases like Redis


1 Designing Data-Intensive Applications – Lost Updates and Write Skew


1 ChatGPT and the Future of Everything


1 Designing Data-Intensive Applications – Weak Isolation and Snapshotting


1 Designing Data-Intensive Applications – Multi-Object Transactions


1 Designing Data-Intensive Applications – Transactions






1 Job Hopping and Favorite Dev Books


1 Technical Challenges of Scale at Twitter
Willkommen auf Player FM!
Player FM scannt gerade das Web nach Podcasts mit hoher Qualität, die du genießen kannst. Es ist die beste Podcast-App und funktioniert auf Android, iPhone und im Web. Melde dich an, um Abos geräteübergreifend zu synchronisieren.