Player FM - Internet Radio Done Right
13 subscribers
Checked 8d ago
Vor vier Jahren hinzugefügt
Inhalt bereitgestellt von Stephen Townshend. Alle Podcast-Inhalte, einschließlich Episoden, Grafiken und Podcast-Beschreibungen, werden direkt von Stephen Townshend oder seinem Podcast-Plattformpartner hochgeladen und bereitgestellt. Wenn Sie glauben, dass jemand Ihr urheberrechtlich geschütztes Werk ohne Ihre Erlaubnis nutzt, können Sie dem hier beschriebenen Verfahren folgen https://de.player.fm/legal.
Player FM - Podcast-App
Gehen Sie mit der App Player FM offline!
Gehen Sie mit der App Player FM offline!
Slight Reliability
Alle als (un)gespielt markieren ...
Manage series 2917773
Inhalt bereitgestellt von Stephen Townshend. Alle Podcast-Inhalte, einschließlich Episoden, Grafiken und Podcast-Beschreibungen, werden direkt von Stephen Townshend oder seinem Podcast-Plattformpartner hochgeladen und bereitgestellt. Wenn Sie glauben, dass jemand Ihr urheberrechtlich geschütztes Werk ohne Ihre Erlaubnis nutzt, können Sie dem hier beschriebenen Verfahren folgen https://de.player.fm/legal.
Learning SRE, one day at a time.
…
continue reading
93 Episoden
Alle als (un)gespielt markieren ...
Manage series 2917773
Inhalt bereitgestellt von Stephen Townshend. Alle Podcast-Inhalte, einschließlich Episoden, Grafiken und Podcast-Beschreibungen, werden direkt von Stephen Townshend oder seinem Podcast-Plattformpartner hochgeladen und bereitgestellt. Wenn Sie glauben, dass jemand Ihr urheberrechtlich geschütztes Werk ohne Ihre Erlaubnis nutzt, können Sie dem hier beschriebenen Verfahren folgen https://de.player.fm/legal.
Learning SRE, one day at a time.
…
continue reading
93 Episoden
Alle Folgen
×In this episode I explore the challenges of achieving unified observability when integrating with SaaS products and services. I cover: 🌊 The new wave of mega-complex SaaS ⚗️ Challenges integrating SaaS with our observability pipelines 👩🦯 How the lack of SaaS autonomy limits the effectiveness of OpenTelemetry 💰 Paying twice to ingest, store, and search telemetry 📈 Monitoring and predicting SaaS observability costs ...and much more. Shout out to Mark Chiavaroli (and apologies for mispronouncing your surname multiple times), Damian Sharrock, and Reece Hewitt for bouncing ideas on this topic. The 'Is it observable?' series can be found here: https://isitobservable.io/ ...and you can find Henrik on LinkedIn: https://www.linkedin.com/in/hrexed/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Bluesky: https://bsky.app/profile/slightreliability.bsky.social YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre…
S
Slight Reliability
This week I check in and give an update on work, life, and my attempts at bringing to life SRE practices in the world of non-production environment management. You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…
S
Slight Reliability
This week I'm joined by Karanveer Anand, SRE Technical Program Manager at Google to discuss blameless post-mortems. We cover: 🦅 The recent Crowdstrike outage and their public post-mortem 🚑 When do we do a blameless post-mortem? 😕 How do we do a blameless post-mortem? ✅ How do we make sure action items are followed through? 📰 The power of learning from post-mortems created by other teams and orgs ...and much more. You can find Karanveer on LinkedIn: https://www.linkedin.com/in/karanveer/ You can find Crowdstrike's preliminary post incident report here: https://www.crowdstrike.com/blog/falcon-content-update-preliminary-post-incident-report/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…
This week Zach Michel from https://middleware.io/ and I discuss the state of OpenTelemetry and what it means to adopt it. We cover: 🌩️ Achieving observability in a SaaS world 🥫 Context propagation - the magic sauce of OTEL 🚪 The telemetry gateway concept and leveraging the OTEL collector 🪵 The state of OpenTelemetry logging 🫂 Making use of the OpenTelemetry community ...and much more. You can find Zach on LinkedIn: https://www.linkedin.com/in/zamichel/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ For a list of ways to interact with the OpenTelemetry community go to: https://opentelemetry.io/community/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…
S
Slight Reliability
In Episode 80 Niall Murphy talked about the need for SREs to be better at articulating the value of our work. In this episode I'm joined by ex-Googler and Engineering Director (SRE) at Culture Amp Artem Yakimenko about how we might achieve this. We discuss both quantifiable and qualitative approaches including leveraging the untapped data in support tickets, customer sentiment and rankings, the relationship between finance and performance, the link between user design and performance, and so much more. Books mentioned in the episode: 100 Things Every Designer Needs to Know About People By Susan Weinschenk https://www.amazon.com.au/Things-Every-Designer-Needs-People/dp/0321767535 You can find Artem on LinkedIn: https://www.linkedin.com/in/temikus/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…
In the world of SRE we constantly talk about defining SLOs, but what about evolving them over time? This week I chat with SRE Tech Lead Dom Finn about just that. We cover the relationship between reliability and user analytics, latency classes as a way to speak SLOs with business stakeholders, the role of NFRs and how the thresholds differ from SLOs, and much more. Books mentioned in the episode: The Beginning of Infinity: Explanations That Transform the World By David Deutch https://www.amazon.com.au/Beginning-Infinity-Explanations-Transform-World/dp/0143121359 Turn The Ship Around! By David Marquette https://davidmarquet.com/turn-the-ship-around-book/ You can find Dom on LinkedIn: https://www.linkedin.com/in/dom-finn/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…
This week I talk about the impact of SaaS-first technology strategies on the work of an SRE. I pose questions about observability, ownership, on-call, and how much control we have over reliability. You can find the Bleeding Tech blog on Medium: https://medium.com/@stownshend You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre…
S
Slight Reliability
This week I chat with Dan Slimmon about applying the approach doctors use to treat patient symptoms during incident response. You can find Dan's blog at https://blog.danslimmon.com/ or connect with him on LinkedIn here: https://www.linkedin.com/in/danslimmon/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…
S
Slight Reliability
This week I hear about all things Kubernetes from Komodor CTO and co-founder Itiel Shwartz. We chat about the promise that was made when Kubernetes first entered the industry, the challenge of getting developers engaged and capable of working in Kubernetes, my hate/hate relationship with Helm but its important contribution to the Kubernetes project, Kubernetes observability, and so much more. You can find the Kubernetes for Humans podcast here: https://komodor.com/blog/the-kubernetes-for-humans-podcast/ Or find out more about Komodor here: https://komodor.com/ Or find Itiel on LinkedIn: https://www.linkedin.com/in/itiel-shwartz-18542853/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…
This week I sit down and have a discussion with Amin Astaneh (from Certo Modo) about CI/CD. We cover the power of the standard change as a way to navigate ITIL while still implementing DevOps practices, what to monitor to make your CI/CD observable, single piece flow, testing in production, and so much more. You can find Amin on his company website https://certomodo.io , LinkedIn: https://www.linkedin.com/in/aminastaneh/ and Twitter: https://twitter.com/aastaneh You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…
S
Slight Reliability
"Environment issues are just incidents that happened to occur in a non-production environment"... so why do we treat them so differently? In this first episode of the 2024 season I reflect on how we handle incidents in non-prod environments. (Note: Had a few issues with noise suppression in OBS Studio cutting off the start of some words, will sort it for the next episode) You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre…
This week I speak with co-author of the original SRE book + the SRE workbook, and renowned speaker Niall Murphy. We chat about the state of SRE in the current macro-economic climate and how we're not yet doing a very good job at articulating the value of SRE to leaders, the relationship that velocity and reliability have, the value of new features versus reliability improvements, and *much* more. You can find Niall at: LinkedIn: https://www.linkedin.com/in/niallm/ X: https://twitter.com/niallm Website: https://relyabilit.ie/ (and his company Stanza: https://www.stanza.systems/ ) You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ X: https://twitter.com/the_kiwi_sre Instagram: https://www.instagram.com/slight_reliability/…
S
Slight Reliability
Paige Cruz (from Chronosphere) is back. This week we discuss sampling. What is sampling? Why do it? What kinds of sampling are there? You can check out Chronosphere's cloud native observability platform here: https://chronosphere.io/ You can find Paige on: LinkedIn: https://www.linkedin.com/in/paigerduty/ X: https://twitter.com/paigerduty You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ X: https://twitter.com/the_kiwi_sre Instagram: https://www.instagram.com/slight_reliability/…
S
Slight Reliability
This week Valeska Victoria returns to share some of her experiences working as an SRE at eBay. We look at the cascading effect of production issues in complex integrated environments (how there's often no single root cause), developer literacy of how infrastructure works, the importance of ownership and accountability of reliability, and much more. You can find Valeska on: LinkedIn: https://www.linkedin.com/in/valeska-victoria/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ X: https://twitter.com/the_kiwi_sre Instagram: https://www.instagram.com/slight_reliability/…
This week I chat with Ankit Jain from aviator.co about developer experience. We define developer experience and developer productivity, and how this applies to SRE. We discuss the growing expectation on developers and how this leads to frustration and burnout. We also explore how to measure developer experience and how to start working to make improvements. You can check out Aviator's developer experience platform here: https://www.aviator.co/ You can find Ankit on: LinkedIn: https://www.linkedin.com/in/ankitjaindce/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ X: https://twitter.com/the_kiwi_sre Instagram: https://www.instagram.com/slight_reliability/…
Willkommen auf Player FM!
Player FM scannt gerade das Web nach Podcasts mit hoher Qualität, die du genießen kannst. Es ist die beste Podcast-App und funktioniert auf Android, iPhone und im Web. Melde dich an, um Abos geräteübergreifend zu synchronisieren.