Gehen Sie mit der App Player FM offline!
Incidents & Operations with Dan Slimmon
Manage episode 433608422 series 2814917
In this episode, Adam welcomes Dan Slimmon, an experienced Site Reliability Engineer (SRE) to discuss aspects of incident response and troubleshooting in software engineering. Dan explains his methodology for clinical troubleshooting, the importance of maintaining a common mental model, and techniques for leading effective incident response efforts. They also delve into the value of continuous ops reviews and ongoing mental model updates to prevent issues, emphasizing the need for structured processes and effective communication.
Want more?
- 🚀 New listener? Start with the introduction.
- 🎁 Enter the FREE giveaway for a copy of "Release It!"
- 🧭 Get the Small Batches Way guide to software delivery excellence
- 🥋 Software Kaizen: My One-on-One System for Engineering Leadership
- 🧑🎓 Dan's course on leading incidents (Code SMALLBATCHES24 for 24% off!)
Chapters
- (00:00) - Incidents & Operations
- (01:14) - Guest Welcome
- (01:40) - Dan's Career Journey
- (02:33) - Evolution of Tech Stacks
- (04:59) - Clinical Troubleshooting Explained
- (11:53) - Incident Response Fundamentals
- (17:41) - Effective Communication in Incidents
- (26:09) - Training for Incident Response
- (33:22) - The Essence of Incident Response
- (33:53) - Balancing Short-Term and Long-Term Fixes
- (35:01) - The Firefighting Analogy in Software Incidents
- (37:11) - Postmortems: Learning from Incidents
- (42:14) - Building a Shared Mental Model
- (42:41) - Looking for Trouble: Proactive System Monitoring
- (47:59) - Ops Reviews: Continuous Improvement
- (54:37) - The Importance of Closing the Feedback Loop
- (59:40) - Final Thoughts and Resources
120 Episoden
Manage episode 433608422 series 2814917
In this episode, Adam welcomes Dan Slimmon, an experienced Site Reliability Engineer (SRE) to discuss aspects of incident response and troubleshooting in software engineering. Dan explains his methodology for clinical troubleshooting, the importance of maintaining a common mental model, and techniques for leading effective incident response efforts. They also delve into the value of continuous ops reviews and ongoing mental model updates to prevent issues, emphasizing the need for structured processes and effective communication.
Want more?
- 🚀 New listener? Start with the introduction.
- 🎁 Enter the FREE giveaway for a copy of "Release It!"
- 🧭 Get the Small Batches Way guide to software delivery excellence
- 🥋 Software Kaizen: My One-on-One System for Engineering Leadership
- 🧑🎓 Dan's course on leading incidents (Code SMALLBATCHES24 for 24% off!)
Chapters
- (00:00) - Incidents & Operations
- (01:14) - Guest Welcome
- (01:40) - Dan's Career Journey
- (02:33) - Evolution of Tech Stacks
- (04:59) - Clinical Troubleshooting Explained
- (11:53) - Incident Response Fundamentals
- (17:41) - Effective Communication in Incidents
- (26:09) - Training for Incident Response
- (33:22) - The Essence of Incident Response
- (33:53) - Balancing Short-Term and Long-Term Fixes
- (35:01) - The Firefighting Analogy in Software Incidents
- (37:11) - Postmortems: Learning from Incidents
- (42:14) - Building a Shared Mental Model
- (42:41) - Looking for Trouble: Proactive System Monitoring
- (47:59) - Ops Reviews: Continuous Improvement
- (54:37) - The Importance of Closing the Feedback Loop
- (59:40) - Final Thoughts and Resources
120 Episoden
Alle Folgen
×Willkommen auf Player FM!
Player FM scannt gerade das Web nach Podcasts mit hoher Qualität, die du genießen kannst. Es ist die beste Podcast-App und funktioniert auf Android, iPhone und im Web. Melde dich an, um Abos geräteübergreifend zu synchronisieren.