Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Blameless Staff SRE Amy Tobey. Amy has been an SRE and DevOps practitioner since before those names existed. She cares deeply about her community of SREs and wants to take what she’s learned over the 20+ years of her career to help others.
In our very first episode, Amy chats with Netflix software engineer Lorin Hochstein. Lorin is uniquely focused on deep incident analysis and narratives, and was drawn into the storytelling approach due to his fascination with how complex systems fail or behave in unexpected ways.
He realized that the valuable data from Netflix's complex systems was being under-utilized due to shallow post-incident retrospectives and lack of reporting on non-streaming incidents. To combat this, Lorin pioneered the “Oops” write-ups, fostering continuous learning within the organization and creating a deeper level of psychological safety by focusing on the individual incident participant’s narrative of what happened.