Charity Majors is the CTO of Honeycomb, a platform that helps engineers understand their own production systems through observability. Honeycomb is very different from traditional monitoring tools like Wavefront as it is built for data with high cardinality and high dimensionality, which can instantly speed up debugging of many problems.
NOTE: This episode has some explicit language.
We talk about observability, monitoring, building your own database for a particular use case, starting a developer tool startup, having the right oncall culture, getting to fifteen minute deployments and more.
Notes are italicized
05:00 - High cardinality and high dimensionality in Honeycomb. Data retention in Honeycomb - 60 days. Many monitoring systems, like Dropbox’s Vortex, downsample data in two weeks
13:00 - Observability driven development. The impact of deploying code within 15 minutes of it being merged in. Synchronous and asynchronous engineering workflows
19:00 - Setting up oncall rotations. What the size of a rotation should be
21:00 - How often should someone on a 24/7 oncall rotation be woken up? Once or twice. But there are exceptions. The impractical nature of some of Google SRE book’s “Being Oncall” chapter. Oncall for managers
31:00 - Why are monitoring tools so ubiquitous compared to observability tools?
36:00 - Observability & Tracing. What the future of observability infrastructure might look like
40:00 - What will the job of an SRE look like in the future? The split of roles in software engineering organizations in the future
43:00 - Shipping code faster makes engineers happier. How do you ensure your engineering organization is healthy, and the metrics to use. Learned helplessness in engineering organizations, and leadership failures
51:00 - Building internal tools in-house vs using external tools. The large impact that designers at Honeycomb have had on the product.
58:00 - The story of starting Honeycomb. Creating a “Minimum Lovable Product”. A description of Honeycomb internal architecture. Dealing with tail latencies.
71:00 - Continuous Deployment and releasing code quickly. Use calendly.com/charitym if you want to chat with Charity about continuous deployment best practices or anything else.