ALTERNATE UNIVERSE DEV

The Bike Shed

302: Observability with Charity Majors

Tune in as Co-founder and CTO of Honeycomb, an observability platform, Charity Majors joins Chris to drop some knowlege bombs such as:

  • Thinking of observability as being about the unknown unknowns: Allowing for high cardinality, high dimensionality, ad hoc queries at any point in time.
  • Comparing instrumentation to a muscle: It's a habit that needs to be developed and fostered.
  • Sincere continuous deployment: 15 minutes or bust.

And bunches more, since y'all know you hear her name come up at least once during every other episode!

Transcript:

CHRIS: Hello, and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Chris Toomey. And this week Steph is taking a quick break, but while she's away, I was joined by a special guest, Charity Majors. Now, folks who've been listening to the show lately will know I've been mentioning one idea or another from Charity almost every episode these days. Charity's work spans from the deeply technical through to the deeply human. And across all of it, she brings such a wealth of experience in pragmatism while consistently providing grounded, actionable advice about how we can improve all aspects of our work.

And to give a bit more context for those who aren't as familiar with Charity's work, she is the co-founder and CTO of Honeycomb, which is an observability platform that we talk about more in the episode. Charity is also a prolific blogger, tweeter and speaker, and general leaver of digital breadcrumbs for the rest of us to hopefully follow. And Charity is also one of the hosts of the o11ycast podcast. That's observability, o11y podcast. And in fact, in the intro to the first o11ycast episode, Charity provides a beautiful summary of her approach to the varied work that we do. Quote, "I'm someone who's always been drawn to where the beautiful theory of computing meets the awkward, messy reality of actually trying to do things." And that quote rang so deeply true to me when I heard it and really encompassed what I see across the variety of work that Charity has shared with us. And frankly, I've been so impressed with the quality and quantity of wonderful content that Charity has shared over the years. I was really just thrilled to get the chance to sit down and talk with her directly. So without further ado, here's our conversation with Charity Majors. Thanks so much for joining us today, Charity.

CHARITY: Thanks for having me. It's great to be here.

CHRIS: As I've mentioned on many an episode, I've been following your work for a while now. And at this point, I would say that just about every Bike Shed episode has a reference to you and some piece of work that you have put out into the world, whether it be a tweet or a blog post, or a conference talk or something. So I'm so grateful for all the work that you put out into the world and for taking the time to chat with us today.

CHARITY: That's so exciting. Yay. I feel right at home here then. [chuckles]

CHRIS: Fantastic. Well, I want to dive in. I think it's sort of the core of some of the conversation that we'll be having, which is around instrumentation and observability, and observability as a newer, noveler form of how we think about this space. But to give a bit of context, I was hoping you might be able to give just the quick summary for anyone who might not be as familiar with observability as a concept and what that means now, and Honeycomb as a product and how it offers affordances around observability and pushes that envelope forward.

CHARITY: Yeah, I think of the observability as being about the unknown unknowns. For a long time, all of the complexity was really bound up in the app. You had the load balancer, you had the app, and the database. And all the complexity you could just attach to a debugger and step through it if you had to. But then we kind of blew up the app, the monolith, and now it's in services scattered to the winds, and you can't just trace it. And so observability is a way of passing that context along hop by hop so that you can actually slice and dice in real-time. And the hardest problem is not usually debugging the code. It's finding out wherein the system is the code that you need to debug.

And observability, if you accept my definition, which is it's about unknown unknowns, that you should be able to ask any question of your systems, understand any internal state just by observing it from the outside, well, then a lot of things proceed from that, in my opinion. Like, you need to be able to handle high cardinality, high dimensionality. You need to be able to string together a lot of these high cardinality dimensions. You need to... any kind of schema or indexing scheme in advance is verboten because you don't know what questions you're going to need to ask. And so there's a lot that flows from that definition; arbitrarily-wide structured data blobs is the source of truth, et cetera. But at its heart, it's just about the concepts, that our problems are getting harder and harder. We don't get paged to go, "Oh, that again? Oh, that again?"

CHRIS: [chuckles]

CHARITY: Ideally, we fix those things. But we still get paged. What the hell is this? It's about allowing engineers, empowering them in a reasonable amount of time to be in constant conversation with that code that's out there in the world because most problems honestly we never get paged about. They're too subtle until they snowball, and they pick up other problems. It's like a hairball under your couch until it gets so big and so impacting that it actually does alert someone. And then you just start picking up the rock and be like, oh God, what's that? Well, we've never understood this. And that's why ops has such a reputation for masochism. [chuckles]

CHRIS: Absolutely. There are so many little pieces in what you just said that really deeply resonate with me, although there is one facet of some of the way that you talk about observability that I find interesting. I'm someone who likes to cling to the perhaps unrealistic these days ideal of a monolith of what if we were to just keep everything in the same place and all the data lived together in one database, and I could have foreign keys, and consistencies, and asset compliance?

CHARITY: Which you should do for as long as you possibly can. You should never impose more complexity on yourself than you absolutely need to. And I would say that it's never not better to have observability than the older paradigms of monitoring and so forth. Some of Honeycomb's biggest and best customers still use monoliths. But they still find it really valuable to be able to apply the principles. I think that it's the microservices revolution, if you will, that forced this set of changes. It was inevitable. The steps that I started talking about, like, somebody would have because the older way just became untenable when you started adopting containerization and a lot of these things that made everything suddenly a high cardinality including the number of applications you have. But it's never not better to have high cardinality tools and to be able to instrument your code for spans and tracing. Tracing is still valuable even in the monolith.

CHRIS: Yeah. As I've observed and started to play around with Honeycomb, that's definitely what I've seen is I'm almost exclusively working in the context of monoliths and, like I said, clinging to them for as long as I possibly can, which isn't going to be forever.

CHARITY: It's true. [chuckles]

CHRIS: I recognize that truth, but already I see the value. And so Honeycomb is a platform that you've built that allows for this high cardinality, high dimensionality ad hoc queries at any point in time. And so the idea that I can come into the tool and say, "Huh, I've got a new novel problem today." I don't need to re-instrument my code. I can just ask a new question, and the system will responsively be able to answer that question, ideally. And that feels like it holds true in a monolith all the more so, like you said, in an SOA architecture. But even in my safe little playground of everything is in the same space, I still don't know how everything's working all the time if we're being honest. So being able to answer those questions feels meaningful.

CHARITY: Totally. I think that one way of thinking about the SOA or microservices is that it pushes a lot of what was in the operations realm into a realm of development, and suddenly you're responsible for a lot more of the operating of your services, things like retries and backoffs, and load distribution, and thundering herds, and all these things that ops traditionally took care of. Well, now you have to think about them. So you need some ops tools, too. What I like about...of course I like everything about Honeycomb because we designed it for this problem. But it speaks in the language of variables, and endpoints, and functions, and not in the low-level language of proc IPv6 timeouts and stuff where I feel like ops has also traditionally been the translation layer between software engineers and their actual code in production. And it's time to start giving software engineers those tools in their own language.

CHRIS: Yeah. I love that. And I'm very happy to have Honeycomb as part of an instrumentation stack, which actually shifts me to the next question, which as I look at Honeycomb, very quickly the first time I saw it, I was like, oh okay, this makes sense. I want this in the world.

CHARITY: Oh, I like you. [laughs] Not all people are like you.

CHRIS: It might have been my second or third look, but it was definitely...once I got it, I was like, oh yes, I absolutely want that. But now, the question that I have is I typically will have a collection of tools that exist in this space. And there's a weird Venn diagram overlap of well, there's logging, and there's error tracking, and there are APM performance tools, and there's metrics, dashboards. And my sense is that Honeycomb perhaps can or an observability tool more generally can subsume a bunch of those, but it's not clear to me exactly. I think I probably still want logging. I think I still want error tracking as a discreet service tool that I'm using but maybe not APM and maybe not metrics as a distinct thing. Maybe I can infer those from a tool like Honeycomb. But I'm wondering what's the current thought on that?

CHARITY: Well, part of what you're seeing is just observability tooling is very new, and we haven't had time to grow up. And here I'm like, officially, we play very nicely with all other vendors, and none of us would ever try to compete or take away from each other's faces. But I do think that ultimately, logging pretty much the only real use case for it is security stuff, the security archiving, just keep every log light. It's gotten cheap enough, but it's not actually useful for debugging or understanding your system, not really. It's useful for compliance. It's useful for proving that you did something in the past. Most logs are just a pile of trash, but they can be useful trash. And I understand people's emotional want to hold onto them for a while, and there's nothing wrong with that. There's nothing wrong with keeping some trash around for a while, while you make it...[laughs] Sorry, not to totally slam on logs, but they are trash.

CHRIS: I love the analogies that we're going for. [laughs]

CHARITY: But the thing about observability is I do think the kind of center of the world is these arbitrarily-wide structured data blobs from what you can infer logs, from which you can infer metrics, from which you can roll-up. So I do think that well metrics are the right first tool for understanding infrastructure. Like if you're Amazon and you're responsible for all this hardware and stuff, you should be asking yourself, is my service healthy? But if you're someone who's writing and shipping code on top of that service you care about, can my request complete? What is my user's experience? And that's observability's territory. So I think that ultimately, I do think metrics, logs, and traces all get subsumed under the observability umbrella and performance management, too, if the tools get built correctly. There will still be use cases. They will just get smaller, for logs, for standalone metrics tools.

Honeycomb just launched our metrics product. Metrics is like a 30-year-old piece of technology. Prometheus and Datadog are going to be the last best metrics tools ever built. We have wrung the water out of this laundry. [chuckles ] But we aren't trying to compete with that. What we are trying to do is give people an on-ramp into Honeycomb. They've got decades’ worth of stuff. They've been corralling metrics, structuring them. You rely on them. You don't want to give them up. So yeah, let's feed them in. Let's give them an overlay. And number two, the more interesting use case for me is when you're a software engineer who's writing and shipping code, you do care about did the memory usage just triple, or is the CPU completely buzzing after I shipped my last change? But there's really only like three or four of those metrics that you really care about as system metrics. The rest are mostly legacy.

CHRIS: I like the idea that aspirationally, Honeycomb is moving towards a place where given sufficient input data, given this arbitrarily-wide data blob with high cardinality, et cetera, that we can infer basically all of those others from it. But also speaking to also observability is somewhat new, and so we got to build a lot of product to get there and that idea that there is perhaps a space right now where you might be bringing together a few of these tools. But if there is a future world in which I can have one of these tools that just handles everything and tells me about my code and directs me to the line of code that I incorrectly instrumented, that would be wonderful. Happy to do the work in the interim to cobble it together from the pieces.

CHARITY: The place in the meantime that we're at where all of these big vendors are acquiring other vendors and trying to put together...they're like, we have three pillars. Coincidentally, we have three products to sell you. It's like, it's not good for the users because when you're...like, you're sitting in the middle here. You've got your metrics dashboard. It's telling you that there's a problem. Okay, if you can't slice and dice and figure out what it is, you have to jump over into logs and visually correlate based on the times and hope no timestamps are wrong and try and find the thing. And then, oh, okay, so you want to trace it. So you've got to copy over and try and find that in your tracing product and hope that that would get sampled in. It's not good. You can't follow the question from the beginning. I have a problem to the end. I have a solution and back. And it's not linear. You're going to be following a trail; then you're going to need to back up, then you're going to find another trail. And then you're going to want us to zoom out and see who else is impacted. And you really can't back your way into that with different products. You have to start with the arbitrarily-wide structured data blob.

What does confuse me is I know that New Relic is built on this. New Relic has these. And we almost didn't start Honeycomb because we were just like, edit data, and New Relic is going to figure it out. Here we are like six years later, and they still haven't fcking figured it out. [laughs] But like Datadog, they aren't based on that arbitrarily-wide structure, so they are really...and I know that they're trying to get...all of these big vendors are trying to get to where Honeycomb sits technically faster than we can grow up and become a business.

CHRIS: The race is on.

CHARITY: Yeah. It's fun.

CHRIS: One of the related things that I've seen you talk about a few times is the idea that instrumentation is a muscle. It's a habit that needs to be developed and fostered, and that rings very true to me. At the same time, a lot of my instrumentation work has been more in a reactive space. If we're being completely honest, something went wrong; we can't figure it out from the information that we have available, so then we go in, and we add a new logging line. We wrap the code in some way. And so I'm wondering if you can talk a little bit more about that. What does that look like in practice or perhaps some examples or something? But how can we tease that apart and understand that a little bit better? Because it sounds wonderful to me.

CHARITY: I think of instrumenting a lot like commenting your code. It's a way of thinking to the future and reverse engineering; what am I going to care about? What is someone else going to care about? And I really do think of commenting as just a less true version of instrumentation, honestly. It's you talking about what you think the code should be doing, but you've left production out of the loops. You don't know what the code is doing. [chuckles] But ideally, they're kind of the same muscle. It's why you're writing your code. You've just developed a monitoring thread almost in your brain. It's like, yeah, this is going to be valuable. Oh, this is going to be valuable. And so I do think that it's on vendors to make sure that we do as much for you as possible. And this, honestly, is the long winding journey to Honeycomb finding product-market fit, which took almost three and a half, four years.

And for a long time, I was like, it’s not magic. You have to understand your code. You have to blah, blah, blah, which is true. But also, we need to walk closer to the user. We need to make it easier. We need to do the beeline, which will initialize the event, pre-populate it with a bunch of stuff, create the framework so that all you have to do as a user is just printf now and then just stuff this in the blob, vendors making it as easy as possible, as automated as possible. We have more to do. We really should be pre-populating it with all of the language internals and all of the stuff about the environment. We'll just be glad to tap that well. But there's something that we can't do for you, which is understand what you're trying to do and what is important.

Honestly, here's a story from the past. The reason that New Relic was so big, they hit the ground, and they super hockey-sticked everything was because they dovetailed with the rise of Ruby and Rails because Ruby allows for so much fcking monkey patching. Every web app looks the same. You can just be like, we assume all this crap, and so we could make it just like magic for you. You just install this library. Boom, you're off to the races. Well, try as you might, I want to say a type language like Go, you can't do that stuff with. You can't make it as magical. You have to think a lot more about how you're structuring things for better or for worse, which is why their growth slowed because those languages just aren't so popular anymore.

So it's trade-offs all the way down. Yes, everybody should be an expert in forecasting the future and understanding all the subtle things that you don't know you're going to know, but you're super are going to want to know. But as you've discovered, most of your learning comes from being in the trenches, which is why it's so good for devs to be on call and be close to their code and be in this constant conversation with it because you develop a sixth sense. I can't tell you exactly why I know it's going to be a problem, but I'm just going to wrap it because I'm pretty sure it is.

CHRIS: There was a tiny bit that I was hoping that you would have some very specific like, oh, you just do X, Y, and Z. I kind of knew that wasn't going to be the answer, but it also represents something that I so appreciate about your thinking and the work that you put out into the world, which is it's realistic. Sometimes you're like, you know what? There's going to be some tacit knowledge involved here. You got to put in the work. You got to learn the thing, and that's just true sometimes. And so I appreciate your willingness to be like yeah, you know what you got to do? You got to do the work. And then after that, you'll know...and so there's sort of a virtuous cycle that can happen here. There is a feature, as far as I understand it, of Honeycomb, too if I can briefly hype up your products slightly but the idea that you can observe the series of questions that another developer asks. So if they were in a debugging session, you can see like, oh, they asked this, and then they asked this, and then they filtered on that.

CHARITY: It's like your Bash history but for debugging. [chuckles].

CHRIS: I want this for everything.

CHARITY: Right?

CHRIS: Let's have a shared hive mind of the developers on a team, both in terms of our observability tool but also just kind of everything.

CHARITY: What did you do?

CHRIS: Yeah, what did you do, and why? What were you thinking? I saw you went down a road there, but then you stopped and backed up, and you went a different way. That's interesting to me.

CHARITY: This is why we keep trying to build things into the product that will incentivize people to write texts about what they're doing, whether it's retroactively applying tags or writing a breadcrumb to yourself. Why was this meaningful? As you're putting it in your bookmarks, why are you putting it in your bookmarks? Collaboration is just as much about collaborating with your past self and your future self as it is with the rest of your team. I don't remember why the fck I did that two years ago. I don't know. I don't know why I did that two months ago. But the more you can leave breadcrumbs for yourself and then surface that to the team, you're right; it’s transformational.

I wanted this so selfishly because I have never been that person on the team who loves graphs. I hate graphs. I don't think visually very well at all. I've been working with my friend, Ben Harts, off and on for like 10, 12 years now. He's always the person I've hired repeatedly. He's always the person who comes in and makes the graphs. And then I look over his shoulder, and I bookmark them. I can be up all night making the perfect dashboard. And then I'm like, great, mine. [chuckles] So there's room in the world for both of us. But the point is that not all of us should have to go through that effort. [chuckles] We should be able to learn from each other. Only one person should ever have to have to craft the perfect query, and then the rest of the team should be able to effortlessly piggyback on it.

CHRIS: Yeah, absolutely. And again, I want that but for everything. I dream of a future in which that's true.

CHARITY: And so much of debugging is this wandering path where you go down the wrong place, and you need to be able to zoom back to all right; where did I first know that I had a beat on it?

CHRIS: There's a corollary that I see to pair programming where one of the things that I find so valuable is, what Google query do you type in when you hit that wall? When you're like, oh, this isn't working as I'm thinking, and then you type something and I'm like, whoa, wait, I wouldn't have even thought to ask that question of the internet.

CHARITY: Oh, I love that. That's fantastic.

CHRIS: But now you've productized that, and I love that. So thank you for building that thing in the world.

CHARITY: Excellent.

CHRIS: Shifting gears slightly, one of the other themes that you really pushed for in the world is the idea of continuous deployment and not like yeah, you should ship your code pretty quickly after you merge it, but true, sincere continuous deployment.

CHARITY: 15 minutes or bust.

CHRIS: 15 minutes of bust, test in production. There are some really wonderful if we're being honest, scary themes that you talk about. I love the ideas that you're putting out there, but they're probably the things that I look at, and I'm like, ooh, that seems like a whole thing right there.

CHARITY: It assumes a lot. Let's put it that way. It assumes a lot.

CHRIS: It definitely does that. I desperately want to get to that world. I want to get to the place where there's that confidence. And similarly, there's a theme that you've talked about around Friday deploy freezes and why that's not a good thing. And the empathy for humans that part's good, but maybe we're applying it in the wrong way if we say we're not allowed to deploy code on Friday. Because it's like yeah, deploying code is terrifying and scary. No, let's solve that problem. But I wonder if you can talk a little bit about that. How do you get there? How do you get to the place where continuous deployment is a realistic outcome for you?

CHARITY: Yeah, that's a very good question. There are no easy answers, unfortunately. And the answer is always going to depend on where are you starting from? Are you starting from a clean slate? Are you starting...a lot of the advice that I give sounds like Looney Tunes to someone who's coming from enterprise because they're just like, "You don't understand the constraints that I am operating under." And I'm like, "Yeah, you're right. I'm not of your world. That probably shows." [chuckles] So I think the easiest way, though, is always when you're starting a new project that what you do on day one would be to set up your CI/CD and deploy it to prod before you've even started building. My favorite analogy to that is to like...you know the myth about Alexander the Great and his horse how when he was a little boy he would pick it up every day before he had breakfast? And so, by the time he was an adult, he could pick up his horse because he picked it up every day, and it was never hard.

When you start deploying that way, it's never hard. When you're just like, okay, anytime this gets above 10 minutes, we're going to put in a couple of hours of work, and it's never hard. It's just the easiest thing in the world. And everything's easier because you get to watch what you're doing and in real-time, and you develop that muscle of I’m merging it to main. I'm going to go look at it in a couple of minutes. And you don't feel done in your gut until you've looked at it. And that's doing it on easy mode. And you can do this in a hybrid way. Even if you have like, well, I'm paying for a deploy. Nobody is saying you have to sign up for a long, painful deploy process when you got to spin up a new project. And I've seen it gain momentum. If you start something that's clearly the new way, everybody sees how fast this team is executing. Everybody wants a piece of it. And so you start learning from the way that you are able to do it in your unique environment. You're the best evangelist to the rest of your team members because you know the subtleties. You know the problems. So that's the easy answer is start fresh. [laughs]

CHRIS: [laughs] That makes sense. I do, again, I appreciate the pragmatism or the realism of the way that you approach a lot of the topics.

CHARITY: Another answer, though, it's just that the engineering work involved in taking a deployed pipeline down from hours, days, to 15 minutes it's just engineering work. It is just labor. It can be done. The political problems are the hard ones. I mean, in the past, sometimes our deploy probably would get up to two or three hours, and we were just like, oh God, this is not…put in the work. You just start instrumenting your pipeline, and you start looking at where the tests are taking time. And it will pay dividends every bit of time that you pay down, which is why I really see these long…our own pipelines is it's a vacuum of engineering leadership that they've allowed it to happen because there's nothing fancy about it. You just put in some work.

CHRIS: Yeah, the solvability of the technical challenge feels very true, but what you're saying of it's people problems which again, that's always true of the tech stuff.

CHARITY: It is people problems, but I also hate it when people are just like, oh, it's people problems. That means mysterious and unsolvable. Now, most of the time, when you see this, it's a lack of collective confidence in themselves. They see this as being as just for the elite engineers, or only ex-Googlers are allowed to do this or something. Or they go to conferences, and they hear about it, and they're just like, God, I wish I was allowed to do that, or I wish we could do this.

But the thing is that engineers have more power than they realize. We build these companies. They wouldn't exist if it's not for us. We have all the power if we just choose to use it. I know that a lot of these people who I've talked to that were just like, "Oh, I wish we…" I'm like, "Have you ever lobbied for it?" And they're like, "No, I just know we could, or that's someone else's decision." I'm not going to promise you that you can get whatever you want. But I promise you that if you start speaking up if you start talking to your colleagues and being like, "Wouldn't it be nice?" And they start speaking up...if a quarter of the engineers want something in the company, it gets done. [chuckles]

CHRIS: That definitely feels true. And to the topic of actually lobbying for this and having the hard conversations internally and working on the people problems, you have done, I think, a really fantastic job of providing actual benchmarks in terms of timing and what does this look like as a practice and what are the multitasks?

CHARITY: It's so expensive. It's so costly to organizations. And it's the easy answer for any engineering leader to be like, "Well, we need to hire." That is the laziest answer in the world. You probably don't. You probably just need to fix your CI/CD system and then bask in the resources that you suddenly freed up. [chuckles]

CHRIS: You have a wonderful blog post that really I think does such a good job of highlighting the cost that you're talking about there, the human costs for every slowdown in your deploy process, it has this downstream ramification. And having that as sort of a piece, a bargaining chip in the conversation of here's a voice that is saying a very clear thing about this cost of not doing this work, which granted, it's always trade-offs. Everything is an optimization. But here is a way to actually measure the cost of not going with this approach. And again, I appreciate you're putting that out there in the world so that the rest of us can be like, "Look, on the internet, it says so."

CHARITY: [chuckles] Exactly. I'm happy to be the internet for you. But it's so true because other people in your business don't want you to suffer too, either. They don't want everything to get slow. They just aren't equipped to understand the cost of this slowness the way that engineers are. And I feel like sometimes this is...it's like we're always lamenting like, why does product get to own all the engineering cycles? Where aren't we allowed to do all this other stuff? I promise you're allowed to. You just have to make the case because the case is righteous and justified. But you have to explain to them the cost that it's incurring your organization in terms of your ability to execute and in terms of your ability to hire and retain people. You just have to explain those costs. And engineers are just like, "Well, we only say it once, right?" Well, that's not how you win arguments. You have to say it. You'll probably lose. And you say it again, and you'll probably lose. You say it a third. And you will win eventually because you control all of the creative labor of the technical organization. So just make the fcking case. [chuckles] I don't know. I make it sound simple; it’s not.

CHRIS: I love the sound bite of the cause is righteous, and that is the kernel of the thing here, which is like, just to be clear, this is a virtuous path that you were going down, battle for it, work towards it, absolutely. So I think a related topic here, so continuous deployment is one of those things that you want to get to and a practice that you want to evolve to. But in exploring some of your other work, one of the things that I was exposed to is the DORA metrics, which is not something that I hadn't seen before. But for anyone who's unfamiliar, the DORA metrics is a set of four key metrics to track developer and team productivity, so their deployment frequency, lead time for changes, change failure rate and the time to restore the service. And they are deeply interesting because frankly, I have for a long time felt like developer productivity was not really a quantifiable thing.

CHARITY: It's not, yeah.

CHRIS: Individual developer productivity I still feel like this is a bad thing. Don't do that. But team productivity these metrics actually are like oh, actually, as I look at those, those seem like the good ones. We should do that. I'm wondering, what does that look like in practice when you see that actually employed within an organization? What are the feedback loops, and how does this appear in the world?

CHARITY: Yeah. We all owe a huge debt of gratitude to Jez Humble, Gene Kim, and Nicole, who worked on this for years and got this out into the world, just putting some actual research behind the stories that we were telling ourselves about productivity. And people who haven't read Accelerate...a lot of people are always asking me, do we have any stories? Do you have any research? Do you have any proof or something? I just always point to the book Accelerate. That's where it all comes from. Yeah, it's true because it's such a noisy world. When you're an engineering org, and there's just so much going on, and there's so much stuff that bugs you personally, and some of the stuff that you have true beliefs about. And it's hard to just cut through the noise.

And I feel like that's the great gift of the DORA metrics. If you start focusing on one of them, you will lift your org out of poverty, and the others will get better too. And it provides just this wonderful focus point for teams that aren't sure where they stand or aren't sure how to get better because it can be so mystifying. When you're in the trenches, and you're just like, why does everything feel so hard? Why is it that we thought this would take two days, and here it is two months later, and we can't ship anything? And it feels like the more we ship, the farther behind we get. These are the beacon of hope. It's like, you pay attention to these, your lives will get better. You can dig yourself out of this ditch.

That's certainly been true for the teams that I've been on. And high-performing teams, I think we all have this idea in our heads that high-performing teams are ones where the great engineers join when in fact, those great engineers could join your team, and they wouldn't get any more done than you are. Because most of our productivity is defined not by the data structures and algorithms that you know but by these social-technical systems that we swim in every day, it’s the water around us. It's the friction involved in getting that code to production. If it takes the magical engineer from Google 24 hours to get their code changed out, well, they're not a member of a high-performing team either.

You mentioned earlier all these people are out there who haven't experienced a world like this don't live in a world like this. And in my experience, they often lack a lot of confidence because they don't think they're that good, or they don't think that they can have nice things. And the DORA metrics that's your ticket to a better life. It's like go to college and graduate because it kicks off these virtuous feedback loops, these cascading cycles of things getting better for everyone and people getting more excited and energized. And they just don't get burned out by shipping too much code. They get burned out by not being able to ship code.

And if you're a leader in any type of organization, and I don't just mean manager, I mean any type of senior engineer or manager or whatever, then it's part of your job to pay attention to these metrics, lobby for them, track them, track them on your own if you must, and try to make them better because every engineering team has two customers or two...whatever. I'm blanking on the word. But it's your customers and your engineering team. You're responsible to both of them. And I've never seen one of those sets deliriously happy and the other set miserable. They tend to rise and fall in tandem.

CHRIS: I'm just nodding along for anyone in the audience who can't see what my head's doing. But I love so much all of the things that you're saying and, again, the passion and conviction that you bring to this conversation because these are amorphous, hard to pin down ideas. But I appreciate the North Star that you're setting across all of these different things that as I'm reading, I'm like yeah, that sounds true. I want that. Those things are the things that I want. But interestingly, one of the other threads that I see weaving through a lot of your work is obviously we've talked a bunch about just deeply technical topics thus far, but also a lot of your work spans across to the interpersonal. And frankly, even dividing in that way is not representative of the world because it's a Venn diagram mishmash of some days it's technical, some days it's personal, some days it's both. But one of the things that you've talked about is the engineer manager pendulum which I find super interesting. I think every engineer at some point has that question, that internal oh, do I want to go engineer track or manager track? And this distinct idea or the idea that management is a promotion and any other movement would be different, and you have wonderful things to say about that.

The other thing that you've pointed out is that former managers can often make great engineers after the fact because of the earned empathy that they have now from looking at things from a slightly different angle.

CHARITY: Amazing engineers.

CHRIS: But I'd love to hear a little bit more of your thoughts on that because I think it's such an important space, and I've definitely previously operated under I'm an engineer, and then I guess I got to be a manager, and then I guess I don't know where I go from there, but it's this very linear path. And you shook that worldview of mine, and again, I appreciate that shaking. But yeah, I'd love to hear a little bit more about that.

CHARITY: The best people that I've ever worked with have been engineers who had been managers for a while and then went back to engineering, and it's not just empathy, although there's a lot of that too. It's also a deeper understanding of the business and the reason that we do things. So much of being a powerful engineer is choosing the right work to work on so that you get a lot done very efficiently and quickly, and you don't spend a lot of time just foundering, which you've mastered, and you know the basic technical principles. And how do you get better? A lot of it is just getting better at identifying what to do and what not to do because we have to not do so much more than we can ever do in order to move forward.

I wrote a blog post as a present for a friend of mine who was a director of engineering at the time, and he was suffering. He was just miserable, and he kept thinking about going back to engineering, just kind of dragging his...because he wasn't in an org where that was really celebrated or anything. When you've been there from the beginning, you built the organization; you’re like a senior director and everything. It feels like a long way to fall. And I wrote that post for him. And he did end up going on to be an engineer after that. And he was so much happier. But I think he was surprised at how he didn't fall at all. He actually probably had...I think the engineers had a higher opinion of him afterwards when he was one of them again. And he still had this vaunted voice because he could speak to how the system had been there since the beginning. And he basically got to look around and look out farther than the engineers who were heads down every day and go, "This is going to bite us. I'm going to take a small team. We're going to do this forward-looking security product."

I don't want to identify details, but that for me really just kind of cinched...It was like the more we can strip hierarchy out of these discussions; the healthier everyone's going to be because we're just monkey brains. And the monkey brain in our skull hates losing hierarchy, hates losing power or stance or anything. And I think that the thing that you learn after you've been a manager is a lot of it is just the wizard behind the curtain, the idea that you have more power as a manager. You have more of some types of power, and you have a lot less of other types. And you're just as constrained as the engineers but in other ways. And the path moving forward is not to dominate people or be above them but to combine your powers for good and self-sort to find a place that actually gives you the most joy.

CHRIS: It's a wonderful philosophy. And actually, a thing that you said in there really stuck out to me, which was you wrote that blog post as a gift to someone, and that is such a kind thing to do. And it also, again, reflects what I see in your work overall. You're really clearly leaving a trail of breadcrumbs behind you to help other folks that are traversing a similar path by questioning aspects of it. Or how do we do this well? Why is everyone sad, and why is it bad? And so again, I so appreciate all of that work that you've done.

CHARITY: I think that that comes from my lifetime in the trenches of operations. [chuckles] Ops is notorious for the pain that we bring upon ourselves and try to solve. But I would just like to add a pitch out there for other ops engineers of the world and our colleagues. I was fortunate enough to rise up through the ranks in organizations that really respected operations. We always felt we ruled the roost. We felt like we were way above all the other developers. We got to say what went into production and what didn't. And I feel like ultimately...if you have to have an imbalance of power, I think that's slightly healthier than the developers ruling the roost. Ultimately, there shouldn't necessarily be any imbalance of power. But I just want to pitch it; this whole no-ops thing really got my goat for a while there because operations is just the engineering workaround delivering value to users. I think the second wave of DevOps is now about okay, software engineers; it’s your turn. It's time to learn to write operable software. And so I just wanted to throw in my hat in the ring for all the ops people out there. You're just as good. You're just as good as anyone else. [chuckles]

CHRIS: I mean, it's sort of a theme that I've seen in your writing of everybody's doing good, important work and breaking down hierarchy and just collaboratively moving in the same directions and trying to choose the right North Stars to aim towards. And yeah, it's all fantastic. And so with that, I think we probably reached a perfect spot to wrap up. But Charity, if folks want to keep up with more of your work online, where are the best places to find you?

CHARITY: My blog post is at charity.wtf, and I'm @mipsytipsy on Twitter, and of course Honeycomb.io and our blog.

CHRIS: We will include links to all of that and many of the blog posts, and other podcasts interviews that you've been on, and a bunch of just various things that I collected as I was preparing for this episode because, again, you've produced such a wealth of information on the internet that I want to point as many folks as possible towards those things. But yeah, thank you so much for taking the time.

CHARITY: My pleasure.

CHRIS: The show notes for this episode can be found at bikeshed.fm.

STEPH: This show is produced and edited by Mandy Moore.

CHRIS: If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review on iTunes,; you as it really helps other folks find the show.

STEPH: If you have any feedback for this or any of our other episodes, you can reach us @bikeshed or reach me on Twitter @SViccari.

CHRIS: And I'm @christoomey.

STEPH: Or you can reach us at hosts@bikeshed.fm via email.

CHRIS: Thanks so much for listening to The Bike Shed, and we'll see you next week.

All: Bye.

Announcer: This podcast was brought to you by thoughtbot. thoughtbot is your expert design and development partner. Let's make your product and team a success._

Support The Bike Shed

Episode source