The Bike Shed

314: Communication, Testing, and Accountability

Oct 26 '21

Chris regains several of his developer merit badges and embarks on a perilous CSRF (Cross-Site Request Forgery) adventure. Steph shares highlights from Plucky, a management training course, including ways we can "click" and "break apart" from our current role, and how to have hard conversations.

They also discuss how software development processes change at different team sizes, processes that break down as teams grow, and processes that are resilient at any team size.

This episode is brought to you by ScoutAPM. Give Scout a try for free today and Scout will donate $5 to the open source project of your choice when you deploy

Become a Sponsor of The Bike Shed!

Transcript:

STEPH: Boom. I'm recording. Magic is happening. [singing] What's this? What's this? It's a Bike Shed episode. What's this? What's this?

CHRIS: You did that on the mic. [laughter] So you just started recording too, so it's not like you're like, "Oh, I forgot I was recording."

STEPH: Oh, I didn't have a finishing line that rhymes with shed.

CHRIS: Head, dead, bread, spread.

STEPH: [singing] Is TDD dead? I don't know. [laughs]

CHRIS: Cool. I liked it.

STEPH: Hello and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Steph Viccari.

CHRIS: And I'm Chris Toomey.

STEPH: And together, we're here to share a bit of what we've learned along the way. Hey, Chris, what's new in your world?

CHRIS: What's new? I had a fun experience over the past week or two of regaining some of my developer merit badges, which is always enjoyable. So one was I had to configure AWS, specifically S3 and IAM such that I could upload files to an S3 bucket, which seems like one of those things that a developer should be able to do, and it's just not that hard. And, man, I failed so many times, and I stared at the screen. And the ARNs I think that's another acronym that I had to try and figure out what it means and fight against. Anyway, I got there. So that's one merit badge earned. I really hope [laughs] I correctly and securely configured access to an S3 bucket such that we could upload files in our Rails app. Cool, neat.

Moving on, the next merit badge that I went for was restoring the sea of green dots. Our RSpec output had gathered some noise. There was a whole bunch of noise across a variety of things. There were some dev tools that were dumping some stuff in there. And there was something related to apparition, which is the...I want to say it's the Capybara feature spec driver that we're using now, which sits on top of ChromeDriver or something like that. I don't really understand the details, but it was complaining about something. And I found a fix, and then I fixed it and whatnot. But it was one of those. I did this on a Saturday because I was just like, you know what? This will be cathartic and healing. And then I got to the sea of green dots, and I was so happy to get to it.

STEPH: This is me...I'm giving you a round of applause.

CHRIS: Well, thank you. Arguable whether it delivered any real value to users, but again, this was Saturday effort, so I was allowed to indulge my fastidious caretaker of the code role.

STEPH: Sorry, before we move on to more serious, can we pause to talk about developer merit badges? I really, really want cute felt badges that we can...I mean, I can't design them. I don't have the talent. But I think between us and other folks, we could design amazing merit badges, and then people could collect those. I'm very much in love with that idea.

CHRIS: I love the idea. I am now certain that if we were to really pursue this, that we would fall into the deepest of bike sheds as we try and define well; what are all the merit badges? And what are the different levels?

STEPH: [laughs]

CHRIS: And how many do you need to collect before you can get to what are the different...There are just so many different taxonomies that we could introduce, and, oh man, I could spend a couple of weeks on that.

STEPH: [laughs] It has a very strong Pokémon vibe too of you got to catch them all.

CHRIS: Absolutely.

STEPH: Okay. All right. We won't digress into bikeshedding merit badges, but I'm still very, very interested in that idea.

CHRIS: Indeed. If anyone out there in the listener space wants to just make these, that would be great. This is the way that I avoid bikeshedding now is I just say I'm not allowed to make these decisions or even think about it. But if these happened into the world, I would be happy about that.

STEPH: Oh, I just remembered we do have something similar at thoughtbot. They're not physical where you can hold them, but I think we've talked about turning them into physical badges. But we have our internal tool hub that we used to track our schedules. And one of the fun Ralphapalooza events that we had, a team came up with the idea of introducing badges in the tool hub, so then you could award people badges. You could give people badges. And it's very cute. So they could probably help us with the taxonomy. They've probably already figured out a number of badges we could get started with.

CHRIS: And of course, this is where my brain went initially to like, oh, what would the taxonomy be? But I think that's how this goes bad. And if we just keep it in the this is cute and fun, and what are all the possible merit badges, but they're all equal, and the points are made up anyway, and then it's just a fun thing, then I'm like, I'm super into this. Let's do that. Have you used a regular expression to parse HTML? Congratulations, you get a merit badge. Have you not used regular expressions to parse HTML? You get a different merit badge. [chuckles]

STEPH: [laughs] I feel very positive that I could be chief of cute and fun. I could manage that department.

CHRIS: Yes, that feels like definitely a role that you could really excel at. But shifting around ever so slightly, I did run into a fun bug this week. And it was a mystery tour of, I'm going to say, sadness and then eventual learning and understanding, and I think we've come to a better place. But I want to tell a story, take us on a quick tour of the adventure that I went through.

So we recently saw a handful of exceptions come through in our exception monitoring service and then piped into Slack, where we see those around CSRF token expiry. So this occasionally happens in a Rails app. The CSRF token that was on the page gets rotated. And therefore, when someone...if they have an older version of the page open and they try and submit a form or something like that, then CSRF protection is going to kick in. And you do get some false negatives there or some cases where like, nope, this is actually a fine user, this is not hacking, this is nothing bad. It's just that that user had a tab open or something like that.

I'll be honest; I want to understand better the timeline of expiry and how Rails expires those and whatnot. But it's one of those things; it’s deep enough in Rails that I trust that they are doing a very reasonable thing. And I think the failures that we're seeing that's part of the game. And so, mostly, we wanted to add a nicer handling around that. So thankfully, Inertia actually has a really wonderful page in their docs about handling Cross-Site Request Forgery expiration token, this whole thing. This is a particular failure mode that your app might have. And so it's nice to be able to provide a nicer user experience.

And so what we ended up doing is if we catch that exception, we have a rescue_from in our application controller that will instead of having this be a 500 and just a full, like, something went wrong error page, we instead respond in an Inertia-like way to basically show a flash message that says, "This page has expired. Please refresh the page to continue." And if the user just refreshes the page, then they will get a new CSRF token. And from there, everything is going to be fine. So it's not ideal. But it is, I think, both secure and now a nicer user experience.

STEPH: Yeah, that sounds really nice. When they refresh the page, do they lose all that form data? I'm curious how painful of a flow that is for the user.

CHRIS: Currently, yes. Inertia actually has a really nice feature for remembering form data. If you've ever been on GitHub and you're filling in a box, and then you go away to a different tab, and you come back, and it's still there, and you're happy about that, it's that sort of thing. So we could configure that. At this point, we don't have...most of our forms are pretty small. So this is not something that we opted to do proactive management around. But that is definitely something that we could add but not something that's default or anything like that.

STEPH: Cool. Yeah, that makes sense. I was just curious because yeah, either small form doesn't really matter, or also, this may be just a small enough error that only a handful of people are experiencing it that it's also just not that big of a deal.

CHRIS: Yes, this definitely should be an edge case. And we've also recently been working on functionality to log folks out after a period of inactivity, which would also, I think, obviate this in a different way. So all total, this shouldn't be a big deal. And this was basically a quick, little snippet of code that we thought we could just drop in, and everything would be great because it shouldn't happen much.

But then I was testing out a different feature on staging, and everything I tried to do was popping up this little alert flash message that was like, "Hey, your page is expired." And I was like, that seems bad. And then I realized literally every action, any non-GET request, was getting this response that the CSRF token didn't match. And I was like, well, this seems bad. Luckily, it was only on staging and hadn't made it to production.

But it had made it to staging, which meant it had gotten through CI, which was very concerning because we have a pretty robust set of feature specs at this point. We built up a bunch of fakes for all of the external data systems that we're interacting with. And we're really putting the app through its paces and trying to do so in a very production-like way. And so I was like, this is such a deep fundamental breakage. I don't know what's going on here. And so I started to investigate.

And it turns out that in a recent commit, I had started using Axios, which is a little wrapper around the Fetch API. They may not actually use the Fetch API under the hood, but it allows you to have a nicer interface to make XHRs. And we implicitly had that in our package already by virtue of Inertia. Inertia uses it under the hood, but I wanted to make it explicit because now I was using it directly. So I figured that's cool. I will yarn add Axios, and then I will continue on with my day. And I worked on my feature and everything was great. And then I pushed it up into a pull request, and everything was great, and CI passed. And I got it onto staging, and everything was very sad.

So then I started on the adventure of like, what is going on here? It turns out that somewhere between version 0.21.1 of Axios and 0.23.0, which there's a bunch of things about those version numbers that make me uncomfortable but here we are, somehow the behavior where you can configure the XSRF header name, which is what they're calling it on their side, the configuration stopped working. And so our override that says this is what our CSRF or XSRF token should be called when it's sent back up to the server in a header that was getting lost. And so they were falling back to their default name, Axios was. And, therefore, Rails was like, "There's no CSRF token here. So this is going to be a no for me. I'm going to reject all of the requests."

So the fix was relatively easy to roll back and to pin the version of Axios to the previous version that we had been using. I didn't actually intend to upgrade it. I just intended to make it an explicit dependency. But by doing that, I accidentally upgraded it. I don't love that there was this pretty deep breakage in that. I haven't done the good work of trying to open an issue. I still want to scan through and see if there is an open issue or a conversation around this before I start making any noise. But I think if I don't find anything, this is the sort of thing that should be reported because I can't imagine I'm the only one running into this.

Likewise, I was very sad that my test suite did not find this. Turns out in Rails, CSRF protection is just turned off in test mode, which may be overall makes sense. But for feature specs, in particular, I definitely want to have it. And so, it was nice that I was able to find the relevant configuration. And we introduced an RSpec configuration that says, "If it's a feature spec, save off the existing configuration and enable CSRF. And then after the spec, go back to whatever the previous was."

So now all feature specs run with CSRF. And I did make sure to push up that as a singular change to CI, and CI was very unhappy with me. Many, many features-specs failed, which was good. That was what we were going for. They failed for the right reason because things were fundamentally broken. And then, I was able to update the package-lock or the package.json on the yarn lock, pin the version, fix everything.

But man, there was this period of like, oh man, the app is broken in such a fundamental way. Users just can't do stuff anymore. They can view anything, but they couldn't change any data. And it just snuck through CI. And that feeling is the worst feeling. We had, at this point, built up a lot of trust in our test suite. It was really telling us when stuff was wrong, and if it was green, I felt very good merging. And suddenly, this just really shook me to my core on that front.

STEPH: I love these journeys that you take us on. I mean, they're painful for you, and I am sorry to hear that. But I love these journeys that you take us on. [chuckles]

CHRIS: I usually only take us on them when I've figured out the answer. And I'm like, all right, here's where we're at. It was rough for a little while, but now we are happy. And thankfully, the one configuration of saying, hey, Rails, also, please include this as part of our production like, configuration for test mode. So I feel better that moving forward, this breakage won't happen again.

STEPH: We should add that as another merit badge for telling a bug story. All right, I'm taking off my hat of chief of fun and cuteness. So this may not be terribly relevant to all the things that you just shared. But I am curious where you mentioned that with Axios because you'd specified the name of the token, and then that overriding behavior is what then broke. And so then that's what led to this whole adventure that you went on. I'm curious, why did y'all customize the name of that token?

CHRIS: A, this is a great question. B, I'm not super sure. C, I think the reason is because we were trying to align to Rails. So we have a little middleware on the Rails side that will serialize the CSRF token into a cookie. And then that cookie value gets read by Axios and sent back up as a header on the request. So this is the way that with Inertia CSRF just kind of works and is good. And it's different than Rails' normal. We put a hidden input into any form. And so Rails holistically knows about both sides of that, and everything works fine. But now I have to manually round trip the CSRF token.

And Axio's default configuration is a header name X-XSRF-TOKEN, and we needed X-CSRF-TOKEN because that's what Rails is looking for. I probably could have configured it the other way on the Rails side. But one way or another, I had to get Rails and Axios to come to an agreement, to meet at a table, and to agree to collectively protect the app. And so I had to mediate that discussion, and that's what ended us here.

STEPH: A meeting of the minds. [chuckles] Cool, cool, cool. Yeah, that makes sense. I was just curious because then that would have changed the whole journey. But yeah, that is super interesting. And I definitely resonate with the idea of when you've really invested in your test suite, and you trust it that then when it doesn't catch something that obviously breaks the application, then that feels like something worth prioritizing and digging into and then figuring out how to bring back that parity.

I don't know that I've turned on enable CSRF for feature spec. So I'm also very interested in looking at that configuration and considering if I need that for any of my future client projects if that's something that I need to remember for the future because that's very niche but good to know about.

CHRIS: I feel like this only really comes up if you're working in the...it's called the odd middle ground that Inertia ends up occupying. If you're in a traditional Rails app that is generating HTML server-side, forms are generated. They got the CSRF token inlined there in a hidden input. And then when you post that form, it's coming back up. The names automatically are going to match. You don't need to worry about it. And it's probably fine to not have it included in test mode.

And if you're at the other end of the spectrum and you've got API interaction, and that's the way you're doing everything, then you have a different auth mechanism and cookies, and whatnot just don't apply in the same way. And so it won't really matter on that side but for a different reason. And it's only because we're in this interesting middle ground, which, again, I really love. And it's the thing that I love about Inertia. But this is a rare case where it's like, oh, we do have to bring the two sides to meet in the middle. And this is a case where, unfortunately, due to a very subtle breakage on a minor release of...a package that we're using silently broke so, yeah.

But yeah, thankfully, everything is back to working. And again, we've been able to enhance the test suite in that little way that I feel confident again because this won't sneak in another time. We have coverage around this. We're good to go. So while I was very scared when this initially happened, I feel better now. I'm happy to go into the weekend feeling better about this. But that's my story. What's new in your world?

STEPH: So I feel like I've been having one of those weeks where I have less code adventures. In fact, it's one of those days where I went to thoughtbot's daily sync...because we often have our client daily syncs, but then we still have a thoughtbot sync as well. And I went to the group, and I was like, I get to write code today. It's going to be a great day. All the other things I'm doing are also interesting, but I get particularly excited when I get some maker's time and get to write some code.

So I feel like I've had less coding adventures recently and more hiring and process-related adventures. And specifically, I just completed the Plucky Manager Training, which is a program that's founded and led by Jen Dary, who was recently on thoughbot's podcast, The Giant Robots Smashing Into Other Giant Robots. I'll be sure to include a link in the show notes for anyone that's interested.

CHRIS: I believe this was the third time she was on. It's at least the second, possibly the third. And all of them are great listens, just as an aside, so we should include links to all of them.

STEPH: Yes, I think she's one of the rare guests that has been on the show three times. And I think I've only listened to the first couple minutes of that episode. But I think they talk about the fact that this is her third episode, which is really, really cool. And I'm still frankly synthesizing all the information and the ideas that I've collected from the course.

But I do have a few quick takes that I'm interested in sharing with you. So the first one is my cohort...we were the Panda Cohort, so go, Pandas. And some of the things that we talked about were…, and I think that this may have been the first day. So it was three days, and it was three hours for those three days. And they're spread out over a couple of weeks, which is really nice because then you show up for those three hours of the class, but then you leave with some ideas and some things to experiment with. You get a week to then try out an experiment and then come back to class next time and talk about this is how it went; it went to wonderful, or it went terrible. And you get to share that with others and work through it.

And in the first class, we talked about coaching versus managing, which I found just a helpful definition to review. So managing is more direct, and telling someone what to do while coaching is encouraging someone to determine their own path and find their own solution. And I find that as a team lead at thoughtbot, I'm very often more in that coaching space than I am in that managing space. I think it's frankly pretty rare that I actually need to put on a manager's hat. And I often feel like I'm wearing my coaching hat instead.

And some of the other things we talked about one of them is what is work? Which is a fun question to ask. And Jen had an analogy for this speaking about imagine that you have a plastic Easter egg. So it's got two sides, and side one is all the skills and desires and things that you're fulfilled by. And side two is a company that needs those skills. And it's great when those line up and click together, like when you take a job or get a promotion. Have you ever played...do you know what I'm talking about? Those little plastic Easter eggs. Have you ever played with those as a kid?

CHRIS: Yes, certainly.

STEPH: [laughs] I realize I just launched into that analogy. [chuckles] And then Jen goes on to say that's totally normal for then those sides to unclick. And Jen continues to say that it's totally normal for them to unclick. So maybe the company changes direction, the company is acquired. You've fallen out of love with something that you do about your job, or you have kids, and that has changed the things that you are fulfilled by and what you're looking for. And that's not necessarily bad. So it can be like, hey, you are working on x now, and you're not fulfilled by that anymore. But then another company comes along and says, "Hey, we're working on this, and you are fulfilled by that." So then another click happens.

And essentially, it's a nice analogy to represent someone's career path and the ways that we are going to shift and re-prioritize what we're interested in. But it's also a really nice way to help it feel less personal because both sides are allowed to change. The company can change. You, as an employee, can change. And then you can look for that next click that is going to match up with a company that meets your skills and things that help you feel fulfilled.

One of the other topics that we talked about are hard conversations, which I love that we dug into this one because that's certainly one that I struggle with or...I mean, we all get that feeling if you have to confront someone if you have to have that uncomfortable discussion with someone. It is a very hard thing to do. And so we had some very honest conversations around what is a hard conversation? What does that represent? And essentially, they represent that there is stalled progress and something can be improved.

So Jen likens a hard conversation to a tool. It's something that you can use to then help something move forward again if something feels stalled or if there's something that needs to change. And during those hard conversations, you may not get to the resolution that you're looking for. So you may be looking for a specific outcome. But you also have another person that needs time to respond and to take in everything that you have said and process that information.

So when you have a hard conversation, you may actually only move forward an inch. So if you had a lofty goal of we're going to talk and then we're going to have this hard conversation, and we're going to get to this space...But instead, you actually just make incremental progress. Like, okay, at least this person is now aware of this concern. That might be your win for the hard conversation versus actually tackling; how are we going to address it? I just want them to be aware of this concern.

And it's a very vulnerable conversation, and they often take time before you can get to that ideal resolution. But essentially, the idea is get in the game, start the conversation, and then have follow-up conversations for that hard conversation. And I really appreciated that framing because I often will think of hard conversations of oh, we have to have this hard conversation and get to this specific outcome. But if you shift the goal line to be like, no, I really just need to at least make this person aware of a concern, that makes it a lot more approachable. And then also probably yields more fruitful outcomes because that gives the other person time to think about what you've shared to also come to the table with their own ideas and then work together to then get to that ideal resolution.

CHRIS: I like that framing a lot. I can definitely see the case where you, as someone who has recognized something that needs to change (perhaps you're a manager),lineup you've now thought about that a good bit; you've observed it, but the individual that you're bringing that to this may be novel. This may be a surprise for them. And so if you come into that interaction both about to share this information but then also trying to resolve it and trying to get to I need you to internalize it, and I need you to fundamentally change your behavior as a result of this conversation we're going to have, that's quite possibly not a realistic outcome. And if you're trying for that, it might inherently lead to just a bad outcome because that individual is not in a position to do that. But they are potentially ready to hear it. And so you can just achieve step one and then later have step two. So I like that a lot.

STEPH: Yeah, in general, I found the course incredibly helpful, very insightful. It was also really nice to hear from other managers that are facing similar problems or perhaps novel problems and then getting to weigh in and help each other. So it's a wonderful course. I'll be sure to include a link in the show notes for anyone that is interested. And I'll probably come back with some more insights from the class because it's really...we just wrapped up. So I'm sure I still have some ideas that will percolate over time, and I want to come back and share those with the group.

Mid-roll Ad

And now a quick break to hear from today's sponsor, Scout APM.

Scout APM is leading-edge application performance monitoring that's designed to help Rails developers quickly find and fix performance issues without having to deal with the headache or overhead of enterprise platform feature bloat. With a developer-centric UI and tracing logic that ties bottlenecks to source code, you can quickly pinpoint and resolve those performance abnormalities like N+1 queries, slow database queries, memory bloat, and much more.

Scout's real-time alerting and weekly digest emails let you rest easy knowing Scout's on watch and resolving performance issues before your customers ever see them. Scout has also launched its new error monitoring feature add-on for Python applications. Now you can connect your error reporting and application monitoring data on one platform.

See for yourself why developers call Scout their best friend and try our error monitoring and APM free for 14 days; no credit card needed. And as an added-on bonus for Bike Shed listeners, Scout will donate $5 to the open-source project of your choice when you deploy. Learn more at scoutapm.com/bikeshed. That's scoutapm.com/bikeshed.

STEPH: Pivoting just a bit, we have a listener question that I'm excited to dive into. This question comes from the one and only, the Edward Loveall, fellow thoughtboter. And Edward wrote in, "How does the process of software development change at different team sizes? What's a process that breaks down soon after the team starts growing? What's a process that is resilient at all sizes? And by process, I mean anything that involves other people including organizing tasks, code review, deployment, or anything else that isn't you alone writing code in a vacuum."

I'm really excited about this question because I think there's a lot here. And there's actually one part that I'm struggling with a bit, so I'm curious to see what you think, Chris, about it. But I'm going to start off with saying that I think there are a number of management processes that definitely break down as a team grows. But in the spirit of Edward's question, I'm going to focus more on the software development process and how those might need to change and what starts to break as your team grows.

So starting off with processes that break after the team starts growing, this one, frankly, what really starts to break is not a process specifically, but it's the lack of process that really starts to become visible and painful. So, how do we track work? Before, maybe the product manager or someone would just send you a message and say, "Hey, can you work on this?" or "Hey, can you fix this thing?" And how does code need to be reviewed before being merged? Does it need to be reviewed? Are people just merging as they get stuff done? How are deploys performed? Oh, we have a super urgent production fix that needs to go out, and the only person that knows how to deploy is out sick today? Cool. That's the type of process that I think that really breaks down, or at least you start to notice when the team starts to grow. What are your thoughts?

CHRIS: I definitely feel that first one very strongly. We're feeling it right now on the team, which is still very small. There are only three developers working on the project, and then we have a product manager. And each week, we're slowly iterating, and tweaking, and honing, and trying to introduce just enough process in terms of how we define the work to be done, communicate the status of it, all of that fun stuff.

We started with Trello. And we just had a board with some columns, and then we had more columns, and then we got rid of a few of them. And then we recently added a Power-Up to the Trello board, which allows for epics. So there are cards which are epics which tie to sub cards. And I'm staring at it, and I'm like, how long until we're Jira? How long can I hold out here and not be Jira?

But it does feel like we're slowly iterating towards a more useful process for this team rather than process for process' sake, which I feel like is a really useful distinction. There's also a question of like, what can be known or what can be adequately measured and whatnot versus what can't be? So we've talked many a time on the show about estimation and velocity and trying to track that and the pitfalls inherent with that. And so there's, in my mind, two different camps. There's the process we want to avoid. And again, to reference German Velasco's wonderful blog post, Say No To More Process.

And I really feel like there is a tendency often when things go wrong to then try and paper over that with process. Oh, this team didn't use the design system. So we need to write ESLint rules to make sure you can't import from the directories that aren't the thing. And it's like, we can do that, and I've definitely done that. And I will do that again in the future. But I always have the lens of do we need this? Is it worth the trade-off, the cost, the overhead, the complexity that it's bringing in?

But definitely, organizing and communicating tasks is one of the ones that becomes really difficult. The more people that are working on something, the more you need probably more than one person staying out in front of them and trying to define the next bit of work that needs to be done after that.

Code review feels like it probably should stay similar, with the exception that I lose the ability to review all code at some point. Right now, I'm trying to review every single PR that goes through or close to it. At some point, I'm just going to have to give up on that. But for now, that's my goal. But fundamentally, code review, I think, will hopefully take the same shape.

Deployment, similarly, like, I've talked about the merge queue thing. I want to get a little bit of process in there but not too much. There is definitely some necessity for change. But I definitely want to resist the urge to change everything and to just say, like, slowly over time; we’re going to have to be a big Byzantine organization with lots of rules and standard operating procedures and all of that.

I've heard anecdotally, and I don't know if this is true, so maybe someone out there on the internet can correct me if I'm wrong, but my understanding is that at Google, they’re pretty tight in terms of what languages and frameworks can be used and what processes, and workflows, and build tools and all of that whereas Facebook, as a counterpoint, is relatively lax. Obviously, React is used very heavily on the core web application. But there's some flexibility in terms of different languages and frameworks and things for sub-projects or small individual teams having a little bit more autonomy. And I think that's a really interesting thing of are you one large, cohesive, organized company or do you try to act like a bunch of small disparate but roughly connected teams that share good ideas but can work independently? And that changes how I would think about this question.

STEPH: I really like how you're describing the addition of process. It sounds like a just-in-time process. So as you're learning that something needs to be added, then that's when you look for answers. And then you sprinkle on a bit of process that everyone agrees that feels very helpful within also the right to review and see if that still makes sense for the team.

There's one additional area where I think the lack of process really shines through in addition to the number of ways that you've mentioned is also onboarding. So if you have a very small team and you are onboarding, it's likely that...Chris, you can let me know if I'm wrong, but when someone's joining the team, there's probably a good chance that they get to pair with you at some point, or they even get welcomed by you to the team. And then, they get an overview of the product and the codebase. And there's probably this really nice session where they get to ask you questions, and then they have that onboarding session. Does that sound about right?

CHRIS: Yes. But I would go so far as to say it's not just a day or a session, but it's probably a couple of days. So yes, and.

STEPH: That's even better. And with some of the smaller teams that I've seen, that onboarding process is where they are pairing with that lead person on the team. And that's going well until suddenly that lead person can't pair with everybody. And nobody has really thought about how to streamline that onboarding or how to coach or teach someone else to be a really good onboarding pair.

And I have strong feelings about this area because we often focus so much on hiring, but then we drop the ball when it comes to onboarding that new, wonderful colleague that we've worked so hard to recruit. And at the end of that day, someone's going to reach out to them and say, "Hey, how was your first day?" And it makes a big difference for that person's retention as to how those first couple of days ago.

So I think onboarding is another really important part that when you're a smaller team, you probably don't need much process because you have more of that personable onboarding experience. But as the team grows, there needs to be more of a process to help other teammates join the team.

CHRIS: It's interesting. I think I totally agree with you that over time, there is a necessity to be more intentional and to have a little bit more structure in the process. And I don't think you're saying this, but I just want to make sure we are saying the thing that I think we believe, which is that shouldn't replace the human that helps you onboard.

Like, I still like the idea that everybody gets a pair for some amount of time when they start at a new company. And you're working together on a feature, or you're working together on bug fixes. You're shipping to production as soon as possible. But you're not doing that based on some guides in a wiki. You're doing that with another human that's helping you. There should also be guides, and a wiki, and documentation, and formalization as the organization grows but not in place of having another person that you get to talk to.

STEPH: We're just going to send you a little yellow rubber duck and then with a little Post-It note that says, "Good luck [laughs] with your onboarding process." Definitely. I agree with everything you said. It does not replace that human element where there's someone that's helping you onboard. I just see that onboarding is one of those things that gets forgotten, or we often point someone to a README which I do think is great because then it is battle-testing our README. But then there still needs to be someone that is readily there to say, "Hey, how's it going? What are you struggling with? Can I pair with you?" There still has to be that human element that is helping guide you through the process.

And I think smaller teams may forget that they actually need to assign somebody to you to make sure that you have someone that you know. Like, hey, this is who I can reach out to with all my questions. Because they're probably not going to be comfortable posting in the company channel at that point or a larger communication to say, "Hey, I'm stuck on something."

CHRIS: There's one other area that comes to mind, or I guess it's more of an anecdote that I have heard, but it speaks back to GitHub's early, early days. And they were somewhat famous for being very flat in terms of the organization and very self-organized, and everybody's figuring it out, and you're working on the thing that's most important in your mind. And for a long time, this was a celebrated facet of the company and a thing that they talked about rather publicly.

And then I think there was this collective recognition, and maybe they reached a tipping point where that just didn't work anymore. Or maybe it actually hadn't been working for a bit, and there was just the collective realization of that. But it was interesting to watch from the outside as GitHub added more formalization, more structure, more managers, and hierarchy, and career ladders, and things of that nature. And I think there's a way to do all of those things in a complicated, overloaded, heavy way.

But I think a different version of it is...like, you were using the word coaching earlier. Having formal structures within your organization to encourage people on their career path, to help them grow, to have structure around that, I think is a really difficult thing to get right. But I think it is critical, and I think just not having it can't be the answer past a certain probably pretty small size. So that is an interesting one where I think you do need to introduce some process and formalization around how you think about the group of people and how they work together within your organization.

STEPH: I agree. I think where some folks may see a lack of hierarchy; others feel a lack of support. And adding levels of management should really be focused on the outcome is that we're helping people feel supported. So even getting feedback as you're adding those different levels of management, like, hey, did we make your life better? Did we make your life worse? I think that's a great question for management to ask as they're exploring a less flat structure.

CHRIS: So, Steph, I have a question for you now on a variant of this topic. In general, we seem to be fans of having a codebase. Probably a Rails app that’s got a database behind it, and that's where you put the data. Everybody commits to that same repository. It's all kind of one collected thing. And often, organizations grow to a certain size, and they're like, this is untenable. We cannot have this many people working on this same codebase. So we shall do the logical thing, which is we will break it up into small pieces. And those pieces will communicate over HTTP, and it will be great because then our teams can be separate from each other and can manage their little piece of the world. What do you think about that? Is there truth there? Is it not true at all? What do you think?

STEPH: All right, so your team is getting too big, and to the point that you feel like you need to split it out so then you can have small teams, and they can all work independently on different parts and services of the codebase. I don't love the idea. I'm trying to think through because I feel like there's a lot of nuance here. But I don't love the idea that that's the driving force as to why are we making the change?

And that is often a question that comes to mind whenever we are making a big change, either architecture or process-related is like, what's driving this? And then how are we going to measure it? And if we are driving it just because we have a large team, let's talk more. Why are people blocked? Why can't people work together? What's preventing people from being able to contribute to the same codebase? Are people blocked for a long time because they're having to wait on someone else to complete that work? I have a lot of questions that I don't know if I can fully answer your question. But my instinct is to say let's not break up the architecture just because our team grew in size.

CHRIS: Yeah, I think I definitely agree with that. There's probably a breaking point where it's just too many individuals, and there'll be too much contention. But I think resisting that or at least naming that as like, okay, that's what we're saying but is that really what's true? Or are we actually feeling that this system is so deeply coupled that there's no way to change some small piece of the code without impacting other parts of it?

Like, is the CSS completely untenable because we're just using global class names, and it's leaking everywhere? Okay, do we need a different solution there? And then it's actually fine. We don't need to have different services that have their own different style sheets. We just need a different approach to CSS. That's a particularly easy one to go for because there's inherently a global namespace there. But the same thing is true in a lot of different contexts. So services are a way to break things apart and enforce those boundaries. But if inherently coupling is your problem, then you're just going to be coupled over HTTP, and I think it's going to be difficult.

There's a wonderful blog post by Josh Clayton, which I think does a better job than I'm doing in this moment of highlighting some of the questions I would want to ask. The blog post is titled Services are Not a Silver Bullet. And so Josh goes through and enumerates a bunch of the different versions of the story that he's heard throughout the years of well, we need to go to services because x, because our test suite is slow because pull requests are constantly having merge conflicts and whatnot, because the code is very deeply coupled and any change here affects everything else. And a fix over here broke something over there. This is no good. And so he does a really good job of presenting alternatives or at least questions that you can ask to say, like, is this the problem, or is this a symptom? And we need to address the more underlying cause.

And so I think there is a point where you just can't have 1,000 people trying to commit to the same Rails codebase. That feels like it's maybe too big. But it takes a while to get to 1,000 people. And there will be times where extracting a service makes sense or integrating with an external service that exists. Like, I've talked about Stripe before as my canonical like, yeah, it's actually deeply intertwined with the data model, but they're just dealing with such a distinct complexity set over there. And they have such expertise on that that I'm happy to accept the overhead of the fact that that service lives outside of my core application, and I need to deal with synchronizing state and all of that. I will take on that complexity, but it's not worth it for everything, and it's not a silver bullet. Again, to reference the name of Josh’s blog post there, Services are Not a Silver Bullet.

And so, coming back to Edward's original question, I would say that having a monolithic codebase works for a really long time, but there is probably a breaking point somewhere well along, but fight it for as long as you can. I think.

STEPH: I really like how you touched on coupling because it really helps ask those questions to get to the heart of what are the pain points that you are feeling? And it is less of a decision that is based on people and process but more if you're going to split out a portion of your architecture. It is in response to an actual business need and a business value versus some other pain points that you're trying to fix.

A particular example might be like maybe you have a portion of your application that really just needs to spend a lot of time crunching data. And it's really not as specific to your application; it's something that can happen on its own. And then it's beneficial to move that outside so it can scale and relate it to the work that it needs to perform versus keeping it in-house with the application.

I do want to circle back to another question that Edward included which is what's a process that is resilient at all sizes? And the ones that really come to mind for me...and these are a bit amorphous intentionally because it will look different for each company. But three areas that are very resilient at all sizes, whether you are 1 to 2 employees versus you've got hundreds or thousands it's communication, testing, and accountability.

So communication, where are we headed, and how do we know what we're working on? For testing, it's how do we test our changes? Do we write tests? Do we use QA? Do we have a staging environment? What does that look like? What's our parity between staging and production? And then how do we know what's in progress, and how do we know when it's done? Those are three core areas that, regardless of your team size,,I think are very crucial to the team success. What do you think? What are some of the processes that are resilient at all sizes?

CHRIS: I actually really like the list that you just provided. That is a wonderful trifecta, and I think it will take you very far, so probably not much to add from me. But I guess on that note, should we wrap up?

STEPH: Let's wrap up.

CHRIS: The show notes for this episode can be found at bikeshed.fm.

STEPH: This show is produced and edited by Mandy Moore.

CHRIS: If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review in iTunes, as it really helps other folks find the show.

STEPH: If you have any feedback for this or any of our other episodes, you can reach us at @_bikeshed or reach me on Twitter @SViccari.

CHRIS: And I'm @christoomey

STEPH: Or you can reach us at hosts@bikeshed.fm via email.

CHRIS: Thanks so much for listening to The Bike Shed, and we'll see you next week.

All: Byeeeeeeeeeee!

Announcer: This podcast was brought to you by thoughtbot. thoughtbot is your expert design and development partner. Let's make your product and team a success.

The Bike Shed Follow

314: Communication, Testing, and Accountability

The Bike Shed