ALTERNATE UNIVERSE DEV

Serverless Chats

Episode #13: Managing a Serverless Engineering Team with Efi Merdler-Kravitz

About Efi Merdler-Kravitz

Efi is a software expert and currently the R&D director at Lumigo. Over the last 12 years, he has been working as a developer, team leader, group manager and director in the healthcare, mobile, security and agriculture industries. Recently, Efi has been working on developing serverless applications and building tools to make serverless easier.

Transcript

Jeremy: Hi, everyone. I'm Jeremy Daly, and you're listening to Serverless Chats.  This week, I'm chatting with Efi Merdler-Kravitz. Hey, Efi. Thanks for joining me.

Efi: Hey Jeremy. Thanks for having me.

Jeremy: So you are the R&D director at Lumigo. So why don't you tell the listeners a little bit about yourself and background and then what Lumigo is up to?

Efi: So as you said, I'm leading the R&D of Lumigo and I've been working on pure serverless applications for the last 2.5-3 years. And on a personal note, I think it's the best technology decision that I ever made. So a couple of words about Lumigo. Lumigo is a SaaS platform for serverless monitoring and troubleshooting. Basically, Lumigo connects to your AWS accounts and alerts when things go wrong and then tells you the entire story of the requests that lead to that issue so you can quickly get the root cause. Let me elaborate a little bit. When you break your application to small pieces following the microservice architecture, it becomes very hard to debug your application, by the way, both in production and in your local dev environment, and it becomes especially hard when using async components like SNS, SQS, Kinesis, etc. And as someone who worked with serverless extensively before, we understood the challenge here at Lumigo, and therefore we developed the platform to help developers like us to understand the environment quickly when something goes wrong.

Jeremy: Awesome. Alright, so we've had a couple of shows so far where we've talked about observability, and we've kind of gotten into all of that sort of stuff that Lumigo, I think, as a product does, which is really interesting. But I actually want to talk to you about this idea of managing a serverless engineering team because one thing that's kind of unique, I think about Lumigo is even other companies that are working on serverless products, they're not entirely serverless themselves. And Lumigo is. You pretty much manage an entire serverless engineering team, right?

Efi: Yeah, we are 100% serverless from deployment, packaging, monitoring. Everything is serverless. We don't use any physical or virtual servers in our back.

Jeremy: Awesome. That's so cool. So alright. So you’re a manager. You've been doing this for a very long time. You've been managing engineering teams. And so I really want to get into this idea of what’s sort of different about managing a serverless engineering team versus managing a traditional engineering team. And I know maybe some people are thinking, “Well, you know what's the difference?” But I think there is. And I think you think there are some differences. So maybe we start first by what we have to do to sort of move our team to serverless. So if we've got, for an established organization, you know, it is great to be a greenfield startup and be exploring new things. But most companies are not. Most companies are established companies with legacy systems and so forth. So how do we go from you know this idea of taking a team that's used to working with all these different services and EC2 and containers or whatever, and moving them to something that's a lot more serverless. So maybe we start with that. What's the first step to getting people to move teams to serverless?

Efi: Great question, Jeremy. So I think that the best advice to any new beginning, especially in the technology world, is to start small. Try to taste the technology before jumping headfirst. So what does it mean in our case? First of all, I think you should ignore buzzwords. You hear a lot about new cool services. For example, AWS have dozens of ways to save data, where you have the ability to run machine learning on it. Try to use simple, trusted, and well-documented services I think services like API Gateway, DynamoDB, S3, Lambda, of course. These are the services that are the building blocks of any serverless application in AWS, and there's a good chance that you use at least one of them in the final solution. Try to use the simple version of services. What do I mean by simple? For example, in AWS, you have six or seven services that provide queuing capabilities. There's a good chance that choosing SQS, at least in the beginning, when you start is good enough. Avoid more complicated services like Kinesis. And you know, in the end, what they say, no one was ever fired for choosing SQS. So I think it's a good choice. And read and learn. Many people think that serverless identical to previous technologies that they use. And sometimes they forget that serverless is not only a new technology but also a different way to approach development. There are many great blog posts, newsletters, and, of course, for specific AWS services, read the AWS documentation. They are a great source of information. Choose a good framework that will help you with the transition. There are many good ones like AWS SAM, the serverless framework, Chalice or Zappa if you work for Python, and don't forget other practices that you used in the past. If you used .Node and Express the past, then AWS released, I think a couple of years ago, a framework called AWS Serverless Express. It's a framework that was released by AWS to help you in the transition. So if you are familiar with Node and Express, it will help you in moving to the serverless world. For example, in the Python world, if you use Django or Flask, then you can use Zappa, which I think is a great choice for the transition. So you don't need to change your methodologies, at least in the beginning because eventually I think, that these frameworks, for example, the framework AWS provides, and the framework that Zappa provides in the end, don't provide the code that should be running in the serverless environment. But at least in the beginning, they remove a lot of overhead from your head.

Jeremy: Right? And so what about the transition? I mean, do you just transition the whole team right away? Or do you pick like a point person to do something like that?

Efi: I think that especially serverless, which is something that is just very new, I think you should lead the way as a manager. As I said before, serverless is not only technology. It also shapes the way you develop software. Therefore, you as the leader has the service responsibility of how your software team works and delivers. You need to master the tools of the trade. So before giving it to anyone in your team, make sure you sit down and learn it on your own. Again, as I said earlier, read the blogs, do the tutorials, read as much information as possible before moving the entire team or transitioning the entire team.

Jeremy: I think also, it's one of those benefits of being an engineering manager where you get to try out some of that cool stuff first, before you let anybody else use it.

Efi: Exactly, exactly. You know, most of the time you do the boring stuff, and now you have a chance to start something new.

Jeremy: Exactly. So how do you then introduce it to the team? So if you as a manager are going through and learning some of the basics here, I mean, obviously you can lead the way, but this is why we have teams, right? Because teams can really go deep and and start implementing those things which you might not be able to do with your busy meeting schedule. Right? As an engineering manager. So what's the best way? How do we introduce that to your team once you sort of feel comfortable with it?

Efi: I think that the best thing about service is that it’s a cool technology, it’s a new technology, and I think the moment you introduce it that way to your team, it will make your life a lot easier. So make sure it looks cool. Show that it saves time in deployment. Show that it's very easy to configure new components and show them that you can easily and quickly deliver new code. And in the end, you know, developers hate configuration and the moment you save them the hassle in configuring new Dockers and configuring new services, but all they have to do is just write the code and let it run magically. The moment they'll see it, you know they get hooked up immediately.

Jeremy: How do you impart that knowledge on to your team? Do you do formal training or is this something that you just encourage they explore on their own, for example?

Efi: Yeah, it's a good question, and I think it's a general question. I don't think it's related only to serverless. So what's the best way to learn new technology? So in the end, it's a personal taste, and I think each of the managers and developers choose their own path. So my own personal taste is to do it together, planning together. First of all, I think it's great for bonding. You know, developers usually walk alone, knowing their own environment on their own laptop, putting headphones on the ears, listen to music and most of the time they don't interact with each other. So it's a good way to bond, you know, to talk with each other. It makes it very easy to ask questions. And the best of all, I don't think you're interfering to each other because you're learning together. So it's not like you are learning right now, then someone in the middle of you know, off his own task, you ask him a question, and you need to stop him on what he is doing right now. I think that two good resources that I recommend on learning. So, first of all, there is the official AWS tutorial on serverless. I can share the link at the end of the recording. And I think there's a very good tutorial by serverless-stack.com, a tutorial on how to use serverless code-wise and how to use the various tools. And each of my developers, you and the team, the day they arrive, must actually go over these two tutorials before they actually start code for the first time.

Jeremy: So when you do that, it sounds like it's sort of a combination of both, you know, that you would let them sort of go through some of these tutorials. But is that something you do kind of get everybody together and do like a formal training though?

Efi: Yeah, and again, I think it depends when a new developer arrives or when you as a team begins to learn serverless. So I think when you as a team, as a whole team starts to begin to learn serverless, then I think it's better to make it formal, see together and learn together. But after the team gathers up enough information, enough knowledge and now a new developer joins the team, then you won’t gather the entire team again. And then you know you have a set of links that each developer can go over alone.

Jeremy: Alright? And in terms of especially like with the new developer coming on, or just I guess, the team learning in general you know, sharing information between the team. Like how do you recommend doing that? I mean, do you still put stuff in Wikis? Or should you be using something more real-time?

Efi: Actually, that’s a good question. By the way, I just want to add a lot of people forget that many developers jump from on-premise development to serverless development, so there's also another gap that they need to learn or to jump over. This is not only serverless, but also cloud computing. So to many of the developers who have joined my team, one of the first things that they do is also play with the AWS environments on its own. Create new resources, you know, create an EC2 instance, many of the developers see it for the first time. So remember that. So jumping straight to serverless, sometimes it's not the best way. Make sure to move along the path that will allow developers to easily transition to the cloud computing. Now, regarding your question, I think that sharing is very important and always share, you know, we here at Lumigo, we use Slack or in your case, any other tools that you prefer, and we have a weekly meeting where all the developers gather together and we define the agenda before the meeting. And each developer shows a new serverless material that he’s learned in this week. So serverless, it’s a great — there are a lot of new things to learn in serverless on a weekly basis. There's a very low chance that we'll repeat the knowledge that you’ve learned from previous weeks.

Jeremy: Right, so you would repeat a lot of that stuff. But what about sort of capturing that? Do you do internal documentation for that? Like using a Wiki or something?

Efi: No, we have only Wiki online to public material on AWS or to any other good tutorials that we find. But we don't have any Wiki on, you know, on the new stuff that the new developer learned last week, mainly because serverless is such a dynamic field, a dynamic technology. So writing something in a Wiki will make it stale in a matter of a couple of months.

Jeremy: Are you telling me that my Wiki page on how to install SQL Server 2000 is out of date?

Efi: Yeah, unfortunately.

Jeremy: So what about best practices, then? I mean, so if you're not, I mean, it's kind of hard with serverless because you throw the term “best practices” out there, and I like to think about it to say I'm not sure if they're are best practices, just anti-patterns right now. Like things we try to avoid. You know, so codifying those, I think, is somewhat important, at least between the team, so everybody's doing things the same way. So how do you do that? Do you establish best practices for a team?

Efi: Yeah, it's a great question, and I think you can also divide it into best practices in actual code. What's the best practice in using DynamoDB, or S3, and the best practices to behave as a team, as a serverless team. So I think serverless promotes end-to-end process, which means that developers takes full responsibility from the design phase after the production phase. So, for example, in Lumigo, developers are responsible to the product. You know, we are eating our own dog food and you know what bothers us the most, so we're the best candidates to develop our own platform. And developers are responsible to the quality, making sure everything works as expected, and we're putting a lot of emphasis on automatic testing in the deployment. One of the, I think, one of the benefits of serverless is agility, but you need a set of tools to help you get the most from serverless. Just using DynamoDB or just using Lambda is not enough. You need all the tools behind the scenes that will help you make the most out of it. And I think that developers, in the end, are also responsible to production and making sure that what they develop works as expected and is being used by our customers.

Jeremy: Right, yeah. And I actually really like that philosophy, too, where I feel like when you’re developer or when you were a developer the past, you'd write a snippet of code, you know, maybe you would write the test for it. Maybe there's somebody else writing a test for it, but then it would go to QA. Someone would test it. You throw it over the wall, some Ops person maybe puts it into production, and then maybe at some point it comes back to you. But I like that full ownership of it because one, you can see things in production very quickly. And as long as you follow good practices for security and things like that I think it's a really, really good way to keep developers motivated too to come to sort of see and own, you know, that entire process. So I do like that. Alright, so let's say we've done this now. We've introduced our team to it. We’ve got a good knowledge base going. We’re sharing information, you know, we're writing serverless applications. So now how do we run this team on a regular day-to-day basis? And maybe let's start, actually with growing the team, because this is sort of those funny things you see where someone's looking for a serverless developer with 10 years experience, which is kind of hard because service hasn't been around that long. So where do we start? Who do we look for when hiring for serverless teams?

Efi: Yeah, yeah, it's a good question, and as you know, as a developer, if you see someone post the job with 10 years  as a requirement, that means it is not serious. So that's a great question. And I think it's very hard to find someone with a lot of serverless experience, and it really resembles the early days of mobile app development. Now, I believe that in the future a lot of developers will have this experience, but right now they don't. But I think that in the end, a good developer is a good developer. It doesn't matter what technology they use. But I think that experience in the technology, in the methodologies, that enable us to use serverless to serverless are a good plus. So I think things like experience in the cloud that you're using, either it’s AWS or Azure or Google Cloud. I think that experience using serverless components like S3 or DynamoDB. I know many developers that I don't know, wrote code that runs on EC2 but broke quite a lot with S3. So I think it's a very good experience. I think knowing agile processes, like a continuous integration continuous delivery because serverless is very agile and promotes end-to-end the ownership, the knowing how the entire process works is very important. And I think automated testing. A developer in you know, in our era needs to know how to write tests, and needs to know how to write them good.

Jeremy: Right. Yeah. And I think you know, the other thing is that beyond just knowing how S3 works, for me anyways, I've always been looking for people who understand distributed systems or at least have some knowledge around you know what happens when something fails and things like that. What are your thoughts on that?

Efi: I think it's a good point. I think you know, that's one of the ways that I personally test developers that come to the team. I think that a good exercise is to let them to design distributed system. For example, try to design a system that mimics the mechanism of Lambda. So say you're on the Lambda team, how would you design and build something that's scales indefinitely? Or a system that runs multiple process, that tries to deliver a message from one place to the other? I think that serverless is about scaling. So you also need a developer to think about how to scale the design, and how to scale the system that they built.

Jeremy: Right, and one of the things that I found interviewing, especially people graduating from college, is that they don't have a lot of distributed systems experience. And one of the questions I tend to ask them is, “Alright, what happens when your database isn't big enough anymore?” And usually the common answer is “We'll just get a bigger database,” right? Like, but eventually, if you can get them to answer the question, well, maybe I could put some of my data on this database and some of my data on this other database and spread the data, right? If they can start thinking about sharding and things like that, I think that's really interesting. And then also a lot of people — and this is really scary, in a sense. I mean, I'm not sure what they're teaching it in some of these colleges — but you know, this idea of like even connecting to an API, it's always happy path, I think, is what you get from a lot of developers. And so if you say to them well, what happens if the API doesn't respond, right? And if they say, “Well, try it again.” Okay, that's the first step, and “Try it again?” Alright, what happens when you can't keep trying again? What's that failover? How do you build in that resiliency or at least thinking about? And I think that's always a good sign if people can come back to you, give you answers like that, then I think that's really interesting,

Efi: Actually, that that's a very good point. And I think that in the end, developers don't need to mention the right terms. So even if they don't say “shards” but they mean, okay, like you said, “Hey, split the data between various databases,” it shows thinking in the right the direction.

Jeremy: Right, and curious people, I think make the best programmers — people that are willing to just, they want to learn something new. So all right, let’s move on a little bit past now we've hired. Let's say we can hire some good people. There’s probably some training in there and so forth. We kind of talked about that. But what about you know, this day-to-day work in serverless? Is there, let's say we're working with cross-functional teams. That's something that's very popular in agile environments now. You have a product manager. Do things like the granularity of user stories change? I mean, now that we're building much smaller components, do we need to get that detailed and say we need five functions as opposed to we just need to solve this user story. I mean, is there a difference there?

Efi: No, no, I think it's the same. I think the moment you move to microservice architecture, I think the way you think of all the products stays the same. Nothing changes.

Jeremy: Alright, and that I think that's probably music to product managers’ ears, right? So that they don't have to learn anything too new.

Efi: Job security.

Jeremy: Right. There you go. Alright. So what about tools? Because I mean Lumigo is a tool, obviously. But that's sort of a monitoring, you know, debugging sort of after, in-production tool type thing. But in terms of tools to help you build the services - you mentioned frameworks - like how do those come in?

Efi: Yeah, it's a good question, and I think, you know, first of all, when they teach us in college about encryption, they always tell us not to write our own encryption algorithm. Use something that is ready. So I think that the same thing applies to serverless tools. There are a lot of tools today that enable you to package, to upload, to deploy your code. You have tools today that help you to monitor, and debug. Use them. Don't write something on your own. Don't waste your time on it. And I think one of the first things that you need to learn is to learn tools like AWS CloudFormation or Terraform. These are the tools that enable you in the end, that these are the basic tools, these are the building blocks that enable any serverless packaging technology to deploy your code, to deploy your various sources. So no matter what serverless framework you choose, either the Serverless framework, or Chalice, in the end, behind the scenes, everyone is using either CloudFormation or Terraform. So I think it's very important to learn the best building blocks, and I think you need to learn how to automate your tools, automate your testing. So use good testing libraries like Pytest or Jest and there are many others that are very good. And also use serverless plugins to test some of your flows locally, like DynamoDB or API Gateway. I think that Bash scripts, or scripting, depends on the OS that you use...

Jeremy: So let me… Sorry to interrupt you, but I want to go back to the testing locally thing, and we can talk more about that. But the mocking libraries and so forth, and maybe we disagree on this, and that would be great. But I'm thinking, you can emulate DynamoDB locally and certainly the Serverless offline plugin that allows you to run, you know, the end points, I think is great locally because that we don't have to publish them, and you can make changes quickly, and that works really well. But I think interacting with some of the cloud native resources like a DynamoDB or an SNS and SQS, from a local development standpoint, I feel like it's better to interact with the cloud services of those, and I get unit tests, right? Doing stubs, you know, or some sort of mocking, maybe for unit tests, I think makes a lot of sense. But, I mean, you know, how far would you go with these local mocking libraries?

Efi: Yeah, that's a very good point. I think it's a painful point right now in serverless, in serverless testing. And I think the only thing that I can say right now is that testing locally just as you said won't give you the quality that you're expecting. In the end, local testing will give you a certain amount of validation on your code. But I think that the best way to increase your testing velocity is to give your developers that build it to run their code easily and fast in the cloud. That's the only way to actually test and make sure that the code that you wrote is working.

Jeremy: And are you a fan of giving each developer their own environment?

Efi: Yeah, in the end, that's what we're doing here in Lumigo. And again, I think it really depends on the size of the team because although serverless is supposed to be a pay-as-you-go. But there are various components in serverless, like Kinesis, that even if you didn't use them, you’re still paying them. So if it's a large thing, you need to think of a better way to control your costs. And I think you know, I think it's a different discussion, a discussion about costs. But in a smaller team, I think you can give to each team member its own environment and give each developer an ability to easily deploy the code to his environment.

Jeremy: Yeah, I love that model because then you just have, they're not messing up anything. They're not even messing up the dev environment, right? They’ve just their own sandbox that they can play around with. Alright, so let me go back for a second to the frameworks, right, because you said CloudFormation and TerraForm is usually what happens under the hood. And I totally agree with you on that. I think that's a good point, because what you have, even with the Serverless Framework or with SAM, which are the ones that I'm most familiar with, you know, if you want to create an SNS topic or a DynamoDB table or something like that, and include it in the SAM template.yaml or the serverless.yaml file, you're still writing just straight CloudFormation in order to make that happen. So even if you know how to do a function, you know, the functions in the events and some of those other things that are super handy and easy to do, you know, they've got short hands in those different frameworks. If you want to get a little bit more advanced, yeah, knowing that other stuff is absolutely imperative. And again, I'm monopolizing most of your time here, I know I'm doing a lot of talking, but the Bash script stuff is another thing where I know it's a really low level, or it seems low level, but there's so many things you can do with Bash scripts that a little tiny script here, especially as part of your CICD process or your testing process, you know, makes a huge difference to know that. So I'm totally in agreement with you on that. So did you have anything else on Bash scripts or?

Efi: I think that again, Bach scripts, or scripts in general, you know, the group. So you need to learn how to group things together. And the only way to do it is to learn scripting.

Jeremy: So, yeah, so then the other thing too in terms of tools — and I know there are some tools that do this now — but I guess this goes back to certainly CloudFormation, this would be very much so specific to AWS. I guess other services as well. But understanding the IAM permissions and roles and things like that because, you know, I just had a conversation with Hillel Solow about serverless security, and we were talking about the least privilege principle and things like that to make sure that we're not opening things up too much. But that's a tough thing to do to one, learn all of this stuff, but then to enforce it. So do you have a way or do you use tools to enforce IAM permissions?

Efi: Yeah, that's a good point. We don't have any automated tools to enforce it, but it's something that is very important to remember. Security is important. It doesn't matter if you use serverless again or any other development paradigm or any other technology, and I think it should be part of the development cycle. And when working with AWS, understanding how IAM roles work, I think it's crucial because otherwise the security will be partial at best. And I think that you know, the term that you mentioned, the least privileges, that it's something very important — an issue developer should learn as part of his welcome to the company. But again, just to show you that security is not only IAM, for example, in AWS in serverless when using S3, so making sure that your S3 buckets are not public, making sure that they are private. So again security is not only IAM. In Lumigo, the security phrase is being done through the code review. So during the code review, we have a checklist that needs to be passed. So one of the checklists is security. So we do ask the developer, “Why did you add this kind of IAM permission? Why are you using it? Can you reduce it to something lower?” So the developers need to answer “no” to answer these questions before they can actually take the code and merge it to production. But by the way, I think also it's a good time. I want to go with you over our flow here at Lumigo just to understand the development flow that we do here at Lumigo and how everything is working together. So we have our task in JIRA and the moment that developers set a task, he opens a branch in GitHub. We're using a GitHub flow, which means that the end, each branch that is merged to master actually is being deployed to production. And so the developer creates a branch, write the code, test the code either automatically overriding automation. He has to test the code on its own, on their own AWS environment. They do a pull request. They do a code review. There are a lot of automatic gating that we do. Again, the word automatic is very important here. And so things like linting, unit tests, integrated tests, static analysis, and if the code review passes then it merges to master and we have an automatic continuous integration service, specifically we use CircleCI and we push it to our monitor environment and then to production. And in the end, the developer self-monitors it through the Lumigo platform. And again, pay attention. It’s very important that the developer is responsible to the entire cycle, you know from the product, from writing the code, writing the testing and monitoring and production.

Jeremy: Awesome. And I really like the code review process that, you know, you have this sort of checklist of things that you have to do. Now, I know some companies that are very, very good with this. I know some startups that are not so good with this. So I know you guys are still a relatively small team, and that's great to enforce those those policies earlier because I do see those possibly breaking down because you still have that human element. And maybe in the future we'll have some better tools that will automate all of it for us. Okay, great. So that's awesome. And I think that the outline of what your process is is really, really helpful. So let's get you know, so now if we arc this story here, we moved our team to serverless. We've been running serverless for some time now. Now what about things like roles and specializations in serverless teams? Because some of the AWS services or services in Azure or Google Cloud, they could require a specialist in and of themselves, right? DynamoDB, designing DynamoDB tables, understanding Kinesis and some of how those things work. Athena. QuickSight. All of these tools that are very complex in and of themselves. Is that something that you want to start doing, is shifting some of the responsibility to individual users so that they can sort of go deep on one service and sort of broaden the knowledge for the entire team?

Efi: Yeah, I think it's a tough question. In the end, I think it depends on the size of the team. As a rule of thumb, I think that for small team, I don't know less than 10 developers — it's a ballpark doesn't have to be, it can be also 15 but it's on your personal feeling — everyone should know everything. I think that it's when the team is small, it’s a chance for the team to learn serverless together. You know, when the team is very big or when the team grows, it becomes very difficult to share knowledge, to gather together and to talk together about the various problems. So when the team is small, it's a good point in the life of the startup or in your team to create a core knowledge of serverless. So as the team gets bigger, so I think you need to start to specialize. And always remember, I think you need to always remember to have redundancy. You know, making sure that at least you have two developers that know how to use resources.

Jeremy: That's a really good point. Yeah, the redundancy aspect of it is huge. But sorry. Go ahead.

Efi: Yeah, and I think that some services just like you mentioned — DynamoDB, Kinesis — are difficult, difficult to master. And they change all the time. And I think you should share knowledge on these services with a couple of developers, even on small teams. So in small teams, most of the developers or all of the developers know how to use DynamoDB, let’s say, in the general level, how to write a DynamoDB, how to read from it, but, for example, how to design an index, and how many shards should I have for Kinesis? I think this is kind of speciality that, I don't know, maybe one or two people in your team should know as deep as possible and the others should ask them questions.

Jeremy: Yeah, yeah, that makes a lot of sense. Alright, so let's talk about maybe the day in the life of an engineering team. I mean, you mentioned your workflow, and you kind of gave us that, but does anything change in sort of the methodology, the way that we approach software development?

Efi: Yeah, I think it's, in the end, it’s very similar to teams that use microservices. Again, it's full ownership, product, code, testing, deployment, production. I think there's a very major change from, you know, from using serverless to using other microservice technologies, the cost. Think we started talking about it earlier. So people need to understand costs. Developers need to understand cost. It's part of their development site.

Jeremy: How important is cost, right? I mean, you take larger organizations. I know there's a lot of jokes about you know, my serverless infrastructure cost us $30 a month or something like that, but that applies if you're maybe using, you know, just Lambda and API gateway or something. But add Kinesis in there. Add DynamoDB. Start adding some of these other services and get some scale, right, and then all of a sudden cost is an important factor. And if you're calling the KMS API too many times or the Secrets Manager API, you know, things start to add up. So how much time and energy should developers be thinking about costs, or be spending thinking about costs?

Efi: I think that people, you know, people that come to serverless for the first time sometimes forget how easy it is to scale serverless. So in a matter of minutes, you can easily get a hundreds and thousands of Lambdas running simultaneously. Millions of requests to the DynamoDB. And in the end of the day, you suddenly see a bill of a couple of hundreds of dollars, and you ask yourself, “What?” So I think it's very important what you just mentioned. So, for example, in Lumigo, we have cost alerts in each of our environments. Both our dev environments, have cost alerts so developers know if they use their resources too much. I as a manager, I check the cost on a daily basis, and I'm trying to understand the trends. I use the Cost Explorer in AWS quite a lot. In addition, we also use our own tools. We have our own monitoring tools which also gave us a cost breakdown, and I think again, part of the code review is part of the checklist that I mentioned earlier. We ask the developers, why did they choose, for example, this amount of memory for this specific Lambda. Or why did they add another index to DynamoDB? Each index costs more money because you are duplicating the data. And for example, while they are using Kinesis and not Firehose. So there are many questions we ask along the way when doing the code review. Again, it's not something that can be done automatically, something that people need to see the code and understand what's going on. But you ask the questions in order to make sure that developers understand the trade-off, in order to understand that it costs a lot of money. And you know, especially for startups, where money's always tight, suddenly paying thousands of dollars per month, it's dangerous, can be really dangerous. So it's not only “Oh no, we'll use the corporate credit card.” It can be really dangerous for the startup. So you need to pay attention to it.

Jeremy: Yeah, and I think too that's one of the things that I know I always did. I mean, I've worked in many, many startups. So even before serverless, cost was always a factor in optimizing the solution and I think that is something you can do. Like you said, Kinesis versus Kinesis Data Firehose. Some of those things have variable costs as opposed to fixed costs for shards and some of these other things. So I think that's something you build in early. You don't need to worry about maybe premature optimization, but I do think that if you say well, look, this is going to be $5000 a month, and this is going to be $500 a month. If there's a way for you to see that up front which, in most cases, I think there are, you can do some good estimations. Yeah, that's definitely definitely something you should be paying attention to.

Efi: I agree.

Jeremy: Alright, so what about the overall responsibility of the team, right? So smaller teams, we talked about this a little bit more in the beginning, you know, Ops teams, security teams, all kinds of teams that do things in the modern cloud, and do things in modern organizations. Where does the overall responsibility change for serverless development teams?

Efi: Yeah, you know, I have never talked about security teams because I think that security again, you know, especially in today's world, with security so prominent, I don't think that developers can specialize in security. They need to know security. They need to know how they write the code by the things that — security needs someone who specializes in security. Now it depends on whether you're in a big corporation or in a startup, whether you want to hire someone who specializes in security or doing some kind of training for developers to be security specialists. So that's a different question. But I think that security is a different role. But you've mentioned also DevOps, and I’ve mentioned also on a previous note,  the QA. And I think that DevOps and QA, today in the serverless, are actually one role. It's developer, DevOps and QA is the same person, is the same developer who is doing everything. And I think that in the end it produces a better product because it's a developer. A developer knows how to test his code. He knows how to write the testing in order to think about all the various edge cases that might appear, either the developer or doing the code review with the other developers. But I mean the developers themselves, and not someone who is external to the development process. The same thing about operations. I think that again, because serverless gives you the ability to deploy your code very easily, especially the tools today, I don't think there's any need to have a separate role for it. The developers can do it, and with the monitoring tools that you have and the monitoring that AWS provides, I think that developers can do it. They don't need someone to do it for them. Of course, I'm not talking about customer support and things like that, that probably will require a different world. But I think that the day-to-day in monitoring and making sure that everything ticks as expected, I think that developers can do it.

Jeremy: Yeah. And I think I think like you said, the idea of owning all the way through QA into production is really interesting. And, I mean, maybe this even goes back to the cost optimization thing when I'm writing something in serverless now, you know, I might be building a couple of Lambda functions. I'm interacting with Dynamo. I'm doing some of these other things. I spend a lot more time thinking about the design of the application and how it should be built, than I do actually writing code. Like a day that I write only a few lines of code, but I've launched something that is production ready, you know, is a pretty good day, so we certainly don't measure — you know, the less lines of code, the better, in my opinion. I think that's shared amongst quite a few people. Alright. Great. So let's kind of wrap this up maybe, because we've been talking for quite some time, but I'm fascinated by this conversation, so I think I could probably talk to you all day about it. But maybe you could just give us some general advice based on your experience, some general advice for engineering managers that are starting to manage serverless teams.

Efi: Yeah, I think use the serverless benefits. So move fast. Test serverless ideas. Add new features quickly without getting bogged down with provisioning problems. Again, it's something that serverless, the cloud provider, gives you. I think if your code, if your application is monolith, again, we start with an existing thing. Yes, according legacy code, your application is monolith, I think you should start breaking it into smaller components and you don't need to break everything. You can start with breaking the peripheral stuff like report generation, email services, Slack alerts, and all kinds of services that are not the core, and slowly but surely start, you know, eating parts of your monolith code and responding to serverless. Again, I am returning back to the original question that you've asked me. Don't use the buzzwords. Use services everybody uses and move slowly.

Jeremy: Right, yeah. And I think that that's a really good point. I mean, it's starting simple.

Efi: Exactly.

Jeremy: There's no reason to launch something with a very complex, you know, multi-connected EventBridge with Kinesis in there. And yeah, or SageMaker. Like these things get complicated or can get complicated pretty quickly, so alright. Well, that's great advice. Listen, Efi, thank you so much. This has been absolutely awesome. I think I've learned a ton. Hopefully, the listeners have learned a ton. So maybe you can tell people how they can find out more about you and about Lumigo.

Efi: Yeah, sure. So I'm on Twitter @TServerless and you can find me also on Lumigo.io. I write blog posts over there. And of course, you can contact me by email efi@lumigo.io. I always like and love to help others, especially in the serverless world.

Jeremy: Awesome. And then Lumigo’s Twitter handle is just @Lumigo.

Efi: Yeah. Yep.

Jeremy: Perfect. Okay, awesome. I'll get all that into the show notes. Thanks again.

Efi: Thanks.

Episode source