ALTERNATE UNIVERSE DEV

Serverless Chats

Episode #89: Serverless in a DevOps World with Sarjeel Yusuf

About Sarjeel Yusuf

Engineer turned product manager, Sarjeel Yusuf is greatly interested in how the move to cloud computing and the rise of DevOps is revolutionizing the way we manage and release our software systems. Ex Thundra, and currently at Atlassian, Sarjeel is focused on bringing DevOps enabling solutions from the perspective of incident investigation and resolution in Opsgenie. By leveraging his past experience in Serverless monitoring and debugging at Thundra, he believes that there is a great opportunity in how serverless can unlock the potential of DevOps teams. 
In his free time, Sarjeel loves to write about new advancements in the fields of serverless, DevOps, and more recently, product management strategies. His writings can be found on his personal medium account as well as other publications. He would love to get in touch with anyone who would love to brainstorm ideas in pushing existing technologies to build amazing products. 

Watch this video on YouTube: https://youtu.be/T7eUUUBRZQQ

This episode is sponsored by Epsagon.

Transcript

Jeremy: Hi, everyone. I'm Jeremy Daly, and this is Serverless Chats. Today, I'm joined by Sarjeel Yusuf. Hey, Sarjeel, thanks for joining me.

Sarjeel: Hey, Jeremy, thank you so much for having me. I just want to say it's pretty exciting to be here. I've been watching the show for quite a while now, and it's just exciting to be here with you and talk about everything serverless, I guess.

Jeremy: I'm excited to have you here. So, just to introduce yourself. So, you are a product manager at Atlassian. So, I'd love it if you could tell the listeners a little bit about your background and what you do at Atlassian.

Sarjeel: Sure. So, yeah, as you've mentioned, I'm a product manager at Atlassian. Actually, a very new product manager. Just a year ago, I was a software developer within Atlassian, within Opsgenie, and now I'm a product manager at Opsgenie. So, I made the switch to product management very recently, actually.

And so, for those who don't know what Opsgenie is, Opsgenie is basically an on-call incident management tool. It allows you to route your alerts to the right person, make sure that everybody is aware of incidents that may occur. And it helps you all the way from incident awareness to incident investigation and retribution. And my specific role at Opsgenie is basically helping DevOps practicing teams to better their entire DevOps flow, especially considering incident management in the DevOps pipeline.

Jeremy: Right. So, that's actually what I want to talk to you about today, is just about DevOps. It's such an interesting discipline. And as teams sort of evolve and start using the cloud, it's almost like it's sort of necessary, I think, in order for you to adopt some sort of a DevOps culture.

And working at Atlassian, obviously, Atlassian has Jira, and Opsgenie, and all these other services that help with software development, and the software development lifecycle and things like that. But I think there's a major confusion out there about what exactly we mean by DevOps. And especially when you see companies labeling tools as like, "Hey, here's a DevOps tool." Or you've got DevOps engineers and things like that, that just seems really weird to me, because I don't think of DevOps that way. And maybe we could start there and sort of just set a baseline for the listeners here, and have you explain what exactly is DevOps, and what do we sort of mean by as a practice or as a culture as opposed to a set of tools or engineers?

Sarjeel: Yes. Yeah, that's it, right? DevOps right now, the reality that DevOps has ... The word DevOps has become a buzzword. Actually, quite interestingly, I think it was yesterday or a few days ago, I saw a tweet by Patrick Debois who was saying that just because ... It goes along the line of something like this. Just because an idea has become a buzzword doesn't mean that you should shy away from it. You should still go into it and explore what it is, and you learn from it.

That's the problem right now. The industry has been capitalizing on DevOps. Especially a lot of new startups are capitalizing on DevOps, marketing themselves as DevOps tool. So much so that the promise of DevOps is kind of lost or not fulfilled when you have all of these DevOps tools or DevOps engineers or DevOps certifications coming up in the industry.

Let's try to understand what exactly DevOps is. I think the best person who explains this or who captured this is Jez Humble. He basically describes DevOps as a set of practices, a cultural mindset, not exactly a set of tools. Yes, you can have tools to help with your DevOps practices. I'm not saying that, "Oh, any tool that says is associated with DevOps, that's definitely a lie." No, it's not like that.

So, you can have tools to help with your DevOps practices, your DevOps culture. Harboring that culture in your company or in your team. But at the end of the day, it comes down to how you and your team and your entire organization are going from the ideation phase all the way to the release to production and then maintaining of your product. For example, that's where we, at Opsgenie, operate incident management. How you maintain your product, and then how you learn from that and then go through that loop again.

So, traditionally, what we saw was that we had all these separate teams where you had different roles associated to a separate state in your development flow. For example, you had ideation. The first one would be ideation where you would see more involvement of product managers and designers and sometimes engineering managers. I'm just talking very generally. You would have build, you would have tests, release, monitoring, incident management, feedback. All of these were siloed.

And the problem became that when your product, when your software would go from one stage to another stage, when those involved in one stage would throw it over the wall to those involved in the next stage, the people receiving it in the next stage, there was some communication gap. And what that resulted in was that things just went slower, especially when you would scale your product, and especially when things would go wrong. That's what we see as an incident management tool.

Especially for our customers, when our customers are using Opsgenie and the responders are not necessarily the people who were responsible for building the code, it takes them longer to resolve the incident. That's expected. You are trying to resolve something that you didn't build, that you don't know the nitty gritty details about, and you're trying to find what went wrong. That's what DevOps aims to solve. So, I would say that with DevOps, what you can achieve is that you can go faster. You can increase your velocity while maintaining stability. That's the entire promise of DevOps.

Jeremy: Yeah. I like, basically, that quote of just because it's a buzzword doesn't mean you don't need it. And I feel like the same thing has happened with serverless as well, where everybody just starts slapping the term serverless on their product, or say we do something with serverless. I think it just confuses things more and more. And so, when you say things like DevOps, we need a DevOps tool, or we need a DevOps engineer, it sort of perverts the underlying principles, I guess, of what you're trying to achieve. And so, maybe let's go there for a second. From a principle standpoint or a cultural philosophy, as you had said, what are sort of the main objectives here? What are we trying to achieve with DevOps? Because you mentioned this idea of throwing it over the wall. And that happened all the time, right? I wrote some code, I give it to my ops team. My ops team tries to put it into production. And I'm going back a way. I know you're actually much younger than I am, so good for you. But that, actually, I think, is good, because it gives you a fresh perspective on seeing how things should be working, as opposed to old people like me saying to ourselves like, "Well, we used to do it this way. So, maybe we should keep doing it this way."

So, that idea of throwing things over the wall and having something not work, and then having to just kind of kick it back as opposed to just have a flow that this whole thing gets taken care of. So, what are sort of those principles that sort of enable you to break down those walls or break down those silos and just kind of have your software flow all the way from ideation through to production, and deployment, and then to even monitoring, and troubleshooting, and incident response?

Sarjeel: Yeah, that's actually a very good question. What exactly is the solution? If we say that, "Okay, all those tools that are coming out, or all the certifications that are coming out isn't exactly the solution." Then what can we do to break down those silos? I believe that there are two things that we can do. One is to try to involve everybody across that stream in mostly all the stages. Even as a product manager, I try my best to get involved in all the stages. And then also, even within the ideation phase, get the technical side involved within the ideation phase.

So, it's not only a product PM group only, like get everybody on the same table and understand how we can go from ideation to production. And that is one culture, that is one practice that you really need to incorporate in your team. Stop thinking about people as just fixed roles and allow more flexibility and allow the flow of ideas more. That one way is how we can really break down the silos.

Another way is that, "Okay, now that you have everybody involved in everything." The responsibility of the groups. I mean, it's, it's almost impractical to have a single or a group of engineers building everything and also making sure everything runs and also maintaining the systems and getting everything deployed while ensuring its stability. It becomes very difficult. If we still look at traditional practices, having one team do everything would become very difficult. So this is where I believe automation comes in, and automation is key.

Also, while we're talking about automation, we should also try to think of this left shift culture. Bringing everything closer to either the development team or the ops team, or whoever else, but basically bringing it closer to the build stage. Right now, we are seeing this trend. A lot of people, and including I, would say that CI/CD is kind of the backbone of DevOps, because CI/CD is now looking at a lot of automation. And we see a lot of automation features coming up over there. When you're looking at automation, you're also looking at incident resolution. You think that entire incident resolution that would sit over here, coming closer to your CI/CD. And eventually, we're also seeing CI/CD tests, and all the automated tests coming closer to the developers themselves. You see debugging and having all these integrations in the IDE. Being able to locally test your cloud apps and things like that.

Yeah. It's pretty great. We are seeing a left shift, we are seeing an increase in automation. So, it's not only a buzzword, but even though it is perceived that way, but the reality that we are seeing, these improvements happen. And we are seeing an increase in DevOps practices and successful practices, actually.

Jeremy: Yeah. I think automation is a good point, because that's one of those things where sort of like automate all the things. It sounds really, really good. But then it also scares people too. A lot of ops people say, "Wait a minute, if you automate away my job, then what am I supposed to do?" And the answer to that is there's a million more things that you can do, especially around security, around speeding up the pipeline. Again, minimizing your time to recovery, or just things that you can work on. But the idea of automation is a key principle, I think, in DevOps, because it just gets ... It's the idea of getting things from somebody's IDE into production as quickly as possible. And then being able to sort of understand how that change maybe impacted the overall system or whatever, and be able to resolve those things much more quickly.

I remember the days where we used to work for months on a software release, and then we would put the software release out there, and then 80 things would be broken. So, we would decide, "All right, is it bad enough that we have to roll back the whole thing? Or is it okay where we can live with some of these bugs, and then just set out the QA team to start doing some bug hunting?" And you don't want to do that. That's just not the way that rapid software development and modern software development works. So, this idea of deploying very quickly and being able to see if there's any impact that is negative or whatever, and be able to roll back those changes quickly, I think, is super important.

And then the other thing you mentioned about sort of shifting left, or this idea where the developers become more responsible for the code that they write. I think that's actually a really, really good thing, where it's like, "If I'm going to put a piece of code out there that is going to use too many cycles, or it's slowing things down, or it's affecting the latency or whatever it is." I shouldn't rely on some other engineer that's running my system to say, "Hey, I found this problem in your code, can you go fix it?" It should basically be as a team, you're saying, "Okay, I released this code. We're noticing these high latency warnings, or errors, or whatever. I'm the one who is responsible for that. I should go in and I should be the one that fixes that."

Sarjeel: Yeah. That's absolutely true. Okay. You mentioned that at some point, you used to write code, and then you used to interact with the QA engineers and things like that. In that sense, Jeremy, I have been lucky that when I started my career ... I started my career around 2018. Not that way back then. When I started my career in 2018, the first company that I joined was Thundra, actually. You probably heard of  Thundra. I believe you have had ...

Jeremy: Absolutely right.

Sarjeel: You have had Emrah Şamdan over here also talking about serverless observability and debugging, and things like that. I joined Thundra. And then after Thundra, I joined Opsgenie. And both of these companies practiced building software, along the principles of DevOps. So, I have never actually seen QA engineers or a specific team just to resolve incidents. For us, it was always like, "Okay, you write the code. You wrote the code. If something goes wrong, you're on call" ... And if you're on call, or even if you're not on call, the person on call would alert you that whatever changes you made, something was wrong. Then they would pull you in as a responder.

And then I look at our customers. Some of our customers still do have these practices. Especially when you're a large enterprise customer, it's a bit harder to change the entire culture. It's a bit slower. When I talk to these customers, and they tell me about these problems, it becomes very difficult for me to relate to them, essentially, because I have never ... But coming back to your point about like, if you build it, you run it. And I think that's exactly what I see serverless and serverless offerings as a great opportunity, especially when you're new to DevOps and you're trying to look at DevOps, or you're thinking of adopting DevOps, or your team is thinking of adopting DevOps. I believe this is where serverless comes into play. If we go back to what I previously said about like, "Okay, we want to try to reduce ops. We want to see a left shift of you build it, you run it." So, as things coming closer to the people who are building things. We also want to see automation. This is where I believe serverless comes into play.

Jeremy: Right.

Sarjeel: The reason why I say this is because ... So, when I graduated from university and I got my first job as a junior developer, Thundra gave me a perspective of both ... This is cloud computing, right? And I just graduated, and I had seen, "Okay, this is cloud computing now." I had always heard about it in university. Right in university, you hear about the latest trends and things like that. "Oh, my god, I'll get to work on AWS." I've never interacted with any AWS service before. And I was presented containers, EC2 containers, and I was presented AWS Lambda. And with AWS Lambda, I just got to it, wrote my first lines of code, got it and uploaded it, and I was able to trigger the lambda function. With EC2, I spent quite a while trying to understand, getting over that learning curve, to a point where I was like, "Oh my god, if I don't get it done by this week, I'll probably be fired."

Jeremy: Well, it's funny, though, that you mentioned the idea of where serverless fits in, in DevOps. I totally agree with you here. And I'll give you a history lesson. And so, I hope I don't sound like an old man yelling at clouds. But essentially, how it used to be was that you would need to maintain a server somewhere. And usually, it was a physical server that ... We weren't even talking about VMs and things like that. It was a physical server, and there was networking, and there's all these other things you had to do with it.

And that was something where there was a clear line between someone who was a developer and was writing code to between someone who was actually installing software patches, and doing the networking and actually plugging in cables in a data center somewhere. So, a lot of that changed in the late aughts, 2008, 2009, when EC2 started to become more popular with AWS, and so forth. And that made it a little bit easier, but you were still thinking about VPCs and trying to do networking and that kind of stuff. It was easier, but still something that you wanted someone to set up for you so that as a developer, I would just have an environment that I could use.

What serverless has changed is that now you just have an environment. And so, you don't have to set up an environment, you just need an AWS account or a Google Cloud account, or IBM or whatever, that you can just go and just upload some code and have it immediately execute within that environment. And so, that's one of the things for me, where if you try to say to a developer, "Hey, I need you to take responsibility for all of this stuff. And oh, by the way, we're running on EC2 instances, and VPCs, and you need to know the security groups, and you need to understand how all of these things might be able to affect you." That is too much, in my opinion, to ask somebody. But to say, look, and you're throwing your code into ... Even if it's a container in Fargate or something like that, or you're doing a Lambda function, that's pretty isolated environment. It's pretty easy for you to reason about if something is not working. "I'm not able to connect to a service. It's running too slow. It's timing out." Things that are easy, I think, for you to understand and debug, and that just becomes ...

I don't think that's too much of an ask. So, I do think that you're asking developers now to go all the way through that spectrum, and to understand a little bit of the operational aspect of it, but they don't have to understand the deep networking stuff or how packets are routed and some of that stuff. They just need to understand some of the basic cloud principles. I think serverless enables that and really is this huge enabler of companies accepting DevOps.

Sarjeel: Yeah, exactly. That whole point about a lot of the underlying infrastructure being abstracted away to the cloud vendor and becoming the responsibility of the cloud vendor. That in itself is just extremely helpful to anybody trying to practice DevOps, any team trying to practice DevOps. Because all of a sudden, you no longer have to worry about your ENIs, or your security groups as you mentioned. All of that is managed by the cloud vendor that you're using, whether it be AWS or Google Cloud provider. What that allowed you to do is, as we have seen quite a bit, as one of the well-known benefits of serverless is it actually allows you to focus on your business logic more. It not only allows you to focus on your business logic, but another hidden gem, I would say, is that it also allows you to connect and communicate, focus on the communication and sharing of code, and getting over that learning curve when the other teams are involved.

So, even though you didn't write the code yourself, if you look at somebody else's Lambda function, you can focus on ... Or if you look at somebody else's FaaS functions or Lambdas, let's say, it's easier to understand. It's easier to collaborate on a code base. And, on top of that, it just makes it easier for an entire team going through that spectrum to manage that pipeline, the DevOps pipeline going from ideation to ... In fact, I say that, especially as a product manager, I would say that all product managers should also learn how to deploy Lambda functions, especially when you're trying to ideate through an idea.

It's become so easy. You can write throwaway code. It becomes very easy to write. You just write throwaway code. Just code that works, just to test whether an idea works or not. And especially when you're trying to find that perfect product market fit, just write a bunch of Lambda functions with your engineering manager or your lead engineer, and show that to the test group of customers, see if it works, go back and ideate it. It's so easy to do that because, one, serverless functions are cheap, or serverless services are cheap. The pay-as-you-go model. They're very lightweight, they're very easy to get up and running with. You don't need to worry about all that infrastructure that we already talked about. So, even there, just in the ideation phase, it's very easy to go forward.

Jeremy: Yeah. I think there are a lot of benefits to just using serverless to do some of these DevOps practices. And I know we haven't really mentioned all of the principles, I guess. We mentioned a couple of the main ones, but I think one of the things a part of the DevOps culture, or at least a part of what you need to do to fully embrace it is this idea of building microservices, right?

I mean, microservices allow individual teams or small groups of people to work on parts of the application independently. And when you start dealing with some massive monolith, and you've got a bunch of different teams all contributing to the same code base, it gets really, really messy. So, being able to break those up into smaller things is super important.

Serverless, I think, has a bunch of really cool things baked in, especially with intercommunication between microservices, and you don't have to set up things like Kafka, or RabbitMQ, or some of these other things that's just another thing to manage. So, what are your thoughts on that? What are some of the tools or the services available as part of the serverless ecosystem that just help with microservices?

Sarjeel: As you mentioned, microservices, we're all familiar with the benefits of microservices.

Jeremy: I hope we are.

Sarjeel: Hopefully. Believe me, I have dealt with a monolith, especially like when you look at front end as a monolith. In many cases, front ends can be considered a monolith you have this one big front end code base, and it just becomes very difficult. You really do see the benefits of microservices. It's actually the idea of microservices that really plays well with the entire DevOps culture and practice, where you can have each team working on something, you can go fast on that. Especially when you look at the stability.

I know we're going off on a tangent over here. We haven't started talking about how serverless is baked into the benefits of building microservices. I just wanted to mention that one point that is really amazing that I have seen dealing with monoliths and microservices is that when you're looking at it from a DevOps perspective, and you're looking at stability of your system, just having one part break and not affecting the other part. That in itself, I believe, is taken granted for. It's pretty amazing. Being able to decouple all these different aspects or all these components of your entire system. And looking at them individually where one component's failure does not necessarily result to another component failure. That, in itself, is pretty amazing with microservices.

However, what does that lead to is that when you're thinking of microservice architectures, then you also need to think about communication overhead, as you mentioned. Yes, you did decouple all of these, but now you still need all of these to communicate with one another.

Jeremy: And reliably.

Sarjeel: And reliably. Yes, exactly. Reliably. As you mentioned, there's a lot of overhead over there. I personally haven't dealt with Kafka or RabbitMQ.

Jeremy: Consider yourself lucky.

Sarjeel: Yeah. We saw EventBridge, and I think a lot of people would agree with me over here, that EventBridge is definitely the next best thing after AWS Lambda. That's because of all the use cases that it has enabled, and how powerful of a service it is. And it really allows you to think about serverless architectures and event-driven architectures from a whole new perspective.

One of the best things that it allows you to do is reduce all those ops that you would otherwise have to deal with. All that overhead with communication. One of the things like even marshalling and de-marshalling, you're literally just communicating in the form of events. Being able to leverage other capabilities of EventBridge, such as routing of events based on rules. That in itself also just enabled a lot of use cases within your microservice architecture and also as ancillary services supporting your DevOps pipelines.

Yes, one is definitely EventBridge. Again, when we were mentioning all of these services, it's also good to point out that age-old myth about serverless equating to only Lambda function. A lot people, even I, when I began, looking at, "Okay, what is this word, serverless?" I started thinking, "Okay, yeah. Serverless equals Lambda functions." Then I realized, no, it's actually a whole set of tools. It's a whole set of services that are available out there.

We mentioned EventBridge. We should also give credit to DynamoDB. Especially when you're looking at it from the point of scalability, you can have your entire microservice architecture built using serverless service. But if your data layer isn't scalable, then what's the point?

Jeremy: What's the point? Right.

Sarjeel: Exactly. Having that incorporated also. And then also, having a lot of the responsibility being abstracted away to the cloud vendor, that in itself allows a lot of teams trying to adopt DevOps to go faster.

When you're looking at EventBridge, DynamoDB, then of course, you have your AWS Lambda functions, your Fargate, basically your containers as a service. If you find FAS services a bit limiting, you can always look at containers as a service. We are seeing a rise in popularity with containers as a service. So, you have this whole set of tools in your cupboard that you can just basically bring in plug and play. And that's what serverless allows you to do. It lets you bring in a service and let you plug it in and play it in the entire way your microservice architecture operates.

Jeremy: Right. Yeah. I think you bring up a point too. You mentioned DynamoDB, which has global tables, and all kinds of things that allow you to replicate data to other regions.

The other thing that's cool about DynamoDB, or EventBridge, or Lambda functions, is that it runs in multiple availability zones, even if you're running it in a single region. And that gives you redundancy and resiliency and all these backups that ... Again, speaking of Kafka or RabbitMQ or something like that, where you'd have to have multiple services or multiple systems running in multiple regions or multiple availability zones that were subscribing to all these events and trying to manage all of that complexity.

EventBridge just kind of does that for you. You don't even have to think about it. Same thing with DynamoDB. But DynamoDB global table is actually something where this could get us to, maybe not an easy way to get to it, but certainly possible to start thinking about active-active regions, where you can actually have your systems running in Europe, and you have them running in the US, and maybe you have them running maybe in Australia or something like that.

So, what are your thoughts on that and where serverless helps get teams to deliver ... Not only to deliver software faster, but to deliver software to more places or more regionally.

Sarjeel: Right. If we take a step back, and if we look at ... You mentioned active-active. Yes, active-active architectures in itself is a whole different topic. And you have one of the great personalities in this field, Adrian Hornsby, who talks about this quite well. He has a great set of resources, blogs, and talks about that. Anybody who's interested and wants to learn anything what active-activity, they can definitely go and refer to that.

But if you look at that architecture from a DevOps point of view, from the fact that what do we actually want to achieve with this type of architecture. You trace back a lot of its motivation and its origins, not origins per se, because it's been there in academia for quite a while now, but a lot of the motivation to adopt such an architecture. A lot of it comes from the fact that, that one horrific story of the Netflix outage.

I just want to mention as a side note. We've been talking about this Netflix outage for quite a while. I'm just waiting for the next big outage because I think we have been overusing this Netflix outage story quite a bit now. Working in an incident management tool, like an incident management company. We do hear about a lot of outages, but none of them compared to what we saw with Netflix, or on that scale, but regardless. So, we see a lot of that motivation for active-active coming from the outage of Netflix, and we saw Netflix kind of start pushing the idea of resilient architectures. I'm not saying that it wasn't there before. Of course, it was, but we saw Netflix, one of the big tech companies starting to push and really think about it from a whole new perspective. And the whole point over here is to maintain stability. As we mentioned earlier, that actually is one of the goals of DevOps. Now, when you're thinking about active-active, it's easier said than done.

Jeremy: That's very true.

Sarjeel: Right. It's actually easier said than done. When you start thinking of how serverless tools can come and help with setting up such an architecture, we do see a lot of burden lifted off. So, for example, you mentioned DynamoDB global tables. Then there's also Route 53. So, we have geo routing that they recently announced. I believe it's one of the more recent capabilities with Route 53. We have DNS failover with Route 53. Then you have API gateway, which came up with custom domains, which allows you to target now regional endpoints. So, we can see, we can actually build this entire active-active service with serverless services, where a lot of that responsibility again, gets abstracted away to the cloud vendor.

I know I've said this statement, being abstracted away as a cloud vendor quite a bit. Simply because I want to stress the fact of how important it is to try to reduce the ops to eventually move towards a very successful DevOps practicing team. Again, having a lot of these activities, let's say, being automated by these managed or services, especially when it comes to scalability. We talk about serverless in the sense that a lot of times when you talk about the limitations of serverless, you look at it, one of them is that it's stateless. A lot of people have difficulties in thinking about stateless. How to think of stateless, whole business logic, and how would you have a business logic translated to a stateless architecture. But with active-active, it actually becomes an advantage to have stateless architectures. To have stateless compute services, because you don't want to hold the state in too long in an active-active. And you want to keep on switching between nodes.

Another benefit where serverless really shines is the fact that it's pay as you go model. So, if you have nodes that aren't being used, why pay for them? So, that's another advantage where you can see the benefits of serverless come into play.

Now, there's that, and there's also the scalability. So, all of a sudden, you have a lot of traffic being routed to a specific node. You may not have handled your routing rules very well, considering your traffic or things like that, but it's okay. It's okay. You DynamoDB is going to scale. If you have, let's say, a Lambda function over there, or maybe Fargate instance, it will scale. So, auto-scaling, pay as you go, statelessness, all of these come together to really help you build that active-active architecture and start thinking of how you can build that active-active architecture.

Now, by the way, another thing I want to mention, which I think we missed upon was when we're talking about the characteristics of serverless functional or serverless in general, and how we are using these serverless functions, or serverless tools to build microservices. One of the characteristics is that when you think of a serverless architecture, it's event-driven, and that plays very well when you're dealing with microservices. All of a sudden, you now have to start thinking about event-driven architectures. Again, we're coming back to EventBridge where your EventBridge may be triggering a lot of your Fargate instances or Lambda functions. Just having that constraint of having functions or having your compute resources being triggered by events allows you to make sure that you think about this architecture in an event-driven fashion.

There is a possibility, though. I must point this out that there is a possibility for you to fall into an anti-pattern. Especially when you're trying to adopt serverless, and you're moving to this granular architecture, you're moving to serverless architecture, a microservices from your monolith. It is easy to fall into an anti-pattern where you try to replicate your entire logic, your entire business logic on monoliths exactly into your serverless architecture where you would have one Lambda sitting before another Lambda function, which sits before another Lambda function. All of a sudden, you have this anti-pattern, where you can't do things as synchronously. And asynchronicity is something that is, again, another benefit of having Lambda function, but just thinking about microservices. So, there's this anti-pattern you may fall into, but as long as you think about it in an event-driven fashion, as long as you know what you're doing, as long as you do your research before building its architecture, you should be good.

Jeremy: Yeah. I think that anti-pattern is very prevalent where people just end up, unfortunately, trying to stack too much logic, or try to chain functions together in a way that is definitely slower.

We talked a lot about the benefits, I think, from a DevOps perspective, or from a DevOps culture of building things with serverless, and I think that makes a lot of sense. There's still ops work to be done. We still have to clean up development environments, or maybe run some audits or some of these other things. So, I guess from that perspective of ... And maybe this falls more on operations, but I think it's sort of part of the full cycle. Where does serverless fit in there? And what are some of the tools that are available for you to sort of just kind of run the infrastructure beyond just trying to deploy code that is maybe client-facing?

Sarjeel: Yeah. Actually, this is a pretty great question, because when I look at serverless and how serverless can aid in DevOps practices. We talked about serverless functions and serverless technologies inside your main code base, inside your main infrastructure itself. But yes, then there's a whole set of other use cases that we can come to where your serverless function to serverless technologies can act as ancillaries, helper functions or ancillary services, aiding you to get through that DevOps pipeline.

So, as you mentioned, cleaning up your environment, or even just thinking about deployments, how we're looking at automated deployments throughout the CI/CD stage, and also automated tests. Then again, monitoring and debugging and identifying root causes of incidents and remediation and all of that. All of that can actually be done with several functions. There are a lot of tools out there that are trying to help you achieve these things. Atlassian itself is building a lot of tools that helps you achieve this, helps you automate through this. But having those tools, and having those third-party tools and having serverless technologies integrated with those tools really does give you that extra boost to go faster while maintaining the stability that we're always talking about, that we're really trying to go for. I can give you an example.

Jeremy: Absolutely.

Sarjeel: There are actually many examples that we can talk over that I would actually like to point out. One of the examples that I really like is, for example, we recently ... Actually, not recently. About a year ago or so, we built an integration with EventBridge. And basically, the use cases were such that the way we saw customers using the Opsgenie, EventBridge integration was, okay, they get an alert from either Datadog and New Relic about some form of configuration drift in their infrastructure. Once they identify this infrastructure drift in the AWS setup, Opsgenie would send you an alert. And that alert acts as a trigger through EventBridge into your AWS infrastructure that can run automated playbooks to correct that configuration drift. I think that was just an amazing use case that we saw some of our customers using.

Another thing was like, for example, security compliance, or when you see some suspicious activity in your account, you can use AWS CloudTrail, or you can use any other security monitoring or audit logging tool. You integrate that with Opsgenie. Opsgenie gets that alert. And upon that alert, using ... Again, this is where I've seen customers leverage the event routing capability of EventBridge. Depending on what the content of the alert is, they're routed to the right area of their infrastructure to immediately remediate that. All of this being done automatically.

So, what we're actually seeing is kind of a reduction in the need for SRE teams, the need for infrastructure maintenance, and basically all of ops. The developers themselves can't set this up, because it's so easy. It's so easy to get up and running with EventBridge and Lambda functions or serverless in general, that you can have your development teams set this up, and take responsibility of that ops part also.

Jeremy: Right. Yeah.

Sarjeel: I think in the beginning, you mentioned like sometimes ops can get scared, like, "Oh, what's the point? What are we needed for?" Why not? Let's come together. That's the whole point, of coming together, and helping. If the development team can't set it up, that's where I feel that we need to start thinking of ops in a whole different way, especially with the advent of serverless and all of these third-party tools. We need to start thinking of ops in a whole different way of how we can leverage this new technology in the best way possible to accelerate according to the team's development practices, according to the team's cultural practices in building software. Because, again, every team is different. That's also a reason why. You can't really say that there's one solution or one tool that fits that would solve all the DevOps problems of the industry. No. Every team is different. Every team is different within an organization. Every organization is different.

Regardless, you can have like third-party tools to try to help you bolster your DevOps solutions. But as soon as you see that, "Okay, it's not working." That's where you can fill in the gaps with serverless services I think that in itself is just pretty amazing.

We are looking at customers do this with Opsgenie wasting a lot of automation come up. For example, in Opsgenie itself, we're using Lambda functions to replicate customer traffic. You have synthetic monitoring Lambda function, and you have transactional monitoring Lambda function. So, these synthetic monitoring functions, they're hitting our APIs. Then the transactional monitoring Lambda functions are receiving the input and processing it and sending it to New Relic, and the other monitoring tools that we're using. Whenever something is wrong that Lambda function will automatically send an alert to Opsgenie, a surprise, we use Opsgenie ourselves internally. We get an alert. So this way, we manage to track or we managed to catch errors or incidents before our customers can even get it.

And remember, this is, again, where you can leverage the characteristics of serverless services or tools, because, again, it's pretty easy to set up, so developers can set this up. I remember going around and playing with a few monitoring Lambda functions myself when I was a developer. So, I set it up, and then getting that connected to New Relic, and doing the whole ... Again, we try to play by that motto. You build it, you run it as much as possible. So, for example, when something goes wrong in Opsgenie, we get that alert. We try to investigate it ourselves. So, all of that is made possible because at some point, we are using Lambda functions to send a lot of monitoring data and generate a lot of data and send that monitoring data over to New Relic.

Jeremy: Yeah. I think you hit the nail on the head in terms of where the SRE team members go after some things become easier. And you mentioned this idea of CI/CD pipelines. So, if it's super easy to set up a CI/CD pipeline, and it's just a matter of a couple of clicks in a dashboard, or it's just you have to deploy maybe another cloud formation template or something. If I was an SRE, which I've done roles similar to SRE in the past, I would be really, really tired of setting up another CI/CD pipeline for somebody. If that was my job, just, "Oh, we got to set up another one of these. Set up another one of these." That is just wasted human capital where you could be spending that time, like you said, writing a lambda function that sends synthetic traffic or getting into chaos engineering.

If you have people who know the ops side of things, and can say, "Hey, what happens if this service can no longer communicate with that service? How does your service react? How does the other service recover, and so forth?" And again, becoming chaos engineers around that, I think, is a hugely important thing that larger teams have got to start doing maybe even smaller teams. But you've got to start doing to understand the nature of distributed systems, and what happens when one thing breaks down. So, I do think that there's an evolution here, where it's like the more you can automate, the more sort of your developers can own some of that stack, it just frees up people to do more important work than things that can just easily be automated.

Sarjeel: Yeah. No, I definitely agree with you. Once you have a lot of automation ... Again, we get back to the same point where you have automation, you can start thinking of the business logic and start thinking about how you want your company to perform to scale and basically work for your customers. Now, when we look at SRE, SRE is now free to start basically looking at the resiliency of the system. Performing more tests, making sure that we are at the amount of nines that we want in terms of resiliency and availability. And SRE gets freed up because of that, right?

Also, when you're talking about automation, you mentioned CI/CD. I wanted to sidetrack to this upcoming concept of GitOps. So, we've heard quite a bit of it recently. There's a company we've worked, I believe, they're really pushing the needle on this. They're looking at this quite a bit. Even that, when we think about automation. So, a lot of that automation, when you're managing ... This entire idea of GitOps, the motivation comes behind like the rise of Kubernetes, Kubernetes becoming popular, how you would manage your Kubernetes infrastructure. And they're looking at the property of Kubernetes to kind of ... Because it's a defined architecture. I forgot the term. I'm so sorry. You define it in your kubectl and everything. You'd push the infrastructure changes or your code changes. That's when GitOps would basically automate. You'd go from continuous delivery to continuous deployment. It's a push from continuous delivery to continuous deployment.

Whenever I look at continuous deployment, and so whenever I look at GitOps in general, I find it very scary, because, okay, you pushed something. And all of a sudden, all of these things are happening automatically. Your entire infrastructure is just about to change. Your entire code base is about to change. It's always very nice. You have somebody in the middle, in a staging environment. You first push to a staging environment, somebody in the middle.

Jeremy: You test it. Right Yeah, exactly.

Sarjeel: You test it. Exactly. And then there's a little button that says, "Okay, deploying." And it goes to production, everything is cool. But you're reducing that. At the end of the day, that's what the idea of DevOps is, to try to reduce the manual labor and push for automation. And then I look at GitOps, and I'm like, "Did we just go crazy? Are we going too far?" Then that's where I see, "Okay, you can have ..." GitOps shouldn't only be thought about in terms of, okay, yeah, automating your deployment, but you should also think of it from a perspective of observability. And, again, when we're talking about observability, that's where I believe we can leverage these Lambda functions, because one, your Lambda functions are very light, and they're just being used for monitoring. So, you're continuously monitoring the actual state, as compared to the desired state. And whenever you see the actual state had drifted away from the desired state, again, you can either use Opsgenie as your alert consolidation tool. Or you can just trigger an event through EventBridge from your lambda function, send an event to EventBridge and, again, go and remediate that. Yeah. Basically, just go and remediate that drift away from the desired state.

So, this is one way. We are looking at automation, and we are trying to find ways to go faster and faster. And this happens that, okay, as we go faster, we still need to remember, maintain stability, maintain availability. And this is where we see the benefit of, especially, Lambda functions, or Azure functions or whatever form of FaaS functions you're using. Because when we talk about using FaaS functions in production for your actual code base, there are a lot of limitations that everybody talks about. A lot of edge cases that aren't covered by these services. But in this case, in this regard, it fits perfectly. One, it's cost-effective, it's easy to spin up, and use it scalable. At Opsgenie, when we want to increase the traffic on a certain API, simply have several concurrent Lambda functions, just bombarding that API with requests and different kinds of requests. This is scalable. It's easy to spin up.

This is exactly where one of the benefits lies. But again, yeah, as I mentioned, in production, you may have some limitations. When you're thinking about it, in terms of microservices and active-active architecture as we talked about before. But when you're thinking about it as ancillary services, and just helping you go through that DevOps pipeline, when you're thinking of it as a glue code, especially, it's really beneficial to use serverless functions.

Jeremy: Yeah. I think all that ties together too. I mean, GitOps is something that just ... CI/CD continuous deployment is one of those things where, yes, it scares a lot of people, because it's just going through FaaS. It allows you to move so quickly and make changes so quickly. And I think that if you embrace the whole culture, if you embrace the idea of microservices and serverless deploying very small units of code. Just this idea of test-driven development or being able to have the tests that you need in there, and so forth, the ability for you to roll back quickly, adding in things like chaos engineering to know what happens if we put something out there and it breaks that we know that the other things will degrade gracefully. Having that capability and kind of following that whole thing. I mean, that's sort of the holy grail of doing this stuff, because it's okay if you break something sometimes, but it should go through a test process and there should be a development environment where you're testing these things against other things. But if something does break, you're isolating it, you're minimizing, you're creating those bulkheads there that are minimizing the impact that it has on a larger scale.

So, we're running out of time and so before we finish, though, I do want to talk about ... I mean, we've been talking a lot about serverless, and EventBridge, and active-active, and DynamoDB and all these great things. It's not like a team can just go ahead and shift tomorrow and start using all this stuff, right? There are a number of barriers to adoption, some of those being just the cultural change in a company, first of all. But also, just this idea of the learning of these tools, and then maybe even the limitations of some of these tools. So, what are your thoughts on some of the barriers that might exist to people who want to adopt, not only DevOps, but maybe DevOps with serverless?

Sarjeel: I can best answer this with a story of mine, or something that I experienced, especially when I switched over to Opsgenie from Thundra. Remember, I was in Thundra serverless. At that time, Thundra was a serverless monitoring tool. Now, it's become much more, of course. At that time, we were focused on serverless. And I was just like, "Oh my god, it's an amazing technology." Then I switched over to Opsgenie and I see, "Okay, we aren't using it that much." And in fact, when I switched over to it, a major functionality of ours, that was initially being built on serverless architecture, the senior engineers rolled back on the decision and went back to EC2 and other forms of container services.

And I asked, "Why did this happen?" I didn't have that much experience, and I really wanted to know what happened. So, I remember, one of the co-founders actually sat me down. He was a pretty cool guy. He actually took me to a whole different meeting room. He sat me down, like, "Okay, I'm going to teach you something now." I'm like, "Okay." He told me that the service is great, and it is definitely, in some way, the future. But in its current state, we do see a lot of issues. And this is back in late 2018, let's say. Around that time, the maximum run time was five minutes, I believe, for a Lambda function.

Jeremy: Five minutes. Yes.

Sarjeel: Yeah. It is in that same year or a bit later that we saw 15 minutes then. So, that is a huge improvement, I would say. At that time, we weren't ready. Our use case was not the best use case. Or the way we were looking at our use case was not in the best manner to adopt serverless. And I think it's very important to understand the limitations of what you can and cannot build, and how you can get around these barriers, because I feel that there's always a way to get around these barriers. Are you just willing to invest in it? It's not like Opsgenie gave up on serverless. We continued. We still use a lot of serverless components in a lot of areas in Opsgenie. Especially in our DevOps pipeline, for example, when we want to spin up emergency instances, we use Fargate. We were using Fargate. I'm not sure if we are still. We're using Fargate for our SRE, in our logging, and other operations, because it's easy to spin up, and it's cost-effective for that use case. But there are definitely limitations.

What I have noticed, Jeremy, even in the small period from 2018 since I graduated to now, I have seen ... We have all seen major leaps of improvement. Just mentioning five minutes to 15 minutes runtime, that was a major improvement. I remember there was this conversation, this whole conversation that I had with Emrah Şamdan and how we were looking at, "Okay, we need to think about runaway cost." We now see that the billing has become more granular for a lot of serverless services, for example, Lambda functions, which was 100 millisecond. Now, it's one millisecond. That in itself is just a huge improvement. I think we see the same thing definitely for a lot of serverless services.

And then there are other things. For example, being able to debug your serverless infrastructure. That in itself is problematic, but we do see a lot of improvements in the industry, for example. And also, a lot of third-party tools are coming up. So, for example, I've been following Thundra's growth as they went from ... They kind of started encapsulating all of ... enabling cloud developers. Recently, they came up with Thundra Sidekick, which is, I think, a very cool feature. If people haven't seen that, I recommend they go check it out. We're definitely looking at it. And this whole community that's coming up to fill in those gaps, and there's still a long way to go. But I still feel that even what we have right now is pretty amazing.

Jeremy: Yeah. No, I agree. And I think that there are limitations. Serverless is not a silver bullet. You're going to run into limitations, but I do see ... I mean, I would recommend to anybody, if you're trying to establish a really good DevOps practice within your organization, or you're just building applications, the services that are serverless, and have those serverless qualities are going to be the ones that make the most sense for you to choose if you can. If you can't, then don't. But if you can, choose those, because that just gives you all of those benefits we've been talking about through this entire episode. And just that ability for you to really own your code, get those CI/CD pipelines to the point where you're delivering multiple releases per day and things like that.

So, Sarjeel, listen, thank you so much for joining me and spending this time and sharing your knowledge on DevOps and serverless. If people want to get ahold of you or find out more stuff that you're working on, how do they do that?

Sarjeel: Well, I'm a pretty open guy. You can just contact me with whatever channel you find. Twitter is great. So, Jeremy, I think you are putting ...

Jeremy: Yes, I'll put the stuff in the show notes. Yeah.

Sarjeel: Yeah. You can contact me through Twitter or on my email. You can find my email on my website, which I think is also going to be in the show notes.

Jeremy: Yep, sarjeelyusuf.me, right?

Sarjeel: Yeah, exactly. So, you can contact me through my Gmail or Twitter, anywhere, and I would just love to talk to anybody. As you realized, Jeremy, I'm also pretty new to this field. There's a lot of learning that I need to do, and I want to do. And so, I would really love for people to reach out, and I would really love to reach out to people and, I guess, brainstorm on a whole bunch of ideas, and especially use cases, because I think it's the use cases that are really driving all of these improvements in serverless.

Jeremy: I totally agree. Well, listen, fresh blood and new ideas, new perspectives are always good things. So, also, if people want to check out Opsgenie, opsgenie.com. But otherwise, we'll put all this stuff in the show notes. Thanks again, Sarjeel.

Sarjeel: Thank you so much, Jeremy. Thanks so much for having me.

Episode source