ALTERNATE UNIVERSE DEV

Serverless Chats

Episode #41: Communication Patterns in Serverless with Paul Swail

About Paul Swail

Paul is an independent cloud architect who helps development teams make the transition to serverless. He has almost 20 years’ experience delivering software solutions to clients across a wide variety of industries. In addition to client consulting, Paul has been running his small SaaS business on AWS for the past 6 years and is slowly migrating it to a fully serverless stack. He writes in-depth articles on serverless in his email newsletter and on his blog at winterwindsoftware.com.

WATCH THIS EPISODE ON YOUTUBE: https://www.youtube.com/watch?v=gf__z3K8LBI
Transcript:

Jeremy: Hi, everyone. I'm Jeremy Daly, and you're listening to Serverless Chats. This week I'm chatting with Paul Swail. Hey, Paul. Thanks for joining me.

Paul: Hey, Jeremy. It's great to be here. Thank you.

Jeremy: You are a cloud architect at Winter Wind Software. So, why don't you tell the listeners a little bit about yourself and what you do?

Paul: Yeah, sure. I'm an independent cloud architect. I work primarily with AWS and I specialize in helping development teams ship their first serverless application into production. I've been focused specifically on serverless for two years or so now, although I have been doing software development professionally for about 19 years in total.

Jeremy: Wow. Still not as old as me, but that's okay. One of the things that you've been focusing on lately... Or, I think you're making this transition into serverless first. So, why don't you tell us a little bit about that?

Paul: Yeah, yeah. My website is currently in the process of being moved to ServerlessFirst.com, so I think that... Basically, serverless first is a methodology, which I've been taking with my clients that I've been working with over the past couple of years. They're used to more traditional ways of building serverless apps, and they see the value of using serverless, but they can't use it for everything. By default, your architectural decisions start with serverless services unless you can justify using something else.

Jeremy: Awesome. Alright, I wanted to talk to you today because... I don't know what happened, but somehow you've become one of the most prolific writers in serverless over the last couple of months, releasing a few articles or a few blog posts every week. It's been awesome because every time we get more content, and you answer more questions, and you get deep onto one particular subject, I think it's super helpful. One of the things that you focused on quite a bit, I think, has been this idea of these communication patterns in serverless applications. You wrote two articles recently. One was called Seven Ways to Do Async Message Processing in AWS, and another one was Interservice Communication Channels for Serverless Microservices in AWS. Both great articles. Definitely go to WinterWindSoftware.com, check those out. Very, very interesting stuff. Why don't you tell us a little bit? Maybe you can get us started, in terms of what are the main communication patterns in serverless, and why is it so different, maybe, than a traditional application?

Paul: Yeah. I think a lot of my clients have come from the monolithic architectural background and asynchronous stuff. They may be aware of it but they haven't used it a lot. AWS has a lot of services which does... Around asynchronous messaging patterns and it can be hard to understand what to choose. So, a lot of it is just documenting questions that clients have had for me. A lot of the writing is around that area. We can go through the details of lots of individual services that I discussed in the article.

Jeremy: Yeah. Let's start, though, maybe just thinking about the difference between asynchronous and synchronous because I think most people are very familiar with that monolithic approach of... I should maybe take a step back. They're used to that request-response type mechanism, right? I make a request to a website or to an API and that data comes back to me. There's that one part of that immediate response. That's not going to change whenever you have a customer-facing or a web-facing side of things, but it's where the backend... The backend is what gets different. That is one of those things where, I think, when people are familiar with monolithic applications, they think, hey, I've got 15 different methods, or functions, or whatever, that are all in one big application and I can say, "hey, I need to process the order. I need to pull the inventory. I need to send the message." And that's all in one app, or one, I guess, big chunk of code, really. But when you start moving to this asynchronous thinking, we're starting to separate out these components separately. So what do people have to think about when they start building that type of application?

Paul: There are quite a lot of things to think about. I guess based on the workloads, firstly. If it's like a task or a job-based type workflow, you may want to do, or maybe that you need to notify a lot of other systems, so based around... That's a decision in itself, around what service you use, based on the nature of the workload. There are a lot of operational considerations just around throughput, and concurrency requirements, and latency requirements, and scalability of any downstream systems that you may need to talk to, and message durability, and your error handling, and retry mechanisms. These are all things... Oh, cost, of course, as well. These are all things that you need to consider around how you would structure any asynchronous messaging patterns that your workload requires.

Jeremy: Yeah. I think that makes a lot of sense. I think what you get people coming from the monolithic space or the traditional system space, and they move into distributed systems. Now, there's this whole different idea of passing messages around, right? We're no longer using a single system, so now we're trying to communicate with multiple systems, and as you said, things like durability of messages, that becomes a huge concern. Or, something like the error handling. What happens when you send a message off into the ether and you don't know what happens to it? Does it ever get to its destination? How do you know a message was even sent if that information isn't recorded correctly. I think maybe that's another thing that is interesting to me though, is that even people who maybe come from distributed systems and think about, oh, I've got to set up a Kafka cluster, or I've got to do something like that. Traditionally, there has been a ton of stuff that you would need to do just to create the messaging components between these distributed systems, but serverless changes that quite a bit.

Paul: Yeah, it does. I can echo those sentiments you said, I used to, back in my Microsoft.net developer days, setting up BizTalk servers, something actually I knew how to do, and we did it in a few projects, but just the provisioning and management overhead of doing it just... even though it was a nice distributed messaging pattern, it was just so much effort to manage. Whereas with AWS serverless and async services, it's simple. It's just quite a few lines of YML and SLS deploy, or whatever it is you're using to deploy it, and away you go. The operational overhead is just significantly less.

Jeremy: Yeah. I think that makes a ton of sense. When you're setting up these distributed applications using serverless, and you're using one of the pub/sub services, for example, like SNS or EventBridge, it makes it really, really easy. Maybe let's talk about pub/sub for a second.

Paul: Yeah. For folks who don't know, pub/sub is a system where you're a publisher, not a subscriber, and so it's a Web decoupling and separate services, if you even think of your application as having separate services, and you can simply... something happens in service A and you can just publish an event. Service B, may need to do some processing on that. It can consume that event, but the actual message communication between those is managed by a pub/sub system. Within AWS, we have two main pub/sub systems. We have SNS and EventBridge. I've been using SNS for a few years now, and I've just started using EventBridge. Very similar feature set, although EventBridge seems to be the preferred solution amongst most serverless experts these days. I think mainly around it offers more event targets. A particularly nice feature is around the schema registry, and so a big thing with pub/sub is your different services, and there is a small coupling around the actual schema. They obviously couple, but they do need to know what shape the message is going to be. The schema registry that EventBridge provides gives you a way, if you're using a tech language such as TypeScript or Java and you can download type definitions based on the events that are getting sent out. So, it means your consuming application knows exactly what to expect and you can, at compile time, you how to work with your messages.

Jeremy: Yeah, yeah. And I think as you said, it's... I think things are going to move to EventBridge. I think we've seen a lot of noise around this and I've been using EventBridge, in production now, for several months and there are some really, really cool patterns that you can do. Just with the huge throughput. I know there's some concerns or some talk about some latency issues and things like that, but it also depends on what your work flow is. SNS still has its place, right? So, SNS, still a great pub/sub service that you may need to use. So, let's talk about the benefits of using asynchronous versus synchronous, right? I think that most people, again, going back to that monolithic example, it's, alright, an order is placed, so I have a function or a method in my system called process order. And then process order has to do the inventory. It has to do the billing. It has to do the messaging. It has to do the invoice creation or some of these other things. That all happens by using internal method calls, or whatever, and it's easy to call those. When you start switching over to distributed systems, you might have a billing service. You might have an invoice service. You might have an alerting service and that is all a simple way to think about it, if you're doing a container based system and you might want to do asynchronous communication that way, but when you're using serverless for breaking it down into even smaller functions and things like that. What are the benefits of being able to split those up into jobs that don't have to immediately give you a response?

Paul: There's several benefits. One benefit is that it's simply... It's easier to reason about it if you've a single task. As a developer, you can see just by reading it, what it's actually doing, rather than having a huge, as you said, a single function, which does the orchestration of all the things within code. From a logging point of view... From a monitoring point of view, say within a AWS client watch logs, you know exactly... You can go into the functions logs just to see exactly what happened or what went wrong. A major benefit is around retries. So, say you've a multistage work flow and say steps A, B, C, and D. So, say step A succeeds, step B fails, and you don't want to... If that was a single Lambda function carrying out all those four steps, then you would have to retry it all or you would have to build it all into your logic, how that gets handled. If you split it up into four separate asynchronously invoked Lambdas, the Lambda service itself will do the retries for you. In my example there, step B, say it fails, Lambda: A will have completed successfully, that's done, but B will be automatically retried if you so wish it to be. So, that's another benefit around the errors and retries. The most obvious one, which I forgot to say, is around latency. If you have a Interface and API call, if this is all behind, then the user probably doesn't need to wait for all four tasks to complete. Whereas, if you just have an API Gateway call, you can just write it to a queue or write it to a pub/sub system and just return to the user. All the rest of that processing happens in the background and the user gets a quick response.

Jeremy: Yeah. I know, I totally agree. I think that's one of the... That idea of immediate response is huge, cause everybody wants stuff back quickly. There's no reason why that, that invoice or that credit card has to necessarily be processed immediately. I mean, you submit an order on Amazon, and it says we've received your order and then you usually get something later on that says oh, we couldn't bill your credit card, or something like that. The other thing though, I think that is really great about splitting things up into separate functions, is this idea of, not only the security aspect of it where each one can be finely grained tuned or finely tuned security, but also this idea of being able to scale each one of those things independently.

Paul: Yeah, that's right. So, if you have... Say one of your tasks, it talks to its answering system, say in the RDS database or a third party API, which may not scale as well as serverless services. You may want to throttle it, using the async. Say for the likes of an SQS queue. You could use that to throttle your throughput to that system. That's scaling down and such but you can also... If you don't need to do that, you can use the SNS and just have a fanout pattern to distribute it as widely as possible. Say if you have a task which is split right into DynamoDB, or something, which can match the scaling that Lambda would give you.

Jeremy: Right. Yeah, I think that... I mean I do a talk on this about downstream systems not being able to handle the amount of pressure that might come from certain workload. I think that's actually a really important thing to consider... Is the great thing you have with just putting concurrency on a particular function, is to say, "Look, I only want 50 concurrent connections". Or, "I only want 100 concurrent connections". That can seriously help when you're trying to throttle against a downstream API or, like you said, a RDS cluster, or something like that. That's one of the things, too, that I think, is where people get a little bit confused. They say "Well, normally, if I make a request to an API and I've got to call the billing service", you know? Somewhere in between there, if that thing is being throttled and I have to respond back to the customer and say "Hey, I'm throttling this". That's a bad experience but with the async piece here, and putting something in between, like SQS for example, to be able to store and create message durability, that is something where people get a little bit lost on how that works.

Paul: Yeah. Yeah. It can be quite confusing and I guess, if you want to do a full asynchronous back to the user, in that example, you could even introduce WebSocket. You can have the initial API call from the client just writing to the queue. Have all your async processing happening server side and then separately then, do a WebSocket push back to the cloud, once it's ready.

Jeremy: Yeah. That makes total sense. All right. So, let's talk about billing for a second. How are you seeing with your customers, their reaction to this pay-per-use billing? Cause I think in most cases, we're not afraid of these $0.05 Lambda bills but when you start adding on API gateway, and things like DynamoDB, and SQS... I saw an article the other day where it was this application where Lambda cost $0.01 and SQS cost $1.83. So, again, it gets... Other services, besides just Lambda get kind of expensive. So, how're your clients seeing this and sort of planning for the cost?

Paul: Yeah. It's more... With serverless, it's more but the variability than the actual absolute price. So, it's generally low and it's generally negligible in pre-production scenarios. It's the variability of... Especially with large asynchronous work loads that can fan out to lots of invocations. A lot of the times, the client comes... One of my clients is - they're a dev agency themselves. So, they build a lot of apps with totally different workflow patterns. At the start of each project we would often just... It's in an excel spreadsheet, just plugging figures in to see, based on expected usage or what our current architectural design there is and how much it could be. It could vary quite significantly. API gateways is one of the most variable, off the serverless servers, but sometimes clients have existing infrastructure as well, which gets hit. API gateway... I know you did an episode recently with the API gateway team, with the new HTTP API. I haven't tried those out, so hopefully that will... I think they're 33% of the cost, so hopefully that will help out on that front.

Jeremy: A little bit less, actually. It's $1.00 per million, as apposed to $3.50. Yeah. That and... I mean, actually, you bring up API gateway... I mean, I think one of the things we can't lose here, and we mentioned this a little bit earlier, that there are still synchronous use cases, right? We can't just throw away synchronous altogether. Sometimes synchronous might not just be the front end web API. We might need synchronous invocations, and in most cases, even in asynchronous process, needs to make a synchronous call to something like the Stripe API, or maybe to another microservice. What are some of the pros and cons of using... I mean we certainly don't want to try to chain synchronous invocations, right? We don't want to call our API through API gateway, then make a call to some other service that makes a call to another service. Decoupling those, I think, makes a lot of sense but you can't always get around that. So, what are some of the pros and cons of some of those synchronous patterns?

Paul: I guess it's the client gets an immediate response. If the client needs an immediate response you can't... If it's just simply fetching from a database that it has to be synchronous. If it's a GET request, HTTP API... I guess a benefit... You can say it's easier to reason about. So, from a developer debugging point of view, I still would find synchronous calling patterns. The logs are generally easy to find. You don't need to look through log files for separate Lambda invocations often. So, from that point of view, synchronous is still easier to monitor and to debug, I would say. You could possibly argue that it's maybe easier to author in the first place. Well for developers, who like writing Javascript code or Python code, or whatever, lots of YML to configure. Those guys would be probably happier writing simple synchronous code but we try to move folks away from that but... Generally, I would recommend, if you can do it asynchronously, do it asynchronously. Write something too if you're going to HTTP user API, just write something to a queue or to SNS or to EventBridge, or to DynamoDB. Just return and have any other processing done in the background.

Jeremy: Yeah. I think especially when it's POST request, right? Any type of write request, you can return something back, even if it's a... If you have to return an ID back to somebody, then generate a UUID and return that, and also submit that with the job so that, that gets associated with that job and you can look it up later, or something like that. But I mean, if it's a GET request, right? I mean, there's a lot of caching you can do. I think people don't take advantage of the CloudFront caching through API gateways. How you can actually change the amount of time something gets cached for, so you can minimize impact on the API. A whole bunch of interesting tricks that you can do there. Another thing, though, that comes up, is once you start using multiple Lambda functions, is this idea of function composition, right? We talked about the asynchronous patters. Where, sort of, one function generates an event. Maybe that goes to EventBridge and then another one is listening to it. But what are your thoughts on Lambdas invoking Lambdas? There's two ways to do it, synchronously and asynchronously, but maybe your thoughts on each?

Paul: Okay. Let's take synchronously first. Generally avoid against this, but the reason why I would generally say don't do synchronous Lambdas is because say, you have Lambda A invokes Lambda B and it's waiting on the response. As soon as that invocation happens, you've now... the clock is running on the invocations of two functions so you're paying twice for both functions being running. Also, there is... It can be quite difficult. You've also two places to look for the logs, as well. People who are fond of reusing functions, like functions at a code level, rather than the Lambda function level when they first code the Lambda, they may think oh I've got this piece of functionality which I want to reuse. I can just create a Lambda function for that and call that from all the other Lambda functions where I need it. Generally, that's not what you want to do. Just use a code module or a code library and reuse it in that way. There's one exception where I have done it, invoked it synchronously. That is when I'm using VPC. It's pretty low. I had a use case recently where I had the scheduled job that runs nightly, it's for a SaaS app that sends out nightly e-mails, but to do that it needs to query a RDS database, which is inside of VPC and send it out to all the users with a certain... Who matches a certain workflow. There's, sort of, two things going on there. There's a cloud watch reel to trigger Lambda then there's a database query and then there's... I think I was publishing to SNS or using TSCS to send the se-mail at the end. If I just had that as the single Lambda, running inside a VPC, I can't then call the e-mail service and so, in that case, I put a Lambda inside the VPC, just to do the query, the database query and the calling function, invoked that, got the result back and then it was able to send the e-mail because it had internet access. So, that's the only sort of exception where I think it's valid to do that, but some people may argue that's not even valid and-

Jeremy: So, I actually think that's a really good use case. I mean, the only other thing I would suggest is maybe use the RDS data API if you were in Aurora Serverless database. Then that way you could use... and actually that's one of the things I do. I have a reporting service, I think it's a cool set up, where basically all these operational front ends are DynamoDB. They have DynamoDB streams attached to them that replicate the data to an Aurora Serverless cluster and then there's a reporting service that runs queries against that through the data API. Surprisingly, it's extremely fast, the data API, that's not the fastest sometimes when you're doing synchronous stuff but it works pretty well, and then of course, it avoids that VPC issue. You mentioned this idea of separating functions, as you said, maybe that you might consider code modules or code libraries. Where again, you might a function that is charge credit card and you might have another function that is create invoice. I'm using these same examples, but there's a really, really good argument. I know LEGO does this, I know that Bustle does this, where they create these fat Lambdas, right? Where Lambdas do more than one thing because you need to have that synchronous component. Now, maybe the one I just mentioned is probably not the right idea, but there are times, I think, where you don't want to be calling Lambdas between Lambdas in most cases. If you do find yourself needing two separate pieces of logic that need to happen asynchronously, this idea of fat Lambdas is, I think, really interesting. Where you just put these two things together and say "I'm going to run these two snippets of code in the same Lambda function because they get the benefit from that".

Paul: Yeah. Yeah. I would do it like that. I don't even know if I would... Cause sometimes, I guess fat Lambdas can be misunderstood to be, I won't have even call that a fat Lambda such as... I guess it's just...yeah.

Jeremy: Well I break Lambdas into three categories. I have the single purpose function, right? Which is the one that we try to favor. Then you have the fat Lambda that takes multiple bits, or multiple actions and puts them into a single function to optimize it. Then you have the Lambda-lith, which is your entire application, runs in a single Lambda function. So, I think fat Lambdas are an optimization and not necessarily a bad pattern, because sometimes you need to do that.

Paul: Oh yeah, absolutely. Absolutely, yeah. There's no point in being dogmatic about just... I would generally say single purpose Lambda functions, but yeah, if it's something which will always have two natural steps, and you can't really... It doesn't make sense to split them apart into their own Lambdas then go for it.

Jeremy: When it has to be synchronous. The other time I think that a synchronous API call to another Lambda function isn't the worst, is if you're doing interservice communication and you need a synchronous request. If you have a customer service and you have an order service, sometimes that order service needs to look up that customer. When that happens, you definitely don't want to route that back out through the API gateway, or something like that. That just gets overly complex. You do build coupling in, but you build coupling in on all kinds of things, because think about if you're calling the Stripe API or you're calling the Twilio API, your service is bound to, or is coupled to, that other API. I think that's okay in certain circumstances, as long as you have the fallback methods in place. If you're going to process an order and the order API or the order service has to call the customer API, that is fine as long as you, again, maintain some sort of contract between the two. But also, knowing that the order service could fail that particular call and then retry, maybe once that other service comes back up. Of course, you'd want to do circuit breakers and all that kind of stuff in there. But what are your thoughts on that?

Paul: Yeah. I think that's a totally valid pattern. If you have the two separate services, I guess, if you don't want to introduce the coupling, you could somehow have an event-based model where each service keeps a copy of the other's data, as such, so it doesn't have to do those synchronous calls. But sometimes, that introduces problems itself, just keeping the data in sync as well.

Jeremy: You've got to get the data the first time, right. So, it's okay to maintain a copy of the data but you still have to get that data the first time.

Paul: Yeah. Well, I guess you could have an event-based pattern where the service with the data, in the first place, could publish an event and-

Jeremy: Yes. You just...

Paul: You could transfer it asynchronously like that. I'm not a massive fan of that approach either. Sometimes it is just easier to make that synchronous call, as long as you don't have too many. If it's one or two between a service, that's okay. Once you have more synchronous interservice channels then that's where you're sort of losing advantages of having microservices at that stage.

Jeremy: Right. Yeah, I agree cause too much coupling can cause some problems. What about the asynchronous calls of other Lambda functions? Cause one of the complaints that you hear about this, is that even if you are chaining Lambda functions and everything is happening asynchronously, you are introducing a lot of coupling and then you really reduce the amount of reuse that you get from a single Lambda function.

Paul: I can understand the reasons for that and I guess... I don't know if you're alluding to, there's Lambda destinations recently, which have, a few months ago AWS announced it. You can configure any asynchronously invoked Lambda functions. You can configure another Lambda function to always take the result of the first invoked function and pass that into the event of the next one. That coupling, I can see it, but from a reuse point of view, I'm not sure, can you maybe directly... I think you can still synchronously call the function.

Jeremy: You could and actually...

Paul: You won't get the side effect of the destination being invoked.

Jeremy: Right. Yeah, and I should clarify this to because essentially the point that I'm getting at is that if you write it, if you hard code it into your code, right? So, every Lambda function that gets invoked regardless of whether synchronous or asynchronous, is always going to have a hard coded next step, right? I mean you could put a whole bunch of logic in there if you wanted to, or whatever, but there's always that problem, of that function getting invoked synchronously, asynchronously and something happening where then that next call doesn't happen. Which is where Lambda destinations comes in. I love Lambda destinations. I think it's a great pattern to have another Lambda function or in most cases, EventBridge, or something like that, respond or be able to handle the output of an asynchronously invoked function. I think I'm talking more about sort of hard coding that work flow into Lambda functions to say when this Lambda function is done then it calls this Lambda function, and then when this Lambda function, then it calls this Lambda function. I think you introduce a whole bunch of challenges with that approach.

Paul: Yeah. I guess you do and I guess there is step functions, which for certain workflows, that makes sense to solve that problem. If you have a workflow, which is pretty well defined, like a business workflow, it may make sense to... You could put 10 individual Lambda functions together via Step Functions state machine. From that, you can still invoke the individual Lambda functions separately, if you need to, or you can use the Step Functions to compose to pass the output of one to the input of another or to fan out the more complex mechanisms if you need to.

Jeremy: Right. Yeah and Step Functions, again, I say this all the time but for complex workflows, Step Functions are great and for function reuse... and that's one of those things where if you think about your traditional monolithic application, you say, I got to process the order, charge the credit card, create the invoice, do all these things, and all those steps, you need all of those to complete. You need that guarantee. If you can do that asynchronously, meaning that you don't have to immediately respond to the client and say, all this stuff is done, which again, the more complex systems get, the harder it is to respond in a short amount of time, then using something like Step Functions is great because then you can basically set up your retries separately. If something fails or a whole pattern of them fails, or a whole bunch of them fails, you can implement a SAGA pattern and you can go back and unwind all of them. There are a lot of really cool things that you can do with Step functions. They're all invoked synchronous too, so you know exactly what's happening as that state machines is processing it. So, when we get beyond just regular, or I guess, basic messaging between systems in distributed systems, and more so in serverless systems, we've been talking about passing messages back and forth and invoking one Lambda function at a time, or things like that. You kind of mentioned polling or you mentioned SQS and some fan out and some of those things. That's another thing that is a very powerful way to communicate or to build distributed systems, is to use queuing or streaming or some of these things. We obviously have SQS ques. We have Kinesis streams and we have DynamoDB streams, things like that. What are the... I guess, for the benefit of the listeners, what are the differences between those and maybe when and why would you use different ones?

Paul: The benefits of Kinesis is that you get a long back log of events. As an example, I have a SaaS product, which has website click tracking so the fans coming through from different websites, at a high throughput. So, we need to capture them quickly but the processing doesn't need to happen that quickly. The processing can happen gradually. We have a long back log. It gives you long back log potentially, of events and it's similar to the way a queue does but unlike a queue, where you just have a single processor, and pulling items off the queue. With the stream you can have multiple subscribers. In a way, it's a combination between a queue and a pub/sub, in that respect. You can have multiple subscribers processing messages off the stream, although, other consumers can consume that same message. Based on that DynamoDB streams is similar in that, if you already have an application, you're using DynamoDB, as your application database, but you need to react based on certain data that gets put into your system, DynamoDB streams might be a good fit. It gives you an asynchronous event model based on an item is added or updated or deleted on your DynamoDB table. You can then in a separate job, get notified about that even and do whatever processing you need. A drawback of using DynamoDB streams is that the event schema that you get is quite specific to DynamoDB. So, it's in your DynamoDB item, your consuming service needs to know that effectively your database schema. Whereas it's not unlike a nice friendly domain event schema.

Jeremy: Right. Yeah. I think the other thing too is that as you mentioned, the ability to process the same message twice, is sometimes something you want to do with Kinesis or with DynamoDB. So, you might want to have multiple consumers reading that same backlog, as opposed to something like SQS, where you can have multiple Lambda functions or parallel Lambda functions, reading those things off the queue. Once you read the message and remove it from the queue, it's gone. It just takes the message out of the queue, whereas those streaming services, like you said, they store the events for a certain amount of time. You can go back and you can read through those things. I mean, again, it depends. When would you suggest somebody use an SQS queue over Kinesis?

Paul: I guess in most cases, SQS would default. If you have a job which needs one-to-one processing, you're only ever going to have one downstream processor, then SQS makes sense. It's proper serverless price and pay-per-use, which unlike Kinesis... Kinesis charges by the hour based on how long it has a shard based pricing model and so you're paying by the hour, rather than per use. In general, I've used, in my applications, for my own and for clients, I have used SQS a lot more than I have Kinesis. I guess if you have that large volume of events coming in, that you need multiple processors, to need multiple subscribers too. Then Kinesis certainly makes sense in that respect.

Jeremy: Makes sense. Let me ask you one more question about developers building asynchronous applications. A lot of things to think about, right? Lot of pitfalls whenever you're building a new type of system, and of course lot of pitfalls in distributed systems. What are the main things that somebody that's now building things asynchronously really has to think about? What are the big ones you have to be worried about?

Paul: There are a few things. Number one, I would say distributed tracing. If you have a multi-step use case where there's a lot of data processing going on in the background, you probably now have multiple log files to search through. There are... If you were doing that synchronously, you could just look in the one place, more often than not. There are strategies around using correlation IDs within each message so that the same, say in CloudWatch you can query on for that correlation ID and get an aggregate of any log entries across your different log groups, which have that correlation ID within it. You need to build that in to your application, your Lambda code, that doesn't come out of the box. Another consideration is testing, writing automated tests. It's just harder for asynchronous workflows, so if you're writing asynchronous... Say I write in Node.js and Jest test frame work. If I have a synchronous Lambda, it's generally pretty easy to write an integration test for that. You just hit the end pointer, invoke the Lambda function, and just verify the response. If you have a multi-step asynchronous data processing workflow, then you need to test each one of those individually, during an actual...Writing an end-to-end test is difficult. But I just like having a wait step, that just waits until background processes have, you hope, have completed and then you can do whatever verification steps you need. Just generally understandability, it's not a thing in itself but it's just for if you've got a new developer on your team... A lot of teams that I've worked with are more full stack web developers which are used monolithic synchronous workflows. Got a new guy on your team and it's just explaining to them how each piece of the pie fits together. That's just going to take time and documentation really is the only solution to that. It's just... Good documentation is important when you've got these asynchronous workflows.

Jeremy: Well I mean, the good news is, that the distributed tracing there's a lot of options out there now to do that but writing good tests and writing good documentation unfortunately that's a challenge, I think, for most organizations. Anyways. Well, thank you so much. Let's leave it there. So Paul, thanks again for being here. Really appreciate you sharing all your serverless knowledge. The amount of stuff that you've been writing is awesome, really enjoy it. New tips every, couple times a week, it's great. How do listeners find out more about you, if they want to subscribe to your newsletter or some of those other things?

Paul: That's great Jeremy and thanks for having me. You can get me on social media, on Twitter, and LinkedIn, @paulswail and you can get my website. It's serverlessfirst.com. My newsletter is there too.

Jeremy: Awesome. Alright, well, I will get all that into the show notes. Thanks again.

Paul: Super. Thank you Jeremy.

THIS EPISODE IS SPONSORED BY: Stackery

Episode source