ALTERNATE UNIVERSE DEV

Serverless Chats

Episode #26: re:Inventing Serverless with Chris Munns

About Chris Munns:

Chris Munns is the Senior Manager of Developer Advocacy for Serverless Applications at Amazon Web Services based in New York City. Chris works with AWS's developer customers to understand how serverless technologies can drastically change the way they think about building and running applications at potentially massive scale with minimal administration overhead. Prior to this role, Chris was the global Business Development Manager for DevOps at AWS, spent a few years as a Solutions Architect at AWS, and has held senior operations engineering posts at Etsy, Meetup, and other NYC based startups. Chris has a Bachelor of Science in Applied Networking and System Administration from the Rochester Institute of Technology.


Transcript:

Jeremy: Hi, everyone. I'm Jeremy Daly and you're listening to Serverless Chats. This week, I'm chatting with Chris Munns. Hey Chris, thanks for being here.

Chris: Hey, Jeremy. Thanks for having me.

Jeremy: You are the Senior Manager of Developer Advocacy for Serverless at AWS cloud. Why don't you tell the listeners a little bit about your background and what you do in that role?

Chris: For sure. Definitely. Going back to the earlier parts of my career, I started as what I guess, I would have considered a sysadmin. Maybe these days, you would call it a DevOps engineer or an SRV or something like that. I took care of servers and infrastructure, a jack of all trades across the stack below the application. Then, just a little over eight years ago, just about eight years ago, I first joined AWS solutions architect, did that for a couple years, actually went back out to a startup and then came back again. Then for the last three years, I have been a developer advocate for Serverless at AWS.

Then, just in the last year or so I've actually built out a team of people that are all over the globe. What we do as a team is we create a lot of content, we deliver a lot of content, we do a lot of interacting with our customers, trying to share the good word about Serverless and get people over the challenges and things that they are understanding the various aspects of our platform, I would say. You'll see a lot of our stuff show up in webinars, and Twitch and blog posts and in conferences, and in social media and all that stuff. I would say the next biggest part of what we do is act as a voice of the customer back to the product teams. We are embedded in the product organization, we have influence over and what product is built and to a degree, how it's built. We want to make sure that our customers, concerns, the things they're trying to solve the challenges that they have are being properly represented back to the product organization.

Jeremy: Great. All right. We are live actually in Las Vegas, we're at the Big Show, as AWS fans, I guess would call it. We're at re:Invent 2019 there have been ton of announcements so far this week and I think we're pretty much done, we've hit the max on cognitive load for the number of serverless announcements that have come out. There are a whole bunch of them that I want to talk about, and we can get into some of these in detail. There were some really great ones that I think solve a lot of customers pain points. What do you think are the biggest announcements that came out so far? Maybe not just that re:Invent, but also in the last couple of weeks? Because the last few weeks, there's been a ton of announcements as well. What are your thoughts on that?

Chris: Yeah, it's been a really hectic period for us in the serverless organization at AWS, in the last two weeks a whole bunch of things. Really, I like to boil it down to four key big things that we've launched in last couple months, announced in the last say three months that I think take on some of the biggest challenges that our customers have. The first was back in September, we announced that we were going to be changing the way that VPC networking worked for your lambda functions. We announced this new concept of what we call a VPC to VPC net, it's built on in a data based technology called Hyperplane, it's part of the advanced part of our networking stack.

As of last week, the week here before re:Invent, we got Thanksgiving here in United States, we actually finished the rollout all the public regions that we have across the globe. It's taken some time to get this rolled out. It's actually a really huge infrastructure shift, but basically what this did was it drastically lowered the overhead of having your functions attached to a VPC for cold start, we had examples where it was shaving 8, 9, 10 seconds off of that initial cold start pain. It also reduces the total number of them and so really huge one. That's the biggest one out in all public regions today globally and customers are just seeing the benefits of that.

The next is on Tuesday of this week, we announced a capability in Lambda called Provisioned Concurrency. You and I have some fun history in this that it was almost two years ago at a startup event in Boston, maybe it was? Where I talked a little bit about, some of the pre-warming hacks and then you and I just went through back and forth on it for a while you launched your-

Jeremy: Lambda Warmer.

Chris: Lambda warmer project, which has become the de facto standard. We're full circle here, you and I have this like, I don't know, two years later almost?

Jeremy: Right. It's actually funny because I wrote a blog post like an open letter to the lambda team that was asking for provision concurrency. You and I had this conversation way back when, and you said, "Well, we really don't want to do that. Because, we want to improve the cold starts and get those down." I'm actually really glad that the team at AWS did that because I think if you would have gone with provisioned concurrency before then the need to make those improvements wouldn't have been there. It pushed you and your team to and the engineers there to get those cold starts down and work on that. But now provisioned currency adds a whole new level, which again, I think is great. Do you want to talk about that now?

Chris: Yeah. We'll riff on that.

Jeremy: Okay.

Chris: I saw some commentary that people thought that this meant that we were giving up on continue to improve cold starts. It's like, we want to be really clear that, that's not the case. This is a knob or lever that you can turn that yes, it changes the way that functions are essentially, it's difficult to use the term pre warm, but effectively, they are pre baked up through the init phase of the function lifecycle. What we've done throughout the year, and I've made a couple of tweets about this in the first half of the year of places where we've shaved tens of milliseconds off of some part of the overhead of the platform, or we've lowered jitter on various aspects of it. There's a lot of that stuff that continues to happen and actually, I talked about this at Serverlessconf, New York City back in October.

I basically said one of the key benefits of Serverless is that it just keeps getting better for you. There's basically three ways that happens, one is all the stuff we do behind the scenes that you just never see. The second is the stuff that we launch, that we tell you about but it's just automatic, there's no like opt in, there's no option you need to take to enable that. That's basically what the VPC improvement look like. Then the third is where we give you an op where we say, "Hey, for certain things, you're going to want to make this conscientious decision or not to turn this on." Provisioned concurrency is an example of that third one. We see it primarily being for interactive or synchronous space workloads, primarily API, chat bots, things like that.

Again, what it does is it provides, I think, a more trusted solution than some of the things that you and I had even talked about that when you were like, "We can do this. It's a little hacky, you run this fourth logic and your handlers and you do this other thing." This is going to give folks a just a much more consistent method for doing this and then the outcome of that method is greater consistency, lower latency, and potentially even lower costs. That's one of the interesting aspects about this, we're looking at pricing. We didn't want to make this be a penalty for performance. We consider it a premium feature, for sure. That's mostly because we don't think that everyone needs it. There's only certain folks that are really, really can have this extremely low latency requirement.

By and large, and this is another thing we've talked about in the past, cold starts impact very, very, very few in locations by the biggest misunderstanding about the cold start, and how it impacts a workload. Is that really below the 98th percentile or something of traffic, you just don't see it.

Jeremy: You're not going to see it.

Chris: Where that last two percentile is something that people typically care of, so that some people really care about a lot, I should say. Provision concurrency help solve that for them.

Jeremy: Yeah. I actually, I liken this and maybe this is not the best way to think about it, but I think a good mental model is this like buying reserved instances of lambda functions, in a sense that's where you refer. You pay a little bit upfront, but you get a cheaper execution or the upper execution is a little bit cheaper. I look at it that way, but I totally agree with you on the cold start thing, where it's for almost 99% of what you're doing cold starts will never come into play and it's not that big of a deal. I can see there being workloads that are fairly consistent, that are fairly heavy that you might want to... I think the pattern here might be to slightly under your provision concurrency, so that you're almost always using 100% of that provision capacity. Then, the other thing that I thought was interesting is some people have pointed out, they're like, "Well, why not just do the lambda warmer or the cold the CloudWatch ping technique?

Chris: Yeah.

Jeremy: What's different is every time like my project has to call your function. You're using one of those. You're using the concurrent connection in order to do it and the system would allow you to do multiple concurrent connections and the way that it did that was by actually running a wait timer to keep the other ones open. Then there was enough to though you could actually build up that concurrency. The problem with that is that ties up that concurrency.

Chris: It's blocking.

Jeremy: It's doing blocking, right? Where this provision concurrency doesn't do blocking, you always have those available.

Chris: Again, I think that was one of the things where we never formalized the warming hack model as we consider it in any real way. There was also this unwritten what I call the 5, 15 rule, where we said, "We keep functions or execution environments, warm five minutes outside of VPC and 15 minutes inside of VPC." That was before the VPC networking improvement. That was before, a bunch of other things we might have coming out that might even make those times dynamic. The 5, 15 rule might go completely away and then people might have to even more creative and do all this other hacking, and do all this other stuff. Yes, the warming bottle that you and I have talked about it was blocking, effectively could be detrimental to customer requests and of itself.

Jeremy: Absolutely. Yes.

Chris: Again PC, provision concurrency called PC for short now, it gives you an official solution to this problem. Not everyone by far is going to need to solve this problem this way. For folks who do, this is the way to do it for now. That said, we just announced this thing two days ago, we're digesting everyone's feedback. Obviously, we've been talking to customers for quite some time in private conversations about it, but what my team is doing at this point is aggregating feedback, we're going to boil it down, we're going to keep looking at it. If we find out that this isn't the right model, then we're going to work to find the right model for our customers.

Jeremy: Actually, one of the things I really like about this provision concurrency too, is that yes, there are probably those use cases where you want to keep 500 lambda functions warm and that may be for enterprise, whatever, but even for smaller things. I think about a bunch of administrative APIs that I have where they're synchronous APIs. You maybe have, 10 20, 30 people using it at any given time, maybe even less than that, if you wanted to keep that warm, and you wanted to keep some important endpoints warm, so that admin users didn't run into a cold start, which again, would probably be fairly minimal anyways, at this point. That's something where, provision five or something like that it barely cost you anything, it's still going to be very inexpensive, but it would take away that pain that the occasional user would think.

Chris: One other big thing about for provision concurrency is that it's integrated with auto scaling. It is auto scaling pervasive across so many different parts of the platform. People are very familiar with its mechanisms and how it works. There's some little tweaks here that happened with provision concurrency. You can follow your traffic, you can schedule the ability to rise and fall at certain times the day. I think we're going to see people who say, "Yeah, I have a highly latency sensitive application, I'm going to over provision because I need my TP99, TP100 to be really, really low and consistent with the TP50 that I have." If you're not familiar with that term, we're talking about, essentially, performance across the graph of performance over how customers are experiencing your application.

I think we're going to see people doing a lot of different things with it. I think we're going to see people use it and be like, "I probably don't need this." Now it's a tool that's out there, it is going to solve some big customer challenges, that's going to unblock a lot of people who want to build serverless applications.

Jeremy: I want to move on to other things, but one more point that I want to make on provision concurrency, because I was just talking about this actually with Slobodan and Alexander, who are two other Serverless heroes. We were saying, the pattern here or the best practice, leading practice is single purpose lambda function. We're building a lot of similar purpose lambda functions that are endpoints on APIs. Now you say, I want to build an API that has 50 different endpoints, maybe I want to keep those warm for some reason. Now the pricing model here is I'm paying for warming all these different lambda functions. One of the things I don't want to see happen is to slide back into the monolithic lambda function, because now I can keep that warm. Just a thought.

Chris: About that. In my talk, one of my talks that I have this week, the code for it is SVS-343, building microservices, AWS lambda. I actually have some... I talked about this. I talked about the per function, per API action model. I talked about the lambda length, the lambda based monolith that's out there. Internally, with our internal field experts and our subject matter experts internally, we've had a bunch of heated discussions about this. One thing that we see is that there are a number of developers that have frameworks that they love and that framework says I want to route logic inside of my function. Number of API frameworks that are out there that will do this. It ends up working for people, they end up being happy with it, they end up being successful with it.

Now, I would say that I think that you, Alexander, Slobodan and myself, were purists and wanting to see people use the platform the way that the maker intended to a degree.

Jeremy: You want to fail the mechanisms, you want all that resiliency and all of that stuff. I mean, I always tell people don't put dry catches in your lambda functions, let the function fail and let the cloud handle that.

Chris: We can talk about the new stuff that helps make that even better. I think at the end of the day the lambda lift also enables potentially some submitter code portability, people really care about that. There are definitely points where it starts to become problematic, where you reach the balance of what you can do in that lambda lift. Then as soon as you reach that tipping point, you're in like, "No, how do I break apart this thing?" Then you run into all the challenges that you would have with any monolithic application.

I think yes, we will see people who say, "Well, I've got this provision concurrency thing and upload my coder to one function. I don't have to prison that much or something." Maybe to a degree, they'll still end up having to provision some amount towards the cumulative amount of requests that they would have gotten across all those individual function. Maybe being too smart, despite yourself to certain degree, we'll have to see, we'll have to see what patterns end up really coming out from this.

Jeremy: I mean, that might be something interesting that could happen is where it's not about provisioning a specific lambda function. It's just maybe provisioning a certain amount of concurrency across a group of lambda functions or something like that and I'm sure there're things.

Chris: I mean, one of the big things that that provision concurrency does, we could definitely change topics. It gets you all the way up through in it. All the way in the lambda life cycle up through we execute your pre handler. What we've actually seen is that that pre handler code is the more expensive part of the equation most times, then the platform overhead. You've got people importing packages, they're talking out to other API services. Maybe they're getting stuff secrets manager from parameter store, it's crypto mass, we've got decrypted all these things. Provision concurrency getting you through in it, but stopping at your handlers actually, that's a big part of it.

We're going to have to see what patterns emerge, we're going to have to see what feedback we get and how to tweak it. I don't think this is a one and done feature now. I think you'll see some stuff in the future that tweaks in one way or the other.

Jeremy: Great. All right, so that was two.

Chris: That was two.

Jeremy: Next one.

Chris: Man, I'm trying. I got two more here and I'm trying to think of what order I want to talk about them because they're both so important. I'll do it in chronological order from when they launched. On Tuesday, we also basically co announced with the RDS relational database service, something called RDS Proxy. Again, those folks who have been building serverless applications for some time now, working with relational databases has been a challenge, you have to deal with the fact that your functions may need to establish new connections. Then when the function is idle, it doesn't necessarily tear that down. That establishing of new connection overhead could be expensive of both the database and your function. There's the scale challenge of you potentially consuming lots of connections on your database.

Jeremy: Zombie connections.

Chris: Yes. That would lead to people out sizing the database to deal just connections, which is an unfortunate thing to do. Long story short, this is going to help remove that problem. With RDS proxy, we basically are creating a database proxy. There are lots of other solutions for this in the industry. There's PG bouncer for Post grads. There's MySQL proxy that existed, open source packages, a couple others that exist that are out there. This one is built by us for the cloud, scales for the cloud. It's going to do connection management shared connections, helps you with things like failover for multi AZ, databases for you. It's only in preview right now, it supports just MySQL. We're obviously hearing a lot of feedback about Post grads. I'll just say seat tight folks.

Jeremy: I've heard that too.

Chris: Seat tight. We'll see what happens by the time we get to GA or soon after, maybe. This is a really big one. Now between the performance improvements and VPC, allowing people to put more functions in a VPC where RDS is typically running. Then this, you now basically get to the point where you can do these clicking relational database, have it really efficient and effective, not eat up a lot of resources, have to be really fast. This is a huge one. I've had some people say, "Wow, this is the biggest one of the weeks for me."

Jeremy: This is the other thing is funny. On top of the lambda, or the provision concurrency, which I had that package on lambda one more package. I also have a package called Serverless-MySQL which essentially does connection management for you. What it does use the process list cleans up zombie connections, it'll say if you set like 70% capacity, then it will automatically kill connections up to a certain point. AWS this week has killed two of my open source projects, but I actually love it because I don't want to do that. Right? I love the fact that you build workarounds, I think that's where we are with Serverless right now anyways, is that there are certain things that you think you can do or the easy to do with other things and connection pooling should be the simplest of things. When you have a femoral compute, and each one has to compete for those connections, like you said, you need some way to manage it.

Managing it with an open source package worked really well I get a lot of people using that package and things like that, I think people will still use it because they're it does some good transaction handling and or just got a better workflow for transactions and stuff. As I promote my own stuff. Seriously, now I think it's great because this is this was a huge missing piece where people are just not willing to let go of relationships of the human basis,

Chris: For very good reasons, right?

Jeremy: Yes.

Chris: It's where their data is today, it's what they understand they've maybe been trained in it or just have so many years of experience with it. I don't think we were ever comfortable with the idea of telling people like, "Up too bad. Time is relational." Big one. It's a block a lot of workloads, especially in the enterprise. Even for developers that are just in general, more comfortable relational databases is going to open up lambda for them.

Jeremy: All right, so just a couple of questions on this to clarify. The RDS proxy you still have to run inside of VPC, right?

Chris: Correct.

Jeremy: Then in terms of how that connects, there's some secret's manager, stuff that you need to do at the proxy layer itself?

Chris: Yeah. The RDS proxy ends up using secrets manager to handle the secret management between your database and the proxy itself. What's great then is this tie in up through your lambda function, so that you're not hard coding usernames and passwords in places. You can use either the IMF formication methods that they have with RDS today, or you can still use username and password however, managed by secret's manager. You just have to give your function access to that data inside of secrets manager, and it's pulling on the fly and all that cool stuff.

Jeremy: Right now in the preview, it only supports RDS Aurora MySQL, it doesn't support Serverless Aurora yet, right?

Chris: It supports RDS MySQL or Aurora MySQL but not yet Serverless MySQL. Correct.

Jeremy: All right, next one.

Chris: Yeah. Then last, but definitely, definitely not least is we also just announced yesterday, basically a new model for Amazon API gateway. For Amazon API gateway, the first effectively API model that we launched with was what we called it REST APIs. It was very much meant to be almost kind of buy the book to the purest vision of REST APIs. Last year, we announced web socket support, which was one of the biggest things that we were asked for from our customers. I think if we look at API gateway, it's almost like a misunderstood product. It's an incredibly sophisticated, powerful product that just can give you so many knobs, levers and so many things. To a degree customers are just like, "We just want something really simple, really easy, like bit more basic."

Given that, given feedback on performance and cost, and what people perceived what they were spending their money on, which is all valid feedback. We take it, we track it, we quantify it, all of that, like that's what we do at AWS. Yesterday, we announced something that we're calling HTTP based APIs. I'm trying to make sure I don't get the numbers mixed up here, it's 70% less cost than the REST API model.

Jeremy: It's $1 per million locations as opposed to $3.50 per million.

Chris: $3.50. Yes. I believe it's a 60% reduction in the overhead that API gateway used to add. I don't think it was pretty quick, but it did have a whole lot. We still had people saying, "I want even faster." This is going to make it so that you can build, really simple basic APIs up through still fairly complex APIs. We've got a bunch of new authorization capabilities. We've got just a bunch of other things that you could do with it that make it a little more flexible in some ways. There are some things that were in the REST APIs that are not in HTTP yet. It's also a service that's just in preview right now. Again, I think for Serverless customers, in particular, with the way that how a lot of our Serverless developer customers are building APIs, this is just going to be just a solid win for them. It is effectively a new product inside of the product name. You do have to relaunch your applications to support it. It's not just like a toggle. You can do a number of things to export an API that imported in. Or you can just redeploy your application up with a new version of it.

Jeremy: I think that this is a very cool new product and the new way we're going to do, it's definitely going to reduce costs. The model for this is, is that the lambda proxy model, where it's that full pass through into the into lambda function?

Chris: Correct. Today, it's definitely meant to be a much simpler, easier experience.

Jeremy: No VTL templates and that stuff.

Chris: No, no, you shouldn't do any of that right now. Definitely, not. We've got a bunch of new easier capabilities around Corps, talking about the authorization of authorized capabilities. Now we have JWT authorized tickets for open ID, which was a big one that people have been requesting.

Jeremy: Announced Apple log in for Cognito, right?

Chris: Cognito, now it's Apple login. That's going to help folks that are in the Apple iOS, OSX world of things. Which is not a small community of developers. A bunch of new stuff that you'll be able to do with this. Again, I think the lower costs, the better performance and some of the easier configuration capabilities spec.

Jeremy: Great. All right, so that was four. Those are the four big ones, I know been talking for a while, but there are a few other ones that I do want to get to, EventBridge schema registry. I'm a huge fan of EventBridge and now using it in every project I'm building. Again, I talked to some of the team about EventBridge, and I think there're some amazing things coming down the road for that. Tell us about the schema registry.

Chris: Backing up for those who don't really know what EventBridge is at the end of the day. EventBridge has a concept called Buses. This is built upon what we were doing previously with CloudWatch events, so takes the same underlying technology for that. Essentially, what you do with EventBridge is you have the ability to connect some source service and event source. This could be a number of AWS services, it could be third party SAS products, or something custom that you want. Then what EventBridge does, its bus models then allow you to pass that event through set really fine grained rules against the actual individual attributes of the event. It's JSON structure we support today. Then that can then be targeted out to I think it's like 17 different services. It's lambda, its SQS, SNS, it's step function.

Jeremy: Step functions, which you can't do with SNS.

Chris: It's Kinesis, it's Fargate. There's a bunch of different places that you can pass the events. I table this, event will how to structure, you can see that schema and what we're now giving you the ability to do is to basically for your applications, for those third party application, for AWS surface applications. Track the schema of those, register it, you'll be able to determine like a type on it. Then the coolest, coolest thing I think is that we give you the ability to generate what we're calling code binding, which is basically code that will allow you to pull out the individual attributes of that event.

I've heard this from a number of folks over the years, Mike and john from Sinfonia. They always talk about like, "Just give us some code to pull apart the events." Well, to a degree we did here. Code bindings is a super big part of it. I think this is still some early capabilities is still in preview, but being able to track the types of events, which when you have that schema registry there, and stepping back to bigger picture with EventBridge. One of the ideas that we see is when you have all these events flowing through your infrastructure, people will just say, "How do I discover when an event is?" Service registry helps with that. You could see an event, you could see the schema of it, you can see the attributes inside of it, you can decide how you want to filter or have a rule set configure for that.

As you're in, especially for larger organizations, as the number of services that you have expands that you want to do new things consuming different services, the registry is going to make it really easy for you to discover what types of events are flowing through your system.

Jeremy: I think for me, I use EventBridge so much and I have so many events flying through it right now, with different microservices. That's using that as that main bus, that just the number of events is staggering, there're different types of events. Obviously, having the registry for the AWS events, that's easy downloading those code bindings, auto complete in your VS code or whatever you're using. That's a super handy feature, but when you start generating your own events across teams, being able to discover interesting events, categorize them and put them into that registry now I think will be really, really helpful. I posted something on Twitter about this, where I said, "The registry would be great getting people to put stuff in the registry is a challenge in and of itself." With the schema discovery piece of it, I think that is going to be a huge step where people will really find it useful. I'm very much so excited about that.

All right, so then another thing that came out and we can... this is a quick thing, but Express workflows.

Chris: Express workflows, this is four step functions. Again, for folks who are not familiar with step functions, step functions is a managed orchestration service, if you will, that allows you to take all of the workflow logic that you would otherwise be writing code for. Things like decision tree logic, how to chain functions or chain capabilities together, parallelization, failure handling, things like back off and retry. This is all stuff that developers always have written code for, that you end up pulling in some random module off of NPM or hit package or something like, "This is exponential back off retry and it's made by code blaster 317, how fast is this?"

Step functions helps you take that logic out, put it up to a managed platform for you, so that you don't have to think about how to do that. Step functions has been out now for a couple years and what we did, basically, or what we heard from customers was a couple of things. One, the way the service was built, the throughput of it maybe couldn't handle so the most extreme workloads. There was a cost aspect to it that also made some folks, it didn't work for them, basically. What Express workflows do is they have a really, really massive scale difference. The default limits here, and I just had to pull this up from my notes for standard workflows was over 2000 per second. This supports over 100,000 per second, there's a magnitude difference there.

There's some trade offs, I would say, are just the differences between these with standard workflows. You could have a workflow that could run for a year. That's an unusual thing, but it's something that we've seen people need. Because workflows aren't just for lambda functions, you could have human actions, you could have batch processing, you can have things that go away and come back again, at some point.

Jeremy: They're called back pattern and some of those other things.

Chris: Yeah. This actually gives you a maximum time, five minutes. Where you have a very discreet scope workflow, where you still have the same logic that you want to capture, you still want to do the same type of failure handling, but it isn't one of these much longer types of runs that might happen. This is pretty key for that. One of the other things is a giant cost difference for this. This works out to be about $1 per million in locations where I believe the standard workflows was about $25 per million state transitions.

Jeremy: It was 2 cents per 1000 states transition or something.

Chris: I may actually be wrong on the math on that one, but basically, still it is a magnitude difference in cost. Again, I think this is another thing where it's just going to unblock people for being more comfortable with saying, "Yeah, you know what, I don't have this really long running workflow execution, I want to take some data and I want to pass it through a couple different services real quick. Then a couple different lambda functions, or whatever it is that may be trying to get the end result of that be done." This is just going to enable that to happen a much greater scale.

Jeremy: Yeah. Even just from the pricing standpoint is huge for a lot of those workflows, even the ones that I've been doing. We have an article system at the company I work at, and we pulled out articles from the internet, and we run them through a series of national language processing, and we do some extraction and then we do some algorithms. Now all of that happens within 30 seconds, but I have several pieces of that logic that I like to reuse in different ways. What I ended up doing is either stitching those functions together with a synchronous call or something like that, which has some problems that you can get, or you're just basically putting all of that code into one lambda lift as you said and I don't like doing that.

This is one of those things where now, step functions will actually become very, very useful to me and affordable for the volume that we're doing. Because even so it just gets out of control. I think that is a very important one, and it'll make function composition for the right types of things. Then just having those guarantees and the back offs and the retries and the error handling taken care of for you. I think that one is pretty cool. All right, there was the Amplified data store that was launched, I think people should go and look at that if you're in the mobile space. That's a little bit outside of your scope.

Chris: My knowledge and ability to be hands on with it's limited to this point, but my understanding is it's really here to help you with offline data synchronization, storing data on the device. For mobile developers, it's super powerful.

Jeremy: I think that is certainly going to fit in, especially with amplify and all these other things that are happening. Amplify is launching a lot of stuff into the Serverless space and so there's some overlap there, but definitely more mobile. I'll have to get like maybe Nader Dabit on and we talk about that.

Chris: You should. Yeah.

Jeremy: Those are the main things, I think that were launched this week. As we said, there are some things that were launched leading up to that. One of those things, which I think is a huge game changer is lambda destinations.

Chris: Yeah. Yeah, absolutely. This is a big one. It's one for people who are new to Serverless application design, they are like... they scratch their head a little bit trying to think about it. You mentioned this before, of why. What does lambda destinations allow you to do? Basically for asynchronous invocations it allows you to capture either the success or failure outcome of that function execution. We have for many years now had a concept called dead letter queues, which would allow you to capture in failure scenarios the event requests that went and that failed. You could take that message, the dead letter queue, reprocess it somewhere else, pull it back up later. Basically, it would help you with capturing, and then being able to retry failed events.

We had a lot of situations where customers were creating and writing a lot of code for the success path as well. You had times where, let's say you're processing data out of s3. People are uploading images, uploading data files, whatever it might be, s3 calls lambda, lambda executes, well, what happened? For a lot of folks, they'd be doing a lot of log writing. Maybe they're creating almost like an inventory system in Dynamo DB or something like that to track actions.

Jeremy: You're including the SDK and then having a call from the SDK, and from the lambda function to another service.

Chris: Yeah. It was overhead in a couple different ways that people didn't want to deal with. What lambda destination does completely out of band from your code, to write any code to handle this, it's now just built into lambda. You have the ability to take both the success and the failure of a lambda execution, and pop it off to one of a number different places. You can either send it to a different lambda function. You could also send it to an Amazon SNS or SQS target, as well as EventBridge. This will allow for some interesting chaining of functions, it would allow you to take at least in the success cases and say, "Hey, okay, so we did complete this stuff in this lambda function is this action in my workflow. Now let's send it elsewhere for something else, or some other service or some other space."

I could see where this plus EventBridge could be super powerful. EventBridge is just such an awesome product because of all things that could do. It's like, "Yes, you could send these directly to SS and SQS, you decided to EventBridge, we could also send it to their and then also a lot of other places."

Jeremy: With the event registry, with the schema registry, you basically could have other teams saying, "By the way, when that s3 file gets uploaded, the file gets uploaded a lambda function processes it." Then after that's done, it sends an event to EventBridge that says, "Hey, this was processed with some information around it." Now you may have a service from some other team that says, "Hey, would really like to go when somebody has uploaded that image?" You don't have to do anything anymore.

Chris: Exactly.

Jeremy: I mean, in that situation, s3 to lambda to EventBridge, you don't even need to include the Amazon SDK or the AWS SDK and write any of that code. You just process it and payload back and it handles everything for you.

Chris: One of the things I talked about in actually two my talks this week. The two talks that I have is about this tendency for people to always want to build synchronous applications. They want to glue part A to part B to part C. This is one of the things that over the last couple years, we keep having to reinforce this aspect that building more asynchronous distributed applications is the only way to build them. We still see a lot of people doing these types of motions, reading a lot of this glue code, doing all this extra work. This is going to enable it being even easier to build these distributed workflows to build, but not just build resilient lots.

Jeremy: Yes.

Chris: You capture both the success, the failure, we give you both the request and potentially the response. You can take either of those pieces of data as you might need, and then do something else more beyond that. The failure side of things now, you could take this and send it to EventBridge and say, "You know what, let's send it to..." Here's maybe like an interesting one, "The work that you were trying to do was too big for your lambda function." You could basically build a system here that captures the failure of the event and attempts to reprocess it and Fargate and UCS and some other place. You can get away from like, "The limitations of lambda block me from doing this occasionally." One of the hundred requests is out of the bounds of the limits of lambda, I can't use lambda anymore. It's like this basically changes it so that you don't have to have a lot of crazy logical front, you don't have to do anything super creative behind the scenes, you can pass it back through the rest of the ecosystem of things that exist and solve the problem. It's a minor, little thing, but it's going to be super powerful for how people architect distributed applications for Serverless.

Jeremy: You mentioned that people really wanting that synchronous experience and I totally agree, I think that, that's something where people are like, "Well, I need to know that this happened before I can maybe move on to the next step." I think where people are maybe not thinking about this is you want to do that with long polling, like you want to do an HTTP request and wait for all this stuff to happen and come back. You don't want to do that. I mean, there you could use web sockets with API gateway if you wanted to, make connection calls API that bounces around a bunch of different services do it if you need an incentive back to, or even app sync, for example, has this or two way push subscriptions built into it.

Once you make some change and make some calls, and when that data is updated, then that will go ahead and push back to you and do it. There are different ways to build that model and get a synchronous feel, but still be using the benefits and the resiliency and not have service six, the six service in line fails, and then everything fails. Then you have to reprocess the whole thing. That's interesting. Those are the success pass stuff, you mentioned a little bit about the DL queues stuff. This is the failure mode on this really should replace DL queues now on a synchronous and that's because you now get the context you get the payload like you used to with the DL queues, but now you actually get the context of the error when it fails.

Chris: Yes. I mean, if you have your DL queues today, great. Keep using them. This supersedes DL queues. This is better DL queues, it doesn't cost you anything extra, it doesn't change your code, you could plug your DL queue, consuming or retry model back into this same thing. This just gives you better options for it. If you've got DL queues today, again, this is just going to give you more information. It's a minor configuration change.

Jeremy: I think the pattern here is probably on our send it to EventBridge, EventBridge maybe puts it back into SQS, if you want to do some replay or something like that. Now the shape of the data is going to look a little bit different, but here's a trick I haven't tried yet. I think this will work, do your failure into EventBridge, and then use the subscription to SQS to actually transform the data to just put the original payload back into SQS. I think you can do that.

Chris: I think you should be able to do that. I mean, you should be able to do that. The is question do you need to? I don't know.

Jeremy: Do you need to? I mean, that's right. Whatever your retry mechanism or your replay mechanism could certainly do that. I just like to think of weird things like sort of play around. All right, so then more on DL queues actually on another announcement was SNS DL queues.

Chris: Distributed systems have some unique qualities to them, and actually-

Jeremy: Everything fails all the time.

Chris: Yeah. Verner has its line, rid of rebel CTO of Amazon has his line about, everything fails all the time. We get a lot of people to say things like, "I want to guarantee that nothing's ever going to fail." I'm like, "If I had that for you, I would be gambling right now and not talking to you because I would be some mega genius or some fortune teller." There's always the potential that something could go boom in an application. There's no way to avoid this, if you think you've avoided it somehow in your on prem server somewhere. You're wrong. There's even a reason why NASA does things in threes and fours and fives and stuff like that.

One of the things that we have with services like SNS, it gets an event that comes in for the sources. Then it wants to send it to something like lambda or to another target like an HTTP endpoint, or to SQS. If for some reason, it can't reach that endpoint. It can't deliver that message. SNS by default does do retries different targets. If it continues to fail over a period of time, you could have hypothetically in the past lost that message. Straightforward just gives that DL queue or the dead letter queue mechanism to SNS, so if you have a failure off of it, that you can capture it and retry it, or do whatever you might need to do with it. For those rare situations, where you have that type of an issue, again, especially with if you do DL queues to something like SQL, then I'm not paying for anything, unless you have the problem. If you the problem, you have safety net, that's to pay for your use model that you can pull stuff out of it as you need to.

Jeremy: It's entirely worth it.

Chris: Yeah. I've been preaching around lambda for a while now for general asynchronous workflows to just enable DL queues. I probably wish that I could say that we would just like enable it for default for you, but it is a mechanism I think a little bit about. If you have SNS go enable DL queues, just go ahead and do it. Set it up to go to a queue, setup CloudWatch monitor to look at queued up, and maybe you don't even know how to pull stuff out of it right now. Set up that alarm when it does, you could say, "Okay, well, I can queue stuff up for pretty long period of time couple of days, and then find a way to process it, pull it out."

Jeremy: Another pattern that this opens up, which I think is really, really interesting. This is in combination with the lambda destinations is let's say you're running some piece of processing logic and a lambda function. That has to now make a call to an API, a third party API, you take your lambda function, you process it and the payload uses success that sends it to SNS. SNS has a subscription, an HTTP subscription, and then a DL queue on that. If that fails, there's nothing, you're not writing any code here other than here's what it should be that goes into my endpoint.

Chris: Exactly.

Jeremy: I think, that just gets rid of so many headaches. Because if you're trying to do that HTTP call from your lambda function, you have a problem. By the way, you don't need a nap anymore, if you're running it in a VPC because now you can just send it out. Let SNS make that HTTP call for you and now you've saved $30 a month for having an ad or whatever that is there. Right?

Chris: Yeah. It's synchronizing and it's calling.

Jeremy: Really interested things it's all this does.

Chris: When you use a service like SQS or SNS or EventBridge, or Kinesis for asynchronous communication, you get with it the benefits of the persistence and durability capabilities. It's not like, "I have this data in my single execution environment and if that goes, boom, I lose it." You put it into there and you have these durability capabilities and these persistence capabilities that again and of themselves are super powerful.

Jeremy: All right, one more, and then I'll let you get back to the expo floor. I know you have another talk later on today. SQS FIFO support for lambda functions.

Chris: This is a big one, we first announced SQL support for lambda the ability for lambda to directly consume off of SQS queues. SR queues, I should say, back in the summer of 2018. It was one of these use cases that is just so lambda-y I should say. People wanted it for so long, and we got around to it. We had to build some things before we can make that happen. We launched that everybody was like, "Great, now FIFO support." FIFO support or First-In, First-Out, basically gives you order data inside of a queue. There are a lot of situations where people care about order of records coming in, whether it be for transactional things, whether it be for sensor data, IoT workloads, tracking of all sorts of things. Clickstream tracking, basically any other options that have Kinesis and you can just throw things into and pull it out when you want.

Chris: FIFO support, again, now supported in lambda has the ability for you to pull out batches of records aligned based on attributes-

Jeremy: Message group ID.

Chris: They're message group ID. It could still actually support some really massive throughput and be able to again allow you to buffer up a bunch of information and pull it out all out ordered if you need to, you don't have to run any extra code, you don't have to do any pulling yourself. You just have your lambda functions consume records and do the thing.

Jeremy: What I love about the order nature of this as you think about Kinesis and that's been the go to when you wanted ordered records, and then have to do shards, so and then shards would scale and I know you can added some other cool things like the parallel stuff.

Chris: I had a similar one too.

Jeremy: We don't have enough time to talk about that.

Chris: Yeah, next time.

Jeremy: With the FIFO thing that's really cool is, like you said on message group ID essentially, that creates, essentially a shard ID, if you think about it, then you can have... if you want 10 separate lambda functions to be processing things in parallel, you can pull stuff off of that stream using that group ID, you'd have 10 different group IDs that could do that. You can use that group ID then or the message group ID to segment maybe different customers or whatever. It's only parts of the stream that you need ordered or in group by a certain things. I think there're some patterns in there to where you might be able to use it for priority and some of those things you could do some interesting stuff.

Chris: Absolutely.

Jeremy: All right, great. Let's close with this because I love talking to the people that AWS everybody all the PMs, all the engineers.

Chris: Thank you.

Jeremy: Everyone is just so excited about the future. I think you got a lot of people like me and there's a whole bunch of we hang on the What's New blog like, "What's coming out next? What can I build with it?" I think, you've got 65,000 people here who share that enthusiasm but that is infectious because of the way that your team and the PMs that are building these products. I think you're more in the role of advocacy, obviously and you've got a great team of people, you've added some great people. Then I know like, there're some evangelists as well who are doing a similar, a little bit different. What's that future of advocacy at AWS? What are your plans for Serverless around this and getting that getting people on board with, "Hey, Serverless is the way?"

Chris: For sure. In a broader sense, I and maybe it's a little boring to like our strategy, I think of customers across like three different life cycles. There's the completely net new green customer who is like, "What is this stuff? How does it work?" They're the folks that are further along in their journey. They're building applications for production. They're they're doing real life, real world things with it. They're like, "You know what, I'm running into some rough edges, I'm looking for some best practices guidance. How do I scale this right? How we do the best patterns and stuff like that?" Then there are the folks like yourself, like Bengio from IRobot. Many of our other heroes, many of our big customers where you're like, pushing on the bounds of what we can do and how we do it.

Across all those different areas, we want to look to be able to tell stories, share advice, give guidance, gather feedback, continue to grow the space, grow the workloads, we use this hashtag a lot all of us, which is #ServerlessForEveryone. I said big picture, the view that we have is that we want to be in a position where customers could say, "We're going to be Serverless first." Those customers could be in any industry, in any vertical and any size, building any type of an application. Then we want them to say, "Okay, we're going to be serverless first, and we're going to knock it down based on a roadblock or a limit or a challenge." Then part of what my team has to do is, "Okay, great. Tell me more about it. Tell me more about that story. Let me take that feedback and deliver it back to the PMs, so that we can start thinking about what is the potential solution that we built for it?"

I think in 2020, you're going to see my team and some of the technical evangelist, other folks inside AWS all over the place talking about Serverless. You're going to see a lot more blog posts, you're going to see maybe some more instructive guidance around certain topics. We're going to keep doing tech talks, twitch and in a lot of conferences and a lot of places. If you're lucky, you'll get to see Eric Johnson up on stage and Munns full of energy and excitement. You get to read the incredible stuff that the team is writing as well. Every year for the last five years Serverless is getting bigger and bigger and bigger, bigger. Lambda is typically one of the top 1, 2, 3 topics that are summits or even this year as well. The talks are super packed. This space just keeps growing, the customers keep doing incredible things. It's changed the way they build applications, and we just want to continue to magnify and grow that I'd say.

Jeremy: That's awesome. I know they just announced the Builder's Library and I'm sure that you'll probably contribute to some of that.

Chris: I'm not smart enough for that stuff. That's for the real experts.

Jeremy: Well, hopefully we get some good stuff there. Then also, I know, Heitor Lessa is working on a revamp of the Well Architected Framework with Serverless Lens.

Chris: Yes. Very cool stuff on there.

Jeremy: I think that'll be out soon, if it's not already. There's just a lot of good stuff, a lot of good information and like you said, and there's a bunch of great conferences now and all those Serverless state conferences, all around the globe, really interesting people speaking. Not just from Amazon either, I think it's really great to see what Azure is doing and see how GCP is doing and where they're pushing the boundaries where they're going. Because I think that, if it's solving customer problems, that's what AWS is focused on and that's great.

Chris: My view is that the deeper and richer the ecosystem, the better it is for everybody.

Jeremy: You can't be in an echo chamber.

Chris: Exactly.

Jeremy: Awesome. All right. Well, Chris, thank you so much for being here and taking the time to do this.

Chris: Of course.

Jeremy: If people want to get in touch with you and find out more about Serverless or what AWS does, how they do that?

Chris: You could find me on Twitter @chrismunns. You could also if you ever need to reach out about something deeper, I can give me my last name, Munns, munns@amazon.com. Real quick, I would say people often ask me, "Where can I find out about the latest greatest launches and things like that?" We post almost all of our content on AWS compute blog, and that's where Serverless does a lot of its announcements, how to post updates, things like that. You can go and search for AWS blog, you'll find it right away and see the most recent posts that we have.

Jeremy: Perfect. I will get all that into the show notes. Thanks again, Chris.

Chris: Cool. Thanks for having me. Take care.

Episode source