ALTERNATE UNIVERSE DEV

Serverless Chats

Episode #78: Statefulness and Serverless with Rodric Rabbah

About Rodric Rabbah

Rodric Rabbah is the co-founder and CTO of the serverless computing company called Nimbella. He is also one of the creators and the lead technical contributor to Apache OpenWhisk, an advanced and production-ready serverless computing platform. OpenWhisk is open source and offered as a hosted service from IBM and Adobe. It is also deployed on-prem in several organizations worldwide.

Twitter: @rabbah
Personal website: rabbah.io
Nimbella: nimbella.com
Apache OpenWhisk: openwhisk.org

Watch this episode on YouTube: https://youtu.be/xVZhFHmEuKY

Transcript

Jeremy: Hi everyone. I'm Jeremy Daly and this is Serverless Chats. Today I'm chatting with Rodric Rabbah. Hey Rodric. Thanks for joining me

Rodric: Hey, Jeremy. Thanks for having me. I'm really excited about our discussion today.

Jeremy: Awesome. So you are the co-founder and CTO at Nimbella and I'd love it if you could tell the listeners a little bit about Nimbella, but I'd really love to hear about your background as well.

Rodric: Okay, yeah, thanks for giving me the opportunity. So, I started Nimbella about two years ago, just over two years ago, and it was after a long stint at IBM for 11 years, and IBM Research specifically. And there I did a number of things that touched on programming languages, compilers, hardware synthesis, and FPGAs. And the last project I really did was creating IBM serverless functions offering, which is now called identify functions, but it started as OpenWhisk. And OpenWhisk is now an Apache project that we have donated to the Apache Foundation four years ago and really is what started my serverless journey six years ago. So my background is mixed. It has experience from programming languages, compilers, hardware systems, and I've done a lot of things that I think have a common theme across verticals, or I build verticals that cross lots of different layers of the stack.

Jeremy:  Awesome. All right, so I want to start with IBM and OpenWhisk because this fascinates me where, you know, six years ago and just recently, I mean, it was a six-year birthday of AWS Lambda and I think it's, that this sort of kicked off a massive sort of investment and maybe almost like a space race except for the cloud, I guess, a serverless race against all these different vendors. So, you were involved with this very, very early on, right? I mean, like I think it was your project, right? So I'd love to hear how this all came about, like, why did IBM suddenly say, “Okay, we need to build a serverless offering?”

Rodric: Right. So, before I started working on OpenWhisk, I was doing something completely different, where I was debugging hardware and looking at how to generate hardware from software. And then we saw the Amazon Lambda announcement at the time and it was literally right around this time. And as soon as we saw it, it was sort of one of those moments where you just realize, “Oh, my God, this is a dramatic shift in technology that's coming.” And even though it was basically day zero you could see, you know from the right perspective, you could see what the future is. And now I say serverless is inevitable, because you know, whether you're on the bandwagon or not, you will be in the future because that's the only way developers will want to build. So, in the early days, you know, we saw the announcement we were looking at the project we were doing like they gave us the terminology that we were looking for. It was just take hold and just run it in the cloud and we were trying to do something like that with hardware, you know take software and now we can accelerate for you on an FPGA, which is reconfigurable hardware without you having to worry about compilation running.

And so we were doing work in a similar context in a similar area but completely different context, when we saw it. It was like this is it. So we got together as a small team within IBM, six people, and we were having discussions about Lambda and the future. What did it mean for IBM Cloud? And you know from IBM Research our job really was to sort of look at technology that's on the horizon and you know, five, ten years in the future and start thinking about, what does that mean? And after about, you know, a few weeks of just talking about it, I got tired of talking about it and over a weekend I built the first version of what became Apache OpenWhisk.

And it started with, you know, a command line tool which is the programming interface essentially to the cloud, allowed me to create functions, run them, get their logs, and recorded a short video by Sunday night and then sent it in, you know, Monday morning it got in front of the right eyes. You know that whole thing about being at the right place at the right time and it started circulating and from there, we're like, okay this is turning into something, and off we went. So it was sort of the Amazon Lambda landed on the scene. There was recognition that this is something really transformative into the future and then the will to just build something and once you start building something I think good things start to happen, you know, when you're surrounded by good people like we were at IBM. And the project root, I mean, we were three people, we launched the early version of OpenWhisk internally; it was called BlueWhisk, I think, at the time and, you know, I think within one year of when the first commit to the project started to an IBM announced at their big developer conference, it took us basically one year from commit to launch.

And we launched out of IBM Research, which was, again, unheard of. And right around the time we launched, Google Cloud announced Functions, and I think Azure also announced Functions. So, we weren't the only ones, sort of, that saw that shift coming and everybody really started basically saying, “Oh yeah, there is an arms race here or space race.” And I was really excited because it sort of transformed what I've been doing and I think it's been really exciting and rewarding for me.

Jeremy: So, I absolutely love this idea of these sort of ideation, like, these genesis meetings that happen in organizations where you're like, all right, there's this major transformational shift that's going on there. So, to the extent that you can and, again, I don't know if there were executives in there, who were in that meeting, you don't have to disclose that, but if you, to the extent that you can take me inside that meeting, what was the conversation. Was it like, “Oh, we just need to do this because we need to compete with Lambda,” or was it, “we need to compete with AWS,” or was it something where the structure of IBM Cloud realized that this truly needed to be done?

Rodric: Right. So, it was a bit of the latter. And in fact, when we started coding we tried very carefully not to use the word “Lambda” anywhere, that was sort of just IBM bureaucracy that maybe was ingrained in us, but it was a bit of the latter. I mean, I remember the meeting very well. I remember who was in it. I remember where everybody was sitting because it was that kind of transformational meeting, at least for me, and as I saw it. And it was a recognition that, you know, if you wanted to move applications to the cloud or you wanted to transform an organization and become more, you know, what people call cloud native today, basically using the tools and technology in the cloud, you had to do something different and what we had been doing wasn't quite working. This whole shift on lift strategy doesn't quite transform your business and to do the value innovation and sort of pushing up the stack to extract more value at your organization, you had to do something like this.

It wasn't complete buy-in as we started the project, as we built the technology we were fighting against currents that were trying to drag us in different ways: containers and containers service was just getting started. Kubernetes was just, you know, landing on the scene in terms of popularity and we had to basically say, “No, the future is here,” and we built and tried to control our destiny as much as possible. There were senior directors, and as the team grew and we were presenting to more and more people, there were executives, IBM product line managers, and, you know, by the end of first year our calls were fairly big. We were doing to two-week sprints where every week we used to call them shock-and-awe sprint because like what shock-and-awe features can we deliver the next two weeks and that became a theme for our team. And it was really fun to do because as we did this, we were sort of operating the prototype internally more IBMers would sign on and start using it.

And we started using it and so it was really exciting and by the end we had a lot of buy-in because IBM Cloud Functions had to be launching that needed sort of a business line justification and buy-in. But early on I think it was primarily out of research and sort of … but it was senior directors were, sort of, seen participating in that conversation and it was really exciting. Thanks for letting me relive some of that, six years ago.

Jeremy: But no, that's awesome. I mean, again, like I said at that idea of vision, it's hard sometimes. I think it's really hard when you're in technology to sort of pick what's the next big thing. And, again, we've got a lot of serverless haters out there and people who still love containers and Kubernetes and all that other stuff. Not that they necessarily have to compete, but I do love that when you were at that moment and that meeting where you say this is it, this is the next thing, so that's pretty exciting. But now you mentioned in there this idea of lift and shift, right? And this is something where I think most clouds took this strategy very early on to say how can we meet customers where they are and make it very easy for them to just take their on-premise applications and move them into the cloud which is why, you know, we're loaded up with virtual machines and EC2. At least AWS cloud I think is still the biggest moneymaker that they have there. But with this transformation to serverless, I mean, you have a lot of limitations, you know, there's a lot of refactorings, but sometimes you completely re-architect your application. But what about just building this in general. I mean there must have been a lot of technical limitations to to get around, right. I mean, you ... I know AWS and things first started building their stuff on EC2 instances. Is that how OpenWhisk started too, just running on a virtual hardware?

Rodric: Right. There's so much there that I would love to talk about and see how many of these I can peel off. So, yeah, we knew we had ideas of how Lambda was running and executing and then we looked at, “How can we build this?” Obviously we had to run on IBM hardware and the cloud that IBM offered us and that did put some constraints on how we actually architected the system, and some of those features, if you want to call them that today, are still with us. And I think that played to our advantage especially as the Apache project has moved towards more Kubernetes native sort of layer that you add on to give you that serverless experience. But early on we were deploying on VMs and, you know, to auto scale up and down required many minutes, so it wasn't latency that you can sort of just easily hide. And so that meant that we had to rethink or sort of think about the heuristic that we would build to give you that elasticity a serverless solution of I can run a thousand functions and some other user comes along among the thousand functions, “Hey, they just scale and they all run.” So we built a bespoke scheduler and a custom you heuristic for how we do the scheduling and it did influence essentially the architecture because we just couldn't bring up new VMs fast enough when you needed it. So there was a bit of that constraints that played into it.

I think this whole notion of containers versus functions is really still with us and it's for a number of reasons. Some of them you touched on. It's hard and you know to go into serverless, you’re re-architecting applications. So it just doesn't fit the shift and lift model; it’s fundamentally opposed to that in a sense and so a shifts and lift looks attractive, but to really buy into and get the benefits of what serverless promises, whole notion of less operations, more focus on value creation, it's necessary that you sort of be architects. So the key is gentle migration and acceptance that there will be a mix of technology: containers, VMs. And so our solution was to pick containers early on and that allowed us also to run containers as functions. So we started with I think the first run time we had was Node and then we added Python after that.

But very early on people said, “Well, I'd like to run my job application,” or, “I'd like to run some other language that you don't support,” so we were able to say, okay bring your container. So very early on OpenWhisk offered … and we might have been the first that maybe offered this mix of functions as code, basically zip file, and functions as Docker container that you just pull from a Docker registry and just go. So, and I think that helped people, sort of the gentle migration, sort of shift and lift, okay, I buy in and I start to get a taste and then I start refactoring. And I think that's how you still have to do it today. It's sort of this whole gentle migration stuff.

Jeremy: Right. Yeah. I think you make a good point about the hybrid stuff. I mean for a very, very long time especially for any larger business who is already either partially in the cloud or is migrating to the cloud, there's going to be a mix of everything, right. There's going to be VMs, there's going to be containers, and hopefully they start moving things into serverless. So when you built this, though, so you built this bespoke scheduler you talked about, you've got, you know … it's running on VMs, you eventually adopted containers and so forth. But when this launched, like this was ready for primetime, right? Like this wasn't like a little side project thing that just happened to go into production. I mean, this was like enterprise-grade, right?

Rodric: Right. Yeah, so we launched February of 2016, I believe, and we had already been in production for several months at that point; maybe 2015, no, 2016 right, and it wasn't, you know, it wasn't still … the code was quite mature and I think we just recorded another podcast with some of our early partners. Adobe jumped on the project fairly early when we open source, and they were an impetus essentially for joining Apache Foundation. So the code quality was solid in that regard and I used to joke that, “Hey, the system is bug-free.” And I meant, you know, it wouldn't crash for a segmentation fault or things like that and it was true sort of held for a long time we didn't have our first real crash from a segfault for like two or three years and I remember it because somebody Slacked me on an IBM channel, and said, “I thought you said this was bug-free!” So it's sort of stuck. No, it was ready for prime time.

We were already doing many thousands of containers a day sort of churning through containers and our solution was basically you take a function, we didn't create containers per function. We had this notion of stem cell containers which are unspecialized containers ready to inject code into and then once you did that they became specialized for a user and for a function, and that allows us to sort of do things with speculation. We can pre-warm containers, and really allows us to deliver performance that was on par with Lambda. And independent benchmarking today still shows IBM Cloud functions, which is probably the largest deployment of OpenWhisk, maybe Adobe would be second, you know does extremely well against Lambda in terms of latency and throughput. So, yeah, so it was really looking at sort of how do you deliver performance. How do you build this technology. And then how do you meet people in terms of giving them a gentle migration step towards this new paradigm.

Jeremy: Now, I know you've been removed from IBM for a while, but it looks like the project now, the preferred way to run it is on top of Kubernetes, right?

Rodric: Yeah. OpenWhisk, like we said, started on VMs but just like with serverless and Lambda, and you saw it was a future here. It was inevitable. I think Kubernetes quickly started eating up every other container orchestration system on the planet. And so we had to shift the project, the open source project, to support Kubernetes. And IBM also had to do this migration. We were already live in several geographies around the world. And so we started with one region, moved that to Kubernetes, operated that for a while and then, you know, the team got the confidence to roll that out. That was right around the time, actually, I was leaving. I think they had just launched a first on Kubernetes version of cloud functions and off they went. So the project now is Kubernetes native, if you will, or basically you can deploy it with a Helm chart at the risk of a patchy repo.

But we did something different and this is where maybe OpenWhisk stands out against some of the other Kubernetes serverless platforms out there today. We don't delegate the container orchestration to the Kubernetes controller and the Kubernetes container orchestration system because it's too slow. If you're looking at the kinds of workloads that are short running or that are Lambda style where you want to invoke really fast get-responses, I think I’ve seen a number of studies that said the average execution time for Lambda is well under a second and even several milliseconds. Spinning up containers that fast on Kubernetes just doesn't work. It wasn't the time for this. And so unless you solve the problem deep in sort of the Kubernetes scheduler, you have to bypass Kubernetes for container orchestration.

So, OpenWhisk until today really does that for the large enterprise deployments. You can use, you can delegate to Kubernetes, or you can sort of bypass and use a bespoke container orchestration system that really allows you to sort of deliver the best performance. So if you want Lambda anywhere other than AWS, there's really only one answer,might be, and that’s the OpenWhisk project.

Jeremy: All right. Well, we don't have enough time to discuss and solve all Kubernetes’ problems on this podcast. But what I would like to do though is ... so you left IBM and you started, I guess, we’ll set this up for you: you started Nimbella, right? And I want to get into Nimbella because I think this is really fascinating, what you and your team are doing over there. But what was it, you explained it a little bit, but what was it about sort of the current landscape? I mean, you'd already built OpenWhisk and had a tremendous amount of success with IBM, you know with IBM Cloud Functions, it went into Adobe, and then you've got all these people using it but you basically said serverless isn't good enough and you did this other thing. So what was it about the market or the current landscape that made you say, you know, we need to do something different here?

Rodric: Yeah, we needed to do more, I think that's the best … it's like, yeah, this is great, but we could do so much more, and to me, I sort of really viewed it as the introduction of Fortran for the IBM Mainframe. We're at that level of innovation in terms of how early we are on this journey and it was a recognition that, “Hey, there's a lot of problems still unsolved, from how do I debug. How do I look at this whole notion of now I’m breaking up applications that were large monolithic into smaller fractions. There's rich opportunities for system dynamic feedback optimizations where, you know, if I start with a bunch of functions do I fuse them together and run them as a monolith because it's more efficient, o by taking advantage of resources that are specialized like the GPU or a TPU if you're doing a IAM answer flow and things like that.

So sort of looking at it from a pragmatic perspective saying there's a lot of opportunity here to do more and recognizing that from the technology perspective, it was so early use it was hard for developers at large enterprises to really get started and it came from a number of reasons. One was this whole notion of how do you build for the serverless style when you can't quite run locally, you can't quite debug your code in Vivo. And watching some of IBM's early client sort of adopt this technology be successful in the end, but what it took to get there, the questions that they were asking you in some ways influenced my thinking as we started Nimbella. And it wasn't just to compute. I think functions of the serverless touches compute. It ignores the whole data aspect of applications and really this is where I started. I was, like, okay, we've got this model for compute with serverless functions. We've got container service. We can mix the two. What about the data model and looking at how do you marry serverless data model? What does it even look like with compute? That's where the genesis for Nimbella really started. Like, I want to be able to build complete applications. I want to deliver this promise of, don’t worry about the resources being allocated for your data, don't worry about replicating it. Don't worry about some of these synchronization and consistency models because for a lot of those there are good solutions that we've learned over the years of sort of distributed systems, research, and technology that we built.

So, I wanted to bring the two together and that's what really started Nimbella: can we do this, can we do this marriage of stateful and serverless and bring them together so that we can continue to deliver on this promise of, “Hey, as a developer, I just want to build. Everything else should be taken care of for me.”

Jeremy: Yeah, and I think that's a thing ... we should probably just sort of define state or at least give the listeners a little bit of background on state. So with serverless functions, or at least with the traditional serverless functions we've seen over the last several years, there is no shared state in the sense that, you know, when you reload, when you execute a function again that it's going to have all this information there, especially because every single request typically runs in a new container. And even if it runs in an existing container, there's no guarantee that it's going to run in the existing container that you were just in that has the same data there. So I know that AWS has added things like EFS integration and there's more things that are happening there. But really even with something like EFS integration, most of the time when a function triggers if you need data in there that wasn't passed in as part of the event, you need to rehydrate that data. So what are ... maybe we can talk about the kinds of state that you would need in the typical application and which ones really are kind of missing from serverless?

Rodric: Right. Yeah. I think you’ve really framed it exceptionally well and I describe it in terms of locality, right? So when you're running functions in Lambda or really any serverless platform, you don't know where your code is running. You don't know the container, you don't know the resources. Every data that you don't need to touch, you have to move. And you lose data locality out of that. So you're spending time transferring data back and forth. That has both an economic impact and maybe even from a sort of eco-friendly perspective, right, that's wasted power. And so by being data locality aware you can bring back computer computational efficiency that we know from traditional building of systems is important. And so our approach really is about looking at, well, yeah, to be able to scale a function to thousands of instances, the system has to say, hey look, so state is on you, right, because it becomes much easier to spin up a thousand containers without having to worry about consistency, sequentialization, etc. But then you're leaving the burden on programmers. Now, you've given them the supercomputer that's basically a distributed system and said, okay go figure out the rest of the data synchronization model they need to do and that's where, you know, things are lacking. Can we do better?

And at Nimbella I do think we're doing better. One of the approaches we're doing that is with declarative approach. So if you're a function or even a container you can say here's the state I want managed for me. So what that means by being able to declare that it could be for example a file that you need to load because you're doing machine learning inference. So you need to load the machine model that you've pre-trained with your neural network. Every instance of that function doesn't need to load the same file. If you could load the file once, mount it, and share it across multiple functions now, you've saved the cost of maybe a thousand X because you're not doing a thousand times. And moreover, if you're reusing containers, it's already there because it's been hydrated as you said. So that's one example of sort of looking at different kinds of state that can be managed automatically by the system files. And these are things that might be stored in an object store for example, and then mounted as actual files that you can use within their containers. So I like to think of it as state because the system can manage it for you bit more efficiently.

In fact, you can't even do it at user level. I think this was sort of the fundamental primitive for us was, like, If I’m a user and I really want to do this maybe with EFS now we can do some of these as you were touching upon, but I can't … the system doesn't allow me to do it. So unless you have the deep integration within the platform, you don't get that computational efficiency from the copy. Another aspect of state that we also take a declarative approach to managing is sort of transient state. You do things that you would put in ElasticCache or Redis and these could be, for example, session tokens, oAuth flows between Stripe, OCA, or whoever you're using, sort of, identity management with and you need that state, you need to store it somewhere and sometimes you just need a very lightweight database, you know, it's a key-value pair.

So can we just provide that and it's just there for you and these are some of the things we do at Nimbella. So when you write a function in Nimbella and you just want to counter that's persisted and shared across functions, it's just there. You say, you know, for this key bump the counter. You don't know where your residence is located. That's our burden. You don't know how it's backed up. That's our burden. You don't know which geography it's running. The only thing you know that your functions can touch that store with sub-nanosecond, latency sub-millisecond to nanosecond latency, because we've taken on the job of making sure that your compute and your data are co-located and so we're delivering that performance. We're delivering that aspect of state management that is really starting to deliver again on that, sort of, serverless promise, but now it's also stable.

So our approach is really to look at what kinds of things people are doing, and we’ve categorized four of them that we've sort of focused on today. It's static assets when you're building an application that you want to deliver out of the CDN. So that's one. Its files like the machine learning models that we talked about that you might store on object store, but then give you essentially an abstraction that lets you treat them from functions as the file system. And then key-value store. So for transient state, we've left some of the harder problems for our future roadmap, things like databases. Now, I'm talking about RDS and so our goal there is not to build all of these things ourselves. In fact, that's a key aspect of what we're doing. We're not building our own cloud in the sense that we're not managing infrastructure. We're building on top of existing cloud providers that do exceptionally great things, it's just too hard for many developers to penetrate. So we just take the existing clouds of the commodity and build these important layers of abstraction on top of them.

Jeremy: Yeah, and I think it's a it's a good point you make about sort of, I mean, I don't know exactly how you worded this, but this idea of sort of like making it abstract for, or abstracting in a way for developers, right. like making it so it's sort of clear that they don't have to do it. Now, I have concerns about adding state to serverless because I think in some cases people would just use it as a crutch, right, and sometimes when you make things available to people that's when you get serverless WordPress and things like that that maybe you just shouldn't be doing, right. But I do think there are a number of use cases where you do need it but also as sort of, I think, just for me personally, I like to really find a way that I can pass as much information in the event as possible so that you can maintain that statelessness because again the promise of serverless is unlimited scale or at least you know massive scale, right? So now if you're mounting EFS volumes or you're connecting to some file system or something like that, now you've got potentially thousands of concurrent users that are all accessing the same file system and so forth. So there's just a whole bunch of other problems that get introduced there, but I think that's really interesting. So, I don't know, I mean, what are your thoughts on that though? I mean do you adding state is a great thing for a lot of use cases, but at the same time ... I mean, are you still in the camp of you know serverless should be as stateless as possible when possible?

Rodric: Yes, and that's because the history of computing has shown us that's the best way to sort of get computational efficiency and maybe this is important in terms of my background right where we started this call. I've come at this from a programming language and compiler perspective. And so I'm just looking at, “Hey, I can optimize these with a compiler if I had the right abstractions,” and what serverless has allowed us to do is basically be proscriptive, right? When we launched IBM Cloud Functions and when Lambda came out, now Amazon said, write your code like this and people wrote their code like that because the carrot was so big, they just followed the recipe. And, yeah, demanded more features, demanded more capabilities and that came over time but it's given us the opportunity to be proscriptive.

And in some ways because the hardware has come first, the cloud is this massive super computer. It's available. It's allowed us to essentially say, oh we can do distributed programming now with the right program and reliable language abstractions and be proscriptive and people will do it. And so that model that you just touched on has roots for me in sort of actor oriented models where your function is essentially an actor and you can think of the steps of execution, the life cycles, as being broken up into pre-work, so things you do before you actually start running your function, the function itself, and then post work. So opportunities to do things like fetch data from a database or fetch data from a key-value store or file system can be done in the pre-work. And then when I'm done with my functions, hey, serialize all this back out to the right places, can be done in the post work. And what's important about sort of thinking about these three phases of execution is that first the top, the pre, and the post can be completely managed by the system if you can take a declarative approach or other approaches for sure, but at least that's that's how we've come at it.

What that allows you to do from a functions perspective is just have this really clean abstraction that says here's my event, my event contains some state, I don't know where it came from, I don't care where it came from. I can write my code against that event, what's happening before and after is now hands-off and that, you know, ties back to the serverless promise. Just write your code. You can think about your interface, your API, and then everything else is match versus how much of that can we do. I mean, that's how we started, it was like how much of that can we do and this is where you know the genesis for our company which really was. And we found that you know for a number of these kinds of states we can do extremely well and the benefits from the end-user are the abstraction of the function is still pure; you didn't have to break that abstraction. How far can we push it? I mean, this is where it's still early but this is what we're trying to do.

Jeremy: Yeah, so speaking about before and after, another problem in a question that always comes up has to do with function composition, right? And we always think about single-purpose functions. That's the way that we recommend you do things. I mean, this function converts the image, this function processes the record and then it sends it somewhere else, and you can do that with, you know, choreography, right, you can just sort of hope the next system picks up. But state machines have been something that most people embrace when you're trying to connect multiple execution components or if you're composing functions, state machines are really helpful. So what do you have at Nimbella to help with those kinds of workloads?

Rodric: Yeah. So, what we have actually we inherited out of OpenWhisk. Once again, some of the early features we put in OpenWhisk, it wasn't ... we talked about bring your own container, run your container as a function The other was composition. It was built in from the ground up and sort of an intrinsic into the system. What that allows you to do, for example, is take functions and then chain them together so you have a pipeline. And later on, and this was sort of … we did it in a way where the composition itself look like a single function, and what that means is that you can take it and then further compose it so it really goes to, sort of, from a software engineering perspective coming up with libraries of reusable assets that then you can take and treat as malleable code that you can integrate in other components. And later on we went from just composition sort of a sequence to, hey, let me write an arbitrary state machine, a data flow graph, and we have again open source that came out of IBM Research called Composer, which is very similar to Amazon step functions in that it compiled down to a state machine and … but you can code to it against library from Node.js and Python.

And I think Amazon is just starting to do some of that work. We've done it back two years, I think, before. This is one rare area where we had innovated something faster than Amazon. So we're really proud of that work. But I think competition is important because of some of the things you talked about. If you have the ability to focus small pieces of code on specific functionality and then build them into larger applications using the right models, a state machine, you can again bring in this way of sort of saying, “Well, I can re-transform this program, I can recompile it into something that's completely different,” and it's easier to do that when you're starting with small building blocks than taking a large piece of code and then breaking it down into smaller pieces. And when you start small and course it in essentially, you go from fine-grained to monolith … you know, you can still deliver computational efficiency. You can scale things out as much as you can. When you go the other way around you start hampering some of that. Your attack surface also gets bigger, your boot times become longer. There's a number of things that become as hard as parallel programming really still is today. So, I like the model where you start with small fine-grained pieces of code and then course in it, you know, one API for per function. That doesn't mean you have to package the code exactly that way. I mean things like XX solve some of those problems today.

But, so, I like the model starting small, building graphs that essentially state machines. What I think it's fundamental, and Amazon will eventually get there, is, you know, can you take these state machines and further compose them? Right? Can you take ... can you call step function from one step function? Can you call from one workflow can you call another workflow? This touches on something we published out of IBM called “The Serverless Trilemma,” basically can you treat this code as an opaque piece of code that is no different than a serverless function? Code in, event in, event out and what you run inside whether it's a single fucntion or a whole workflow, it is opaque to you as an end user. If you could do that and solve the double billing problem. Basically, you're not waiting for the workflow engine to also run and support just black box code, code that you can't modify as the vendor, then you've satisfied the serverless trilemma, and that's sort of like the Zen of serverless compositions for me. So we build essentially a serverless trilemma satisfying composition with open warrants.

And since then your people have sort of quantified and looked at while step functions does it this way, durable functions from Microsoft do it this other way. I think we're sort of trying to define space around compositions. So what we're doing at Nimbella and sort of inheriting a lot of what we did with OpenWhisk and building on top of that project.

Jeremy: Nice. And that's another point you make about, you know, using these, building these little reusable components, that's another really good argument for statelessness in those components because if you want to reuse a component, but it's always saving to the same database or something like that, it's better to have something that maybe converts that object into whatever format it needs to be, past that then using a state machine to another function that maybe then has the ability to save that state and things like that. So that's really interesting.

So, I want to talk a little bit more about Nimbella because I think an interesting approach that you took was again this whole thing runs on top of Kubernetes as well, right, and you can also run it on-premises so you can basically run it in any cloud or on-prem.

Rodric: That's right. And I think we've done this in two ways. One, as you know, we have a hosted service and if you're an end user who just wants to build against the cloud and you don't care about which cloud you're running on, you don't care about anything but time-to-market, time-to-solution, you know, we're building cloud that's basically very easy to get started and it goes to sort of this notion of building projects, building entire applications that incorporate compute front-end, back-end state and just deploy and it's a repeatable unit of execution. Somebody else could take that code and deploy it and run it. For the on-prem, we're essentially trying to say, “Hey, we can bring that experience for you, wherever you're running your cloud,” and the motivation for us behind doing that is the recognition that a lot of service providers out there are building their own clouds and they need functionality like what serverless functions give them.

They have events, they want to be able to allow their end users to operate on those events. We've seen it with Auth0, Twilio, Salesforce, Zoho now even has a serverless offering, and I think that's just, you know, that repeating pattern that I have events, I have an ecosystem that my developers code against, they want this kind of serverless experience. And so we’re essentially saying we can accelerate your delivery and we can do it in a way where you can run it on any cloud of your choice. It's basically like saying we can bring the Amazon experience for you or whichever cloud you want. That's too big to say because we don't do everything that Amazon does; we do maybe three or four things, but that's the kind of … that's the reason we've sort of approached this whole model of Kubernetes. It allows us to say Kubernetes is almost everywhere. Every organization we talked to has now says, well, we have in-house Kubernetes expertise; we say great, point us at your Kubernetes cluster and, you know, within 30 minutes, we've deployed this entire Nimbella stack for them and they can start coding, building projects, and deploying them and I think that's what's been very powerful for us being able to reach those organizations, help them fill gaps in their portfolio in terms of being able to offer these capabilities. We never expose Kubernetes to the end user; it's an operational aspect. So, it's just a normalizing platform for us. And because it's everywhere, it's allowed us to basically say we can run everywhere.

Jeremy: Awesome. All right. So let's take off your Nimbella hat for a second. Let's put on your analyst hat if you could for me. So, if you look at ... I mean, you said earlier, you know, that sort of Kubernetes is starting to get, you know, sort of become the de facto standard for containers. And I think I agree with you there. There's a lot of surrounding tools that are also maturing and sort of becoming sort of a standard there. One thing we don't really have a standard with, though, is serverless, right, and the way that people are ... and I should take that back and say more serverless functions, right,the way that people are building functions as a service. So, we have Lambda which runs on its own proprietary, it's open source, but Firecracker, you know, sort of to run it as close to the metal as possible. Microsoft Azure, you've got GCP but GCP is also doing not only Google Cloud functions or Google functions, and then they also have their, excuse me, their cloud run and some of these other things, Oracle Fn I think is also like a cloud run type thing. You've got Fargate, you've got edge providers, we've got Cloudflare with their workers, you've got Fastly, you mentioned. All of these other companies like the Salesforces and Adobes and building their own, you know, either running on top of something else or building their own, running their own serverless platforms that are integrated into their system.

So there do not seem to be any standards. There are a lot of different approaches. I know there's the cloud events, you know, Cloud Native ... working whatever it is, the Cloud Native Foundation is trying to do this, like, working group for events and standardize that, which I don't think that has had much movement on it. But just what are your thoughts? I mean, are you concerned with all of these different approaches to serverless?

Rodric: “Concern” isn't the right ...  isn't the word I would use, and I think we're in the phase where there is room for a lot of innovation and exploration. They think everybody recognizes there are giant opportunities here. So it's greenfields everywhere. Change the context a little bit and hey, you can go a long way. Taking a pragmatic approach, you know, when we looked at this standards issue, what we said there is a de facto standard, it's Lambda and that's because they process more functions on any given month than any other cloud provider as far as I know. I think the number is in the trillions and I remember, you know, my first conversation with Tim Wagner at a New York City serverless conference five years ago, where he said, I asked him, “How many do you do a day?” He’s like, “2 billion.”

So, it's been exponential growth, you know, over that five years to where they are today. But as you also said earlier, it’s this tiny fraction of all the serverless compute. All the compute that's happening in the cloud today. So we’ve got a long way to go. I think there will be standards or, you know, efforts to standardize will rise and you're sort of seeing it. So Google has this Knative project and as part of that they have been looking at, “Okay, what does the interface look like? Can we standardize it?” And because there's sort of it's got the “K” in the name, right? It's sort of riding on the Kubernetes wave. It has an opportunity to sort of become a standard just like Kubernetes is effectively the de facto standard now for port container orchestration. So I think we need this kind of exploration and I think we're seeing exciting technology being developed because of it and, you know, what's happening at the edge with Fastly and Cloudflare is really exciting. WebAssembly.

And you know the future of isolates where you're running containerless functions, you know from a computational efficiency perspective really excites me. I don't think end-users will eventually care; they'll just care about the interface. So because of that there will be some standardization. As a start-up we can't do that, right; it costs too much and it's prohibitive for us, but it has to come from essentially a consortium of the big players. But everybody has a stake to play today and, you know, … so I don't see it happening any time soon. And if you're a pragmatist you look at who's the biggest whale and it's Lambda and you say okay, they’re a standard and you see it, you know Auth0, the signature is very obvious. Netlify is very obvious. They're all Lambda and so it's winning, you know, without actually being declared a standard. Will that change? Possibly, but I don't think we're going to wait around for it.

Jeremy: Right, right. So, what about you know ... so, with these different approaches to serverless as … I mean, for some people it makes a lot of sense. if I'm an enterprise and maybe I have partial workloads on-prem, I have some things running in the cloud, maybe I want to mix and match, and I've got an operations team that can manage my Kubernetes cluster for me, or can deal with all this stuff. That's a lot different than your small start-up or somebody is just hacking on the side or something like that. So, I mean how much do you think these different approaches to serverless are sort of targeted at maybe the, I guess, the different persona of people who are using it.

Rodric: There's a couple of ways of looking at this one, as you know, from the operations side and the other is from the end user side. Actually, I'll give Knative Project here a shout-out because I think when they came on the scene, they did a really good job of sort of separating the persona dealing with serverless. There's the operator which is managing the infrastructure, the Kubernetes, the VMs, the infrastructure that you’re actually running on, and then there's the end user which is building code and deploying it to this platform. Their concerns are completely different and it's ... I think you have to approach it in different ways from an enterprise perspective. They care about both and you can look at some organizations that have gone all-in on AWS: the LEGO Group, Capital One, there's dozens of them, Vanguard. And they recognize the transformation they can get by essentially just delegating all that infrastructure whirring essentially to AWS.

It's a process and it's a journey so it takes time to get there and if you're a small company, you're a small business, you don't have time for all of that. So your time-to-market is what's most important. You're going to look at which cloud can I build on that will give me the best solution and in some ways the choice that you make really becomes impossible to revert because the more you build, the more you're essentially tying yourself to that platform and if you're extracting value out of it, great. So, for the most part, you know, if we talk to somebody and they say we're on AWS, we love AWS, we’re like, great. We are not the solution for you and we sort of recognize that and we would do the same thing. But there are organizations for various reasons and just because they're running Kubernetes, you know is an indicator that you have to meet them where they are. They have needs where they want to operate their own infrastructure. They want to be able to run the same kind of environment on multiple clouds, data gravity, or other kinds of concerns. You have to be able to give them that serverless experience because I think their developers are going to demand it. So there is a bifurcation there between the operator and the developer that serverless can help serve both in different ways: one on the developer experience and sort of normalizing what you're coding against from an end user perspective. The other is the operator where now you can have smaller teams managing infrastructure because Kubernetes does a lot of the heavy lifting for you.

And you can extract some of that value and now repurpose it, extracting a greater business impact in the long run. And actually if I answered your question was a sort of I took it in a couple of different directions, but I think it's a really interesting area for us and even as a business perspective has implications.

Jeremy: I'm not even sure what my original question was, but let me follow up with this, and I hate to ask you about vendor lock-in because it's just one of those things where, again, when you take a million different approaches to serverless, you pick one and in some cases, that's it. You're sort of stuck with that, you know, at least from a, you know, from a compute and certainly from a managed serverless perspective. I think you know if you pick DynamoDB that's a task to migrate to Mongo or to do something like that, I mean, you pick any database you're pretty much locked-in for it. But I'm just curious, you know, sort of from your perspective, and I know that you know that there’s ... it's more anecdotal but, how important is portability do you think for you know, some of these larger enterprises?

Rodric: I think it's important but maybe not for ... it's important because we've had the conversations with very large organizations and some have said something as simple as this: we can run on any cloud as long as it's this one “name on the list,” right? So their business reasons for why some companies must run on a particular cloud and the sort of lock-in aspect comes about like you said, once you start building it, sometimes it just starts by enterprising developers. One of the early choices I made with OpenWhisk was to use Cloudant which is an IBM CouchDB as a service. And why did I use it? Because it was just there. I needed a database, I didn't want to worry about it. I just used it and I regret that choice to this day and should have used a relational database, but IBM didn’t offer one.

So these choices become really hard to reverse as you grow and takes a lot of investment that essentially then moves. And unless you're a large organization that can afford to spend hundreds of millions, of billions of dollars, that choice for me is almost a non-starter but it's there. People actually are trying to do this and I think whether it's by running Kubernetes clusters on different clouds and then having to normalize it, it's just happening. So I stopped questioning whether, you know, it's valid or not. I just recognize there's a massive opportunity because it’s happening and so if you just accept that and say, “Okay, what are they missing?”

And it's this whole serverless notion because it empowers their developers, it empowers the organization that generates the most value. That's where we focus. Our notion really isn't to look at Kubernetes. It's how far up is back we can go and in some ways because we're small, we're a small company, we're highly focused, we can push up the stack much faster than some of the other companies. And this is what, you know, gives entrepreneurs and, you know, startup founders the opportunity to compete in this space.

Jeremy: Yeah, got that. All right, so I'd love to sort of ask you... and we talked a lot about state and the need for state and maybe there's the need for, you know, better controller mechanisms to spin up or scale up serverless faster, you know, whether that's pre-provisioning or something like, you know, Cloudflare workers are doing where it supposedly at zero millisecond cold starts, right, in there. And of course they can only run I think for, I don't know, 30 seconds or something like ... anyways a very small amount of time. They're 50 milliseconds. That's a very small amount of time those run for. But anyways, so what are some of the unmet promises, let's put it that way, like, besides the state aspect of it. Like what else are we missing from serverless? And what do you think that, you know … there's companies like you have to keep solving for?

Rodric: I think accessibility of the platform. And I remember when I first met you, right, we had this conversation about, we called it “serverless bubble” at the time, right, and maybe “bubble” isn't the right word because bubbles burst and that's not a good thing. Maybe “echo” chamber is better. But I think … one thing I've learned, and I learned this very early on when I left IBM sort of went to a developer conference at, yeah, there's a thing called serverless, the greatest thing, and it was like what's a micro service? Right? Instead of recognizing that the world hasn't yet caught on. There is part of, you know, the technology community that has sort of, you know, good for that. But recognizing that there are still a large interest in Kubernetes, still a large interest didn't EC2 instances in VMs. There's a massive world out there where building applications for the cloud is still hard. You know, just log onto the Amazon console and look at everything you can get. Where do you get started? Right? So the opportunity for us is making the cloud more accessible.

And so we like to think that from a Nimbella perspective, you can create an account within 60 seconds. You can deploy your first project, you know, not even having to install any tools, right out of GitHub. And hey, I have stood up an entire application. It's got a front end. It's got a dedicated domain. It's served from a CDN. My functions are entirely serverless, they scale. I can have state. I just did that, right. So, it's about really making the cloud accessible for a large class of developers from the enterprise, all the way to the indie developer who just has an idea for a mobile app or a website that they want to build. I think this is where really the opportunity is, you know, whether you're running things in a container or an isolate like Cloudflare does. It comes with implementation detail nobody's going to care about in the future. It's what is the programming experience? How fast can you let me create and so at Nimbella, we like to think, you know, create, build, and deploy at the fastest pace of innovation. That's what we really want to try to do. So, and that's what excites me about this like serverless is transformational and even transcendental technology because it can unlock all of that and hopefully you can tell how excited I am just talking about it.

Jeremy: No. No, that's ... I think you make a really good point and I always argue that, you know, serverless when it first sort of came out, right, when we first started building Lambda functions, it was so easy. It was simple. It was a really simple way to think about it and then it just got more and more complex, and more and more complex. And now we're at a point where if you log into the Lambda console on AWS, I mean, it's mind-numbing because where do I even start?

All right. So I think that is a very lofty goal. I totally agree with you. So good luck with all of that stuff. Rodric, thank you for joining me and telling the story of the IBM and OpenWhisk and what you're doing at Nimbella and just giving me that analysis of the serverless sort of market and what the future is because I think it's a really messy place right now and it's got a long way to go. So, the more people we have like you that continue to shape it is great. So if people want to get ahold of you and contact you, how do they do that?

Rodric: So, I'm on Twitter @rabbah, my last name. I'm also easy to find by email: rodric@gmail or rodric@nimbella.com. I think you'll share some of my contact information later on, but you know Twitter is where everything happens today so @rabbah on Twitter and you can find me there.

Jeremy: Awesome. All right, well, again, thank you so much. We'll get all that stuff into the show notes. It was great to have you.

Rodric: Yeah, thanks for having me, and, really, thanks. I really enjoyed it.

Episode source