Serverless Chats

Episode #79: What to do with your data in a serverless world with Angela Timofte

Dec 14 '20

About Angela Timofte

Angela Timofte is the Tech Lead at Trustpilot, a global review platform that helps businesses collective and leverage customer reviews. Angela has a proven history of transitioning legacy applications to new platforms and product offerings. She is driven to build scalable solutions with the latest technologies while migrating away from monolithic solutions using serverless applications and event-driven architecture. She is a co-organizer of the Copenhagen AWS User Group and a frequent speaker about serverless technologies at AWS Summits, AWS Community Days, ServerlessDays, and more.

Trustpilot: Trustpilot.com
Twitter: @AngelaTimofte
LinkedIn: Angela Timofte

Watch this episode on YouTube: https://youtu.be/jHE0VYfQUaY

Transcript

Jeremy: Hi, everyone. I'm Jeremy Daly and this is Serverless Chats. Today I'm speaking with Angela Timofte. Hey, Angela, thanks for joining me.
Angela: Hi, Jeremy. Thanks for having me here.
Jeremy: So, you are the Data Platform Manager at Trustpilot, so I would love it if you could tell the listeners a little bit about yourself and your background and what you do at Trustpilot and what Trustpilot is all about.
Angela: Yes, of course. So as you already mentioned, I work as a Data Platform Manager at Trustpilot and I've been with the company for almost six years. So, quite a long period of time for a company that it's only been for like 11 years on the market. But yeah, I started in the company as a backend developer and then moved to be more of a full stack developer and now the data platform manager because my love for data was always there and I kind of did everything that I could do to move closer to the data. To be honest. no matter where I was in my career it was always data that, like, attracted me the most. Like how do you handle it? And to be honest nowadays, like, data is everything. Like you can't make any decision as a business without data and now everyone is seeing that. So it's really cool the position I am in right now because I can push all these, like, these data meetings that we do so that we take, like, the right decision. And then, yeah, Trustpilot. I mean, hopefully everyone heard about Trustpilot. At least that's what I want to think, but in case you haven't, you should use it. It's an online review platform and I mean our all … our whole mission is to help people to have better experiences when it comes to purchasing online. But of course, at the same time, we want to help businesses to connect with their customers and also improve their offerings and for that we offer all these analytics tools. So they understand better their customers and they know where they need to improve their business.
And I mean if you think about, like, the situation now with Covid, honestly Trustpilot came perfectly. Even for me, like, I'm talking from my perspective now, but, like, I had to order everything online, like, from food to, like, toilet paper. I didn't go to the store to fight for it. I went online and fight for it, and fought for it. So, but it came super handy because I'm based in Copenhagen and we don't have Amazon here and then you have to purchase from, like, small businesses and you don't know about all of them. So I had found myself searching on Trustpilot. Okay, can I trust this business because I've never heard about it, right? So I don't want to throw money out there. And yeah, that's what you can use Trustpilot for if you are a consumer, and especially in these times it's perfect because I can trust that what I find there it's real data and real people and I can get their opinions on all of these. So, yeah.
Jeremy: Awesome. So I am super excited to have you here because as much as I am a huge serverless geek, I love data; like, I'm with you on that. Like everything that I do, I'm always finding better ways to build or abstract data or interact with data. I built all these open source libraries, and I think all my open source libraries have something to do with data. And what we've seen over the last several years is this move to more and more serverless data, right. Like, capabilities that allow us to use databases that are more and more serverless, DynamoDB obviously being one of the big ones; all kinds of crazy stuff happening with relational databases.
So, you're an expert on databases and data and I would love to get some of your, you know, sort of your comments and insight on what are those choices that people have right now for serverless data. And also we're in the middle of re:Invent right now, so just in the beginning of re:Invent there's already a whole bunch of announcements of new things that Amazon has released and more options that are available. So let's start there. What are, you know, what are those serverless options for data that people have?
Angela: Yes, I mean you've already mentioned DynamoDB and for me that's like the first choice when it comes to building serverless applications, to be honest, especially when you think about scaling and that's my first pick and then Aurora and now we have Serverless 2 let's see what we can do with that one, right. Super excited to see more about the version 2 and then yeah you can use S3 Kinesis and they've released some other things now for databases like Babel Fish to, like, export the data up, and yeah. Then you had ... what was it called ... Glue as well …
Jeremy: Oh, yeah, Glue Elastic Views. Yeah.
Angela: Yes, which will actually replace some of the pipelines that I have probably with, like, cold DynamoDB streams going to Lambda then going to Elasticsearch, so probably that will change some of the things that I've done previously. But, yeah, when it comes to databases and serverless, I know like especially before, I don't know if it's still the case, but when people are mentioning serverless was always like Lambda functions, and containers, and things like that, but no one was talking about databases and I think it's a huge mistake because, I mean, no matter how scaleable you make, like, your services, if your database is not scalable, then, like, you're missing the point, right?
Jeremy: Right, right.
Angela: And yeah, and that's why I think it's super important to look at these serverless databases so that you make your entire pipeline scalable from one end to another. And that's where DynamoDB comes super-handy.
Jeremy: Right. So with DynamoDB, I think if people don't know what DynamoDB is, just go and look it up. It is an amazing key-value store document database. You can do some really cool things with that. But it does have limitations in terms ... especially if you're building like OLAP applications, right? Because you can't do all these different queries. You can’t slice and dice the data different ways.
So, beyond DynamoDB being a very good sort of ... I look at it as a perfect application or a perfect database for your frontend users that need to access data quickly where the access patterns are very consistent and you know what those are going to be. But when you have to start exploring data more or you've got relational things that need to be done, what are some of the options there? You mentioned Aurora Serverless. right, so let's dig into that a little bit and then let's talk about V2 because that's kind of exciting.
Angela: Yeah, so on Dynamo the way that we use it is, as you mentioned, it's for our front application, right, so that they are scaling accordingly and also that everything gets like pre-calculated and the way you start, like, you need to be very strict when you start your baseline Dynamo so that you actually take the benefits out of Dynamo otherwise, yeah, don't just throw data out there hope for the best. But, yeah, then, like, you can go to Aurora Serverless. The thing with Aurora Serverless, at least version one, we don't use it as much because of some of the limitations that you have with Aurora Serverless and one big one for us at least with that you can't import from history. And that I know ... I can’t remember at what conference I complained about that, but I know I complain about it when people from AWS were there. I was, like, I need this, people. And also the pricing on Aurora Serverless, it's quite, quite high. But of course, when I mention pricing like even if you go to Dynamo, which is much cheaper, you need to calculate, like, how much time, like development times, it takes to put things in Dynamo for instance or, like, for people to actually understand because it might not be so easy to do things in Dynamo. And then, yeah, with Aurora Serverless, unfortunately, we don't use it as much but that's why I'm super curious now with version 2 which seems that they are investing more into the serverless version of Aurora and hopefully they’ll work more and more so that we can use it in more, like, heavy production, like, workflows because before it felt like it's not really for your heavy workflows, I would say.
Jeremy. Right. Yeah, I mean in the scaling characteristics of version 1 was this doubling of capacity where it actually had to, like, move data between instances and it took like a minute to scale up and it just wasn't one of those things where it was as elastic as you wanted it to be. V2, very, very promising. It's very cool because the instances themselves scale which is kind of crazy and it will just … and I did a bunch of tests the other day, or last night actually, and just I threw as much as I could at the thing and it just laughed at me. I mean, it was, like, no problem. And then the other cool thing is according to the website, it says it's going to have all the features of Aurora. So that should mean S3 imports and global tables and all these other things. They actually said there would be global tables so that could be really, really cool.
Angela: I hope they listen.
Jeremy: Twice the price, though. Yeah. I know, I know. Twice the price though as ... just as the v1. I did some calculations on it though. I mean it still might be cheaper depending on your workloads and they say it's 90% cheaper, but anyways, I think that's a super interesting option. Right. The other thing you mentioned was S3. And I don't think a lot of people think of S3 as a database. But if you store data in S3, you've got a lot of options, right?
Angela: Yeah, that's … it's actually right that people don't think of S3 as a data store and it's also that it's been there for so long and all these other new “cool” have appeared. I feel like people forgot about S3 and how powerful it is and how flexible it is. And I think the problem with S3 it just doesn't come to your mind as like a data store. Like how would you go about? But it's very flexible once you start using it. And it's the same as with Dynamo: it might take a bit of time to, like, actually adjust to how you take your data, but I think it's a very powerful tool that people forget about. So, yeah.
Jeremy: Right. And there's a bunch of interfaces into it as well. I mean you can use S3 Select so on, like, really large files you can select just a portion of them so basically, you can query a file or an object within S3. And then you've got Athena, right? So what are your thoughts on Athena?
Angela: We’re actually not using Athena. Yeah, I know. So I can't really say much on, like, production work because we don't use it. That's my take on it, you know, we don't use it!
Jeremy: That's it! Well, I mean, you know, I think the funny thing is that ... I mean with this large of a footprint that Trustpilot has and all the different services you're using, I mean, again, it's impossible probably to use all of these services, right? So, you just have to pick the ones that actually work for you. But Athena, I mean, what I love about Athena is just the fact that you map over these S3 buckets and you can query it like normal SQL which, I mean, is sort of ... and gives you the performance of something like maybe BigQuery. Like what about things like BigQuery or Azure Cosmos DB or things like that. Have you played around with any of those?
Angela: Yeah. So the reason why we don't use Athena is because we use BigQuery. So, yeah, it's very powerful. And that's what we use to analyze over our data and precompute everything and then we push it back to our data space where we then, like, use it in Dynamo or Aurora or like any other database to actually use it in our applications, but for analytics we use BigQuery.
Jeremy: Awesome. Right. So, we just talked about a bunch of different database serverless that are available to you. And we mentioned some that are cross-cloud right? We're multi-cloud right? They're not just Amazon options. So this is something I think is a really difficult choice for people who are building new applications. You know, how do you ... and not just new applications but refactoring old ones as well. What do you have to think about, you know, when you're moving to either build a new application or refactor an old application? Like, how do you think about, you know, what you choose for a database? And when, and I guess, when is serverless a right choice for you?
Angela: Yeah, I mean we have, like ... we use both AWS and then Google Cloud, but we kind of try to stay in the AWS world when it comes to all of our production data and services and serverless and so on. So first is, like, think which cloud provider is the right for you. I'll go for AWS, but I mean, I might be biased there. But then, yeah, the way that you have to look at the application that you have, so if you start with, I know, Monolith and then you want to split it, I would go with serverless and that's something that we try to do at Trustpilot, to go with, like, serverless first. So we have this principle that whatever you want to build, new or refactor something, you should think of how you can do that in a serverless application because of all the benefits that you get from using serverless applications like scalability and price and so on. So I would start with that: like, think how you can put your application into a serverless infrastructure and then of course if that's not possible because there are still limitations on the serverless choices, then we go to containers. So it’s ECS or EKS, and then if that's not the right choice still, like, the last resort is an EC2 instance and then you just dump your things there and pay for it because you do that.
Jeremy: Right, right.
Angela: So that's kind of the mindset that we have around, like, when we go for something new or, like, refactoring. And to be honest, now it's almost everything it’s serverless when you think about, like, a new ... building something new, because you have so many tools there that you can, yeah, you can get around serverless. But as I said, like, there are still some limitations that, yeah, I find and I'm like, “Oh no, it can't be serverless!” And, yeah, you have to go for something else.
Jeremy: Right. So, you mentioned the sort of this process that you use at Trustpilot and that's super interesting because, again, I always love getting insights into other companies and how they go through these processes, and I know you had mentioned to me in the past your first attempt at moving data into DynamoDB because that's something you’ve really got to think about, right. I mean, and again, DynamoDB it's a NOSQL database. You've got to precompute a lot of data, you’ve got to think about your access patterns, so what ... tell that story because I think that's really interesting, sort of the experience you went through.
Angela: Yeah, absolutely. So with DynamoDB, we started to look at it around 2017 when there weren't that many tutorials about it. So, we started to look at it, we were ... so at that time we had all of our data and MongoDB and then, we’re like, so used with how the way MongoDB was working, that when we started to look at DynamoDB, we're like, what is this all about? Like what's with this key-value store that you're trying to push us to use and, yeah, it wasn't that good for us to start using DynamoDB, to be honest. And in the beginning, we're like, yeah, this is just some silly type of database and we're going to use it for, like, I know some simple scenarios that we have and, yeah, that was the beginning. Like, we really didn't think too much about DynamoDB and now we changed to be our preferred type of data store. So there you have it.
Jeremy: So what's your advice, I guess, out of that? I mean because, again, there are a lot of different options like you said, learning S3, figuring out what you can do with that. You don't even use Athena so that, I mean, so knowing how or what you can do in Athena, like, so, what's the advice for somebody that wants to make that leap to some sort of serverless database or, you know, Dynamo or Aurora Serverless or something like that?
Angela: Yeah. I mean it’s ... definitely try to get all the information that you can about it. And yeah, look up for tutorials. Nowadays, like, yeah, you can find a lot of information and then start using it, practice, because especially when it comes to DynamoDB, like, it's quite, I mean, it's not that easy to see those patterns, to be honest, especially in the beginning because if you come, for instance, from a SQL database and you kind of know how to store your data and how your query will look like, your indexes and so on. But then you go to Dynamo where you have a primary key and sort key and do things with it, you know and, are like, “ So what can I do with this?” So it takes practice. It takes a lot of practice to see the access patterns and yeah, I would say like whenever you have something new that you want to store, like, just give this a try to see how would you do that in Dynamo. And also watch lots of tutorials from people.
So yeah, that would be my advice. And when it comes to companies if you want to push people towards something new, I would say really give them time to adjust because companies are trying to say, like, “Oh, DynamoDB will save a lot of money, Aurora Serverless is saving a lot of money,” because they just went through a presentation where they were saying that, and then it's like, yeah, we need to do this and then they expect that overnight, but it's not like that. It's like if you actually want to get the benefits you need to give people time to actually adjust to whatever new technology you want to adopt.
Jeremy: Right. And that's something I find with DynamoDB, too, is you can't just sketch it out and put it in theory. Like, you have to actually start using it; like, you have to start putting things down, putting data in there, and figure out how you can actually access that and whether it's going to work for you, you know with what you're trying to do, because you will get your modeling wrong a number of times before you finally get it right. So, don't just model something and then throw it into production without going through a number of iterations.
It's funny though because I remember way, way back when, and I'm getting much older than I'd like to admit, but when I first started using SQL and I was writing, you know, basic select queries and insert, like, fine easy, you know, I was using SQL server a lot in MySQL. But you get to a point where you know it seems like third normal form and building all these things it's just impossible to understand … like, I mean, it eventually gets to the point where it's very simple to understand once you get it, but it does seem like a daunting thing. Then you make the move to DynamoDB and I look at how you would structure a relational database and you're like, wow, that is easy, like, that's so simple. Now, I've gotten much better with DynamoDB and understanding the patterns but, yeah, it takes an investment. It takes a huge investment for you to get there.
Angela: Yeah, absolutely. It … And especially now that people have experience with other types of databases I think it's more difficult to make that switch because it is quite different and it takes time to see these patterns. And as you said, like, you'll get it wrong many times probably before you get it right, and, like, yeah, you need to start with your queries, you know, you kind of need to do a lot of planning before …
Jeremy: Right.
Angela: ... with the Dynamo, right? Because you can't just dump your data, you need to do all the planning on, like, okay what I expect, like, everything needs to be planned before you actually do it. But of course, I mean it's easy now to play with the data and moving, like, around. Like, before was like two years to transition from, like, one table to another, you know.
Jeremy: Right, right.
Angela: Yeah, it was painful but now you can play with data much easier. So, in that terms ... yeah, here you can practice with Dynamo and see if you go … if you can store things in there or not. I would say you can but it might not be super easy in the beginning.
Jeremy: Right. Yeah, so I think the takeaway here is if you're a company and you're putting stuff into DynamoDB and you get it wrong the first time you're not alone because we've all done it, right, so just keep on working on it and you'll get there. You mentioned a little bit about limitations, like, you run up against limitations when you're moving things into, you know, some sort of serverless database offering, or just serverless in general. I mean, there are limits there. So how important is it to understand those limits because I find that they’re very high but when you hit them, they’re also very painful.
Angela: Yeah, I mean, when it comes to, for instance, with Lambda, one of the limitations that I'm kind of ... I find myself lately hitting it quite a lot is the concurrency. Like, I want to … and not just limit the concurrency, but when I limit the concurrency, you know, to not throttle everything that is being invoked with and I know, I was, like, I still ... because in my mind was like yeah, we'll just do this. It's simple. It's serverless. This event is triggering this Lambda and then this Lambda will call some third-party API and then we've hit this limitation of which I didn't think about that like the third-party API had its own limitation with like ... or you can only call up 50 times per second. And then I was like, oh, how am I doing this with serverless? And I’m kind of trying to choose to stay in the serverless world, you know, and, like, really like move things around and it got to like a super complex solution. And I was, like, okay now I just need to say no to serverless and move to a container which will be so much easier to explain to people what I'm doing than try to stay in the serverless world. But I spent a lot of time with, like, finding a solution in there and, yeah, it's good to note the limitation so that you don't just spend a lot of time trying to reinvent things just so that you’re staying serverless. But yeah, as you said, like, that’s one thing that you need to do, like, learn the limitations so that you know what you're dealing with. But they are adapting everything and like, yeah, updating all these things that you kind of need to look at the news all the time, I would say, because they are there constantly working on this.
Jeremy: So, yeah, I totally agree. I mean, that's one of those things to where it's like ... and you mentioned this I think in the beginning where you said, you know, you get one part of your application, one piece of your architecture, that'll scale really well, and then you got other pieces of your architecture that won't, so solving the database problem is really great. I mean, things like Aurora Serverless v2, DynamoDB, solve a lot of those. You know, Lambda functions can scale infinitely, you know, whatever they want to do and then, but then you still have those problems with APIs, right? So you still have to figure out how do you manage those quotas and do that. And I think a lot of people get frequency and concurrency confused. Right? So if you or, you know … so if you have a quota of 200 calls per minute, you can't set your Lambda functions 200 concurrencies because what if it only takes, you know, two seconds for that to run then you're running, you know, I mean, thousands of invocations against it. So lots of things to think about there, not necessarily solved yet, but like you said, getting there.
So, alright, so let's go back to Trustpilot for a second because I'm really interested again into kind of digging into your architecture and how you solve some of these problems. So let's start first of all ... I mean, you're obviously a huge fan of serverless, right, which is great. Love serverless enthusiasts, especially serverless data enthusiasts, but what about the rest of your organization? Like how did you bring that in? How did you say, “Hey we need to go to a serverless database, or we need to start thinking serverlessly.” Like, did that change happen within your engineering?
Angela: Yeah, so it definitely took us time and we started quite literally looking at Lambda functions. And I remember it was after they released or announced it at re:Invent and we have a few colleagues that went there, and when they returned from re:Invent, like, every time when someone comes from re:Invent, you’re very excited. And they were like, “We're going for this,” and all of us were like, “What? What's this Lambda function? Leave me alone. I’ll just build my application here the way I know.” So it took us time and I think, like, what helped us was this idea of, like, just trying using serverless, but you're not forced to use serverless. And we were constantly trying to teach people, and, like, constantly talking about the benefits and showing, like, we had, of course, we had people that kind of wanted to use this from the beginning because it was something new and you always have those people in your companies, right, that they are attracted to whatever is new.
So we had people testing out and then, like, sharing with the rest of the company on what they achieved and how it's possible, and then slowly get everyone in the company starting with, like, serverless architectures and then, like, that's how we grew. And the same with the database, the serverless database, which is like some teams we tried to use it and then show the benefits to the rest of the company and that's how we kind of got everyone to be interested in serverless. And they've seen ... so we kind of showed the benefits in our company not from just general presentations, you know, marketing presentations where it's like this is serverless, this is what you get, you know, it's like no, no, let's get real. Like what do you actually get? And, yeah, that's how we got people excited and now everyone kind of wants to go to serverless first.
Jeremy: All right. Now, so what about your ops team where you like your SREs like, what was the impact on them?
Angela: Oh, I think they looked at them the most I would say, because all of the sudden they didn't get requests like, “Oh, can you please give me access to these,” or, “Can you create an EC2 for me?” And you know, those requests that no one really wants to deal with so, yeah, now they can actually focus on building the infrastructure that helps us and building, because before it wasn't about building it was about, like, supporting developers in doing their job more and now as a developer you can do all those things yourself. So the SRE or DevOps team can focus on building the infrastructure that helps us in different ways.
Jeremy: Right. Yeah. And I know I had a colocation facility for many, many years and I would always get the text message at 2:00 am that a blade went down or that, you know, there was a problem with the SanDisk Array or something like that. And you're always driving to the colocation, you have to physically change out hardware. So besides just not having to necessarily do that, I mean, obviously for me, and I say obviously, but maybe this isn’t obvious to people, but since I went serverless, I haven't gotten a single alert at 2:00 am to say that a server went down or something like that. Which made my life a lot better.
Angela: Yeah, that's actually right. Since we moved our data and our services as well to serverless, like, no more alerts at 2:00 am at you, like, “Oh, you need to scale this database,” or, like, “You're having problems with this service, you need to provision more.” So that's a huge benefit. And I know people that are on-call, they appreciate that for sure, like, no more midnight calls to do like, yeah, I need to click these three buttons, and you’re, like, really? Like someone can't do that automatically. You know?
Jermy: Yeah, It's a huge ... I think it's just a huge morale boost. I mean, and I love that these are more and more stories of this where you hear that operations people might be, sort of nervous or I guess intimidated by serverless, and then the ones that implement are like, “This the greatest thing ever, because now I can focus on things that actually matter,” which is something that's important as opposed to, like you said, just, you know, doing the daily toil.
So, let's talk about the Trustpilot architecture and look at sort of what you have now. I know you gave a presentation a while back. I'm sure things have changed, you probably move more stuff there, but just what's the typical overview look like? I mean, are you just using DynamoDB or are you using a sort of a broad range of databases?
Angela: Yeah, I mean one thing I always try to mention when it comes to database use is that you shouldn't put everything in one type because there are so many types of databases and, you know, you need to look at the purpose of that database so that you use the right one for your use case. And yeah, that's what we try to do at Trustpilot. We have data in Dynamo, Aurora, ElastiCache, Elaticsearch. We still have some data in MongoDB as well, in Redis, and like, you name it, different datasource for sure because we have different use cases and that's what you have to look at when you choose the data store.
Jeremy: All right. And now are you continuing to, sort of, evolve your architecture and move things away from MongoDB and some of these other sort of non-serverless options?
Angela: When it comes to MongoDB, I mean, we've tried to move like a lot of our data from MongoDB because of scaling. So we moved a lot in Dynamo, but some of the data that we still have MongoDB, it just makes sense to use MongoDB for the use cases that we have, so that's why we still have some data there. It’s still the right choice for us. But who knows? I mean, now we have Document database from Amazon which supports MongoDB. So that might be the right solution for us. But, yeah, for nowMongoDB works, so we kind of keep the data there, because we've also been in this, like, continuous refactoring for a very long time. So if ot works, we keep it there for now. You know?
Jeremy: And I think that's not uncommon. I mean it's going to take a while I think for people to get everything moved over and it's really hard when you have something that is running okay, and it's working for you and is not a problem for you to say, “Maybe we'll just leave that for now,” and then work on some other things, but you're definitely right. Once you've established, once you have sort of a legacy application, it is hard to think about just spending all that time refactoring it just to, you know, just to get the data piece changed, especially if you're not having the performance issues.
Angela: Yeah. Yeah, exactly. I mean, it is a lot of time that you need to invest to do that and we've done a lot of it, not to say that we have invested a lot of time in refactoring because of all the scaling issues that we were having previously, but now that we've hit like a moment where we kind of we can breathe in that space and we can focus on other things. You know, it's like okay, let's leave this as it is for a second because it works and focus on other things that might be on fire and that's the case we have here with MongoDB like we don't have that much data left. It's your point, quite a lot of data there, but it works for now, and like the complexity that we have in there it doesn't make it so easy to refactor. So that's the reason why we, right now, we are keeping things as they are with the data that we still have in there.
Jeremy: Awesome. All right. So I love to talk theory and a lot of what we talked about I think will help people but what about actual, like, real-world stuff? So let's talk about an example. I know you have a couple of examples here, but what are some of the problems that you were having? Because, again, if it's working, you know, maybe we don't need to invest the money. But when we start to have problems with things, you know, we need to think about refactoring those and maybe taking a serverless approach. At least that's how I think about it. So I know you've done this a few times at Trustpilot. So what’s a practical example? What was that problem that you solved with serverless that made it the right choice?
Angela: Well, it's all about scalability, to be honest. That was one real problem that we were having with scaling the database, the Mongo database. And to be honest, some of the issues were because of the way that we configured things and the way that we stored, And, yeah, exact example for this one was, in the beginning, we were storing all of our data into one cluster in MongoDB. And then, like, whenever you're putting a lot of pressure on one data point the entire class that would say also all the applications that were using that data were down. So we changed that in Mongo, but then still, like, scaling was a real problem for us.
And the company has grown a lot. So, yeah, that's why we knew we had to do something and that's why DynamoDB is the right, definitely the right choice for us, because it's safe. Yeah, it's scalable. And yeah, we know, and I shouldn't say we know, but we definitely hope that the company will grow even more, and that makes it the right choice because I don't see us ... I don't see the need of refactoring again what we have in Dynamo, for instance, because I know it can handle this scale even though it might double, you can still handle it. So that makes it a great choice for us because, as I said, we've been in this refactoring mode for a long period. We've been changing from MySQL database on-premise to MySQL database in the cloud then from MySQL to MongoDB. So, once you factor in that now that I can say, like, we have this in Dynamo, and I don't see the necessity of switching to something else because of scaling issues. It makes it just perfect, I would say.
Jeremy: Right. Yeah. So, you mentioned in that talk that you gave, you know, this idea of the user sign-ups and expirations on, I think was it, like, temporary accounts and things like that. So that was one of the big things. Was that one of the things you moved from Mongo to DynamoDB?
Angela: Yeah. Yeah, that that was one example. So the scenario was that we have, so people can sign up, but then they have to activate their account. That’s quite, like, a normal scenario, right? So they have to activate and if they don't have to be in like 30 days then we need to delete the account. And we're doing that in our only Mongo database for where we're keeping all the data for consumers. And of course, we're putting a lot of load, unnecessary load, on our primary database. So we decided to actually take this entire scenario out and we started, okay, of using events when consumers sign up. We will send an event to store some data in a DynamoDB which would say this consumer signed up and then we'll have another event coming from the activate ... like the activation API, saying this consumer activated, so then we'll delete the data in DynamoDB and we had one Dynamodb with all the unactivated accounts. And then from there we could look at, like, when the account was created and we can delete whatever accounts that are not activated in time. So this way we took that whole pipeline to serverless in its own context and, like, its own service and then doing it’s spin there separate from our primary data. And we did it with, like, three events and DynamoDB and then, yeah, another Lambda that was listening to ... was querying this database.
So it was a very simple scenario but we took a lot of load from the main database by not going like every, I think was like every day, queried the database to get like all un-activated accounts. And so, yeah, it was a very simple scenario, but like this just shows how you don't have to, like, refactor your whole database. You can just take parts of it or, like, queries like whatever it … This was just a scenario and we took it out and its own being ... I haven't checked it in, like, a very long time because it's just working, you know? I'm thinking maybe I should go and check it. No, but, like, that's like one example, where, as I said, like, you don't have to refactor the entire thing. You can just take part of it.
Jeremy: Yeah, and one thing that I love about that example is I think a lot of people who think, oh if I want to tack on or I want to go serverless that I've got to somehow re-engineer my existing application and that's a perfect example. Think if you got flooded with a bot or some sort of attack or something like that that was just adding new accounts and adding new accounts; it’s not even touching your Mongo database, right? Because it's all getting buffered in this DynamoDB table that is going to scale. And then unless it's activated right then, those events don't get passed through. So you're taking off all kinds of load. You're making your primary database that is already scalable but maybe can't handle those transactions or that number of transactions, you’re offloading all of that and that's just a perfect example. I love that and it's a great, you know great ... I think anybody could use that exact same scenario for their sign-ups right now and take a ton of load off that database and, like you said, just the maintenance of having to query through a MongoDB and look for the accounts that were not activated yet. That's just ... it's just a waste. I mean that's a complete waste. And with DynamoDB were you just using, like, the TTL to expire on activated accounts?
Angela: Yeah. Yeah, so that's like ... I remember in the presentation when I was presenting this scenario, I gave like two options because one is to look at the ... just have a cron job that will trigger your Lambda to, like, query the DynamoDB based on the data that you have in there, but the other one is to just turn expiration, like expired because, you know, like, your accounts will expire in 30 days, for instance, and then you can use TTL to expire those items. And then you can use this trim to trigger a Lambda that with that data, for instance. So that's another way of looking at these. So you have multiple choices. That's a good scenario to be in and it depends on what you're doing, right? Because remember on streams he will trigger any kind of activity and in our case, we only cared about expired so I think we just went for cron job in this instance, but if it was possible to filter what kind of events the stream was in, definitely TTL would have been the right approach for us.
Jeremy: Right. And even getting them in batches, I think the TTL approach would be interesting because then you could have that stream every time an account wasn't activated you could forward that off to S3 or something like that, so that you could store a record of the accounts that never were reactivated, have all kinds of data on that but not have it in the operational database which, again, is another really, I guess, good pattern to use in serverless, right, is to say the things that need to be in operational databases, store those and operational databases; the things that can be stored for reporting and for, you know, data like analytics and that kind of stuff put, those in another place, like S3 or something like that.
Angela: Yeah. Yeah. Absolutely. I mean that's the beauty with serverless that you can split things and you should split things like all of these as you say, like keep your main database doing what's important for you, but all these like extra scenarios put them outside and have them on their own so they just work on their little things and they are very good at doing that thing, right? So that's the beauty with serverless and it's something that people should consider and not just put everything in one solution and build like a monster.
Jeremy: Awesome. Well, that's pretty good advice to end with, Angela. Thank you so much for taking the time to chat with me. Super informative stuff. If people want to find out more about you, connect with you, how do they do that?
Angela: Yeah. So, I mean you can find me on LinkedIn by using my name or on Twitter also by using my name. And yeah, I'll be happy to talk more about databases, serverless, or whatever, anything else. So yeah, just use my name.
Jeremy: All right, and so and Trustpilot, Trustpilot.com if you want to check that out and sign up there. So I will get all that in the show notes. Thanks again, Angela.
Angela: Thank you. It was a pleasure talking to you.

Episode source

Serverless Chats Follow

Episode #79: What to do with your data in a serverless world with Angela Timofte

Serverless Chats