Software at Scale 29 - Sugu Sougoumarane: CTO, PlanetScale

Software at Scale - Un pódcast de Utsav Shah

Categorías:

Sugu Sougoumarane is the CTO and Co-Founder of PlanetScale, a database as a service platform company. Previously, he worked at PayPal and YouTube on databases and other scalability problems, and he’s one of the creators of Vitess, a graduated CNCF database project that served as YouTube’s metadata storage for several years.Apple Podcasts | Spotify | Google PodcastsWe discuss his early days at PayPal and YouTube, their initial scalability problems, the technology that helped them scale out their products, the motivation behind Vitess, its evolution into PlanetScale, and some of the innovative ideas coming out of PlanetScale today.Highlights5:00 - Being interviewed, and hired by Elon Musk at X.com. Working with some of the PayPal mafia like Max Levchin and Reid Hoffman.9:00 - Solving PayPal’s unbalanced books via a better Activity Report15:00 - PayPal’s Oracle database and the initial idea of sharding. 20:00 - Early YouTube architecture and culture, and the consequences of explosive growth.24:00 - Sharding YouTube’s database. 32:00 - The story behind Vitess. It all started with a spreadsheet.40:00 - How a user with 250,000 auto-generated videos caused YouTube to go down. How Vitess fixed that, and the implications of a SQL proxy.45:00 - The motivation behind keeping Vitess open source, and interest from other companies dealing with similar problems. The motivation behind going from Vitess to PlanetScale53:00 - How PlanetScale can improve on some traditional relational database developer experience. How NoSQL was never actually about getting rid of SQL and was more about skipping traditionally rigid schemas. MongoDB’s support for SQL, and PlanetScale’s approach to solving schema changes.58:00 - Technical innovations coming out of PlanetScale.1:05:00 - Running databases in containers and the implicationsTranscriptUtsav Shah: [00:15] Hey, welcome to another episode of the Software at Scale podcast. Before we get started, I would just like to ask a quick favor from any listeners who have been enjoying the show. Please leave a review on Apple Podcasts. That will help me out a lot. Thank you.Joining me today is Sugu, the CTO and Co-Founder of PlanetScale, a managed database as a service platform company. Previously he worked at YouTube for a really long time from 2006 to 2018 on scalability and other things, including Vitess and Open Source CNCF project, and before that he was at PayPal. Welcome.Sugu Sougoumarane: [00:53] Thank you. Glad to be here.Utsav: [00:56] Could you maybe tell me a little bit of, you know, when you got interested in databases and MySQL? Cause it seems, at least from my understanding you've been interested in distributed systems, my sequel databases for like a really long time.Sugu: [01:11] Yeah. So, I would say, the first serious involvement with databases started when I joined Informix in 1993. That was the time when there were three huge database companies, Informix versus Cyber's versus Oracle. And I guess eventually Oracle won the war, but that was the first time when I came in touch with databases. I specifically did not work with the engine itself, but I worked on a development tool called 4GL, which was popular those days; not there anymore. So that was actually my first introduction to databases.Utsav: [02:08] Okay, and I guess you took this skillset or this interest when you moved to YouTube from PayPal. So, what were you doing at PayPal initially and what made you decide that you want to work at YouTube?Sugu: [02:25] So PayPal, I guess when I moved from Informix to PayPal it was mainly because this was around 2000. This was the time when the internet boom was happening, and it was clearly obvious that Informix had fallen behind it; like they were still trying to make a client-server work and that was pretty much a dying technology. That's when I kind of decided to make a change. It was somewhat of a career change when you realize that the technology, you're working on is not going to be around much longer. I don't know if other people saw it then, but it was pretty obvious to me. So that's the time when I decided that, you know. I kind of made a decision to start back from the beginning because you'd be surprised in year 99, not many people knew how a website worked. Now, it's probably common knowledge, but I had no idea how a website worked. 03:36 inaudible requests come in, you know, what servers do what, and those things. So, it was all kind of unknown science to me. So, for me, it was like I'm starting a new career; like even within software engineering, each of these things sounds like a different career, you know, there are choices. It is as if I stopped being software engineer A and now, I'm going to become software engineer B. So, the only skills I could carry over were my ability to write programs. Beyond that, I knew nothing. And I had to learn all those things from scratch again at PayPal. By the way, I did not join PayPal directly. I actually joined X.com, which was founded by Elon. So, Elon actually hired me and then later, X.com and PayPal merged, and that's how we became PayPal. Utsav: [04:41] I think if you can tell us a little bit about Elon Musk, I feel like this podcast will just shoot up in popularity because of anything to do with Elon Musk now.Sugu: [04:50] Alright. There is this story I think I've said that it's not really commonly known, but at least, I'd say I might've said that in one place, which is that, actually, I failed the interview. I went for the interview and Elon thought I was not worthy. They decided to pass on me. But then I was so excited when I found out, at that time I didn't know who he was, this was the year 2000. I mean he had just sold his previous company, but there were many other people who had done that. But when I found out what he was building, I got so excited that after I heard that he passed on me, I sent him an email saying why he should hire me. And somehow that email convinced him to bring me back for a second round. And then I got hired and the rest is history.Utsav: [05:55] So interesting. What is it like working with him? Did you work closely with him? How big was X.com at that time?Sugu: [06:02] It was very small. I think we were about 15 engineers, 12, 10, 15 engineers. I think the total was 50, but most of the other people were customer support type of thing because it was a bank, right, so there was a lot of customer support. Yeah, at that time, it's surprising many of those people that I worked with, not just Elon, are now celebrities like Jeremy Stoppelman who was like sitting right next to me. I remember asking, Jeremy Stoppelman is the CEO of Yelp in case you want to look it up. So, I remember I went there and he was the guy next to me. So, I said, "Hey, you know where's the bathroom?" [06:53] "Where do you get coffee?", and sometimes you say, "Yeah, it's there", sometimes you say, "I don't know". And finally, I asked him, "So how long have you been here?" "Oh, I joined yesterday". I literally joined the day after Jeremy; I got hired. Yeah, so those were good times. So, Elon himself, by the way, like SpaceX, he used to talk about that [07:21] as his dream, you know, going into space. And he's really inspiring when he talks about it; like he starts talking and everybody just shuts up and listens because he used to talk about him being broke. There was a time when he was broke, where he said he didn't have money for food. Did you know that Elon was broke once in a while?Utsav: [07:49] I feel like I've read about it or seen some YouTube video where; now there are so many YouTube videos where they take quotes of Elon and it's like inspirational stuff. I think I've seen some of those. That's so fascinating. And yeah, now that clearly became the PayPal mafia with so many people making so many successful companies and stuff. Like, did you work closely with any of the others, like Jeremy Stoppelman? Sugu: [08:15] Yeah. There were lots of them. So, I actually worked with Max a lot more right after the merger. So Max, definitely, Reid Hoffman, although I don't think I interacted much with Reid Hoffman, there is Roelof, for sure, Steve 'Chairman' Chen, obviously that's how I ended up at YouTube. My memory fails me, but if you look at the mafia, yeah, I have worked with all of them.Utsav: [08:52] Did you work on the Igor program, like the fraud detection stuff, which I've read about publicly?Sugu: [08:58] Oh, no, I did not. I worked on some fraud systems, but not Igor specifically. I am trying to remember who worked on it. It was probably Russel Simmons.Utsav: [09:14] So what did you do at PayPal? I am curious.Sugu: [09:18] So the thing is, this is a good lesson, the struggle I was having initially at PayPal was recognition. There were a couple of things. I was kind of an unknown there, nobody knew who I was, and I was part of the merger, right? I came from the X.com PayPal merger, and there was a lot of back and forth about which technology to use. Eventually, we decided to continue with the original PayPal technology and stuff using the X.com technology. So, the struggle I had was, how do I prove myself to be worthy. So, in software engineering, you have to show that you are good and then they give you good projects. You don't get to choose the projects because you get assigned projects, right? So at least that's how the situation was then. And by the fact that I was an unknown, all the good work was going to the people who were already recognized for how good they were. So, there was this one tool called The Activity Report, which basically tallied the total of all transactions on a daily basis, and say this is the total in this category and credit card category, ACH category, and that kind of stuff, and used to report those numbers. But the auditors came in and they said, you know, with these transaction numbers that this report produces, you need to tally them against your bank accounts; against the total balances that users have. The first time the auditors came, there were like a few million dollars off, you know?And they're asking like, where is all this money? I don't know if you remember when PayPal initially was founded, the business model was to make money off of the float. When we transfer money, there’s a float. We'll basically invest that money and whatever money we make out of that. So, people are like, 'oh, it must be the float', you know, that difference was because of the float and the auditors were like, 'no, that explanation is not sufficient. You need to show the numbers and know that they add up and that they tally’. And so, finally, we figured out where all it is on the tally. But the problem was that PayPal was software written by engineers and how do you create transactions? You insert a row in the transaction table and commit, you insert the row in the transaction table, update a user's balance, and then do a commit, right?So that's how you do it. But sometimes the transactions go pending, in which case we don't update the balance. Sometimes we complete the transaction, and then after that, you update the balance. And guess what happens when you start doing these things. There's this thing called the software bugs that creep up. And so, every once in a while, after they release the transactions wouldn't add up to the user's balance. And sometimes they won't tally with what's actually in the bank accounts. And so, every few days, the activity report is used to produce discrepancies. And because the software was written inorganically, there was no easy way to find out like you have, you know, million transactions. How do you figure out which one went wrong? The numbers don't add up to the totals here. And every time this used to happen an engineer was assigned to go and troubleshoot.It was the most hated job because it's like finding a needle in a haystack. Like you'd say, 'Ah, activity report conflict. Now I got assigned to do this. Why is my life so bad?' You know? So that was the situation. And then, at one point in time, I said, you know what? I'll own this problem. I'd say, 'Just give me this problem, forget it. I will investigate this every time it happens. And I will also fix the software to make it easier.' So, I took on this and I rewrote the entire program, such that, to the extent that, at the end of its maturity, it would find out exactly, previously it was all totals, that at the end, once it got fully evolved, it will say 'this particular account has this discrepancy and most likely this is the transaction that caused it'.So, it will produce a report or a list of conflicts with a very clear clue about where the problem is, and then you spend five minutes and you know what went wrong. It would find software bugs in it because sometimes somebody forgot to update the balance. It'll say, 'Hey, you created a transaction and didn't update the balance, you know, so here go fix it'. So that it used to do. And eventually what had happened was because it was so solid, it kind of became foundational for PayPal. Like people used to ask for reports about, you know, transactions, and they would say, 'How do you know this is correct?' and we'd say, 'Oh, I validated it against a not-so-good activity report'. 'Okay, then it's good. I trust these numbers. So that kind of stuff. So, making that software and taking that pain away from all the other engineers, got me accepted as one of the cool engineers. Utsav: [15:12] So interesting. How big was the engineering group at that time? Sugu: [15:19] So the PayPal by the X.com site, there were about 15 and on PayPal's side there were also 15. So, we were about 30 to 40 engineers.Utsav: [15:28] And once you got accepted by these people, what were your next projects, I guess? Sugu: [15:33] So having gotten accepted, I became the kind of person who 'd say, 'If you give a problem to Sugo, you know, you consider it a salty kind of thing'. So, which means that anytime PayPal had a struggle where you say, 'Oh, you need the top brains on this', I used to get called. And the biggest problems later were all about scalability. So, I kind of moved into the PayPal architecture team, and used to solve all problems related to scalability. And the scalability was always the Oracle database, because it was a single instance database, and every time, the only way we could scale it was vertically. So, the idea of sharding was actually born there for us. We needed to shard this database. But it eventually succeeded only on YouTube. We didn't succeed at sharding at PayPal itself, but we kind of knew how to do it. We kind of learned; we went through the learning process when we were at PayPal about how to shard a database, but by then the [16:46] had taken place. There was a drift and lots of things took place after that.Utsav: [16:50] That's so interesting. And is that where you met Mike? You were discussing some of these stories, [16:55]Sugu: [16:56] Yeah, Mike was reporting to me. So, there was Mike, there was another person called Matt Rizzo. They were the top engineers, you know, one of the top engineers at PayPal. We still use some terms that only Mike and I would understand. We'd say, 'Oh, this is APCD right?' and then we'd laugh. Nobody else, I would say, would understand what that means. [17:28] ACH process completed deposits, Bash tool that he wrote. You probably know that Mike is really, really good with Bash.Utsav: [17:39] Yeah. He knows his Bash.Sugu: [17:42] He knows his batch. That was kind of my fault or doing because there was a time when I just told him a problem, you own this problem, you have to solve it. And he'd say, 'Do you think Bash is a good idea? Sounds cool'. I said, 'If you think it's a good idea, it's a good idea. It's up to you. It's your decision'. So, he went and he is still scarred by the fact that I let him do it and said, 'Why did you let me, why didn't you tell me it was a mistake?' Utsav: [18:15] I can imagine this very horrible process at PayPal, like adding and subtracting money and it's all written in bash.Sugu: [18:23] That wasn't written in Bash, but there were some tools that we wrote in Bash. Yeah, the money part was all SQL, SQL+ actually. [18:36] I guess, that could be called the Kubernetes of those days. You were writing an orchestration tool and that we wrote it in bash.Utsav: [18:46] Okay, that's so interesting. And then you're saying that the idea of sharding was born at PayPal, or like the idea of sharding or my SQL database, like you figured out how to do it. Why would you say you did not complete it just because there was not enough time or, at that point did you decide to leave and go to YouTube?Sugu: [19:04] There were a lot of reasons, some of which were changes in management at that time. There were the PayPal engineers and there were the eBay management that had acquired us and the relationships were very strained. So, it was kind of a very difficult thing. So, there was no coordinated, what do you call it effort? There was no focused, coordinated effort towards solving it. It was all over the map. There were a lot of committees spending a lot of time in conference rooms, discussing things, not coming to conclusions, you know, that kind of stuff where there were just 'too many cooks in the kitchen' type of thing. I think the core first kind of figured out how this needed to be done, but we were not able to, you know, push that idea forward, but you finally made that happen at YouTube. So, which proved, you know, it kind of absolved us of you know what you're doing.Utsav: [20:20] So, yeah, maybe you can talk about the earlier days at YouTube. What was YouTube like in 2006? How many people were there? Was this like pre-acquisition, post-acquisition? I don't know the exact -Sugu: [20:29] -just around the time of acquisition? So actually, around the time when YouTube was founded, I actually went to India to open the PayPal India office. But I used to visit YouTube every time I visited. I came to the US and I used to come pretty often. So, I've been closely following, at some point in time, Steve, and Matt kept hounding me saying, 'What the hell are you doing there? This thing is catching fire. You need to come and join us right now'. So finally, I came back and joined them. But yeah, the YouTube culture was kind of a carryover from PayPal. It was very similar, lots of heroics, you know, lots of, like each person was an impact player. Each person made a huge difference. Like one person owns all of the search, that kind of stuff. One person owned the entire browsing experience. There were only like 10 engineers. So, everybody owned a very important piece of the stack.Utsav: [21:50] Okay, and what were some of the early problems that you ran into? Or like, I guess what were the fires that you were fixing as soon as you started?Sugu: [21:59] So the one big thing I remember, there was this rule at YouTube where you join, you have to commit on your first date.Utsav: [22:11] Okay.Sugu: [22:12] I broke that rule. I said, I'm not going to commit on the first day. But I spent time learning the code and produced a huge PR, a huge pull request, that we wrote an entire module because it was badly written. So, the biggest problem I found, I felt at least at that time, was because the code was organically grown and incrementally improved, it was going to run out of steam as we added more features. So, I felt that we needed, you know, to clean it up so that you can put a better foundation underneath so that you can add more features to it. So that's basically the first work that I did. But then later after that, the bigger problem also became scalability because we were single, we were uncharted. There was only one database that was called Main and we had to do the re-sharding. So that was the biggest challenge that we solved. We solved the re-sharding [23:22] test later, and adopted that sharding method. But when we first started, it was at the app layer.Utsav: [23:33] Okay, at what scale did you have to start thinking about sharding? How many engineers were there? How much data was going in and what was being stored in this database? Was this just everything of YouTube’s metadata in this one large database?Sugu: [23:48] Yes, all the metadata, not the videos themselves because the videos themselves were video files and they were distributed to CDN, but all other metadata, like video title, likes, dislikes, everything else was stored in my SQL database. It was running out of capacity very soon, and the hardware wouldn't have kept up. Vertical scaling wouldn't have taken us much further. And that's when we said, 'Okay, we need to shard this'. We made a very tough call, which is that we are going to break cross short, transactional integrity. It was actually not as big a deal as we thought it was going to be because of the nature of the app, maybe because users just mostly work with their profile. There were a few things about comments because I posted a comment to you, so where do you put that comment? and some little bit of issues around those types of things, but otherwise, there was still quite a bit of rewriting because an application that asses an unsharded database and makes it [25:10] is non-trivial no matter how you put it. We had to build a layer under the app and write everything, change everything to go through that layer, follow certain rules. And then we changed that layer to accommodate sharding. That's basically how we solved the problem.Utsav: [25:27] Yeah. That layer has to basically decide which shard to go to. So, you were just running out of physical disc space on that My SQL box which seemed like the problem?Sugu: [25:37] I'm trying to remember, whether it was a disc, it might've been a CPU actually.Utsav: [25:42] Okay.Sugu: [25:44] I think it was the CPU. It's either CPU or memory. I think we could have a disc. I'm trying to remember if it's CPU, memory or IOPS, most likely memory, because of working set size issues. I could be wrong.Utsav: [26:14] I'm just curious, you know, do you know how many videos you had in YouTube at that time when you started running, [26:20]Sugu: [26:21] I remember celebrating. Let's see it was either a hundred million or 1 billion, I don't know.Utsav: [26:31] Okay, 1 billion videos and that's when you start running out of space? [26:35] total video, that just shows you the part of like my SQL and like [26:40]Sugu: [26:42] Yeah, it's amazing that I don't know the difference between a hundred million and 1 billion, you know, but something, yeah.Utsav: [26:52] Sounds like [26:53] I don't even know how many videos will be on YouTube now. I can't even imagine.Sugu: [26:59] The reason why it works is that most videos are inactive, right? You just insert a row and move on, and so it is more a question of what is a working set, right? How many of those videos are popular? They come and live-in memory and we had a Memcached layer also in the front, and so it was more a question of how many popular videos people are trying to hit and can all that fit in memory?Utsav: [27:31] Okay, that makes a lot of sense. Then you could put a caching layer in front so that you don't hit the database for all of these popular videos. That's so interesting. And, given all of this, it makes sense to me that you can shard this because videos, you can totally imagine, it's relatively easy to shard, it sounds like. But yeah, what do you think about resharding? That seems like, how do you solve recharging? That sounds like a very interesting problem. And maybe just to add some color to what I think resharding means, it's like you have a data on like my SQL box A and you want to move some of it to my SQL box B or like split from one database to two and two to four. And you have to somehow live transfer data from the database, like, how do you solve that?Sugu: [28:19] So, there's the principle and there is a mechanic, right? Typically, the mechanic is actually more, kind of straightforward, which is typically when you start sharding, the first thing you do is actually pull tables out. Like if there are 300 tables, you'll say, you know, these 20 tables can live in a separate database. It's not really called sharding or resharding cause you're not sharding, you're splitting. So, the first thing people usually do is split a database into smaller parts because then each one can grow on its own. The sharding aspect comes when a few sets of tables actually themselves become so big that they cannot fit in a database. So that's when you shard and when you make the sharding decision, what you do is, the tables that are related, that are together, you shard them the same way.Like in YouTube's case, when we did the sharding, we kept the users and their videos together in the same shard. So that was actually a conscious decision that users and videos stay together. There is actually a sharding model that allows you to represent that as a relationship. Now that we test actually exposes, which is basically there's some theory crafting behind there. But from a common-sense perspective, you can always use about 9 users and their videos, being together. So that's the mechanic part.On the principal part actually, the way you think about sharding is basically beyond the scale, you have to actually rethink your application as being fully distributed with independent operators working by themselves, working on their own. That means that some features you often have to forego for the sake of scalability. In other words, I mean, like crazy features, like what is the real time value of the total number of videos in the system? I want that up to the millisecond. That type of question is unreasonable for a sharding system. You could probably do a select count star, against a small database and can actually do it. But if you are having tens of thousands of servers answering that question becomes harder, right? And then at that time we are to start making trade-offs like, is it really important that you have to know it up to the, like, what if it is up to the minute? Is that good enough?So those kinds of traders here make at a high level and most people do because you cannot go to that scale unless you make these tradeoffs. But once you make these trade-offs, sharding becomes actually very natural, because the one thing that you make, for example, if you are going to have the entire world of your customer, you are going to have, you know, 7 billion rows in a user account. And sometimes many of them create multiple accounts. So, you are going to have billions of rows. You cannot build features that put them all together. You have to build features that keep them separate. And as soon as your features follow that pattern, then sharding also kind of drives that same decision. So, it’s kind of more or less becomes natural.Utsav: [32:21] Okay, that makes sense to me, and it seems like you said that you have a sharding solution that predated the tests. So maybe you can talk about the motivation behind the test, like, and maybe the year that this was in like, well, if you have any numbers on, you know, at what point did you seem [32:36]Sugu: [32:36] [32:37] I'm finding myself surprised that there are details I should know that I don't know, like the number of videos we had when we were sharding, right? It was 2007, 2007 is, actually wow,15,14 years ago. So, yeah it has been many years. So, in 2007 was I think when we did our first sharding, if I remember correctly. Maybe 2008? Around 2007 or 2008, one of those few years was the first time when we did sharding. But the test was not born because we needed to do sharding, obviously, because it was already sharded it was born because we couldn't keep up with the number of outages that the database was causing. That was actually why it was born. It was quite obvious that as the system was scaling, in spite of being sharded, there were a large number of things broken within the system that needed fixing for it to work.And the system overall, when I say system meets end to end the entire system, and that includes the engineers and their development process. So that's how big the problem was, but we didn't solve the entire problem. We said from a database perspective, what can we do to improve things, right? So that's kind of the scope of the problem. And more specifically Mike actually, took off, and I think, spent some time in, Dana Street coffee at Mountain View and came up with the spreadsheet, where he described, he kind of listed every outage that we have had and what solution and how did we solve it? Is there, what would be the best remedy to avoid such an outage? So, when we actually then sat down and studied that spreadsheet, it was obvious that a new proxy layer at any build and that is how we test was born.And the whole idea is to protect the database from the developer. For example, they were like, at that time 200 developers, if you look at it right, the developers, don't intentionally write [35:10] and a developer doesn't write a bad query all the time. But if a developer wrote a bad query, only once a year with 200 developers, 50 weeks a year, you do the math. How often do we see outages in the database? Almost every day. That's what it amounted to; we were seeing outages almost every day. And very often they were actually put 'quote and quote', bad queries coming from that one user role that they will never repeat, but they've fulfilled their quota for the year, you know? And so, what we did was we found common patterns where that would cause data.So, the big change that we made was if I wrote a bad query, I have to pay the price for it, not the entire team, right? Now today, if I wrote a bad query, that query runs on my SQL, it takes on the entire database. So, we wrote with tests in such a way that if you wrote a bad query, your query would fail. So, in other words, what we used to look at is how long is the query running? If the query is running too long, we kill it. If the query is fetching too many roles, we return an error. So those kinds of defenses we added in with tests early on and that pretty much took it a long way. The other big feature that we deployed was connection pooling, which my SQL was very bad at. The new SQL is slightly better. It's still not as good. So, the connection pooling feature was a lifesaver for them too.Utsav: [37:00] That makes a lot of sense to me. And maybe you can tell us a little bit about why my SQL didn't have the capability of defending itself. So, you might imagine like, you know, just from somebody who has no my SQL experience, they can just be like a [inaudible 37:16].Sugu: [37:17] It's very simple. The reason is because my SQL has, what do you call, claimed itself to be a relational database. And the relational database is required, you know, to not fail. So, we at YouTube could be opinionated about what query should work and what query should not work. That freedom my SQL doesn't have. Every query that is given to it, it has to return the results, right? So that's the rule that you had to, to be qualified as a relational database, you had to fulfill that pool and that pool was its curse.Utsav: [38:01] Okay, but you could configure maybe like a session timer or something, or like a query timer, or would that lose deference which is not good enough basically.Sugu: [38:09] They actually added those features much later, but they didn't. Yeah. The newer, my SQLs now do have those features, but I don't know how- but they are all behind. They are not as usable because you have to set it in a query parameter, you have to set it in your session, you know, so it's not as, like by default you just install my SQL, and start running it. You won't have this property, you have to configure it, which is another hurdle to cross. Whereas in Vitess you installed the test, you'll start failing, you know, people will complain like, we see the opposite problem, right? Oh, [38:53] My SQL, why is the test failing it? Because you are trying to pitch a million rows. If you want to do 10 million rows, you have to tell the tests that you want to get 10 million rows and then it will give you them. But if you just send a query, it won't give you a million rows.Utsav: [39:10] Yeah. What do you think was like the one really important? Which as you said, there's connection pooling, it's also limited, bad where he's like, how did it figure out that a query was not one, maybe heuristic is it's turning too many rows and it would just fail fast. What were some other key problems?Sugu: [39:27] Oh, there were, we added so many, the coolest one was, when engineers wrote code, they are, like me too, I'm guilty of the same thing. You think how many videos could a user possibly upload? And we were thinking manually in the [39:51]. For me to upload a video, I have to go make it, edit it and then upload it. You know, we will have, you know, 2000 videos where we'll be a number and, you know, look at how old YouTube was. It was two years old. How many videos can you produce in two years? So, 2000 videos. So, selecting all videos of users, not a problem, right? So, we never put a limit on how many videos we fetched. Then we selected a video by using, and then there was one user who ran a bot. I think there's no other way. Well now there are accounts that have billions of videos, but at that time that user had 250,000 videos.And that wasn't a problem per se, because that I couldn't have been around, but it became a problem when that account got high, got listed on a YouTube stream page. Yeah. The index page, right. The landing page. So, which means that every person that went to youtube.com, you should a query that pool $250,000, because there was, so the, one of the changes we made was that, any query that, that's why we added the limit, plus like, where you say, if you send an aquari with no limit class, we will put a limit. And if that limit is exceeded, we'll just return your arrow thing that you're trying to fish to NATO. So, there was one protection, but the cooler protection that we did was, there are some queries that are just inherently expensive, right? Like this one, if this, I think in this case, I think it might've been a select count star.If it's a select count, it doesn't fish to 50,000 rows, but it scans to 50,000 rows. So, the, the feature that we did was if a query got spam, like because of the fact that it's coming from the front page, which means that my school is receiving like a very high QPS at the same period in that case, what we do is we check if the query is already learning from some other request, if it is, if there are 10 other requests of the same query, they are not sent to my SQL. We wait for them to return and return that same dessert to all those requests.Utsav: [42:21] Okay. Would that be problematic from a data consistency perspective? Probably not. Yeah,Sugu: [42:26] It is actually, yes, it is but it is no worse than eventual consistency. Right. So, if you are reading, if you are reading from a replica, no problem. Right. If you are doing something transactional. So, the rule that we had at YouTube was if you're doing something transactional, so there's also another reason like the rule is if you read a rope, asse that rule is already stayed. Why? Because as soon as you read the rule, somebody could have gone and changed. So that stainless guarantee kind of carried over into this will also is that when you do a select, when you do a select of the query, it may be slightly outdated because one thing is you could be reading from the previous period that is running, or sometimes it could be going to a replica.So it wasn't, it was something that nobody even noticed that we had made this change, but it's like, we've never had an outage later to, you know, very spam after we did that feature. Yeah. But more and more, more importantly, the rule that we followed was if you want to make sure that the road doesn't change after you read it, you have to select for update. So, if you plan to update a row and update based on the value that you have read, you have to select for update. So that's a rule that all engineers followed.Utsav: [43:56] Yeah. It makes sense. And it's better to just be explicit about something like that.Sugu: [44:00] Yeah. Yeah. Because of the fact that the, the, my SQL NVCC model tells you, right. The MVCC, the consistently it's called consistently, that's actually the actual practical Lim is something else. but that model basically tells you that the role that my simple serves to you is technically obsolete. Utsav: [44:24] Okay, and then it just makes sense to me. And you’ll open source with us. I remember hearing the reason is you never wanted to write another way to test. So, you decided to open source it from day one. Can you maybe tell me a little bit about, you know, the motivations behind making it a product? So, it's a great piece of software, which I used to buy YouTube from my understanding for like a really, really long time. At what point did you decide, you know, we should actually make this available for other companies to use. And what is the transformation of the liquid test for PlanetScale? Sugu: [44:58] So even before PlanetScale, even as early as 2014, 2015, we felt that we should start promoting the test outside of YouTube. And one of the reasons was, it wasn't a very serious thing, but it was kind of a secondary motivation. We felt that even for YouTube, for the test to be credible, even within YouTube, this product needs to be used by other people beyond YouTube. In other words, you know, for a company to trust a product, a lot of other people must be using it. That was kind of the motivation that made us think that we should encourage other companies to also use, [45:46]. And actually, Flip card was the first one. I think it was 2015 or 16 that they approached us and they said, hey, it looks like this will perfectly solve the problem that we are trying to solve, and they were the first adopters of the test. Utsav: [46:05] Okay, so that makes sense. And that's like it was open source and you felt that other companies should use it 2015, it sounded like Flip card was interested, but then what is the transformation of from then to, you know, let's make it a product that people can buy because this it's an open-source technology that, how do you go from that to product?Sugu: [46:28] Yeah. So, I can talk about how PlanetScale came about, right? So, the other thing that started happening over time as the test started evolving as a project companies like Flip card started relying on it, Slack started using it, Square started using it. And they were all scared of using the test. Why, because this, this is basically defining technology. It starts something, it's basically a lifetime commitment type of change. And the question in their mind is what, like, what's the guarantee that, like what YouTube is interested in making sure that, you know, it will continue to work for slack, right? Why, like they were, a large number of companies were worried about that part, where, you know, you do focus his videos by like, how can we trust the technology?How can we trust the future of our company, right on technology? What if you abandon the project and that kind of stuff? So, overtime, I could sense that hesitancy growing among people. And this was one of the contributing factors to us starting planet scale, where we say, we are going to start a company at a company that is going to make a commitment, you know, to stand behind this open-source project. So, when we made that announcement that we are starting the company, there was a sigh of relief throughout the test community. You know, finally, I can-Utsav: [48:07] -depend on this really important.Sugu: [48:08] I can depend on this project. It's not like it is here to stay. There is a company who is going to make sure that this project stays healthy.Utsav: [48:18] That makes sense. Yeah.Sugu: [48:21] There were other factors also, but this was one of the, definitely one of the major contributing factors towards making. So, I think at the end of the day, this is something, this is generally a problem in open source. You know, that, you say open source is free, but there is an economy behind it because the engineer's time is not free. And for me to spend time on an open source, I have to be paid money because, you know, I have my family to take care of, you know, I need money to live. And so, I'm expecting engineers to contribute to open source in their free time. it can work for smaller projects, you know, but once the project becomes beyond certain sites, it is not a part-time, it cannot be a part-time contribution. It is a full-time contribution and the engineer has to be paid money for it. And so there has to be some kind of economy behind that.Utsav: [49:32] That makes a lot of sense. And I think it also tells me that the problem that slack and square and these other companies are facing was just so large that they were, that they wanted to use the project, given all of these issues with, you know, we don't know the future of this project, but there was no other solution for them given that existing, my SQL stack and their hyper-growth and the kind of problems that they'd be dealing with.Sugu: [49:59] Yeah, that's right. So that is something I still wonder about, right. If, if not retested, what could have, is that an ultimate, you know, solution, the alternative doesn't look as good either because the alternative would be a closed source product. Like, you know, that scares you, that even more scary, because what if the company COVID out of business, right. At least, in the case of Vitess the source code is there. If planet scale goes away, slack can employ engineers to take over that source code and continue to operate so that confidence is there. So that is one big advantage of something being open source, gives higher confidence that, you know, in the worst-case scenario, they have a way to move forward.Utsav: [50:53] And when you, when you finally decided to make a company and a product, like, did you, what are some things you've learned along the way that, you know, you need to have in with tests for more people to be able to use it at a first impression to me, it's like, there are not millions of companies running into these scale issues, but like, what are some interesting things that you've learned along the way? Yeah.Sugu: [51:14] All I can say is I am still learning. Yeah, every time I come there, I realize, oh, you know what I thought was important. It's not really that important. like somebody asked me, if you are agonizing so much about data integrity, why is Mongo DB so popular? Right? I mean, Mongo DB is pretty solid now, as far as I can tell, but it was not solid when it became popular. Right. They made it solid after it became popular. So, one good thing. If he doesn't, did not, for the longest time did not care about data integrity, but people still flock to it. So, there is always a trade- off in what is important. And you have to find out what that is and you're to basically meet people where they are.And in the case of the test, it’s actually usability, approachability, and usability. That is a bigger hurdle to cross before you cannot up your tests. And that is a big, much bigger hurdle than, you know, the data integrity guarantees that have people looking for. So those are for example, one lesson, where you have to, so the one thing we test is we used to listen to, we used to, we have our ears open. You're constantly listening to people, giving us feedback about what is not working for them and fixing those problems. It turns out that is insufficient. Who are we not listening to? We are not listening to the person, the quiet person that drives over the weekend, doesn't work for them and quietly walks away. Right? That's that voice we never heard. We only heard the voice of somebody who tried to use the test and like went past the problem of not being able to use it, got to use it, found a missing feature and is asking for it. So, we've been focusing on very specific features about how to make the test, but we completely forgot about, you know, how to make it more approachable. Right. So those are the problems we are now solving at PlanetScale. Okay.Utsav: [53:41] That makes a lot of sense. Yeah. Maybe you can tell us about one usability feature that you feel was extremely important that your live build.Sugu: [53:49] The biggest one for example, is a schema deployment. Subconsciously every developer will cringe. When you say, use a database and manage your schema, use a database, they would like, but as soon as you say schema, almost every developer cringes, because, oh my God, I have to deal with that. Bureaucracy, like send it to for approval. DBA is going to detect it. our schema is going to break something. Some app is going to break all those headaches I have to deal with. So, what we have done at planet scale is give you an experience very similar to what you would do as a developer with your source code. You do the same thing with your database. You can take your database ratchet, applied, schema changes, tested, and then have a workflow to much it back into production and very natural. And it looks almost similar to how you develop your source code, right? So, it's a very natural workflow and it applies to the database and you don't have to deal with, you know, that whole bureaucracy, it's all part of this nice, nice workflow. It handles conflicts. If multiple people want to change the schema at the same time, it pipelines them the correct way. And if they cannot go together, we will flag it for you. So those are really, really nice feature. And that seems to be really going well with the developers. Utsav: [55:28] Interesting. Yeah. And I think it speaks to, you know, why databases like Mongo DBS are popular?Sugu: [55:33] Yeah. No schema is not a no SQL that made Mongo DBS when it's the no schema part,Utsav: [55:43] Being able to just tweak, add one more field, not be blocked on like a team to make sure not exactly worry about back-filling and all of that. It is a game changer and maybe a little ashamed to admit actually, like I use Mongo DB at this job and it's not half bad. It's pretty good.Sugu: [56:00] Yeah. Yeah. Now they are trying to bring a skill back to Mongo DB and people like it. Right. So, the real problem was actually schemas, the fact that you can't just add something and move on. It's so hard to do it in a database. Utsav: [56:21] Yeah, and maybe you can tell me today, like, what's the difference between VTS open source and PlanetScale? Like what is the extra stuff?Sugu: [56:29] They are very orthogonal, right? Because what we are building in PlanetScale is a beautiful developer experience. And, what we test is giving you is the other part that, most people that have a good developer experience MIS, which is, a solid, scalable database in the backend, like we could build this developer experience on top of stock, my SQL, but people are hesitating to adopt that because they know that it won't scale. At some point of time, I'm going to hit a limit and I'm going to spend the 90% of management energy trying to figure out how to scale this. That's what companies experience when that problem is taken away from you. Right. You get a good developer experience, but when it comes to scaling with us, have you covered?Utsav: [57:23] Yeah. That makes a lot of sense. And does that mean that as a developer, let's say I want to start a startup, the model. I have a little bit of experience with my SQL. Can I just start with PlanetScale on day one?Sugu: [57:35] What would I get? That's exactly what [inaudible]. Yeah. You basically spend almost no time configuring your database. You start, you said, just go to planets, click, like you click a button, you instantly get a database and then you just start developing on it. So, zero bad years, which is what you want in a startup.Utsav: [58:02] And, and the experience I get is just stock my sequel to begin with, except for it would have all of these things like automatic limits on queries. And it would let me shard as soon as I become too big,Sugu: [58:15] As soon as yeah, exactly. But interesting. Yeah. So basically, what they have to do is make sure that, you know, kind of guide you towards good programming practices. You know what I mean? If you're just running unbounded queries, it's going to tell you, this is not good. So had bounced your [inaudible 58:36], that kind of stuff. So, we can be opinionated that way.Utsav: [56:37] So I'm curious about, you know, all of the technical innovation that I see your Twitter feed and YouTube talk about, you know, you're a handler, I'll have like an automatic benchmarking tool. You're like optimizing, protocol handling and letting go. What are some of the keys, like other innovations that you all are doing at that PlanetScale?Sugu: [58:56] So there is one, I don't know if you've seen the blog series about consensus, generalized consensus. Okay. So, I think that's a decent idea. I feel like it's a decent innovation. What we are doing is there's Paxus raft. You know, those in my eyes are rigid algorithms and they have rigid assumptions about how nodes should talk to each other. And those algorithms were opted for industry best practice. But if you flip it around, what problems are they solving, right? You identify the problem. And then you start with the problem first and come in and say, this is the problem I want to solve. What system would you build? I don't think a raft or Paxus would be the systems we would have been. Right. And what problems are we trying to solve?We are trying to solve the problem of durability, right? I have a distributed system and I say, save this data. And the system says, I have saved your data. And the guarantee that I want is that the system doesn't lose my data, essentially. That is the problem that all these consensus systems solve. But then when I come in here, I'm in a cloud environment, I'm on AWS. I have the ones that have regions, right. I can say for me, durability means my data is across two Lords. Or my, my notion of UWP is data is, at least in one other zone or one of the regions. Right? So, if you specify a system that can make sure that whenever you ask, ask people for the right data, it makes sure that the data has reached all those nodes or regions. And then it gives you, and then later, if there are failures, it recovers based on the fact that the data is elsewhere, right? So that is what the thing says. So, if you look at it, top-down, you come up with a very different approach to the problem where, in which case raft and Paxus are just one way to solve it., so what I have explained is what is the generic approach behind solving the problem of durability, how Paxus and raft solve it, but how we can build other systems that are more effective at more accurately meeting the driving requirements that are coming from the business.Utsav: [1:01:39] This reminds me of semi sync replication and like my SQL.Sugu: [1:01:44] Exactly. So, you could build a consensus system using my SQL semi application that gives you the same guarantees that raft gives you, for example.Utsav: [1:01:55] Okay.Sugu: [1:01:56] But then the theory behind this is kind of what I've explained in the blog feed. So that I think is one great innovation. And the benchmarking is another, is another good one, that may be a PhD that can make them out of that. Or at least a MTech thesis. I don't know what he's going to do, but float on is he's actually a student who has done an awesome job with them. So, some research work is going to come out of what you've done. Yeah.Utsav: [1:02:28] That's awesome to hear, because you don't generally hear of smaller companies and like academy work coming out of them, but plan skills, exactly. The kind of place where something like that would come out.Sugu: [1:02:40] Yeah. There's a lot more there. The other cool one is, the V replication or the materialization, feature, which is to be able to materialize anything into anything., and, it works so well within the model of, the Tessa sharding scheme, that you can actually, run, that is a foundation for a large number of migrations. It's a foundation for applying. DDLs like today in planet scale, you say, deploy this video, we do it in an offline fashion as in like with no downtime and the safeties, you can even rework it. Right. So those, so the way I see it is a DDL is a form of migration. And it's just one way of expressing a migration as a combined, but there are bigger migrations, very moving a table from one database to another where you Rashad a table. These are all DDLs of some sort. And if you can express them as videos, and if the system can do it offline in the background with no downtime and with reversibility, I think that is a very powerful mechanism to make available for you to use this. Like me, often some of these projects take months, years. Yeah, you can now do them in the test with just a simple command. Utsav: [1:04:16] Yeah, this seems like the kind of things that are just completely unheard of or impossible in any kind of other data system that you have.Sugu: [1:04:22] Yeah, near stone because it's science fiction, this is science fiction. Utsav: [1:04:31] And I also let that. That test can also be deployed within Kubernetes. So, you can get all of the benefits of, you know, like the same way as you get benefits of Kubernetes with stateless services that things can just move around. You can also use that to your benefit with tasks where your data can be moved around and stored in multiple places. Am I completely misremembering stuff for like, [cross-talking 1:04:59]?Sugu: [1:05:00] No, it's not like I can reveal to you that planet scale, the tests are all hosted in Kubernetes or everything runs in Kubernetes. And, we have a very large number of clusters. We may already be, or, may soon be, you know, the largest number of key spaces in the world. All in Kubernetes, the entire thing is in Kubernetes.Utsav: [1:95:29] So, that's like at least traditionally, so do you run on containers or is it just like, yeah, that's a tradition, yeah. If people say that you shouldn't run stateful stuff in containers and clearly bucking that trend. So maybe you can just talk through that a little bit.Sugu: [1:05:47] Yeah, totally. So that's, so the, the, the history goes behind the history behind this goes all the way back to, how we test evolve him YouTube and Google, right. On YouTube, originally, we wrote with testing our own dedicated hardware, but we were required to migrate it to Google's work, which is Google's cloud. And, Vitess had to, and the problem was that my sequel was not a first-class citizen in Google sport. Which means that you couldn't use it as a service. There was one team we actually collaborated with the team, but it was still not a first-class citizen. So, the way we did this was we actually landed with tests as, because, so the only way we could run with tests inside Google work was as a stateless application.So, we actually added the durability, survivability of nodes going down in, and Vitess such that anytime, the cloud can come and take down your server and we test should handle that. So that wasn't with Tessa's DNA that we had to build, and make it part of the test. So then later when Kubernetes came out, you had already for it because it was already running in what I call a data hostile environment., and therefore, like, because of the fact that we land at a massive scale inside Google board, we were able to, we have the confidence that it will run well. And he has shown them doesn't run well so far.Utsav: [1:07:34] Okay. Does that mean that the, my sequels basically live outside the Kubernetes and then you have a large part of like all of the tests it's running?Sugu: [1:07:39] Inside the container. Utsav: [1:07:40] Okay, Interesting. Sugu: [1:07:43] Yeah, it's my read on my sequels inside parts. We are not the first ones to do it by the way jd.com, HubSpot, and, there's another company. I forgot their name. They've been doing it for many years.Utsav: [1:08:03] Okay, and what are the benefits of running it inside a pot? I guess your perspective,Sugu: [1:08:06] Ah, manageability, right? Because it's a uniform environment. You don't treat tests as, you know, a special case like today, if you run a Kubernetes, that's the Kubernetes setup where you run your application and then you have to manage the database using different tools and technologies and methodologies. If you're running everything in Kubernetes, then you have a uniform set of management tools that work the same way. You want to add something, just push the ambles.Utsav: [1:08:40] That makes sense. And then you have to manage draining and basically hydrating nodes or pods. And you, do you have a custom logic to set that up?Sugu: [1:08:49] There is actually an operator that actually manages these things. We are actually trying to simplify that, you know, because we test, does have a lot of things for that, that we built when we were at war. I feel like at least our planet scale operator does a lot more than it should. It doesn't have to but yeah. There's some, a little bit of blue code is needed. I would sayUtsav: [1:09:18] That makes sense. Yeah. But it certainly sounds like science fiction reading all of your data running on these pods that can be moved around and changed. Anytime you don't even know which machine has all of your data, it sounds in the future.Sugu: [1:09:34] Yeah, it sounds like the future, but it is the future because there is no other way you can manage 10,000 nodes, right? Like if you're, if you're going from the unicorn world where there is this one server that you have to watch carefully to, it's going to be everywhere. It's going to be lots and lots of servers. How do you scale it? So, there's a term that one of our engineers used to use, you want off one manageability, which means that, as you scale the nodes, you shouldn't have to scale your team.Utsav: [1:10:12] That makes a lot of sense. Yeah. And what does your day-to-day job look like now? Like, what is your role or how has it changed over time? I'm guessing initially you must have already got itSugu: [1:10:21] Every time. Yeah. Like, it keeps changing. So, like until, initially I spent like 99% of my time on with deaths, like the early days of planet scale, the last seven, eight months, I spent a lot of time on our internal operator because there was one particular part that was broken that I said, okay, I'll go fix it. So that I'm actually winding down. So, I have to, so I, like, I have a feeling ... eventually what will happen is I think I will have to spend a lot more time publishing stuff. I have not been publishing enough. Like people have been asking me, when are you going to finish this consensus series? I still haven't had time to do it. So probably spend more time doing that, and probably also speak at conferences. I scaled that back a bit because I was too busy, not writing code internally, so it keeps changing. So, it will remain like that over because it's the way I see it is whatever is needed. It's like PlanetScale, right. Whatever is more important. That's the problem I am working on.Utsav: [1:11:44] Yep. That makes sense. And in some ways, it's like going back to your role at PayPal, they're just working on the important projects.Sugu: [1:11:52] Yeah, yeah, yeah. Keep it afloat.Utsav: [1:11:59] Yeah. And maybe to wrap things up a little bit, like I have one question for you, which is, have you seen the shape of your customers change over time? Like, you know, I'm, I'm sure initially with, with testers, like super large customers, but have you started seeing now with all of these usability improvements, smaller and smaller teams and companies like, you know, what we just want to future proof ourselves.Sugu: [1:12:18] Yeah, totally. It's night and day. That type of like, you know, some of like, even the words they use are very different. Like things like, you know, previous customers are more like very correctness, durability, scalability, and now like the newer customers talk about, you know, these two factors are not working, you know, I have this Ruby program. I have this rails thing. Even the language is very different. Yeah. It does change.Utsav: [1:12:54] Well, I think that that's a great sign though. Cause you're basically attracting a lot of people who traditionally would have never even thought about using something like with ESSA plan scale. So that's a great thingSugu: [1:13:04] To hear. Yeah. Yeah. Yeah.Utsav: [1:13:08] Anyways, well, thanks so much for being a guest. I think this was, yeah, this was a lot of fun for me. I love that trip down memory lane and understanding, knowing how the product is supposed to work and yeah. Again, thank you so much. Sugu: [1:13:12] Yeah. Thank you. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Visit the podcast's native language site