Engineering

The Purist and the Pragmatist: An Oral History of MorselDB (Part 2)

Clarisights Journal

May 22, 2022 • 15 min read

Clarisights empowers performance marketing teams at some of the world’s most innovative companies to take meaningful decisions. We do this by democratising data through our revolutionary no-code BI reporting platform. Sitting at the heart of the Clarisights platform is MorselDB. MorselDB is the proprietary database technology that enables our customers to effortlessly turn billions of data points into infinitely customisable reports and actionable insights. In this series of posts the team at Clarisights tell you the story behind the making of MorselDB and how it will power the next generation of Clarisights customers. Read part one here.

Clarisights Journal: So you have the mathematical formulation in place. You have a conceptual sense of what the solution is or what the database engine should look like. What was the next step?

Pritam: I remember this incident. Arun and I were sitting in this tiny little phone call booth on the ground floor of our old office. Arun was a little tense and he kept asking me how much work is this going to be? And I said: “You guys have no option but to build a custom database. But that thing will take up to two years. And you're going to need skilled engineers.”

Ashu: I think the discussion is not complete without Anmol, because he did a stellar job of actually implementing this. Most of the code was written by him. The initial part was written by Pritam. And then I joined in. Initially, we spent around two months trying to build out the conceptual design rather than working on ClickHouse directly. I feared that we were not evaluating our options completely. I remember Pritam and I used that particular phase discussing computing designs.

Pritam: The initial design I came up with was a conceptual design that was not really tied to any specific tool. It was just a mathematical solution. But translating that theory into something that can be built in real life—that is application design. It is true, initially, I was only interested in the conceptual design. Eventually, I said to myself: The concept works. This can theoretically be done. I also realised that ClickHouse had most of the components I needed to start the application design. I also had a clear sense of the components that were missing which is really important as well.

At this stage, if it were up to me, I would have just built an HTAP engine using ClickHouse. But obviously, it's not my project. This is a Clarisights project. Back then I was an external consultant. This is when I presented the design to the team. I said: Okay this is my conceptual design. This is how I think it can translate into an application design. Any analytical engine that this list of capabilities will work for us. ClickHouse works. But so can anything else that fits the bill.

This is when Ashu took over. It was now his job to figure out what exact tools do we use? What actual things do we buy? What do we build? And how do we glue all these things together to build a final product that actually works?

Ashu: That took about two months. In fact, the application design really took place at two levels. First, we translated the theoretical model for the engine into an application design which was system agnostic. And then, once we had a design we were happy with, we translated it to something based on Spark and ClickHouse. It was a very comprehensive process. We really paid attention to detail. And we had to deal with a lot of really challenging questions.

For instance, there was this join problem. Do you de-normalise dimensions and metrics together or do you put dimensions and metrics separately? This was important because, in our system, both dimensions and metrics receive lots of updates—dimensions update less often than metrics though and there is an order of magnitude difference in the number of rows. And if you put them separately, how do you efficiently join them in real-time? Later we realised that this second procedure was the standard way of doing things in the Spark environment. But at that time it was not clear to us that you could sort both dimensions and metrics on the same key efficiently in a system that allows updates to the data at our scale and then easily join them together. So what we did in this two-level application design process was to first translate all these theoretical constraints and problems and requirements into specific components. And then, once this was done, figure out what technology would enable us to build all of these components together.

Pritam: Technologies. Plural. We looked at a lot of technologies in addition to Spark.

Ashu: I think we tested with Phoenix. But that was there just for the purpose of completeness. We also tested with Kudu, Hudi, Pinot, etc. There is a long list of systems we evaluated on the technical requirements. We even tested with Postgres to set up a baseline with a relational database. Eventually, it became a two-horse race between Spark and ClickHouse. We drew up a Spark-based design with Spark as the primary executor and Parquet as the data format plus a custom partitioning scheme to manage the data. We repeated this for a ClickHouse-based design and, thankfully, Pritam had already done a lot of research on that. The idea, in the case of both designs, was to lay out a blueprint of all components and processes necessary. So, once we made a choice we knew how to proceed.

CJ: Which horse was winning the race at this point?

Ashu: At this point, the impression I had was that it would be much easier to build on Spark, and much more difficult to build on ClickHouse. I had more or less decided that we were going to go with the Spark-based design. That's where I think Pritam and I had a long discussion. In the end, we completely went in the opposite direction.

CJ: What made you change your mind about the Spark-based design?

Ashu: Several factors. First of all, the raw performance in Spark (both cold and warm) was not a match for ClickHouse. To be fair, a lot of it was due to the superior storage format in Clickhouse, but there were other differences like vectorized execution and the fact that Spark runs on JVM. Though I must say Spark was able to keep up with ClickHouse as far as join performance was concerned. Kudos to the Spark community for that.

Eventually, we realised that a Spark-based design would have involved a lot more work to get several dissociated components to work together efficiently. ClickHouse offered us even better raw performance and easier management in a single coherent unit. I felt that a ClickHouse-based design gave me the confidence to promise a better experience for our customers, even the more demanding ones. So three and a half months after we first decided to explore a new database engine, we finally decided to go with a ClickHouse-based design.

Pritam: And towards the end, we spent a lot of time checking to see if our current codebase could work with these new application designs.

Ashu: We did a lot of POCs, prototypes, and testing. We did a lot of testing on every aspect of our codebase. We did benchmarking and carried out studies on real-life use cases. We did a lot of work. This is not something we wanted to go into unprepared.

CJ: Arun, while all of this is going on, what is happening with day-to-day business? How concerned are you by the amount of time this is taking?

Arun: Well, we had multiple problems to solve. Forget the new database, there were performance issues with the existing platform that needed engineering intervention. And these were problems very fundamental to Mongo. You cannot fix them. You had to work around them. But then there came a problem that needed fixing urgently—Data Correctness. Customers would log into Clarisights and then realise that something was wrong. The data didn’t make sense. Our systems weren’t showing them the correct data.

Data correctness is a non-negotiable in this business. It cannot wait for anything else. It was a problem that had to be dealt with immediately. And I remember going in and pulling Ashu and his team off the Database project and telling them to fix data correctness.

Ashu: It was not just the correctness that was a problem. We realised that a lot of the performance problems with Mongo were down to the old data enrichment system we were using. It was so important that we had started work on the new data enrichment system even before the MorselDB project. Pritam was leading a team of engineers to get that project done before we started on MorselDB. It was that critical.

Pritam: Also you couldn't really build the new database without taking care of data enrichment. So even as we fixed the data enrichment problem we had to keep the new database in mind. We needed a solution that could work with MongoDB but also fit with the new database engine whenever that was ready.

Ashu: Implementing the new data enrichment system helped take a lot of load off of Mongo. That helped keep the ship running for some time while we were working on the database project.

Arun: The improvements in the data enrichment system also helped improve stability. MongoDB had this problem where every once in a while it would overload. Ashu and Pritam helped stabilise that as well. And it was around this time that we also went and got more customers who were not as demanding as DeliveryHero on our systems. This brought additional revenue and a little more breathing room to get to the server problems.

CJ: So at this stage in the process, Pritam has built the conceptual design. Ashu, you’ve done the application design. You've chosen to go the ClickHouse route rather than the Spark route. What happens next?

Ashu: What happened next, to be honest, was the really challenging part of the entire project. The actual building took the next seven to eight months to complete. That was where the bulk of effort and time was involved. Yes, we had the conceptual and application designs but it was nowhere close to an implementable solution. When you actually start building something is when the bottlenecks start popping up.

And that is when we got Anmol on the project. We needed our best engineer. Someone who could quickly ship production-quality code. It comes down to what Pritam was saying: You need someone who has the superhuman ability to implement. And that is exactly what Anmol did.

Meanwhile, a lot of work went into the codebase. We got into a rhythm of finding what needed restructuring, building small POCs, and rolling out modular components. Roadblocks appeared all the time. Designs needed constant adjusting. We even had to revisit some of our initial design points.

For instance, I remember our original plan was for a distributed database where we could write to any node and resolve conflicts when actually making those writes available for reads. But eventually, we realised that we need some sort of table leadership concept. It took a lot of whiteboard discussions and a lot of implementation and testing. We implemented lots of interesting data structures, modified existing real-world algorithms for our use case and delved deep into complexity analysis, hardware performance characteristics etc. for each component we implemented. That’s how this was a lot more challenging than the whiteboard discussions during the design phase.

CJ: This is what happens for the next seven or eight months?

Ashu: Close to seven months. I think we started the implementation at the end of December 2020. And we ended up onboarding our first client onto MorselDB at the beginning of July 2021.

CJ: Was there a moment of anxiety as you were thinking of moving your first customers onto MorselDB?

Ashu: I think I really started getting anxious around the end of March. My original estimate was that we’d complete most of the building by March-end, and then spend a month and a half on testing, before onboarding our first customers in May. But by the end of March, I began to realise that we hadn’t budgeted enough time to work on the client-side codebase. That took a lot longer than I thought. I think through April and May 2021 I worked on absolutely nothing else. And by the end of May, I think Arun had given up on the timelines. Come May end, however, things looked much better. All we had left to do were tweak performance problems and resource issues. Low hanging fruit. We knew by then that we had a working system.

Arun: From my side of things, customers were putting immense pressure on us for two things. They wanted a solution for our database problems and then wanted a timeline for this solution. My phone calls with customers were all about the timeline: Okay. Tell me how long will it take? Three months? Six months?

Frankly, I had no idea. Meanwhile, our timelines kept extending. Customers asked us to commit to a timeline for our new database in a contract. And I did that. I signed a contract committing to solve our database problems within a quarter. And that didn’t happen. And then we missed another quarter. It was a lot of pressure.

CJ: Meanwhile, were server bills piling up?

Arun: Ashu managed that a lot. We cut our bills by half and then held it there even as we onboarded more customers.

Ashu: That project had a lot of collateral benefits. Around April of 2020, we realised that server costs were going to be a huge problem for us. And this was at a time when we didn’t have a clear sense of revenue growth. Truly uncertain times. Which is when I kickstarted the cost optimisation project. As Arun said we brought costs down. But something else happened. The project forced us to look at our systems properly. For the first time we began to break down costs by all the services that we had. We got deep visibility into our infrastructure and also how the costs of that infrastructure scaled with customers. It helped reassure all of us, including Arun, that things would not spiral up exponentially as we started going to market.

Arun: There is another aspect to this as far as growth is concerned. References are a big driver of growth for us. But unless you kept your biggest customers happy you couldn’t give their references to new ones. So all these projects that we talked about—POT, cost optimisation and the new database rebuild—together helped us improve customer experience. Which really helped with potential new customers. And by extension potentially new investors.

CJ: Stepping away from this story for a moment, why did you call the new database MorselDB? Is there an engineering perspective to this?

Ashu: There is actually a backstory here. In fact, it was not known as MorselDB when we started. I gave it another project name initially. A lot of the machines that we used to work on were called Tachyon. Tachyons are these hypothetical subatomic particles that can travel faster than the speed of light. I am a physics man. So I liked that name. And I called it Project Tachyon. But later I decided I didn’t like it anymore!

CJ: Was it because Tachyons are hypothetical? And you didn’t want to think the new database would not see the light of day?

Ashu: That's actually the reason why I didn't like that name very much. Physicists don’t think that Tachyons can actually exist. I thought it didn’t feel right to name our new database after this. Plus it was a working title really. Pritam already had some ideas about what to call it.

Pritam: I can explain that. In the early stages of the project, Ashu and I disagreed a lot on theoretical matters. I remember a whole discussion on write amplification. So at the time, the way ClickHouse did things was it would store data in massive chunks. Huge chunks. Now analytical databases have a particular problem. They have absolutely no issues with writing new data. But the moment you ask them to rewrite even a tiny little portion of this chunk, you have to rewrite the whole chunk. This is what is called write amplification. It's a major problem.

But I had this stubborn notion that you didn’t have to use massive chunks. Why not use small chunks? You'd more or less get the same benefit of chunking—which is called the batching effect. You don't need the absolute biggest chunks to get the most out of your batching effect. Beyond a point, chunk size gave diminishing returns in terms of the batching effect.

Morsel is not just a name, it's a philosophy. Some ways of functioning are baked into our database. (Modern data batch!)

So my conceptual design specified small chunks. During that research, I went and looked at how Snowflake does it internally. And it turns out they do the exact same thing. I felt a little bit vindicated.

Then I wondered: Why do they call it Snowflake? So they have a bunch of reasons, but I like to think one of those contributory factors is the tiny little chunks. And the entire database is composed of so many little chunks all over. It's like a sheet of snow right?

The natural question then is: What do we call our little chunks? We argued about it for ages. And then I just gave up. I decided I was going to make my own names. So what did we have? Bite-sized chunks. Little bite-sized morsels. And that is how MorselDB got its name.

Ashu: I just want to add two things to Pritam’s story. Morsel is not just a name. There is a slightly broader philosophy to it. Some things, some ways of functioning, are baked into our database. And I think that's what makes our database tick. One is small units of work, not just small units of storage or reads but any work. That is one fundamental concept throughout the database: reads, writes, executes…. That all run as small units of work. The second fundamental concept is the amortisation of cost. For example, we never do things one at a time. We basically try to do multiple things at a time so that you pay the cost of operations that need to be repeated for each action only once. This is called cost amortisation in computer science terms.

Pritam: That's the batching effect. The common analogy of energy is laundry. You don't wash clothes one at a time when you do laundry.

Ashu: Exactly. Thirdly, ClickHouse didn’t have a concept of distributed storage at the time. I think there are some plans to add support now. We really wanted an elastic system. And I think that's where Pritam and I actually differed from our regular natures. Pritam is a lot more of a purist. He wants to do things the right way. I am a lot more of a pragmatist.

Suddenly when it came to distributed storage I became a purist. I was adamant we needed distributed storage even if that meant changing the way ClickHouse talks to internal storage. It meant a little additional work, but now we have a more elastic system. We can increase or decrease the number of nodes at any point in time. For instance, right now I can reduce or increase a node and a customer won’t even know I’ve done it. They might see a small impact on performance. But otherwise, the process is seamless. Also, we can even potentially distribute an expensive query over multiple compute nodes, as every node has access to all the data.

Pritam: It’s really cool.

Ashu: The second feature of MorselDB that I was talking about, is the amortisation of cost. That actually lets us come very close to achieving a perfect balance across different axes of the RUM conjecture. This is a standard limitation in distributed systems where you need to optimise for read, update and memory. The conjecture says that you can only optimise for two of these three at one time. You have to forego the third. But because MorselDB does everything in an amortised way, we can achieve a very good balance for all three.

Arun: In some sense it's good to have a purist and a pragmatist together because that's what creates Morsel DB: The coming together of a purist and a pragmatist approach.

CJ: You've got all this set and you're moving your first customer in June. What is that process like?

Ashu: There were several things that we had to prepare. First of all, testing. We ran several tests to ensure database performance and data correctness. We built enough controls in the system that we could point to either reads or writes of a single specific customer. Today we can even point to smaller units of a single customer’s data. We built out all these constructs to enhance customer experience.

Then we built a migration system from Mongo to MorselDB. Let’s be honest, it was really painful. MongoDB’s performance itself was limiting. Plus most of our system is in Ruby, and Ruby isn’t great at crunching a lot of data in a short amount of time. To get around these problems we practically ended up building a proper distributed system to just do the migration.

CJ: How much data are we talking about?

Ashu: If I'm going to talk about raw data, uncompressed in MongoDB, that was close to five terabytes for Delivery Hero. And across all customers, I think it is probably close to 12 to 13 terabytes.

CJ: And who was the first customer on MorselDB? Was Delivery Hero the first customer you moved?

Ashu: Delivery Hero was not the first customer. We started with smaller customers. The very first customer that we migrated was Ajio. The number of different use cases they had was small. So the blast radius was small. By the way, a funny story. We’ve become much better at migrating from MongoDB to MorselDB. Today we can migrate 100 times more data than we could a year ago in the same time. We have come a long way in terms of optimising our system.

Next Time: In Part Three of The MorselDB Story we do a status check on MorselDB in 2022. We also look at how it has scaled along with business at Clarisights. Finally, we take a glimpse at what is in store for the future of the database that makes Clarisights tick.

We're hiring great software engineers. Click here to know more.