Engineering Featured

Engineering at Clarisights: New Joinee Roundtable

Go behind the scenes with our engineers as they discuss building specialized systems at scale, from custom databases to data orchestration.

Clarisights Journal

Jan 21, 2025 • 12 min read

At Clarisights, we're building specialised systems to handle complex data processing needs at scale. As you can imagine, this means we have some pretty interesting engineering challenges that require great engineers to solve them!

Sidin Vadukut, Head of People, sits down with three of the newest members of our engineering team - Bhanu from the Database team, Chinmay from the Mozek orchestration team, and Darshan from Product Engineering - to discuss their experiences working on some of our most challenging technical problems, and our engineering culture.

Sidin: Tell us about your respective backgrounds. Where were you before Clarisights?

Bhanu: I started my software engineering career at Flipkart, where I worked on their cloud computing engine. Functionally it's similar to GCP compute, but internal to Flipkart, not a public service. I worked there for about two years and then joined Google, where I worked specifically in B2B ads before joining Clarisights.

Chinmay: I've been in the industry for about five years. I'm a 2019 grad from Pune, and I joined Schlumberger initially. People might not expect it from an oilfield services company, but Schlumberger is actually quite tech-savvy - they were one of the top 100 companies to start their own internet back in 1971. I was part of their hybrid hosting engineering team, handling infrastructure across Schlumberger.

After that, I moved to 100ms, and then had a short stint at Amazon where I learned a lot about how video conferencing and advertising works. Then I went to Unacademy, where we built products from 0 to 1, including one called Cohesive. That's when Clarisights reached out to me.

Darshan: I've worked at a couple of startups, including a tech-savvy sneaker startup while in college. After that, I joined Commvault, which is an enterprise data protection and disaster recovery company. It was my first stint at large-scale engineering projects. I worked on tools to backup databases and Salesforce, and was part of a newly formed AI capabilities team. After a three-year stint there, I joined Clarisights.

Sidin: What made you choose Clarisights?

Bhanu: One of the things that attracted me was the complexity of the work, which was important for me personally. I wanted more challenging work, more learning experiences, which I felt were lacking in my experience at Google. Multiple blog posts really grabbed my attention. The culture came across well in these blogs - they were very conversational, not just explainers.

Darshan: During my job hunt, I had these specific pointers in my notebook: I wanted a team I could learn from, a place where I could be curious and experimental, and a startup where I could contribute to decision-making, not just engineering. Reading the blog posts and conversations during the interview process made it clear that Clarisights ticked all these checkboxes.

Chinmay: Coming from a company where we were looking at various products, one thing that clicked about Clarisights was the clear vision and focus. This is our product, and we want to see it materialise into something as big as possible. We have opportunities, and we're going there methodically. The team's resilience and patience in the market to make their own plays - that was my primary driver.

Sidin: Bhanu, you work in our DB team, which is very critical. MorselDB is at the heart of our approach to the problem. What has your experience been like?

Bhanu: The onboarding was well-structured. It's not like I went right into complexity. We took baby steps. I got to solve some simple problems in different components before slowly getting into more nuanced problems. At every step, it's been challenging. I've been learning something new in each task. For example, we recently did this dynamic currency conversions feature where we're giving users the ability to convert their currency data into other currencies. To support this, we had to come up with efficient lookups on different dates for the data we're querying from the DB. We had to look into what data structures ClickHouse supports, and what options we have that won't add much overhead to read queries. Going into dictionaries, and how different layouts work in ClickHouse - it's been lots of learning.

Previously, I didn't have any DB experience - I had only used databases as a user. I hadn't even done a single non-trivial migration that required me to know the internals. I've never had any touch with administration or maintenance. So I'm learning everything from scratch. Every task allows me to go deeper and understand how database internals work, especially around query execution and planning.

Sidin: Chinmay, had you ever worked with orchestrators before joining the Mozek team? What's your experience been like?

Chinmay: I've used orchestrators a lot, but never built one. Wrapping my head around all the structures in Mozek has been interesting. From a holistic standpoint, I had an idea of what Mozek does, but how it does things is something I spent a lot of time figuring out. We've had a couple of outages since I've joined, and the team has been dealing with them together. Being part of that helps me understand what can go wrong and how. It's written in Elixir. I had a small three-month experience with Elixir before, but working on it here, I get to understand how to tweak the language the way we want it.

Chinmay (L), Darshan and Bhanu (R) at a recent Clarisights off-site

Sidin: People often ask us, ‘there are so many great databases out there, why build MorselDB?’ And in your case, Chinmay, why isn't Airflow good enough?

Chinmay: Most orchestrators only deal with completing work. Our whole objective is to ensure that fresh data is up to date. That's fundamentally our unit of work. No orchestrator would give you that out of the box. You'd have to bend whatever you choose from the market to your needs, which gets harder and harder until you hit a place where you've customised it so much that you can't do more.

Bhanu: It was definitely strange initially because I didn't know how our requirements differed from what's available. But as I worked through it, I understood that at the scale we operate - billions to even trillion rows per day - along with providing high throughput in writes, it's very difficult to find databases that give you that level of performance. We're not looking for a general-purpose database - we have specific patterns. We found compatibility with ClickHouse, but we still had to build our own storage engine and write-side stack. We have our own design of write-ahead logs, checkpointers for how data gets updated in batch mode rather than point updates… and so on.

Sidin: Darshan, you work with our product engineering team. How do these complex systems translate into actual user experience?

Darshan: Product engineering doesn't build the product; the entire company builds the product. We build out the product experience. So, how users interact with what Bhanu and Chinmay build, without them even knowing about the underlying complexity. Customers shouldn't even be aware of all the engineering complexities - they should just be amazed by their experience. Every project spans across the entire company. For example, when we implemented the count feature, I was initially in conversation with Bhanu as well.

Customers shouldn't even be aware of all the engineering complexities - they should just be amazed by their experience.

Sidin: Tell us about what the Count project is.

Darshan: Count is an interesting problem that came up because, while Clarisights is already a very powerful tool (our table widget is amazingly powerful!), our customers often need even more capabilities. Right now on our platform, we show individual campaigns, ads, and so on. But users might have very simple questions like "For my campaign, how many ads are still running?" or "How many ads have clicks over 500?" They might want to see a single-row answer, or they might want to modify that to ask "Across all campaigns, how many ads are still running?"

These questions seem very simple to answer, but when you look at the database side... well, Bhanu can explain this better.

Bhanu: While ClickHouse does support these kinds of operations, the real challenge was exposing this functionality in the product. We already have a certain model of working with different metrics. So far, all our custom metrics and metrics fetched from sources are aggregated via sum by default. For example, spend at a campaign level is summed up. But if you're only looking at certain types of ads, you need to aggregate that into campaigns - basically sum up the spend across all ads that match certain conditions.

So we had this setup where we only had one type of aggregation, but now we needed to support different types. It's not exactly a complex problem, but rather it's about fitting into our existing design constraints. How do we expose this from ClickHouse, through Morsel all the way to Product?

The challenge comes from our custom interpreter. We needed to make these aggregations a first-class citizen in the interpreter. It's not as simple as saying "do this SQL statement where you put average instead of sum" - it's not that straightforward with our custom DSL and custom query planning, which would require careful consideration of data-flow analysis, optimisations et cetera.

Darshan: And this illustrates something Avisek Rath, one of our Customer Success Managers, says quite often: our product is both a power user tool and something that a normal user should be able to use. This Count feature is a perfect example. We have to build complex things without them looking complex at all. To the user, it's just a simple count of ads matching certain criteria. But behind the scenes, there's this whole chain of careful architectural decisions making it possible.

It's very impactful because it changes how users can analyse their data. We're not just exposing a new function - we're enabling new ways of thinking about and working with the data, all while maintaining our performance standards at scale.

Bhanu: Exactly. And it's very impactful because it changes how users can analyse their data. We're not just exposing a new function - we're enabling new ways of thinking about and working with the data, all while maintaining our performance standards at scale.

Sidin: Of all the projects you've shipped, what are you most proud of?

Bhanu: The project I'm most proud of is building our own custom interpreter for MorselDB based on our query DSL. We have our own intermediate representation (IR) for the query DSL. In examining query performance, we found something surprising: If we compare the time spent in query execution versus query planning, planning actually takes more time, sometimes even more than execution itself.

Planning involves converting your query into actionable steps for processing data. The standard ClickHouse interpreter wasn't optimised for our large query sizes. Our queries might operate on dozens of channels and hundreds of columns, creating massive query trees. ClickHouse's interpreter makes multiple passes over these trees to generate an execution plan, which becomes increasingly expensive as the trees grow.

Our custom interpreter takes a different approach. Instead of multiple planning stages - going from query to AST to plan to execution pipeline - we can skip intermediate steps. We know our query patterns intimately, so we've built these patterns directly into the interpreter. This allows us to jump straight to building the query pipeline. The whole project involved rethinking how query planning works in our specific context, where we have very predictable query shapes but at a massive scale.

Chinmay: I'm working on an integration between Mozek and our data annotation pipeline that's particularly interesting from a distributed systems perspective. As a data orchestrator, Mozek needs to ensure not just that data is collected, but that it is also properly annotated with dimensions and attributes before it's queryable.

The technical challenge comes from managing multiple asynchronous processes. We might have data collection completing, while annotations are still running, and then wait for data replication across read DBs. We have to handle scenarios where the annotation process might fail mid-way, or where we need to restart processing without creating duplicates. Mozek already has sophisticated failure recovery and deduplication mechanisms, but introducing a new component meant carefully integrating with these systems.

We implemented a state machine to track data freshness across these different stages. The system needs to understand when data is "complete" versus when it's "queryable" - these are different states with different implications for our SLAs. We also had to modify our health check and disaster recovery systems to account for this new processing stage. When something fails, the system needs to intelligently decide whether to retry just the annotation step, or restart the entire pipeline.

Darshan: One of my most interesting projects was debugging an authentication edge case that affected users belonging to multiple companies. The core issue emerged when users tried to sign in with email/password rather than SSO. The UI would show a company selection popup, but the backend wasn't respecting the selection - it always defaulted to the most recently added company.

The debugging journey took us through multiple layers of our auth stack. We started in our application code, but quickly realised we needed to understand how Devise (our authentication library) was interacting with Rails' cookie management. We discovered that our front-end app was using API tokens while our backend was also managing session cookies - effectively running two parallel authentication mechanisms.

The really interesting part was tracing the authentication flow through Devise's internals. We had to understand how Devise hooks into Rails' middleware chain, how it processes authentication requests, and where it stores session state. We found that certain APIs were bypassing our company selection logic because they weren't properly integrated with Devise's authentication hooks.

The fix was adding a single configuration parameter to include these APIs in Devise's authentication workflow. Finding this required building a deep understanding of how Ruby's metaprogramming capabilities are used in Rails authentication. The investigation involved setting up specific debugging environments where we could trace method calls and state changes through the entire authentication pipeline - from the initial request through middleware processing to the final response.

Sidin: It's very easy in an engineering team to get abstracted from why we're doing what we’re doing i.e. why customers need these features. Bhanu, you pointed out how dynamic currency conversion leads to genuine engineering problems on the database side. How close do you feel to the customer? How much do you experience customer pains?

Bhanu: There are different aspects to this. From a UX perspective, we can relate quite directly. If the database can process read requests faster, user experience improves. I've been working on a lot of read-side optimisations, and we have dashboards showing our latencies at different levels of the funnel. So there's a very clear connection between our work and user experience.

Chinmay: I'd say we're extremely close to customers. If there is an outage and I don't pick up a phone call at 3 AM, people don't get their data. That's close. One thing that drives me as an engineer is knowing that what I do is creating value for someone. If I don’t feel that, then I lose my motivation. With Mozek, that connection to customer value is very direct.

Sidin: Because you get a call at 3 AM?

Chinmay: Yes, exactly! That motivation comes from knowing it's helping someone.

Sidin: Final question. Why should somebody join Clarisights as a software engineer?

Chinmay: If you want to work on something complex and you love complexity and thrive in it - basically, if all of these are a check, you should join Clarisights. There are no boundaries to where your learning would stop. If you want to keep learning, join us.

Bhanu: I agree with Chinmay on the learning part. This is definitely one of the places you keep learning. There's a lot of complexity, but what's equally important is that you get great mentors. For me personally, the mentorship has been fantastic. We have discussions not just limited to MorselDB, but broader computer science topics as well. Very often we'll have these deep technical discussions that go beyond our immediate work.

Darshan: Other than what Bhanu already mentioned, which are great points, I'll give you three points.

First, you are forced to be curious. You have to go out of your way to ask questions. You have to question everything. Just because an existing solution exists doesn't mean it applies to us - we have to evaluate if it qualifies for all the problems we're facing.

Second, everyone around you is similarly curious. When you're surrounded by high-quality people, who ask high quality questions, who have interesting approaches, and high-quality who are imaginative - it compels you to try out high-quality things.

And third... you have amazing people who make horrible puns.

Sidin: Yes, in fact, that is the most important of all the points we've discussed - the standard for humour in this company is so low that it's actually good for morale!

At Clarisights, we're not just using technology - we're building fundamental pieces of infrastructure from first principles to solve real customer problems. For engineers who want to work on complex systems while maintaining a direct connection to customer impact, who thrive on curiosity and continuous learning, and who don't mind the occasional terrible pun, Clarisights offers a unique opportunity to work on challenging problems with a team of dedicated engineers.