Clarisights: A History In Languages

Clarisights: A History In Languages

Over the last four years, as we've built our product, achieved PMF and acquired a high-quality global customer base, we've constantly had to work on scaling up our engineering infrastructure. We have a steady track record of upgrading features, functions, scale and stability on the front and back end. This tech evolution has been accompanied by a growing diversity in the programming languages used in the Clarisights tech stack. The chart below is a visual representation of this ever increasing linguistic diversity.


The first iteration of our product was built on Ruby, on Rails, in the backend and HTML and Javascript (via React.js) on the front end. And while our tech stack has changed over time, these languages remain key to our codebase. (For now. Change is afoot.)


In 2019, processing CSVs were increasingly becoming a computing challenge for us. By then we were processing around half a billion CSV rows per week. And something we offered customers was the ability to run an aggregation function on metrics with a Group By on dimensions. This was fine on small enough datasets.

But doing it over millions of rows was leading to a host of problems. Containers were getting OOMKilled; the memory footprint and the time to process data were unpredictable, and we had to make provisions for peak memory usage. We needed a way to make our CSV files queryable through SQL. The tool we chose was Apache Drill. That is when Go entered our tech stack. We used Go to build a wrapper service called Rig on top of Apache Drill. Rig was a huge success for us. It brought down Out of Memory failures to zero and reduced both memory usage and processing time to 1/10th of what they were before. (Click here for a brief presentation on the Rig project from December 2019.)


At Clarisights, 2020 was the year that Python slithered into our tech stack. And this was part of our attempt to optimise our data ingestion pipeline using Airflow as an orchestrator. Airflow uses Python to dynamically generate and manage pipelines, and to define Directed Acyclical Graphs. DAGs are hugely important to the work we do at Clarisights. You can read more about how we use DAGs to enable customers to use Custom Metrics here.)


We welcomed two languages to the Clarisights family in 2021. Both accompanied the design and implementation of MorselDB, our homemade analytical database engine. We have an ongoing series of blog posts on MorselDB that you can read here and here.)

MorselDB was built off Clickhouse, which is written in C++. And while C++ is a powerful workhorse of a language, we found cases during the design of MorselDB where C++ was too clunky or verbose to use. We needed a language to write glue code, and that is where Embeddable Common Lisp came in. So while there is very little ECL in the MorselDB codebase, it plays a critical role.


This brings us to 2022 and two key engineering projects that we are currently working on. The first is Mozek, possibly our most ambitious engineering pursuit yet. Reliable system of control and visibility for our data processing and ingestion workloads is essential for us to offer customers our great scale and unrivalled data sanity. Mozek is that reliable system of control and visibility.

Through Mozek we are creating a single brain in the system for orchestrating different kinds of data processing workloads. Mozek, Czech for 'brain', will provide the concept of SLA for different data ingestion and processing workloads as a first-class citizen. Mozek is being written in Elixir. You can listen to the rationale behind our choice of Elixir in this episode of the official Clarisights podcast.

The second critical project currently ongoing is Custom Channel Processing. In some ways this project is an extension of the Rig project from 2019. More on the details of that project in a future tech update. But here we're using Apache Spark to build a high-scale engine for processing and executing data in CSV format. And along with Spark came the language it is written in: Scala.

Thus over the last four years, our engineering team's remit has extended from three languages to nine - all in response to our ongoing mission to help our customers using the right language and the right tools for the job at hand.

If this has piqued your interest, why not apply to work at Clarisights. You can check out all our latest opportunities here.