HYDROLIX BLOG

Ponderings, insights and industry updates

When Append-Only Doesn’t Cut It

March 16, 2022

Author: David Sztykman |

Tags: ,

Initially on building Hydrolix, we designed it to be an append only system.
We built it based on the idea that large footprint telemetry data such as logs usually don’t change and crucially they don’t need to be modified.

The reality however is different, between regulatory requirements such as GDPR, bad data coming in at any time and the need to often alter large amounts of data for internal business reasons we have to had to develop a way to modify and delete rows and columns.

In most systems, altering large amounts of data involves massively increasing capacity to be able to store a copy of the data, alter it and then store the changes. This typically has the knock on affect of causing performance issues for end-users querying existing data and often incurring substantial cost.

With that in mind decided to develop a service that is easy to use, can sit “out of line” with processing and data retrieval and most importantly won’t affect performance or cost.

So in December 2021 we worked hard on our alter system.

How Alter works

Hydrolix stores partition in cloud storage, we have a catalog database which contains the list of partition and the metadata associated (number of rows, size, min_timestamp, max_timestamp etc).

When a user issues a statement like:

ALTER TABLE project.my_table
UPDATE 'client_ip' = 'secret'
WHERE timestamp > '2021-01-01 00:00:00' AND timestamp < '2022-01-01 00:00:00' AND isNotNull(client_ip)

To alter data the first step is to identify the partition which needs to be modified and lock those partition.

Our alter-peer looks at the catalog to find out all the partition for that timerange, then it downloads the manifest and index to verify if the partition contains the data requested by the SQL Statement.

If the partition fits then we lock the partition in our catalog so they can’t be modified (meaning we can’t merge the partition with another one) and we create a job, the job contains tasks for each partition.

The alter-peer will then lookup the tasks and start working on each one.
This allows us to scale to multiple alter-peer, each one will work on a task and move to the next one.
So if you have a job which modify 1000 partitions and you have 100 alter peers, they will each have to run 10 tasks consequently.

Alteration itself is downloading the partition, opening it and modify it based on the UPDATE / DELETE statement in SQL.

But the key part is we don’t update the currently active partition, we create a new one and upload it to the cloud storage.
Essentially it’s like you have 2 copies of the data, the initial one and the modified one.

We can do that because we don’t rely on local attach volume so for a time we can double the storage footprint, the cloud storage scales and don’t cost much so why not use it !

That allows us to do something very nice too, when users issue an alteration statement, it doesn’t affect query performance, it creates essentially a new copy of the data with the modification. And users can preview the altered data before commiting the change to production.
If the change is not what they expected they can just cancel the ALTER jobs and we’ll remove the new partition.
If the change is OK we can commit the change, which will mark the new partition as active and will remove the initial version.

TLDR

Alter is a new mechanism at Hydrolix allowing user to modify and delete indexed data.
When alter is running it has no impact on performance for users with our decouple approach.
Alter is safer than any other approach as we can rollback or commit the change in an atomic way with our cloud storage approach creating new partition is free.

Share Now