I'm evaluating Akumuli to replace existing time-series implementation which is based o

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Akumuli advice about akumuli HOT 6 OPEN

arrowwood49 commented on July 23, 2024

Akumuli advice

from akumuli.

Comments (6)

Lazin commented on July 23, 2024

First of all, Akumuli doesn't support rollup-based retention. You can't set up different retention periods for different time-series. Also, Akumuli doesn't implement rollups the same way Grafana does. It won't pre-generate that 5-minute, 1-hour, and 1-day rollups as a separate time-series. Instead, it has group-aggregate query that can be used to get data with the step you need on demand. Of cause you can generate rollups in your application and write them but 'group-aggregate' is still a recommended way.

The back of the envelope calculations for the capacity:

cardinality - 50 000*20 - 1M unique time-series;
write rate - 50 000 / sec;
space depends on data a lot, 1 576 800 000 000 data-points per year could be stored using less than 1.5TB of disk space if the values are small integers. If the data-points are floats or large integers then it could take anywhere from >1.5TB (low variation) up to 12TB (mostly random data). If the data has a lot of duplicates this can take way less than 1.5TB. But it's better to experiment and load some data to see how compressible it is.

from akumuli.

arrowwood49 commented on July 23, 2024

Of cause you can generate rollups in your application and write them but 'group-aggregate' is still a recommended way.

We are using rollups into separate table to keep disk space usage low. Looks like it will be difficult to do the same in Akumuli even if I create rollups in application since Akumuli doesn't support different retention periods for different time-series. Is there a way to implement custom deletion/retention in application for different time-series?

I'll run some test with real data to find the disk space usage from Akumuli.

If we use Akumuli group-aggregate instead of rollups then what is the memory requirement for query? Assuming user can run only 5 max group-aggregate queries concurrently.
What is the memory requirement to support 50,000 writes /sec?

from akumuli.

Lazin commented on July 23, 2024

It is possible to use multiple akumulid instances with different configs. One for raw data with 20s step and short retention and another one for pre-aggregated data. Akumuli uses space based retention for operational simplicity. Because of that it's not possible to use different retention policies for different series.

The 50000 writes/sec rate is quite small TBH. But the total cardinality is 1M and this means that you will need around 8-10GB of RAM. The query doesn't require a lot of RAM if you query series by one. And if you query many series at once you can use "order-by": "series" to minimize memory usage.

from akumuli.

arrowwood49 commented on July 23, 2024

Thanks @Lazin for the response.

The 50000 writes/sec rate is quite small TBH. But the total cardinality is 1M and this means that you will need around 8-10GB of RAM. The query doesn't require a lot of RAM if you query series by one. And if you query many series at once you can use "order-by": "series" to minimize memory usage.

Could you explain how you estimated 8-10 GB of RAM for 1M series? Will it help to reduce memory footprint if data to Akumuli is added in chunks instead of all 1M series together?

Regarding query, I need it query using object-id tag to retrieve at least 10-15 stats (time series) for a given time range which can be last 1 hour. Other group-aggregate queries to simulate current roll-ups using object-id tag.

I'll take a look at "order-by": "series".

from akumuli.

Lazin commented on July 23, 2024

Could you explain how you estimated 8-10 GB of RAM for 1M series? Will it help to reduce memory footprint if data to Akumuli is added in chunks instead of all 1M series together?

Memory use depend on cardinality. To handle every time-series Akumuli needs 1-10KB of RAM (depending on the size of the series).

Regarding query, I need it query using object-id tag to retrieve at least 10-15 stats (time series) for a given time range which can be last 1 hour. Other group-aggregate queries to simulate current roll-ups using object-id tag.

This should be a light weight query. To see high memory use by queries you need to query tens/hundreds thousands of metrics. It's because query processor caches 4KB of data for every time-series that is involved in the query so when you have a lot of them you will see higher memory use. Also, if query returns data ordered by time the memory use is also higher because it needs to join many series together. 15 stats is not a big deal.

from akumuli.

arrowwood49 commented on July 23, 2024

Thanks @Lazin for the explanation. I'll test queries to check the memory usage.

from akumuli.

Akumuli advice about akumuli HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent