Git Product home page Git Product logo

Comments (6)

Lazin avatar Lazin commented on July 23, 2024

First of all, Akumuli doesn't support rollup-based retention. You can't set up different retention periods for different time-series. Also, Akumuli doesn't implement rollups the same way Grafana does. It won't pre-generate that 5-minute, 1-hour, and 1-day rollups as a separate time-series. Instead, it has group-aggregate query that can be used to get data with the step you need on demand. Of cause you can generate rollups in your application and write them but 'group-aggregate' is still a recommended way.

The back of the envelope calculations for the capacity:

  • cardinality - 50 000*20 - 1M unique time-series;
  • write rate - 50 000 / sec;
  • space depends on data a lot, 1 576 800 000 000 data-points per year could be stored using less than 1.5TB of disk space if the values are small integers. If the data-points are floats or large integers then it could take anywhere from >1.5TB (low variation) up to 12TB (mostly random data). If the data has a lot of duplicates this can take way less than 1.5TB. But it's better to experiment and load some data to see how compressible it is.

from akumuli.

arrowwood49 avatar arrowwood49 commented on July 23, 2024

Of cause you can generate rollups in your application and write them but 'group-aggregate' is still a recommended way.

We are using rollups into separate table to keep disk space usage low. Looks like it will be difficult to do the same in Akumuli even if I create rollups in application since Akumuli doesn't support different retention periods for different time-series. Is there a way to implement custom deletion/retention in application for different time-series?

I'll run some test with real data to find the disk space usage from Akumuli.

If we use Akumuli group-aggregate instead of rollups then what is the memory requirement for query? Assuming user can run only 5 max group-aggregate queries concurrently.
What is the memory requirement to support 50,000 writes /sec?

from akumuli.

Lazin avatar Lazin commented on July 23, 2024

It is possible to use multiple akumulid instances with different configs. One for raw data with 20s step and short retention and another one for pre-aggregated data. Akumuli uses space based retention for operational simplicity. Because of that it's not possible to use different retention policies for different series.

The 50000 writes/sec rate is quite small TBH. But the total cardinality is 1M and this means that you will need around 8-10GB of RAM. The query doesn't require a lot of RAM if you query series by one. And if you query many series at once you can use "order-by": "series" to minimize memory usage.

from akumuli.

arrowwood49 avatar arrowwood49 commented on July 23, 2024

Thanks @Lazin for the response.

The 50000 writes/sec rate is quite small TBH. But the total cardinality is 1M and this means that you will need around 8-10GB of RAM. The query doesn't require a lot of RAM if you query series by one. And if you query many series at once you can use "order-by": "series" to minimize memory usage.

Could you explain how you estimated 8-10 GB of RAM for 1M series? Will it help to reduce memory footprint if data to Akumuli is added in chunks instead of all 1M series together?

Regarding query, I need it query using object-id tag to retrieve at least 10-15 stats (time series) for a given time range which can be last 1 hour. Other group-aggregate queries to simulate current roll-ups using object-id tag.

I'll take a look at "order-by": "series".

from akumuli.

Lazin avatar Lazin commented on July 23, 2024

Could you explain how you estimated 8-10 GB of RAM for 1M series? Will it help to reduce memory footprint if data to Akumuli is added in chunks instead of all 1M series together?

Memory use depend on cardinality. To handle every time-series Akumuli needs 1-10KB of RAM (depending on the size of the series).

Regarding query, I need it query using object-id tag to retrieve at least 10-15 stats (time series) for a given time range which can be last 1 hour. Other group-aggregate queries to simulate current roll-ups using object-id tag.

This should be a light weight query. To see high memory use by queries you need to query tens/hundreds thousands of metrics. It's because query processor caches 4KB of data for every time-series that is involved in the query so when you have a lot of them you will see higher memory use. Also, if query returns data ordered by time the memory use is also higher because it needs to join many series together. 15 stats is not a big deal.

from akumuli.

arrowwood49 avatar arrowwood49 commented on July 23, 2024

Thanks @Lazin for the explanation. I'll test queries to check the memory usage.

from akumuli.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.