Comments (6)
First of all, Akumuli doesn't support rollup-based retention. You can't set up different retention periods for different time-series. Also, Akumuli doesn't implement rollups the same way Grafana does. It won't pre-generate that 5-minute, 1-hour, and 1-day rollups as a separate time-series. Instead, it has group-aggregate query that can be used to get data with the step you need on demand. Of cause you can generate rollups in your application and write them but 'group-aggregate' is still a recommended way.
The back of the envelope calculations for the capacity:
- cardinality - 50 000*20 - 1M unique time-series;
- write rate - 50 000 / sec;
- space depends on data a lot, 1 576 800 000 000 data-points per year could be stored using less than 1.5TB of disk space if the values are small integers. If the data-points are floats or large integers then it could take anywhere from >1.5TB (low variation) up to 12TB (mostly random data). If the data has a lot of duplicates this can take way less than 1.5TB. But it's better to experiment and load some data to see how compressible it is.
from akumuli.
Of cause you can generate rollups in your application and write them but 'group-aggregate' is still a recommended way.
We are using rollups into separate table to keep disk space usage low. Looks like it will be difficult to do the same in Akumuli even if I create rollups in application since Akumuli doesn't support different retention periods for different time-series. Is there a way to implement custom deletion/retention in application for different time-series?
I'll run some test with real data to find the disk space usage from Akumuli.
If we use Akumuli group-aggregate instead of rollups then what is the memory requirement for query? Assuming user can run only 5 max group-aggregate queries concurrently.
What is the memory requirement to support 50,000 writes /sec?
from akumuli.
It is possible to use multiple akumulid
instances with different configs. One for raw data with 20s step and short retention and another one for pre-aggregated data. Akumuli uses space based retention for operational simplicity. Because of that it's not possible to use different retention policies for different series.
The 50000 writes/sec rate is quite small TBH. But the total cardinality is 1M and this means that you will need around 8-10GB of RAM. The query doesn't require a lot of RAM if you query series by one. And if you query many series at once you can use "order-by": "series" to minimize memory usage.
from akumuli.
Thanks @Lazin for the response.
The 50000 writes/sec rate is quite small TBH. But the total cardinality is 1M and this means that you will need around 8-10GB of RAM. The query doesn't require a lot of RAM if you query series by one. And if you query many series at once you can use "order-by": "series" to minimize memory usage.
Could you explain how you estimated 8-10 GB of RAM for 1M series? Will it help to reduce memory footprint if data to Akumuli is added in chunks instead of all 1M series together?
Regarding query, I need it query using object-id tag to retrieve at least 10-15 stats (time series) for a given time range which can be last 1 hour. Other group-aggregate queries to simulate current roll-ups using object-id tag.
I'll take a look at "order-by": "series".
from akumuli.
Could you explain how you estimated 8-10 GB of RAM for 1M series? Will it help to reduce memory footprint if data to Akumuli is added in chunks instead of all 1M series together?
Memory use depend on cardinality. To handle every time-series Akumuli needs 1-10KB of RAM (depending on the size of the series).
Regarding query, I need it query using object-id tag to retrieve at least 10-15 stats (time series) for a given time range which can be last 1 hour. Other group-aggregate queries to simulate current roll-ups using object-id tag.
This should be a light weight query. To see high memory use by queries you need to query tens/hundreds thousands of metrics. It's because query processor caches 4KB of data for every time-series that is involved in the query so when you have a lot of them you will see higher memory use. Also, if query returns data ordered by time the memory use is also higher because it needs to join many series together. 15 stats is not a big deal.
from akumuli.
Thanks @Lazin for the explanation. I'll test queries to check the memory usage.
from akumuli.
Related Issues (20)
- Errors while running grafana-server with akumuli-datasource when !events not found HOT 1
- Performance advice HOT 1
- Akumuli crashing on write HOT 1
- Are only positive integers supported? HOT 2
- "akumulid init" doesn't work without a pre-existing config file HOT 1
- Announcement: Akumuli Java Support HOT 1
- stats endpoint HOT 6
- all VALUES must have the same number of terms HOT 5
- Memory leak? HOT 2
- Group-Aggregate Algorithm HOT 1
- POST on the HTTP server is probably truncated HOT 1
- Custom Panic Handler
- never ending "Main [ERROR] Empty rescue points list found, leaf-node data was lost" HOT 2
- crash at recovery HOT 2
- Abandonware?
- Data loss after recovery
- Missing headers in eval.cpp and storage2.cpp HOT 1
- CMakeLists use C++11, but std::shared_lock and std::shared_mutex are only available in C++14 and C++17, respectively
- Angular is deprecated - please migrate to React
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from akumuli.