akumuli / akumuli Goto Github PK

View Code? Open in Web Editor NEW

837.0 44.0 86.0 5.23 MB

Time-series database

Home Page: http://akumuli.org

License: Apache License 2.0

CMake 2.00% C++ 57.31% C 33.98% Shell 1.08% Python 5.61% Dockerfile 0.02%

time-series database c-plus-plus tsdb time-series-database metrics

akumuli's Issues

PAA query

Implement piecewise aggregate approximation query.

Open/close/restore database

Call to aku_close_database should call column_store->close() method and results should be saved to sqlite database.

Akumuli will not compile on Clang 3.6

I am trying to compile on Ubuntu 14.04 with Clang 3.6.

Nope...

/usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/cstdio:120:11: error: no member named 'gets' in the global namespace
using ::gets;

Storage engine revamp

New data layout.

  data chunk
| all series ids + all timestamps + all values |
sorted by series id and then - timestamp

proposed layout

  data chunk                                superblock
| series1 + series2 + ... + seriesN | ... | series1 offsets + series2... |

Data chunk layout is about the same except that different series are explicitly divided (not by sorting).
Superblock contains fixed size blocks for each series. For each series it contains array (fixed size) of offsets (each offset should point to series data inside one of the data-chunks). There should be one superblock for many data chunks (e.g. 32, branching factor is not that important and shouldn't be configurable). Also superblock should store statistics - count, avg (order statistics maybe), min, max, etc.

Time-series indexing research

Libakumuli should provide some indexing mechanism for time-series, it could be simple resampled time-series or something more complex(FFT or wavelet) but fast enough to build it in real-time. Indexing should be disabled by default and triggered only for some tags in configuration.

This index should be useful for plotting (first priority) and summarization. It should make possible to build plot for a year without touching all data from that year. Another use for this index is similarity search (preferably DTW based).

Akumuli and Python Pandas

Hello,

I'm new to Akumuli.
Is there a way to use Akumuli with Python Pandas ?

Kind regards

Query support

HTTP based query support. Query parameters should be passed through url or POST data in json format, query results should be returned in RESP format using streaming HTTP.

Configuration during runtime

Akumuli should start without --db parameter and without opening database. It should be possible to create new database using query language, delete database, open existing (from list in ~/.akumuli).

Dangling cursors problem

If client cursor reads from the last page and at the same time page overflow is occured - last page is reallocated and access violation is occured. Different reallocation mechanism needed. Pages doesn't need to be reclimed until every client finishes reading (space untilization wouldn't be constant anymore).

Incorrect float value parsing

https://github.com/akumuli/Akumuli/blob/master/src/protocolparser.cpp#L107

Compression isn't efficient if ingestion speed is slow

subj.
In this case - every write is future write and checkpoint triggered every time.

#define ⇒ enum?

Would you like to replace more defines for constant values by enumerations to stress their relationships?

Move from json config to sqlite db

Metadata should be stored inside sqlite database.
Metadata list should include existing parameters:

list of volumes
max cache size
compression threshold
sliding window width

and some new parameters:

list of possible tags
key to id mappings

Performance

First of all - allow me to congratulate you for your efforts on this. This is very much appreciated and very much-in-need requirement for the big-data analytics industry today.

I have been looking for a good high-performance time-series databases. The best options today are OpenTSDB, InfluxDB or ElasticSearch.

The problem with them is: Both ES and OTSDB are monstrously huge (on resources), requiring you pull up java virtual machines and what not for their operation. InfluxDB is good on performance, low on resources (no dependencies), but no where near production grade.

This is where this project could make an impact. Low on resources, high on performance and production grade is what is needed for big-data clusters to perform efficiently.

I am not sure how serious you are about this project (may be this is just a hobby project for you) - but if you are indeed committed to make this production grade, then would like to bring few points to your consideration to put things in perspective (on what is needed in the industry).

I see from your readme.md about writing performance of 1 million points. While it is good, just consider this below scenario:

Sampling one sensor every second for one year generates about 31.5 million points. Now it is not uncommon for a simple household to have anywhere between 5 to 10 sensors and for an industry to have anywhere between 100 to 500 sensors to monitor. So, you are essentially looking at roughly about 3000 million data points per year (for an industrial scenario).

The point I am trying to make is, if someone needs to import past 3 years data just for testing out few business scenarios, then they are looking at average of 9000 million data points to import. This is where much higher write performance is critical. (If import itself takes days, then querying back and analyzing them would become in-feasible).

It is not uncommon to draw interactive charts for past 3 to 5 years data. So, a performant database is expected to be able to return atleast 35 million points /second (one year worth data / second), to be able to remain responsive and fast enough for users. (Even that is slow, because imagine a chart loading for 5 seconds just to show 5 years data - not really good enough).

This luxon here has a nice interactive chart that is demonstrating "30-year second-by-second sensor data. A total of 1billion data-points".

It would be great to see this project reaching that kind of performance.

any wrong request to TCP port is crashing daemon

Hello,
Any wrong symbol on TCP port is breaking server:
server log:

2016-03-05 22:26:55,651 protocol-parser [INFO] /Akumuli/akumulid/logger.cpp(31) Starting protocol parser
2016-03-05 22:26:59,307 tcp-session [INFO] /Akumuli/akumulid/logger.cpp(31) Session created
2016-03-05 22:26:59,307 protocol-parser [INFO] /Akumuli/akumulid/logger.cpp(31) Starting protocol parser
2016-03-05 22:26:59,307 tcp-session [INFO] /Akumuli/akumulid/logger.cpp(31) Creating error handler for session
2016-03-05 22:27:00,445 tcp-session [ERROR] /Akumuli/akumulid/logger.cpp(54) /Akumuli/akumulid/protocolparser.cpp(66): Throw in function void Akumuli::ProtocolParser::worker(Akumuli::Caller&)
Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<Akumuli::ProtocolParserError> >
std::exception::what: unexpected parameter id format - \r\n

2016-03-05 22:27:00,446 tcp-session [ERROR] /Akumuli/akumulid/logger.cpp(54) End of file
akumulid: /usr/include/boost/coroutine/v1/detail/coroutine_op.hpp:44: D& boost::coroutines::detail::coroutine_op<Signature, D, void, 0>::operator()() [with Signature = void(); D = boost::coroutines::coroutine<void()>]: Assertion `! static_cast< D * >( this)->impl_->is_complete()' failed.
/root/akumuli.sh: line 33:    12 Aborted                 ${AKUMULID}
ERROR: Can't run akumuli

request (just pressed "enter"):

$ telnet 192.168.99.100 8282
Trying 192.168.99.100...
Connected to 192.168.99.100.
Escape character is '^]'.

-ERR /Akumuli/akumulid/protocolparser.cpp(66): Throw in function void Akumuli::ProtocolParser::worker(Akumuli::Caller&)
Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<Akumuli::ProtocolParserError> >
std::exception::what: unexpected parameter id format - \r\n

Connection closed by foreign host.

compilation error at Ubuntu 15.10

Hello,

I'm getting the following compilation error at Ubuntu 15.10. Please, do you have any thoughts what's wrong? I've tried master, v0.3.0 and v0.2.0, but all end up the same. So I guess I'm missing some library/header files?

stybla@akumuli:~/Akumuli$ cmake .
-- Boost version: 1.58.0
-- Found the following Boost libraries:
--   unit_test_framework
--   program_options
--   system
--   thread
--   filesystem
--   coroutine
--   context
--   regex
--   date_time
-- Found log4cxx: /usr/lib/x86_64-linux-gnu/liblog4cxx.so
-- Could NOT find PkgConfig (missing:  PKG_CONFIG_EXECUTABLE) 
-- Found Sqlite3: /usr/lib/x86_64-linux-gnu/libsqlite3.so
-- Found APR headers: /usr/include/apr-1.0
-- Found APR library: /usr/lib/x86_64-linux-gnu/libapr-1.so
-- Found APRUTIL headers: /usr/include/apr-1.0
-- Found APRUTIL library: /usr/lib/x86_64-linux-gnu/libaprutil-1.so
-- Configuring done
-- Generating done
-- Build files have been written to: /home/stybla/Akumuli
stybla@akumuli:~/Akumuli$ make
[  1%] Building CXX object libakumuli/CMakeFiles/akumuli.dir/storage.cpp.o
In file included from /home/stybla/Akumuli/libakumuli/page.h:30:0,
                 from /home/stybla/Akumuli/libakumuli/storage.h:38,
                 from /home/stybla/Akumuli/libakumuli/storage.cpp:19:
/home/stybla/Akumuli/libakumuli/internal_cursor.h:36:29: error: ‘caller_type’ in ‘Akumuli::Coroutine {aka struct boost::coroutines::coroutine<void(Akumuli::InternalCursor*)>}’ does not name a type
 typedef typename Coroutine::caller_type Caller;
                             ^
libakumuli/CMakeFiles/akumuli.dir/build.make:77: recipe for target 'libakumuli/CMakeFiles/akumuli.dir/storage.cpp.o' failed
make[2]: *** [libakumuli/CMakeFiles/akumuli.dir/storage.cpp.o] Error 1
CMakeFiles/Makefile2:75: recipe for target 'libakumuli/CMakeFiles/akumuli.dir/all' failed
make[1]: *** [libakumuli/CMakeFiles/akumuli.dir/all] Error 2
Makefile:146: recipe for target 'all' failed
make: *** [all] Error 2

Thank you.

Instrumentation

/stats api command

Revive akumulid daemon. It can't be built right now because libakumuli api have changed.

Regression testing on CI server

Some bugs was unnoticed because current tests suite consist from unit-tests mostly and functional tests wasn't started on CI server at all.

Global config

Akumulid should store some data in the global configuration in ~/.akumuli

list of created databases
...

SAX query

Implement "Symbolic Aggregate approXimation" query.

SIGSEGV on close

NBTreeExtentsList::close fails if there was no previous writes or reads from it and tree is empty.

Search in empty page cause segfault

Need test for this case.

Requirements should list libapr1-dev as well

If you only install libapr1, then the cmake step fails. Need to also do:
sudo apt-get install libapr1-dev

Error parsing query without metric field

I was trying use query like:
{
"sample": "all",
"range": {
"from": "20160102T123000.000000",
"to": "20160102T123010.000000" }
}
but happens exception "metric not set"

Continous queries

Integrate intrinsic storage engine aggregations

Query processor should use intrinsic storage engine ability to calculate aggregates (min, max, sum, avg, etc).

Complete quoting for parameters of some CMake commands

Some parameters (like "${APR_INCLUDE_DIR}" and "${APR_LIBRARY}") are passed to CMake commands in your build scripts without enclosing them by quotation marks. I see that these places will result in build difficulties if the contents of the used variables will contain special characters like semicolons.

I would recommend to apply advices from a wiki article.

Make source code more friendly

This subsystems should be revamped in a way that allows new contributors to easily extend/augment the system:

It should be possible to add new operations to query-engine and extend query language easily.
There should be a simple way to add new protocols.

Tutorials on subject should be created.

Query processor integration

Basic query processing without QueryProcessor and all its dependencies. Simple scans, where, order-by and group-by should work.

Compilation errors

/code/Akumuli/tests/sequencer_test/main.cpp:10:29: fatal error: google/profiler.h: No such file or directory
#include <google/profiler.h>
^
compilation terminated.

Please update the requirements to list the steps required to obtain this header.

Refactor IQueryProcessor

As described in this TODO entry: https://github.com/akumuli/Akumuli/blob/storage_engine/libakumuli/storage2.cpp#L347
ReshapeRequest should be decoupled from column-store and used by both IQueryProcessor and ColumnStore. QueryProcessor should contain only details about post-processing steps (PAA, SAX, sliding windows moving average, etc) and reshape request is responsible for data extraction and transformation (search, reorder). ReshapeRequest should be parsed from query directly and not by the query processor builder class.

NB+tree aggregations don't work after restore

Memory resident superblock can't be used by aggregate method If the superblock was restored after crash. Superblock initialization doesn't fills all SubtreeRef fields used by aggregate method. Also it looks like simple update doesn't maintain this invariant as well.

Revive storage_test.

Create/delete database

It should be possible to create or delete database running akumulid with some set of parameters.

Need more powerful test suite

Things to do: review parallel test, it can be broken; backward cursors didn't tested by functional tests at all; ...

Propagate library configuration to storage and all volumes

subj

More composable libakumuli interface

libakumuli shouldn't define it's own metadata format and souldn't rely on this format. It should provide interface to load and manipulate volumes instead.

Akumulid: SIGSEGV on insert

When pushing data using the following bash script, I often get SIGSEGVs:

#!/bin/bash
for idx in {0..1000}
do
date -u ++%Y%m%dT%H%M%S.%N | awk '{print "+testi tag1=A tag2=C tag3=I\r\n"$0"\r\n+24.3"}' | nc -C localhost 8282
done

Sometimes the SIGSEGV happens on the first run, sometimes after the second run. I always do a --delete, --create before testing. ~/.akumulid is original except that the storage dir was moved to some other (local) location.

gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffbe866e700 (LWP 31504)]
0x0000000000489f95 in std::__atomic_base<unsigned long>::fetch_add (this=0x7ffbffffffff, __i=1, __m=std::memory_order_seq_cst)
    at /usr/include/c++/4.8.2/bits/atomic_base.h:614
614           { return __atomic_fetch_add(&_M_i, __i, __m); }
(gdb) bt
#0  0x0000000000489f95 in std::__atomic_base<unsigned long>::fetch_add (this=0x7ffbffffffff, __i=1, __m=std::memory_order_seq_cst)
    at /usr/include/c++/4.8.2/bits/atomic_base.h:614
#1  0x0000000000488485 in std::__atomic_base<unsigned long>::operator++ (this=0x7ffbffffffff) at /usr/include/c++/4.8.2/bits/atomic_base.h:388
#2  0x00000000004839c0 in Akumuli::IngestionPipeline::__lambda0::operator() (__closure=0x7dcc50) at /home/zeuz/git/Akumuli/akumulid/ingestion_pipeline.cpp:221
#3  0x00000000004852be in std::_Bind_simple<Akumuli::IngestionPipeline::start()::__lambda0()>::_M_invoke<>(std::_Index_tuple<>) (this=0x7dcc50)
    at /usr/include/c++/4.8.2/functional:1732
#4  0x0000000000485215 in std::_Bind_simple<Akumuli::IngestionPipeline::start()::__lambda0()>::operator()(void) (this=0x7dcc50)
    at /usr/include/c++/4.8.2/functional:1720
#5  0x00000000004851ae in std::thread::_Impl<std::_Bind_simple<Akumuli::IngestionPipeline::start()::__lambda0()> >::_M_run(void) (this=0x7dcc38)
    at /usr/include/c++/4.8.2/thread:115
#6  0x00007ffff5809220 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
#7  0x00007ffff4c66dc5 in start_thread (arg=0x7ffbe866e700) at pthread_create.c:308
#8  0x00007ffff4f7121d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

in akumulid/ingestion_pipeline.cpp:221

void IngestionPipeline::start() {
[...]
                if (qref->pop(val)) {
                    idle_count = 0;
                    // New write
                    if (AKU_UNLIKELY(val->cnt == nullptr)) {  //poisoned
[...]
                    } else {
                        auto error = self->con_->write(val->sample);
HERE---->               (*val->cnt)++;
                        if (AKU_UNLIKELY(error != AKU_SUCCESS)) {
                            (*val->on_error)(error, *val->cnt);
                        }
                    }

It seems as the val object becomes invalid. It's not always nullptr, 0x7ffbffffffff,...
I tried initializing it to nullptr (in PipelineSpout::PipelineSpout), but this seems not to catch the issue.
(Line numbers might be a little bit off as I added more logging)

Function aku_remove_database fails if file didn't exists

subj

assert fail: Limiting result count

I was trying the 'limit' and 'offset' features: (master branch as of January 16)
curl http://localhost:8181 --data '{ "output": { "format": "csv" }, "select": "names", "limit": 1}'

Results in:

[zeuz@test-vm akumulid]$ ./akumulid
OK TCP server started, port: 8282
OK UDP server started, port: 8383
OK HTTP server started, port: 8181
akumulid: /usr/include/boost/coroutine/v1/detail/coroutine_op.hpp:266: D& boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::operator()(boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::arg_type) [with Signature = void(Akumuli::InternalCursor*); D = boost::coroutines::coroutine<void(Akumuli::InternalCursor*)>; boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::arg_type = Akumuli::InternalCursor*]: Assertion `! static_cast< D * >( this)->impl_->is_complete()' failed.

Backtrace:

(gdb) bt
#0  0x00007ffff4e975f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff4e98ce8 in __GI_abort () at abort.c:90
#2  0x00007ffff4e90566 in __assert_fail_base (fmt=0x7ffff4fe0228 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x7ffff7b37570 "! static_cast< D * >( this)->impl_->is_complete()", 
    file=file@entry=0x7ffff7b37510 "/usr/include/boost/coroutine/v1/detail/coroutine_op.hpp", line=line@entry=266, 
    function=function@entry=0x7ffff7b378a0 <boost::coroutines::detail::coroutine_op<void (Akumuli::InternalCursor*), boost::coroutines::coroutine<void (Akumuli::InternalCursor*), 1>, void, 1>::operator()(Akumuli::InternalCursor*)::__PRETTY_FUNCTION__> "D& boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::operator()(boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::arg_type) [with Signature = void(Akumuli::InternalCursor*)"...) at assert.c:92
#3  0x00007ffff4e90612 in __GI___assert_fail (assertion=0x7ffff7b37570 "! static_cast< D * >( this)->impl_->is_complete()", 
    file=0x7ffff7b37510 "/usr/include/boost/coroutine/v1/detail/coroutine_op.hpp", line=266, 
    function=0x7ffff7b378a0 <boost::coroutines::detail::coroutine_op<void (Akumuli::InternalCursor*), boost::coroutines::coroutine<void (Akumuli::InternalCursor*), 1>, void, 1>::operator()(Akumuli::InternalCursor*)::__PRETTY_FUNCTION__> "D& boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::operator()(boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::arg_type) [with Signature = void(Akumuli::InternalCursor*)"...) at assert.c:101
#4  0x00007ffff7aa7722 in boost::coroutines::detail::coroutine_op<void (Akumuli::InternalCursor*), boost::coroutines::coroutine<void (Akumuli::InternalCursor*), 1>, void, 1>::operator()(Akumuli::InternalCursor*) (this=0x7ffbd00013c0, a1=0x7ffbd0000ec0) at /usr/include/boost/coroutine/v1/detail/coroutine_op.hpp:266
#5  0x00007ffff7aa72c9 in Akumuli::CoroCursor::read_ex (this=0x7ffbd0000ec0, buffer=0x7ffbd0000920, buffer_size=1000)
    at /home/zeuz/git/Akumuli/libakumuli/cursor.cpp:125
#6  0x00007ffff7a75d73 in CursorImpl::read_values (this=0x7ffbd0001110, values=0x7ffbd0000920, values_size=1000) at /home/zeuz/git/Akumuli/libakumuli/akumuli.cpp:122
#7  0x00007ffff7a72a6a in aku_cursor_read (cursor=0x7ffbd0001110, dest=0x7ffbd0000920, dest_size=1000) at /home/zeuz/git/Akumuli/libakumuli/akumuli.cpp:331
#8  0x000000000047c4dd in Akumuli::AkumuliCursor::read (this=0x7ffbd0102108, dest=0x7ffbd0000920, dest_size=1000)
    at /home/zeuz/git/Akumuli/akumulid/ingestion_pipeline.cpp:27
#9  0x00000000004aa3ea in Akumuli::QueryResultsPooler::read_some (this=0x7ffbd00008c0, buf=0x7ffbd800cf4a "", buf_size=8180)
    at /home/zeuz/git/Akumuli/akumulid/query_results_pooler.cpp:380
#10 0x00000000004a6e3b in Akumuli::Http::MHD::read_callback (data=0x7ffbd00008c0, pos=0, buf=0x7ffbd800cf4a "", max=8180)
    at /home/zeuz/git/Akumuli/akumulid/httpserver.cpp:22
#11 0x00007ffff5a4b696 in try_ready_chunked_body (connection=0x7ffbd8008d30) at connection.c:442
#12 MHD_connection_handle_idle (connection=0x7ffbd8008d30) at connection.c:2379
#13 0x00007ffff5a4d2cb in MHD_handle_connection (data=0x7ffbd8008d30) at daemon.c:800
#14 0x00007ffff4c4ddc5 in start_thread (arg=0x7ffbe15f9700) at pthread_create.c:308
#15 0x00007ffff4f5821d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Invalid paths in metadata db

Absolute paths should always be used

String literals as tag values.

It should be possible to use string literals as tag values. String literal is a series of characters enclosed by double quotes [example: tag_name="tag value"].
String literals can contain spaces but can't contain new line and carriage return characters(otherwise RESP formatted output will be a mess). Escape sequences should be used to represent certain characters.

Escape sequence	Description
\	Backslash
"	Double quote
\r	Carriage return
\n	New line
\xHEX	Arbitrary hexadecimal value

Requirements should list libaprutil1-dev

Without this compilation fails with:

code/Akumuli/src/storage.cpp:36:21: fatal error: apr_xml.h: No such file or directory
#include <apr_xml.h>

Save own statistic in akumuli

May be usefull save own statistic of akumuli in akumuli? This feature can make administration akumuli more easy, because we can analize work with binding to loadings.

Metrics can be a very large number, e.g. ingestion rate, size of current sequence, statistics of search and more.

Nikolay

Data corruption

Tests shows data corruption. Performance get worse (possibly related).

Can't store and retrieve data

First, a timeseries will be stored by sending the following message to the TCP Server:

+mymetric
:1457951936
+5.0

Then, this record should be retrieved by sending an appropriate POST request to the HTTP Server:

{"metric":"mymetric","range":{"from":"1457951936000000000","to":"1457951936000000000"}}

The server returns an OK (status code 200), but the body of the response is empty.

Correlations search

Implement correlations search based on SAX and bag of words model. Token counting should be based on CM-sketch because number of distinct tokens usually very high (more then in natural languages).

akumuli / akumuli Goto Github PK

akumuli's Issues

Recommend Projects

Recommend Topics

Recommend Org