akumuli / akumuli Goto Github PK
View Code? Open in Web Editor NEWTime-series database
Home Page: http://akumuli.org
License: Apache License 2.0
Time-series database
Home Page: http://akumuli.org
License: Apache License 2.0
Implement piecewise aggregate approximation query.
subj
Call to aku_close_database
should call column_store->close()
method and results should be saved to sqlite database.
I am trying to compile on Ubuntu 14.04 with Clang 3.6.
Nope...
/usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/cstdio:120:11: error: no member named 'gets' in the global namespace
using ::gets;
New data layout.
data chunk
| all series ids + all timestamps + all values |
sorted by series id and then - timestamp
proposed layout
data chunk superblock
| series1 + series2 + ... + seriesN | ... | series1 offsets + series2... |
Data chunk layout is about the same except that different series are explicitly divided (not by sorting).
Superblock contains fixed size blocks for each series. For each series it contains array (fixed size) of offsets (each offset should point to series data inside one of the data-chunks). There should be one superblock for many data chunks (e.g. 32, branching factor is not that important and shouldn't be configurable). Also superblock should store statistics - count, avg (order statistics maybe), min, max, etc.
Libakumuli should provide some indexing mechanism for time-series, it could be simple resampled time-series or something more complex(FFT or wavelet) but fast enough to build it in real-time. Indexing should be disabled by default and triggered only for some tags in configuration.
This index should be useful for plotting (first priority) and summarization. It should make possible to build plot for a year without touching all data from that year. Another use for this index is similarity search (preferably DTW based).
Hello,
I'm new to Akumuli.
Is there a way to use Akumuli with Python Pandas ?
Kind regards
HTTP based query support. Query parameters should be passed through url or POST data in json format, query results should be returned in RESP format using streaming HTTP.
Akumuli should start without --db parameter and without opening database. It should be possible to create new database using query language, delete database, open existing (from list in ~/.akumuli).
If client cursor reads from the last page and at the same time page overflow is occured - last page is reallocated and access violation is occured. Different reallocation mechanism needed. Pages doesn't need to be reclimed until every client finishes reading (space untilization wouldn't be constant anymore).
subj.
In this case - every write is future write and checkpoint triggered every time.
Would you like to replace more defines for constant values by enumerations to stress their relationships?
Metadata should be stored inside sqlite database.
Metadata list should include existing parameters:
and some new parameters:
First of all - allow me to congratulate you for your efforts on this. This is very much appreciated and very much-in-need requirement for the big-data analytics industry today.
I have been looking for a good high-performance time-series databases. The best options today are OpenTSDB, InfluxDB or ElasticSearch.
The problem with them is: Both ES and OTSDB are monstrously huge (on resources), requiring you pull up java virtual machines and what not for their operation. InfluxDB is good on performance, low on resources (no dependencies), but no where near production grade.
This is where this project could make an impact. Low on resources, high on performance and production grade is what is needed for big-data clusters to perform efficiently.
I am not sure how serious you are about this project (may be this is just a hobby project for you) - but if you are indeed committed to make this production grade, then would like to bring few points to your consideration to put things in perspective (on what is needed in the industry).
I see from your readme.md about writing performance of 1 million points. While it is good, just consider this below scenario:
The point I am trying to make is, if someone needs to import past 3 years data just for testing out few business scenarios, then they are looking at average of 9000 million data points to import. This is where much higher write performance is critical. (If import itself takes days, then querying back and analyzing them would become in-feasible).
It is not uncommon to draw interactive charts for past 3 to 5 years data. So, a performant database is expected to be able to return atleast 35 million points /second (one year worth data / second), to be able to remain responsive and fast enough for users. (Even that is slow, because imagine a chart loading for 5 seconds just to show 5 years data - not really good enough).
This luxon here has a nice interactive chart that is demonstrating "30-year second-by-second sensor data. A total of 1billion data-points".
It would be great to see this project reaching that kind of performance.
Hello,
Any wrong symbol on TCP port is breaking server:
server log:
2016-03-05 22:26:55,651 protocol-parser [INFO] /Akumuli/akumulid/logger.cpp(31) Starting protocol parser
2016-03-05 22:26:59,307 tcp-session [INFO] /Akumuli/akumulid/logger.cpp(31) Session created
2016-03-05 22:26:59,307 protocol-parser [INFO] /Akumuli/akumulid/logger.cpp(31) Starting protocol parser
2016-03-05 22:26:59,307 tcp-session [INFO] /Akumuli/akumulid/logger.cpp(31) Creating error handler for session
2016-03-05 22:27:00,445 tcp-session [ERROR] /Akumuli/akumulid/logger.cpp(54) /Akumuli/akumulid/protocolparser.cpp(66): Throw in function void Akumuli::ProtocolParser::worker(Akumuli::Caller&)
Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<Akumuli::ProtocolParserError> >
std::exception::what: unexpected parameter id format - \r\n
2016-03-05 22:27:00,446 tcp-session [ERROR] /Akumuli/akumulid/logger.cpp(54) End of file
akumulid: /usr/include/boost/coroutine/v1/detail/coroutine_op.hpp:44: D& boost::coroutines::detail::coroutine_op<Signature, D, void, 0>::operator()() [with Signature = void(); D = boost::coroutines::coroutine<void()>]: Assertion `! static_cast< D * >( this)->impl_->is_complete()' failed.
/root/akumuli.sh: line 33: 12 Aborted ${AKUMULID}
ERROR: Can't run akumuli
request (just pressed "enter"):
$ telnet 192.168.99.100 8282
Trying 192.168.99.100...
Connected to 192.168.99.100.
Escape character is '^]'.
-ERR /Akumuli/akumulid/protocolparser.cpp(66): Throw in function void Akumuli::ProtocolParser::worker(Akumuli::Caller&)
Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<Akumuli::ProtocolParserError> >
std::exception::what: unexpected parameter id format - \r\n
Connection closed by foreign host.
Hello,
I'm getting the following compilation error at Ubuntu 15.10. Please, do you have any thoughts what's wrong? I've tried master, v0.3.0 and v0.2.0, but all end up the same. So I guess I'm missing some library/header files?
stybla@akumuli:~/Akumuli$ cmake .
-- Boost version: 1.58.0
-- Found the following Boost libraries:
-- unit_test_framework
-- program_options
-- system
-- thread
-- filesystem
-- coroutine
-- context
-- regex
-- date_time
-- Found log4cxx: /usr/lib/x86_64-linux-gnu/liblog4cxx.so
-- Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
-- Found Sqlite3: /usr/lib/x86_64-linux-gnu/libsqlite3.so
-- Found APR headers: /usr/include/apr-1.0
-- Found APR library: /usr/lib/x86_64-linux-gnu/libapr-1.so
-- Found APRUTIL headers: /usr/include/apr-1.0
-- Found APRUTIL library: /usr/lib/x86_64-linux-gnu/libaprutil-1.so
-- Configuring done
-- Generating done
-- Build files have been written to: /home/stybla/Akumuli
stybla@akumuli:~/Akumuli$ make
[ 1%] Building CXX object libakumuli/CMakeFiles/akumuli.dir/storage.cpp.o
In file included from /home/stybla/Akumuli/libakumuli/page.h:30:0,
from /home/stybla/Akumuli/libakumuli/storage.h:38,
from /home/stybla/Akumuli/libakumuli/storage.cpp:19:
/home/stybla/Akumuli/libakumuli/internal_cursor.h:36:29: error: ‘caller_type’ in ‘Akumuli::Coroutine {aka struct boost::coroutines::coroutine<void(Akumuli::InternalCursor*)>}’ does not name a type
typedef typename Coroutine::caller_type Caller;
^
libakumuli/CMakeFiles/akumuli.dir/build.make:77: recipe for target 'libakumuli/CMakeFiles/akumuli.dir/storage.cpp.o' failed
make[2]: *** [libakumuli/CMakeFiles/akumuli.dir/storage.cpp.o] Error 1
CMakeFiles/Makefile2:75: recipe for target 'libakumuli/CMakeFiles/akumuli.dir/all' failed
make[1]: *** [libakumuli/CMakeFiles/akumuli.dir/all] Error 2
Makefile:146: recipe for target 'all' failed
make: *** [all] Error 2
Thank you.
/stats api command
Some bugs was unnoticed because current tests suite consist from unit-tests mostly and functional tests wasn't started on CI server at all.
Akumulid should store some data in the global configuration in ~/.akumuli
Implement "Symbolic Aggregate approXimation" query.
NBTreeExtentsList::close fails if there was no previous writes or reads from it and tree is empty.
Need test for this case.
If you only install libapr1, then the cmake step fails. Need to also do:
sudo apt-get install libapr1-dev
I was trying use query like:
{
"sample": "all",
"range": {
"from": "20160102T123000.000000",
"to": "20160102T123010.000000" }
}
but happens exception "metric
not set"
Query processor should use intrinsic storage engine ability to calculate aggregates (min, max, sum, avg, etc).
Some parameters (like "${APR_INCLUDE_DIR}" and "${APR_LIBRARY}") are passed to CMake commands in your build scripts without enclosing them by quotation marks. I see that these places will result in build difficulties if the contents of the used variables will contain special characters like semicolons.
I would recommend to apply advices from a wiki article.
This subsystems should be revamped in a way that allows new contributors to easily extend/augment the system:
Tutorials on subject should be created.
Basic query processing without QueryProcessor and all its dependencies. Simple scans, where, order-by and group-by should work.
/code/Akumuli/tests/sequencer_test/main.cpp:10:29: fatal error: google/profiler.h: No such file or directory
#include <google/profiler.h>
^
compilation terminated.
Please update the requirements to list the steps required to obtain this header.
As described in this TODO entry: https://github.com/akumuli/Akumuli/blob/storage_engine/libakumuli/storage2.cpp#L347
ReshapeRequest should be decoupled from column-store and used by both IQueryProcessor and ColumnStore. QueryProcessor should contain only details about post-processing steps (PAA, SAX, sliding windows moving average, etc) and reshape request is responsible for data extraction and transformation (search, reorder). ReshapeRequest should be parsed from query directly and not by the query processor builder class.
Memory resident superblock can't be used by aggregate
method If the superblock was restored after crash. Superblock initialization doesn't fills all SubtreeRef fields used by aggregate
method. Also it looks like simple update doesn't maintain this invariant as well.
It should be possible to create or delete database running akumulid
with some set of parameters.
Things to do: review parallel test, it can be broken; backward cursors didn't tested by functional tests at all; ...
subj
libakumuli shouldn't define it's own metadata format and souldn't rely on this format. It should provide interface to load and manipulate volumes instead.
When pushing data using the following bash script, I often get SIGSEGVs:
#!/bin/bash
for idx in {0..1000}
do
date -u ++%Y%m%dT%H%M%S.%N | awk '{print "+testi tag1=A tag2=C tag3=I\r\n"$0"\r\n+24.3"}' | nc -C localhost 8282
done
Sometimes the SIGSEGV happens on the first run, sometimes after the second run. I always do a --delete, --create before testing. ~/.akumulid is original except that the storage dir was moved to some other (local) location.
gdb:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffbe866e700 (LWP 31504)]
0x0000000000489f95 in std::__atomic_base<unsigned long>::fetch_add (this=0x7ffbffffffff, __i=1, __m=std::memory_order_seq_cst)
at /usr/include/c++/4.8.2/bits/atomic_base.h:614
614 { return __atomic_fetch_add(&_M_i, __i, __m); }
(gdb) bt
#0 0x0000000000489f95 in std::__atomic_base<unsigned long>::fetch_add (this=0x7ffbffffffff, __i=1, __m=std::memory_order_seq_cst)
at /usr/include/c++/4.8.2/bits/atomic_base.h:614
#1 0x0000000000488485 in std::__atomic_base<unsigned long>::operator++ (this=0x7ffbffffffff) at /usr/include/c++/4.8.2/bits/atomic_base.h:388
#2 0x00000000004839c0 in Akumuli::IngestionPipeline::__lambda0::operator() (__closure=0x7dcc50) at /home/zeuz/git/Akumuli/akumulid/ingestion_pipeline.cpp:221
#3 0x00000000004852be in std::_Bind_simple<Akumuli::IngestionPipeline::start()::__lambda0()>::_M_invoke<>(std::_Index_tuple<>) (this=0x7dcc50)
at /usr/include/c++/4.8.2/functional:1732
#4 0x0000000000485215 in std::_Bind_simple<Akumuli::IngestionPipeline::start()::__lambda0()>::operator()(void) (this=0x7dcc50)
at /usr/include/c++/4.8.2/functional:1720
#5 0x00000000004851ae in std::thread::_Impl<std::_Bind_simple<Akumuli::IngestionPipeline::start()::__lambda0()> >::_M_run(void) (this=0x7dcc38)
at /usr/include/c++/4.8.2/thread:115
#6 0x00007ffff5809220 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
#7 0x00007ffff4c66dc5 in start_thread (arg=0x7ffbe866e700) at pthread_create.c:308
#8 0x00007ffff4f7121d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
in akumulid/ingestion_pipeline.cpp:221
void IngestionPipeline::start() {
[...]
if (qref->pop(val)) {
idle_count = 0;
// New write
if (AKU_UNLIKELY(val->cnt == nullptr)) { //poisoned
[...]
} else {
auto error = self->con_->write(val->sample);
HERE----> (*val->cnt)++;
if (AKU_UNLIKELY(error != AKU_SUCCESS)) {
(*val->on_error)(error, *val->cnt);
}
}
It seems as the val
object becomes invalid. It's not always nullptr, 0x7ffbffffffff,...
I tried initializing it to nullptr (in PipelineSpout::PipelineSpout), but this seems not to catch the issue.
(Line numbers might be a little bit off as I added more logging)
subj
I was trying the 'limit' and 'offset' features: (master branch as of January 16)
curl http://localhost:8181 --data '{ "output": { "format": "csv" }, "select": "names", "limit": 1}'
Results in:
[zeuz@test-vm akumulid]$ ./akumulid
OK TCP server started, port: 8282
OK UDP server started, port: 8383
OK HTTP server started, port: 8181
akumulid: /usr/include/boost/coroutine/v1/detail/coroutine_op.hpp:266: D& boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::operator()(boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::arg_type) [with Signature = void(Akumuli::InternalCursor*); D = boost::coroutines::coroutine<void(Akumuli::InternalCursor*)>; boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::arg_type = Akumuli::InternalCursor*]: Assertion `! static_cast< D * >( this)->impl_->is_complete()' failed.
Backtrace:
(gdb) bt
#0 0x00007ffff4e975f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff4e98ce8 in __GI_abort () at abort.c:90
#2 0x00007ffff4e90566 in __assert_fail_base (fmt=0x7ffff4fe0228 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x7ffff7b37570 "! static_cast< D * >( this)->impl_->is_complete()",
file=file@entry=0x7ffff7b37510 "/usr/include/boost/coroutine/v1/detail/coroutine_op.hpp", line=line@entry=266,
function=function@entry=0x7ffff7b378a0 <boost::coroutines::detail::coroutine_op<void (Akumuli::InternalCursor*), boost::coroutines::coroutine<void (Akumuli::InternalCursor*), 1>, void, 1>::operator()(Akumuli::InternalCursor*)::__PRETTY_FUNCTION__> "D& boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::operator()(boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::arg_type) [with Signature = void(Akumuli::InternalCursor*)"...) at assert.c:92
#3 0x00007ffff4e90612 in __GI___assert_fail (assertion=0x7ffff7b37570 "! static_cast< D * >( this)->impl_->is_complete()",
file=0x7ffff7b37510 "/usr/include/boost/coroutine/v1/detail/coroutine_op.hpp", line=266,
function=0x7ffff7b378a0 <boost::coroutines::detail::coroutine_op<void (Akumuli::InternalCursor*), boost::coroutines::coroutine<void (Akumuli::InternalCursor*), 1>, void, 1>::operator()(Akumuli::InternalCursor*)::__PRETTY_FUNCTION__> "D& boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::operator()(boost::coroutines::detail::coroutine_op<Signature, D, void, 1>::arg_type) [with Signature = void(Akumuli::InternalCursor*)"...) at assert.c:101
#4 0x00007ffff7aa7722 in boost::coroutines::detail::coroutine_op<void (Akumuli::InternalCursor*), boost::coroutines::coroutine<void (Akumuli::InternalCursor*), 1>, void, 1>::operator()(Akumuli::InternalCursor*) (this=0x7ffbd00013c0, a1=0x7ffbd0000ec0) at /usr/include/boost/coroutine/v1/detail/coroutine_op.hpp:266
#5 0x00007ffff7aa72c9 in Akumuli::CoroCursor::read_ex (this=0x7ffbd0000ec0, buffer=0x7ffbd0000920, buffer_size=1000)
at /home/zeuz/git/Akumuli/libakumuli/cursor.cpp:125
#6 0x00007ffff7a75d73 in CursorImpl::read_values (this=0x7ffbd0001110, values=0x7ffbd0000920, values_size=1000) at /home/zeuz/git/Akumuli/libakumuli/akumuli.cpp:122
#7 0x00007ffff7a72a6a in aku_cursor_read (cursor=0x7ffbd0001110, dest=0x7ffbd0000920, dest_size=1000) at /home/zeuz/git/Akumuli/libakumuli/akumuli.cpp:331
#8 0x000000000047c4dd in Akumuli::AkumuliCursor::read (this=0x7ffbd0102108, dest=0x7ffbd0000920, dest_size=1000)
at /home/zeuz/git/Akumuli/akumulid/ingestion_pipeline.cpp:27
#9 0x00000000004aa3ea in Akumuli::QueryResultsPooler::read_some (this=0x7ffbd00008c0, buf=0x7ffbd800cf4a "", buf_size=8180)
at /home/zeuz/git/Akumuli/akumulid/query_results_pooler.cpp:380
#10 0x00000000004a6e3b in Akumuli::Http::MHD::read_callback (data=0x7ffbd00008c0, pos=0, buf=0x7ffbd800cf4a "", max=8180)
at /home/zeuz/git/Akumuli/akumulid/httpserver.cpp:22
#11 0x00007ffff5a4b696 in try_ready_chunked_body (connection=0x7ffbd8008d30) at connection.c:442
#12 MHD_connection_handle_idle (connection=0x7ffbd8008d30) at connection.c:2379
#13 0x00007ffff5a4d2cb in MHD_handle_connection (data=0x7ffbd8008d30) at daemon.c:800
#14 0x00007ffff4c4ddc5 in start_thread (arg=0x7ffbe15f9700) at pthread_create.c:308
#15 0x00007ffff4f5821d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Absolute paths should always be used
It should be possible to use string literals as tag values. String literal is a series of characters enclosed by double quotes [example: tag_name="tag value"].
String literals can contain spaces but can't contain new line and carriage return characters(otherwise RESP formatted output will be a mess). Escape sequences should be used to represent certain characters.
Escape sequence | Description |
---|---|
\ | Backslash |
" | Double quote |
\r | Carriage return |
\n | New line |
\xHEX | Arbitrary hexadecimal value |
Without this compilation fails with:
code/Akumuli/src/storage.cpp:36:21: fatal error: apr_xml.h: No such file or directory
#include <apr_xml.h>
May be usefull save own statistic of akumuli in akumuli? This feature can make administration akumuli more easy, because we can analize work with binding to loadings.
Metrics can be a very large number, e.g. ingestion rate, size of current sequence, statistics of search and more.
Nikolay
Tests shows data corruption. Performance get worse (possibly related).
First, a timeseries will be stored by sending the following message to the TCP Server:
+mymetric
:1457951936
+5.0
Then, this record should be retrieved by sending an appropriate POST request to the HTTP Server:
{"metric":"mymetric","range":{"from":"1457951936000000000","to":"1457951936000000000"}}
The server returns an OK (status code 200), but the body of the response is empty.
Implement correlations search based on SAX and bag of words model. Token counting should be based on CM-sketch because number of distinct tokens usually very high (more then in natural languages).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.