SiriDB is a highly-scalable, robust and super fast time series database. Build from the ground up SiriDB uses a unique mechanism to operate without a global index and allows server resources to be added on the fly. SiriDB's unique query language includes dynamic grouping of time series for easy analysis over large amounts of time series.
One should be able to use an IPv6 address to specify a server_name in the configuration file.
The IPv6 address should then be wrapped in square brackets.
List, count and select statements should return an error message when a series is queried by name and the series does not exist. This works correctly when the receiving server should have the series but fails if the series should exist in another pool.
When using a select statement with merge and a string filter, this generates an error message because string filters are not allowed on number series. The message is correctly returned most of the time, but sometimes can cause a crash of SiriDB.
When a shard has bytes left at the end and the number of bytes are less than one header size, we do not mark the shard as corrupt. The end result is that we start writing to the shard and only if the shard is optimized before actually turning SiriDB off, the new points are lost since we cannot read the shard after the invalid bytes. (optimizing solves this because only valid data is written to the optimized shard).
On the google cloud platform the case showed up where a request send to another SiriDB server was received by its own process. We now accept this request because the UUID is valid. We should however reject this request.
SiriDB will try to load databases from all directories inside the configured database path.
The first thing SiriDB does is create a lock file, and then finds its not a valid DB path.
Since inserts are done asynchronous, it can happen that data points are missing when creating a new replica. We decide to send series to the replica and then start the asynchronous task.
One possible solution is to check if inserts are busy and only send the next series when no insert tasks are running. Another option is to decide to write data for the replica on each insert iteration.
Creating an imap slist is not thread safe and should have an appropriate lock when the list can be accessed by multiple treads. The main thread missed a lock which could cause the database to crash.
When connecting to another server we do not correctly resolve DNS and for some reason localhost is used as fallback.
The correct behavior should be to first test for an IPv4 address, next an IPv6 address, then try DNS and if all fail we should simple write a log and free the socket. (on the next heart-beat, SiriDB will then try to connect again)
A series object reference counter is saved in a uint16_t, which is enough for a series object. Shards on the other hand can have more references and are stored in an uint32_t. We need to map a general function to increment and decrement the references on these objects. On little endian systems this works bun on big endian a problem might occur since bytes are stored in the reverse order.
We should replace the default value 'localhost' in the configuration file with a variable, for example %HOSTNAME which will be replaced with the systems host name. Localhost is never a good choice since this is the address we send to 'other' SiriDB servers which will then try to connect to this address.
With issue #14 we have set dns request to be compatible with bot IPv4 and IPv6. Since we now have a property ip_support which can be set to IPV4ONLY, IPV6ONLY or ALL, we should honor this setting in DNS requests.
A double free on a socket can occur when receiving data on a socket which is closed already be the on_data function. We should prevent the on_data to close a socket twice.
When using a select query, some work must be done by uv_queue_work and therefore needs space in the thread pool. The default UV_THREADPOOL_SIZE is set to 4 but each database requires at least one thread.
SiriDB writes a log entry when an invalid package (or too large package) is received. This line does currently not include the source IP which makes it hard to debug such packages.
Include the source IP so we can see who had send the illegal package.
We can list, query or count shards and its size property. SiriDB currently uses a function to read the current shard size. This is rather slow since we read the file size from the operation system, or in case the file is open, we return the file size using an fseeko function call. This last function call is not thread safe and can conflict with the optimize thread.
We better keep a size property which is lightweight so we can return the property a lot faster and thread safe.
Some objects like series, shards, promises (maybe servers and groups?) are incremented and decrements a lot and can be written as macros. We still need a normal function call available since we sometimes need to parse the decref function as callback.
When we drop a series, we can reclaim the allocated buffer space. This is faster compared to extending the buffer and saves space on a running SiriDB instance.
Instead of solving this bug we better change the configuration option from listen_client to listen_client_port and just accept a port number.
What we additionally can do is change listen_server to server_name and make
clear in the description that we listen on any address (0.0.0.0). We still need the
address because this is the address and port which other servers are using to connect to.
When using median, median_high or median_low on increasing data set, lets say for example having the values 1,2,3,4,5 etc, the current algorithm hit its worst case scenario.
Since these series are quite common (for example counter series) we should improve the algorithm for these types of series.