questdb / questdb Goto Github PK

An open source time-series database for fast ingest and SQL queries

License: Apache License 2.0

Java 91.34% HTML 0.01% CMake 0.08% C 1.91% JavaScript 0.01% Shell 0.04% C++ 5.73% Rust 0.01% Assembly 0.87% Dockerfile 0.01% Makefile 0.01%

time-series low-latency database sql postgres grafana simd questdb tsdb java

questdb's Issues

Visualisation and / or Grafana Support for Stored Data

One of main applications of stored TS data would be visualise it. Perhaps adding this functionality out of the box would be an advantage.

May be through: http://grafana.org/

Cannot serialize byte[] field

Fields of type byte[] are silently ignored. Why? I want to do my own serialization. See ColumnType

Is NFSdb appropriate for JSON metrics data and time series?

Sorry if this is not the appropriate place to discuss.

I'm working on a small, reactive metrics collector server which will collect (JSON-formed) data objects and object arrays, transform/reduce them in memory (i.e. calculate average distribution, projections, ...) and offer them through a reactive interface. On server restart I have a playback approach in mind.

I wonder if NFSdb would be a good fit for this purpose. I like the simplicity of an embedded, file-based POJO store in NFSdb. In particular i did not find any good answers on the following questions in the documentation:

Is NFSdb well suited for JSON-formed data?
How would I limit / delete old data i.e. similar to a rolling logfile appender. I saw the partitioning feature.
What are the features for "time series queries" and "temporal data" of NFSdb mentioned in the wiki? The documentation left me wondering about the queries capabilities in general.

Parameterize TxListener.onCommit()

it would be useful to priovide rowids to onCommit() to the listener knows how much data to process.

Export tool

Create utility to export nfsdb to a delimited format, consumable by third-party import tools.

This is more formal to-do item following conversation under issue #11

Problems in Windows 10

Hi, i would like to see the possible use of this library in our programs.
I've got a problem as I run the examples provided.

Exception in thread "main" java.lang.UnsatisfiedLinkError: C:\Users\Usuario\AppData\Local\Temp\libquestdb3846739714947417735.dll: %1 no es una aplicación Win32 válida

Both the Os and the JDK are 64bits. I reckon the origin of the error is that. Could you provide us a 64 bit version of the dll.

Thanks in advance
Juan

$long index

Hi Vlad,

Once again, thanks for your work on NFSdb - I am looking forward to the next release.

Quick question: Are you planning to add support for 64bit integer index? Current version (2.1.0) supports only $int and $str. I currently use $int but I am afraid that 32bit integer range won't be enough for me in the near future. I was thinking of using $str instead. I've tested performance of $int vs $str: (using an object with int/string key + 80bytes of raw data). These are results of my testing:

Int key:
5,000,000 items added in 315ms
Lookup time is 37ms

String key:
5,000,000 items added in 616ms
Lookup time is 38ms

The lookup time is almost the same, but append time is approximately two times slower - that is why I decided not to use $str.

What do you think?

Thanks,
Jaromir

Asynchronous append()

I want ability to

append data to journal asynchronously
batch data automatically based on timeout/batch size
receive notification when batch is completed (and processed by remote copy)

ArrayIndexOutOfBoundsException when trying to add 2^14 Strings with index

Hello,

I've encountered the following error on version 3.0.0-20150216.031900-5: ArrayIndexOutOfBoundsException: -64, when trying to add 2^14 (16384) elements with indexed String column. Without index everything works fine, also $int with index does work normally - unfortunately only $str/$sym seem to be queryable by key/value.

Test that reproduces the error: https://gist.github.com/user16558789/d1b0781f1d5ca21bd637

On a side note: is there a way in 3.0.0 to query by int values (like "select * from table where id = 123;")? The code in https://github.com/NFSdb/nfsdb/releases/tag/2.0.1 does not work anymore, unfortunately.

Thank you!

Gracefully limit number of columns in order by clause

There is implementation limit of 1560 columns, which would cause assertion exception when breached. This needs to be gracefully reported when parsing sql.

Commit durability

Hi,

This is not really an issue per se, but I was wondering what the implications of using commit() vs commitDurable() are. I've noticed that the number of writes per second drops to a very low figure when using the durable mode. Is this because the database doesn't really flush to disk synchronously when using simply commit()? In case of power failure or similar issues, would that cause any data to be lost?

Better query error text needed

When I query

select f0, f1, f2 from csv
where f1 = RRS.L

the error is 'Invalid column'
It would be more helpful if it was 'Invalid column RRS.L'

Problems loading 7M real data

Hi Vlad, as i told you I'm just in the middle of testing questdb. I want to test it with real data, I mean data we use in our current business.

In our case we've got a Oracle Database with 7/8 millions of bills. I've got a java program that extracts the data from the Oracle DB and writes the information in two different desatinations; a csv file and a questDB storage.

I'm using a modified version program that i'm beeen using to test other different libraries. The program uses sql2o to gather data from the relational DB and SimpleFlatMapper to generate the CSV file with the data. There is not any problem to generate the CSV file with the 8M of records.

But the problem comes up as I try to store the data in questDB. I get the error below. I've repeated the execution several times and the error appears,more or less, with the same number of bills loaded.

In attach, you'll find the java programs i've written.

f.zip
model.zip

un 27, 2016 8:24:07 PM jac.f readAndFlushAllFactura
INFORMACIÓN: .. y en el CSV
jun 27, 2016 8:24:07 PM jac.f readAndFlushAllFactura
INFORMACIÓN: .. y en questDB
jun 27, 2016 8:24:19 PM jac.f readAndFlushAllFactura
INFORMACIÓN: Recuperados 850000 registros de Facturas
jun 27, 2016 8:24:20 PM jac.f readAndFlushAllFactura
INFORMACIÓN: .. y en el CSV

A fatal error has been detected by the Java Runtime Environment:

EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x0000000002c99667, pid=12972, tid=15236

JRE version: Java(TM) SE Runtime Environment (8.0_91-b15) (build 1.8.0_91-b15)

Java VM: Java HotSpot(TM) 64-Bit Server VM (25.91-b15 mixed mode windows-amd64 compressed oops)

Problematic frame:

J 1965 C2 com.questdb.Partition.append(Ljava/lang/Object;)V (402 bytes) @ 0x0000000002c99667 [0x0000000002c99460+0x207]

Failed to write core dump. Minidumps are not enabled by default on client versions of Windows

An error report file with more information is saved as:

D:\JAC\Dropbox\des\Viesgo\questdb\hs_err_pid12972.log

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

Process finished with exit code 1

Unique key support

Hi Vlad,

I've recently came across your NFSDB. It looks impressive. I've been playing with it quite a lot and found one potential issue. There is probably a problem with indexing when a column contains too many different values. For such scenario I've tried to use a standard index with the hitcount set to a high number and it made the db very very slow. It seems the db gets slower exponentially with growing number of different items in the indexed column. To reproduce the issue, just set up the index like this and insert 1M items with different unique ids:

$sym("uniqueId").index().size(15).valueCountHint(1000_000)

It might be a good idea to have something like UNIQUE KEY index/constraint which creates a special index using a hashtable. This would allow user to find exactly one item based on a unique key.

Or am I missing something? Please let me know your thoughts.

Thanks a lot,
Jaromir

Messaging Patterns

Is it possible to have messaging pattern abstraction implementation on top of NFSdb?

ungraceful failure on double-closing writer

writer should allow graceful double-close, such as:

        try (JournalWriter<Quote> w = factory.writer(Quote.class)) {
            TestUtils.generateQuoteData(w, 100);
            w.close();
        }

Lazy / Cached Spill Over to Disk

When reliability can be traded for speed it might be good to have an option to have a delayed spill over to disk and use in memory caching to achieve more speed.

For the example given, I about 15m to 30m objects per second is achievable for the given example, if not more.

Monitoring

Monitor resource usage, mainly memory, open files and CPU usage. The latter metric of interest is the number of simultaniously engaged threads. This could help with CPU capacity planning.

Persist to Existing DB

Some mechanism to persist to a normal DB may be useful.

bulkWriter() does not save indexes when multiple partitions are updated in single transaction

Indexes are empty after using bulkWriter(). This is due to partitions being aggressively closed before transaction is committed.

Query audit

Audit of queries. This should include query text, execution time, possibly plan at the time (plan generation needs a review).

Queries can also be inlined to make log easier to parse or possibly terminated with special character?

Enforce correlation between configuration options

Queue size must be larger than max network connections to ensure threads can never get stuck.

Query language

Query language to support:

field selection
calculated values
aggregation
ordering
time range filtering
predicate filtering
joins on composite keys (inner/outer)
time window joins

Order by asc/desc

Ordering must allow ascending and descending order on each column individually.

Framework for handling sessions

as title

Ignore unsupported field types

As a developer I want to store instances of objects with fields not supported by NFSdb (such as List, Map etc) without NFSdb moaning about them.

It would be useful to store application domain model without needing to copy it to another model.

Benchmarks

Is it possible to have benchmarks against different on disk in memory products. E.g. chronicles, MapDB, H2, etc.

Ordering has to recognise rowid-capable sources

currect order by implementation creates a copy of incoming record source, which is not always necessary. Instead it should hold on to rowid in the datasource to reduce storage time.

Investigate memory leak

Process shows very high memory usage when executing sorting and grouping queries. These have to be urgently investigated.

Generic data access

Ability to read and write data without having access without mapping on java class.

Advise on building a fail-over solution on top of NFSdb (with two+ nodes)

Hello:
I've been testing the replication options available in NFSdb.

I like the benchmark numbers I see on NFSdb
I like the fact that I can setup a client to replicate the data onto one or many nodes

In order to use NFSdb in a production environment, fail-over and data-redundancy becomes an question.

Is there any plans to implement a fail-over implementation or sample code/pseudo-code that can be shared that will help me in building a fail-over solution on top of NFSdb?

Scenario:

{Node1} has a server running NFSdb
{Node2} is running a client replicating the database
Node1 goes down, I need Node2 to switch from a client mode to a server mode.
More data is appended into Node2 and at some later point of time Node1 is started
Node1 needs to be aware that there is already a master running on Node2 and join it as a client (instead of a server)

Is this possible using the (multicast) foundation in NFSdb.

Any advise on how this can be achieved?

Thank you and looking forward to your feedback.

Venkatt

Re-use aggregation results

Queries such as "select sum(x), sum(x)/sum(y) from tab" can calculate sum(x) only once and reuse.

Splitting File

Also way to automatically split file based on:

datetime
item count
other mechanism

May be sometimes helpful.

SSL support

both JournalServer and JournalClient need to support:

SSL encryption
SSL authentication

Implement SSL over ByteChannel

Publish the Data Format as a Versioned Spec on a Separate Repo

Having the data format published as a speck would mean other language implementers can also become compatible with NFSDB.

Document generic data access

Create documentation and examples

Impliment the http://reactive-streams.org/ Speck

Is it possible to implement the above speck so this can work with other technologies in the echo system.

Multicast does not work on Mac OS Yosemite

com.nfsdb.journal.exceptions.JournalNetworkException: Cannot open multicast socket
    at com.nfsdb.journal.net.config.NetworkConfig.openMulticastSocket(NetworkConfig.java:160)
    at com.nfsdb.journal.net.JournalServerAddressMulticast.start(JournalServerAddressMulticast.java:77)
    at com.nfsdb.journal.net.JournalServer.start(JournalServer.java:122)
    at com.nfsdb.journal.net.AuthorizationTest.beginSync(AuthorizationTest.java:187)
    at com.nfsdb.journal.net.AuthorizationTest.testClientAndServerSuccessfulAuth(AuthorizationTest.java:66)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at com.nfsdb.journal.test.tools.JournalTestFactory$1.evaluate(JournalTestFactory.java:51)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: java.net.SocketException: Can't assign requested address
    at java.net.PlainDatagramSocketImpl.join(Native Method)
    at java.net.AbstractPlainDatagramSocketImpl.join(AbstractPlainDatagramSocketImpl.java:179)
    at java.net.MulticastSocket.joinGroup(MulticastSocket.java:319)
    at com.nfsdb.journal.net.config.NetworkConfig.openMulticastSocket(NetworkConfig.java:153)
    ... 32 more

Network example produces different results on Linux vs Windows

I ran the SimpleReplication server/client example as outlined in this URL:
http://github.nfsdb.org/

The key difference between the example shown on the website and mine is that I declare a serverConfig or clientConfig with a setIdName("eth0"):

[server]
JournalServer server = new JournalServer(new ServerConfig() {{
setIfName("eth0");
}}, factory);

[client]
final JournalClient client = new JournalClient(new ClientConfig() {{
setIfName("eth0");
}},factory);

Expected Behavior:

As soon as I declare a JournalReader on the client, the db file must be immediately synced.
As the server publishes new data, it should be synchronized on the client side.

On Windows (as expected), as soon as I run the SimpleReplicationClientMain, I can see the sync messages as shown below:

C:\VG\dev\NFSDBTest\src>java SimpleReplicationClientMain C:\VG\dev\NFSDBTest\data
Jan 05, 2015 4:51:42 PM com.nfsdb.journal.net.mcast.AbstractOnDemandPoller
INFO: Polling on name:eth0 (Intel(R) 82579LM Gigabit Network Connection)
Jan 05, 2015 4:51:42 PM com.nfsdb.journal.net.config.ClientConfig
INFO: Connected to /192.168.1.12:7075
Client started
took: 1978, count=1000000
Jan 05, 2015 4:51:42 PM com.nfsdb.journal.net.StatsCollectingReadableByteChannel

INFO: received 39000097 bytes @ 153.059222 MB/s from: /192.168.1.12:7075 [1939 c
alls]
took: 406, count=1000000

Whereas running the same on a Linux machine, you can see that the client connected fine but no sync messages for a long time. And then about 15-min later, you start seeing the sync messages but the count remains zero. The debug messages keeps repeating for a minute and then pauses for another minute and this keeps repeating in a cycle.

DEBUG on {LINUX} SERVER:
INFO: sent 71 bytes @ Infinity MB/s to: /192.168.1.19:43112 [4 calls]
Jan 05, 2015 7:55:51 PM com.nfsdb.journal.net.StatsCollectingWritableByteChannel

DEBUG on {LINUX} CLIENT:
took: 1420505748619, count=0
Jan 05, 2015 7:55:48 PM com.nfsdb.journal.net.StatsCollectingReadableByteChannel
INFO: received 57 bytes @ 0.001394 MB/s from: /192.168.1.19:7075 [3 calls]

*** Version Details ***
nfsdb-core-2.1.1-SNAPSHOT.jar
disruptor-3.3.0.jar

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.1 LTS
Release: 14.04
Codename: trusty

$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

*** Noticed the same behavior on Ubuntu running OpenJDK ***

Please advise. I can try this on a CentOS system as well in the next few days and publish my results.

Inconsistent order of null/NaN values across types

For INT and LONG types NaN comes up on top, for DOUBLE on bottom. This should be consistent.

Inherited fields support for POJOs

Hi guys, first of all - thanks for your hard work!

It would be great if you can support inherited class fields for POJOs.
Here is my case - in addition to raw trades data I want to store pre-calculated 1/3/5/60 mins quotes.
Currently, JournalMetadataBuilder.parseClass uses getDeclared fields method to find class fields, so there is no way I can write 1 QuoteInterval class with all required fields/methods and inherit all 1,3,5,60 minutes ones from it, I'll have to maintain an exact copy of the same logic 4 times.

I think such support would be beneficial. Though, in my particular case, if I would be able to partition data by interval field and query it somehow by key + interval value that would make my life easier. But as far as I understand at this point I can only fetch a slice of data based on from-to interval, but there is no way to fetch data by a partition based on other custom criteria (like interval value in my case, which would be int representing # of minutes).

As for my proposed inherited fields request here are the changes I made in my local NFSDB copy for now.
Sorry, I haven't used patch util for a while, so posting my changes as a text here. They're super easy anyway.

private void parseClass() throws JournalConfigurationException {

    //Field[] classFields = modelClass.getDeclaredFields();
    List<Field> classFields = getAllFields(modelClass);

    //this.nameToIndexMap = new TObjectIntHashMap<>(classFields.length, Constants.DEFAULT_LOAD_FACTOR, -1);
    this.nameToIndexMap = new TObjectIntHashMap<>(classFields.size(), Constants.DEFAULT_LOAD_FACTOR, -1);

#3 - recursive fields list calculation

public static List<Field> getAllFields(Class<?> type)
{
    return getAllFields(new LinkedList<Field>(), type);
}

private static List<Field> getAllFields(List<Field> fields, Class<?> type)
{
    Collections.addAll(fields, type.getDeclaredFields());

    if (type.getSuperclass() != null)
    {
        fields = getAllFields(fields, type.getSuperclass());
    }

    return fields;
}

Event Appender

An appender to work in Complex Event Processing environment and implement send-and-forget pattern.

As a developer I want:

Log events with constant and low thread handover time
Zero-GC logging
Back-pressure sensitive logging. e.g. I would prefer to lose an event rather that slow down processing.

Replay mode

Sometimes you want to replay events or emit back events with the exact time spacing or a multiple of it. This would be a good option to add.

support syntax for "a join b on (x)"

as title

Abstract configuration builder

As a developer I want to be able to configure Journal factory from sources other that db.xml.

Other implementations could be database-based, code based or may be even some Sprint based configuration.

Single write optimization

Hi.
I have HA requirements similar to #29. To achieve HA I'm going to write persistent log into nfsdb and replicate it. I will write 99% of time and read 1%. So my main criterion is write performance. I should not lose any data during active node failure so I think the best option for me to commit(persist+replicate) data after every append.

I took SimplestAppend.java as a foundation.

Results:

Persisted 1000000 objects in 196ms.

I changed this tests slightly (commit after every append):

public class SimplestAppend {
    public static void main(String[] args) throws JournalException {
        String basePath = "D:\\tmp";
        try (JournalFactory factory = new JournalFactory(ModelConfiguration.CONFIG.build(basePath))) {
            // delete existing price journal
            Files.delete(new File(factory.getConfiguration().getJournalBase(), Price.class.getName()));
            final int count = 1000000;

            try (JournalWriter<Price> writer = factory.writer(Price.class)) {
                long tZero = System.nanoTime();
                Price p = new Price();

                for (int i = 0; i < count; i++) {
                    p.setTimestamp(tZero + i);
                    writer.append(p);
                    writer.commit();
                }

                long end = System.nanoTime();
                System.out.println("Persisted " + count + " objects in " +
                        TimeUnit.NANOSECONDS.toMillis(end - tZero) + "ms.");
            }
        }
    }
}

Results:

0.640: [GC (Allocation Failure) [PSYoungGen: 33280K->864K(38400K)] 33280K->872K(125952K), 0.0013587 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
0.868: [GC (Allocation Failure) [PSYoungGen: 34144K->976K(38400K)] 34152K->984K(125952K), 0.0031355 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
1.118: [GC (Allocation Failure) [PSYoungGen: 34256K->1152K(38400K)] 34264K->1160K(125952K), 0.0586353 secs] [Times: user=0.03 sys=0.00, real=0.06 secs] 
1.437: [GC (Allocation Failure) [PSYoungGen: 34432K->1408K(71680K)] 34440K->1416K(159232K), 0.0045581 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 
Persisted 1000000 objects in 1294ms.

Is there any way to reduce garbage and improve performance for single append/commit approach?

Thank you.

formatter and parser are to be used in import/export tools.

Generic metadata

Have to be able to create Journal instance without having Class. This needs JournalMetadat that hasn't been derived from Class but rather assembled programmatically.