Git Product home page Git Product logo

questdb / questdb Goto Github PK

View Code? Open in Web Editor NEW
13.5K 130.0 983.0 490.83 MB

An open source time-series database for fast ingest and SQL queries

Home Page: https://questdb.io

License: Apache License 2.0

Java 91.34% HTML 0.01% CMake 0.08% C 1.91% JavaScript 0.01% Shell 0.04% C++ 5.73% Rust 0.01% Assembly 0.87% Dockerfile 0.01% Makefile 0.01%
time-series low-latency database sql postgres grafana simd questdb tsdb java

questdb's Issues

Is NFSdb appropriate for JSON metrics data and time series?

Sorry if this is not the appropriate place to discuss.

I'm working on a small, reactive metrics collector server which will collect (JSON-formed) data objects and object arrays, transform/reduce them in memory (i.e. calculate average distribution, projections, ...) and offer them through a reactive interface. On server restart I have a playback approach in mind.

I wonder if NFSdb would be a good fit for this purpose. I like the simplicity of an embedded, file-based POJO store in NFSdb. In particular i did not find any good answers on the following questions in the documentation:

  1. Is NFSdb well suited for JSON-formed data?
  2. How would I limit / delete old data i.e. similar to a rolling logfile appender. I saw the partitioning feature.
  3. What are the features for "time series queries" and "temporal data" of NFSdb mentioned in the wiki? The documentation left me wondering about the queries capabilities in general.

Export tool

Create utility to export nfsdb to a delimited format, consumable by third-party import tools.

This is more formal to-do item following conversation under issue #11

Problems in Windows 10

Hi, i would like to see the possible use of this library in our programs.
I've got a problem as I run the examples provided.

Exception in thread "main" java.lang.UnsatisfiedLinkError: C:\Users\Usuario\AppData\Local\Temp\libquestdb3846739714947417735.dll: %1 no es una aplicación Win32 válida

Both the Os and the JDK are 64bits. I reckon the origin of the error is that. Could you provide us a 64 bit version of the dll.

Thanks in advance
Juan

$long index

Hi Vlad,

Once again, thanks for your work on NFSdb - I am looking forward to the next release.

Quick question: Are you planning to add support for 64bit integer index? Current version (2.1.0) supports only $int and $str. I currently use $int but I am afraid that 32bit integer range won't be enough for me in the near future. I was thinking of using $str instead. I've tested performance of $int vs $str: (using an object with int/string key + 80bytes of raw data). These are results of my testing:

Int key:
5,000,000 items added in 315ms
Lookup time is 37ms

String key:
5,000,000 items added in 616ms
Lookup time is 38ms

The lookup time is almost the same, but append time is approximately two times slower - that is why I decided not to use $str.

What do you think?

Thanks,
Jaromir

Asynchronous append()

I want ability to

  • append data to journal asynchronously
  • batch data automatically based on timeout/batch size
  • receive notification when batch is completed (and processed by remote copy)

ArrayIndexOutOfBoundsException when trying to add 2^14 Strings with index

Hello,

I've encountered the following error on version 3.0.0-20150216.031900-5: ArrayIndexOutOfBoundsException: -64, when trying to add 2^14 (16384) elements with indexed String column. Without index everything works fine, also $int with index does work normally - unfortunately only $str/$sym seem to be queryable by key/value.

Test that reproduces the error: https://gist.github.com/user16558789/d1b0781f1d5ca21bd637

On a side note: is there a way in 3.0.0 to query by int values (like "select * from table where id = 123;")? The code in https://github.com/NFSdb/nfsdb/releases/tag/2.0.1 does not work anymore, unfortunately.

Thank you!

Commit durability

Hi,

This is not really an issue per se, but I was wondering what the implications of using commit() vs commitDurable() are. I've noticed that the number of writes per second drops to a very low figure when using the durable mode. Is this because the database doesn't really flush to disk synchronously when using simply commit()? In case of power failure or similar issues, would that cause any data to be lost?

Better query error text needed

When I query

select f0, f1, f2 from csv
where f1 = RRS.L

the error is 'Invalid column'
It would be more helpful if it was 'Invalid column RRS.L'

Problems loading 7M real data

Hi Vlad, as i told you I'm just in the middle of testing questdb. I want to test it with real data, I mean data we use in our current business.

In our case we've got a Oracle Database with 7/8 millions of bills. I've got a java program that extracts the data from the Oracle DB and writes the information in two different desatinations; a csv file and a questDB storage.

I'm using a modified version program that i'm beeen using to test other different libraries. The program uses sql2o to gather data from the relational DB and SimpleFlatMapper to generate the CSV file with the data. There is not any problem to generate the CSV file with the 8M of records.

But the problem comes up as I try to store the data in questDB. I get the error below. I've repeated the execution several times and the error appears,more or less, with the same number of bills loaded.

In attach, you'll find the java programs i've written.

f.zip
model.zip

un 27, 2016 8:24:07 PM jac.f readAndFlushAllFactura
INFORMACIÓN: .. y en el CSV
jun 27, 2016 8:24:07 PM jac.f readAndFlushAllFactura
INFORMACIÓN: .. y en questDB
jun 27, 2016 8:24:19 PM jac.f readAndFlushAllFactura
INFORMACIÓN: Recuperados 850000 registros de Facturas
jun 27, 2016 8:24:20 PM jac.f readAndFlushAllFactura
INFORMACIÓN: .. y en el CSV

A fatal error has been detected by the Java Runtime Environment:

EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x0000000002c99667, pid=12972, tid=15236

JRE version: Java(TM) SE Runtime Environment (8.0_91-b15) (build 1.8.0_91-b15)

Java VM: Java HotSpot(TM) 64-Bit Server VM (25.91-b15 mixed mode windows-amd64 compressed oops)

Problematic frame:

J 1965 C2 com.questdb.Partition.append(Ljava/lang/Object;)V (402 bytes) @ 0x0000000002c99667 [0x0000000002c99460+0x207]

Failed to write core dump. Minidumps are not enabled by default on client versions of Windows

An error report file with more information is saved as:

D:\JAC\Dropbox\des\Viesgo\questdb\hs_err_pid12972.log

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

Process finished with exit code 1

Unique key support

Hi Vlad,

I've recently came across your NFSDB. It looks impressive. I've been playing with it quite a lot and found one potential issue. There is probably a problem with indexing when a column contains too many different values. For such scenario I've tried to use a standard index with the hitcount set to a high number and it made the db very very slow. It seems the db gets slower exponentially with growing number of different items in the indexed column. To reproduce the issue, just set up the index like this and insert 1M items with different unique ids:

$sym("uniqueId").index().size(15).valueCountHint(1000_000)

It might be a good idea to have something like UNIQUE KEY index/constraint which creates a special index using a hashtable. This would allow user to find exactly one item based on a unique key.

Or am I missing something? Please let me know your thoughts.

Thanks a lot,
Jaromir

Messaging Patterns

Is it possible to have messaging pattern abstraction implementation on top of NFSdb?

Lazy / Cached Spill Over to Disk

When reliability can be traded for speed it might be good to have an option to have a delayed spill over to disk and use in memory caching to achieve more speed.

For the example given, I about 15m to 30m objects per second is achievable for the given example, if not more.

Monitoring

Monitor resource usage, mainly memory, open files and CPU usage. The latter metric of interest is the number of simultaniously engaged threads. This could help with CPU capacity planning.

Query audit

Audit of queries. This should include query text, execution time, possibly plan at the time (plan generation needs a review).

Queries can also be inlined to make log easier to parse or possibly terminated with special character?

Query language

Query language to support:

  • field selection
  • calculated values
  • aggregation
  • ordering
  • time range filtering
  • predicate filtering
  • joins on composite keys (inner/outer)
  • time window joins

Order by asc/desc

Ordering must allow ascending and descending order on each column individually.

Ignore unsupported field types

As a developer I want to store instances of objects with fields not supported by NFSdb (such as List, Map etc) without NFSdb moaning about them.

It would be useful to store application domain model without needing to copy it to another model.

Benchmarks

Is it possible to have benchmarks against different on disk in memory products. E.g. chronicles, MapDB, H2, etc.

Investigate memory leak

Process shows very high memory usage when executing sorting and grouping queries. These have to be urgently investigated.

Generic data access

Ability to read and write data without having access without mapping on java class.

Advise on building a fail-over solution on top of NFSdb (with two+ nodes)

Hello:
I've been testing the replication options available in NFSdb.

  • I like the benchmark numbers I see on NFSdb
  • I like the fact that I can setup a client to replicate the data onto one or many nodes

In order to use NFSdb in a production environment, fail-over and data-redundancy becomes an question.

Is there any plans to implement a fail-over implementation or sample code/pseudo-code that can be shared that will help me in building a fail-over solution on top of NFSdb?

Scenario:

  • {Node1} has a server running NFSdb
  • {Node2} is running a client replicating the database
  • Node1 goes down, I need Node2 to switch from a client mode to a server mode.
  • More data is appended into Node2 and at some later point of time Node1 is started
  • Node1 needs to be aware that there is already a master running on Node2 and join it as a client (instead of a server)

Is this possible using the (multicast) foundation in NFSdb.

Any advise on how this can be achieved?

Thank you and looking forward to your feedback.

Venkatt

Splitting File

Also way to automatically split file based on:

  • datetime
  • item count
  • other mechanism

May be sometimes helpful.

SSL support

both JournalServer and JournalClient need to support:

SSL encryption
SSL authentication

Implement SSL over ByteChannel

Multicast does not work on Mac OS Yosemite

com.nfsdb.journal.exceptions.JournalNetworkException: Cannot open multicast socket
    at com.nfsdb.journal.net.config.NetworkConfig.openMulticastSocket(NetworkConfig.java:160)
    at com.nfsdb.journal.net.JournalServerAddressMulticast.start(JournalServerAddressMulticast.java:77)
    at com.nfsdb.journal.net.JournalServer.start(JournalServer.java:122)
    at com.nfsdb.journal.net.AuthorizationTest.beginSync(AuthorizationTest.java:187)
    at com.nfsdb.journal.net.AuthorizationTest.testClientAndServerSuccessfulAuth(AuthorizationTest.java:66)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at com.nfsdb.journal.test.tools.JournalTestFactory$1.evaluate(JournalTestFactory.java:51)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: java.net.SocketException: Can't assign requested address
    at java.net.PlainDatagramSocketImpl.join(Native Method)
    at java.net.AbstractPlainDatagramSocketImpl.join(AbstractPlainDatagramSocketImpl.java:179)
    at java.net.MulticastSocket.joinGroup(MulticastSocket.java:319)
    at com.nfsdb.journal.net.config.NetworkConfig.openMulticastSocket(NetworkConfig.java:153)
    ... 32 more

Network example produces different results on Linux vs Windows

I ran the SimpleReplication server/client example as outlined in this URL:
http://github.nfsdb.org/

The key difference between the example shown on the website and mine is that I declare a serverConfig or clientConfig with a setIdName("eth0"):

[server]
JournalServer server = new JournalServer(new ServerConfig() {{
setIfName("eth0");
}}, factory);

[client]
final JournalClient client = new JournalClient(new ClientConfig() {{
setIfName("eth0");
}},factory);

Expected Behavior:

  1. As soon as I declare a JournalReader on the client, the db file must be immediately synced.
  2. As the server publishes new data, it should be synchronized on the client side.

On Windows (as expected), as soon as I run the SimpleReplicationClientMain, I can see the sync messages as shown below:

C:\VG\dev\NFSDBTest\src>java SimpleReplicationClientMain C:\VG\dev\NFSDBTest\data
Jan 05, 2015 4:51:42 PM com.nfsdb.journal.net.mcast.AbstractOnDemandPoller
INFO: Polling on name:eth0 (Intel(R) 82579LM Gigabit Network Connection)
Jan 05, 2015 4:51:42 PM com.nfsdb.journal.net.config.ClientConfig
INFO: Connected to /192.168.1.12:7075
Client started
took: 1978, count=1000000
Jan 05, 2015 4:51:42 PM com.nfsdb.journal.net.StatsCollectingReadableByteChannel

INFO: received 39000097 bytes @ 153.059222 MB/s from: /192.168.1.12:7075 [1939 c
alls]
took: 406, count=1000000

Whereas running the same on a Linux machine, you can see that the client connected fine but no sync messages for a long time. And then about 15-min later, you start seeing the sync messages but the count remains zero. The debug messages keeps repeating for a minute and then pauses for another minute and this keeps repeating in a cycle.

DEBUG on {LINUX} SERVER:
INFO: sent 71 bytes @ Infinity MB/s to: /192.168.1.19:43112 [4 calls]
Jan 05, 2015 7:55:51 PM com.nfsdb.journal.net.StatsCollectingWritableByteChannel

DEBUG on {LINUX} CLIENT:
took: 1420505748619, count=0
Jan 05, 2015 7:55:48 PM com.nfsdb.journal.net.StatsCollectingReadableByteChannel
INFO: received 57 bytes @ 0.001394 MB/s from: /192.168.1.19:7075 [3 calls]

*** Version Details ***
nfsdb-core-2.1.1-SNAPSHOT.jar
disruptor-3.3.0.jar

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.1 LTS
Release: 14.04
Codename: trusty

$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

*** Noticed the same behavior on Ubuntu running OpenJDK ***

Please advise. I can try this on a CentOS system as well in the next few days and publish my results.

Inherited fields support for POJOs

Hi guys, first of all - thanks for your hard work!

It would be great if you can support inherited class fields for POJOs.
Here is my case - in addition to raw trades data I want to store pre-calculated 1/3/5/60 mins quotes.
Currently, JournalMetadataBuilder.parseClass uses getDeclared fields method to find class fields, so there is no way I can write 1 QuoteInterval class with all required fields/methods and inherit all 1,3,5,60 minutes ones from it, I'll have to maintain an exact copy of the same logic 4 times.

I think such support would be beneficial. Though, in my particular case, if I would be able to partition data by interval field and query it somehow by key + interval value that would make my life easier. But as far as I understand at this point I can only fetch a slice of data based on from-to interval, but there is no way to fetch data by a partition based on other custom criteria (like interval value in my case, which would be int representing # of minutes).

As for my proposed inherited fields request here are the changes I made in my local NFSDB copy for now.
Sorry, I haven't used patch util for a while, so posting my changes as a text here. They're super easy anyway.

private void parseClass() throws JournalConfigurationException {

#1

    //Field[] classFields = modelClass.getDeclaredFields();
    List<Field> classFields = getAllFields(modelClass);

#2

    //this.nameToIndexMap = new TObjectIntHashMap<>(classFields.length, Constants.DEFAULT_LOAD_FACTOR, -1);
    this.nameToIndexMap = new TObjectIntHashMap<>(classFields.size(), Constants.DEFAULT_LOAD_FACTOR, -1);

#3 - recursive fields list calculation

public static List<Field> getAllFields(Class<?> type)
{
    return getAllFields(new LinkedList<Field>(), type);
}

private static List<Field> getAllFields(List<Field> fields, Class<?> type)
{
    Collections.addAll(fields, type.getDeclaredFields());

    if (type.getSuperclass() != null)
    {
        fields = getAllFields(fields, type.getSuperclass());
    }

    return fields;
}

Event Appender

An appender to work in Complex Event Processing environment and implement send-and-forget pattern.

As a developer I want:

  • Log events with constant and low thread handover time
  • Zero-GC logging
  • Back-pressure sensitive logging. e.g. I would prefer to lose an event rather that slow down processing.

Replay mode

Sometimes you want to replay events or emit back events with the exact time spacing or a multiple of it. This would be a good option to add.

Abstract configuration builder

As a developer I want to be able to configure Journal factory from sources other that db.xml.

Other implementations could be database-based, code based or may be even some Sprint based configuration.

Single write optimization

Hi.
I have HA requirements similar to #29. To achieve HA I'm going to write persistent log into nfsdb and replicate it. I will write 99% of time and read 1%. So my main criterion is write performance. I should not lose any data during active node failure so I think the best option for me to commit(persist+replicate) data after every append.

I took SimplestAppend.java as a foundation.

Results:

Persisted 1000000 objects in 196ms.

I changed this tests slightly (commit after every append):

public class SimplestAppend {
    public static void main(String[] args) throws JournalException {
        String basePath = "D:\\tmp";
        try (JournalFactory factory = new JournalFactory(ModelConfiguration.CONFIG.build(basePath))) {
            // delete existing price journal
            Files.delete(new File(factory.getConfiguration().getJournalBase(), Price.class.getName()));
            final int count = 1000000;

            try (JournalWriter<Price> writer = factory.writer(Price.class)) {
                long tZero = System.nanoTime();
                Price p = new Price();

                for (int i = 0; i < count; i++) {
                    p.setTimestamp(tZero + i);
                    writer.append(p);
                    writer.commit();
                }

                long end = System.nanoTime();
                System.out.println("Persisted " + count + " objects in " +
                        TimeUnit.NANOSECONDS.toMillis(end - tZero) + "ms.");
            }
        }
    }
}

Results:

0.640: [GC (Allocation Failure) [PSYoungGen: 33280K->864K(38400K)] 33280K->872K(125952K), 0.0013587 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
0.868: [GC (Allocation Failure) [PSYoungGen: 34144K->976K(38400K)] 34152K->984K(125952K), 0.0031355 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
1.118: [GC (Allocation Failure) [PSYoungGen: 34256K->1152K(38400K)] 34264K->1160K(125952K), 0.0586353 secs] [Times: user=0.03 sys=0.00, real=0.06 secs] 
1.437: [GC (Allocation Failure) [PSYoungGen: 34432K->1408K(71680K)] 34440K->1416K(159232K), 0.0045581 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 
Persisted 1000000 objects in 1294ms.

Is there any way to reduce garbage and improve performance for single append/commit approach?

Thank you.

Request cancellation

Although network related operations are sensitive to client sending or receiving data, there are some that require explicit termination to free up cpu. These typically are hashing and sorting operations that can run to completion without user input.

These can check volatile flag and keep going until this flag is set or cleared.

Zero GC formatter and parser for primitives

Create zero-GC toString formatter and String parsers for primitives:

int
long
double
float
date - although it isn't a primitive formatter and parser are still needed.

formatter and parser are to be used in import/export tools.

Generic metadata

Have to be able to create Journal instance without having Class. This needs JournalMetadat that hasn't been derived from Class but rather assembled programmatically.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.