questdb / questdb Goto Github PK
View Code? Open in Web Editor NEWAn open source time-series database for fast ingest and SQL queries
Home Page: https://questdb.io
License: Apache License 2.0
An open source time-series database for fast ingest and SQL queries
Home Page: https://questdb.io
License: Apache License 2.0
One of main applications of stored TS data would be visualise it. Perhaps adding this functionality out of the box would be an advantage.
May be through: http://grafana.org/
Fields of type byte[] are silently ignored. Why? I want to do my own serialization. See ColumnType
Sorry if this is not the appropriate place to discuss.
I'm working on a small, reactive metrics collector server which will collect (JSON-formed) data objects and object arrays, transform/reduce them in memory (i.e. calculate average distribution, projections, ...) and offer them through a reactive interface. On server restart I have a playback approach in mind.
I wonder if NFSdb would be a good fit for this purpose. I like the simplicity of an embedded, file-based POJO store in NFSdb. In particular i did not find any good answers on the following questions in the documentation:
it would be useful to priovide rowids to onCommit() to the listener knows how much data to process.
Create utility to export nfsdb to a delimited format, consumable by third-party import tools.
This is more formal to-do item following conversation under issue #11
Hi, i would like to see the possible use of this library in our programs.
I've got a problem as I run the examples provided.
Exception in thread "main" java.lang.UnsatisfiedLinkError: C:\Users\Usuario\AppData\Local\Temp\libquestdb3846739714947417735.dll: %1 no es una aplicación Win32 válida
Both the Os and the JDK are 64bits. I reckon the origin of the error is that. Could you provide us a 64 bit version of the dll.
Thanks in advance
Juan
Hi Vlad,
Once again, thanks for your work on NFSdb - I am looking forward to the next release.
Quick question: Are you planning to add support for 64bit integer index? Current version (2.1.0) supports only $int and $str. I currently use $int but I am afraid that 32bit integer range won't be enough for me in the near future. I was thinking of using $str instead. I've tested performance of $int vs $str: (using an object with int/string key + 80bytes of raw data). These are results of my testing:
Int key:
5,000,000 items added in 315ms
Lookup time is 37ms
String key:
5,000,000 items added in 616ms
Lookup time is 38ms
The lookup time is almost the same, but append time is approximately two times slower - that is why I decided not to use $str.
What do you think?
Thanks,
Jaromir
I want ability to
Hello,
I've encountered the following error on version 3.0.0-20150216.031900-5: ArrayIndexOutOfBoundsException: -64, when trying to add 2^14 (16384) elements with indexed String column. Without index everything works fine, also $int with index does work normally - unfortunately only $str/$sym seem to be queryable by key/value.
Test that reproduces the error: https://gist.github.com/user16558789/d1b0781f1d5ca21bd637
On a side note: is there a way in 3.0.0 to query by int values (like "select * from table where id = 123;")? The code in https://github.com/NFSdb/nfsdb/releases/tag/2.0.1 does not work anymore, unfortunately.
Thank you!
There is implementation limit of 1560 columns, which would cause assertion exception when breached. This needs to be gracefully reported when parsing sql.
Hi,
This is not really an issue per se, but I was wondering what the implications of using commit()
vs commitDurable()
are. I've noticed that the number of writes per second drops to a very low figure when using the durable mode. Is this because the database doesn't really flush to disk synchronously when using simply commit()
? In case of power failure or similar issues, would that cause any data to be lost?
When I query
select f0, f1, f2 from csv
where f1 = RRS.L
the error is 'Invalid column'
It would be more helpful if it was 'Invalid column RRS.L'
Hi Vlad, as i told you I'm just in the middle of testing questdb. I want to test it with real data, I mean data we use in our current business.
In our case we've got a Oracle Database with 7/8 millions of bills. I've got a java program that extracts the data from the Oracle DB and writes the information in two different desatinations; a csv file and a questDB storage.
I'm using a modified version program that i'm beeen using to test other different libraries. The program uses sql2o to gather data from the relational DB and SimpleFlatMapper to generate the CSV file with the data. There is not any problem to generate the CSV file with the 8M of records.
But the problem comes up as I try to store the data in questDB. I get the error below. I've repeated the execution several times and the error appears,more or less, with the same number of bills loaded.
In attach, you'll find the java programs i've written.
un 27, 2016 8:24:07 PM jac.f readAndFlushAllFactura
INFORMACIÓN: .. y en el CSV
jun 27, 2016 8:24:07 PM jac.f readAndFlushAllFactura
INFORMACIÓN: .. y en questDB
jun 27, 2016 8:24:19 PM jac.f readAndFlushAllFactura
INFORMACIÓN: Recuperados 850000 registros de Facturas
jun 27, 2016 8:24:20 PM jac.f readAndFlushAllFactura
INFORMACIÓN: .. y en el CSV
Process finished with exit code 1
Hi Vlad,
I've recently came across your NFSDB. It looks impressive. I've been playing with it quite a lot and found one potential issue. There is probably a problem with indexing when a column contains too many different values. For such scenario I've tried to use a standard index with the hitcount set to a high number and it made the db very very slow. It seems the db gets slower exponentially with growing number of different items in the indexed column. To reproduce the issue, just set up the index like this and insert 1M items with different unique ids:
$sym("uniqueId").index().size(15).valueCountHint(1000_000)
It might be a good idea to have something like UNIQUE KEY index/constraint which creates a special index using a hashtable. This would allow user to find exactly one item based on a unique key.
Or am I missing something? Please let me know your thoughts.
Thanks a lot,
Jaromir
Is it possible to have messaging pattern abstraction implementation on top of NFSdb?
writer should allow graceful double-close, such as:
try (JournalWriter<Quote> w = factory.writer(Quote.class)) {
TestUtils.generateQuoteData(w, 100);
w.close();
}
When reliability can be traded for speed it might be good to have an option to have a delayed spill over to disk and use in memory caching to achieve more speed.
For the example given, I about 15m to 30m objects per second is achievable for the given example, if not more.
Monitor resource usage, mainly memory, open files and CPU usage. The latter metric of interest is the number of simultaniously engaged threads. This could help with CPU capacity planning.
Some mechanism to persist to a normal DB may be useful.
Indexes are empty after using bulkWriter(). This is due to partitions being aggressively closed before transaction is committed.
Audit of queries. This should include query text, execution time, possibly plan at the time (plan generation needs a review).
Queries can also be inlined to make log easier to parse or possibly terminated with special character?
Queue size must be larger than max network connections to ensure threads can never get stuck.
Query language to support:
Ordering must allow ascending and descending order on each column individually.
as title
As a developer I want to store instances of objects with fields not supported by NFSdb (such as List, Map etc) without NFSdb moaning about them.
It would be useful to store application domain model without needing to copy it to another model.
Is it possible to have benchmarks against different on disk in memory products. E.g. chronicles, MapDB, H2, etc.
currect order by implementation creates a copy of incoming record source, which is not always necessary. Instead it should hold on to rowid in the datasource to reduce storage time.
Process shows very high memory usage when executing sorting and grouping queries. These have to be urgently investigated.
Ability to read and write data without having access without mapping on java class.
Hello:
I've been testing the replication options available in NFSdb.
In order to use NFSdb in a production environment, fail-over and data-redundancy becomes an question.
Is there any plans to implement a fail-over implementation or sample code/pseudo-code that can be shared that will help me in building a fail-over solution on top of NFSdb?
Scenario:
Is this possible using the (multicast) foundation in NFSdb.
Any advise on how this can be achieved?
Thank you and looking forward to your feedback.
Venkatt
Queries such as "select sum(x), sum(x)/sum(y) from tab" can calculate sum(x) only once and reuse.
Also way to automatically split file based on:
May be sometimes helpful.
both JournalServer and JournalClient need to support:
SSL encryption
SSL authentication
Implement SSL over ByteChannel
Having the data format published as a speck would mean other language implementers can also become compatible with NFSDB.
Create documentation and examples
Is it possible to implement the above speck so this can work with other technologies in the echo system.
com.nfsdb.journal.exceptions.JournalNetworkException: Cannot open multicast socket
at com.nfsdb.journal.net.config.NetworkConfig.openMulticastSocket(NetworkConfig.java:160)
at com.nfsdb.journal.net.JournalServerAddressMulticast.start(JournalServerAddressMulticast.java:77)
at com.nfsdb.journal.net.JournalServer.start(JournalServer.java:122)
at com.nfsdb.journal.net.AuthorizationTest.beginSync(AuthorizationTest.java:187)
at com.nfsdb.journal.net.AuthorizationTest.testClientAndServerSuccessfulAuth(AuthorizationTest.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at com.nfsdb.journal.test.tools.JournalTestFactory$1.evaluate(JournalTestFactory.java:51)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: java.net.SocketException: Can't assign requested address
at java.net.PlainDatagramSocketImpl.join(Native Method)
at java.net.AbstractPlainDatagramSocketImpl.join(AbstractPlainDatagramSocketImpl.java:179)
at java.net.MulticastSocket.joinGroup(MulticastSocket.java:319)
at com.nfsdb.journal.net.config.NetworkConfig.openMulticastSocket(NetworkConfig.java:153)
... 32 more
I ran the SimpleReplication server/client example as outlined in this URL:
http://github.nfsdb.org/
The key difference between the example shown on the website and mine is that I declare a serverConfig or clientConfig with a setIdName("eth0"):
[server]
JournalServer server = new JournalServer(new ServerConfig() {{
setIfName("eth0");
}}, factory);
[client]
final JournalClient client = new JournalClient(new ClientConfig() {{
setIfName("eth0");
}},factory);
Expected Behavior:
On Windows (as expected), as soon as I run the SimpleReplicationClientMain, I can see the sync messages as shown below:
C:\VG\dev\NFSDBTest\src>java SimpleReplicationClientMain C:\VG\dev\NFSDBTest\data
Jan 05, 2015 4:51:42 PM com.nfsdb.journal.net.mcast.AbstractOnDemandPoller
INFO: Polling on name:eth0 (Intel(R) 82579LM Gigabit Network Connection)
Jan 05, 2015 4:51:42 PM com.nfsdb.journal.net.config.ClientConfig
INFO: Connected to /192.168.1.12:7075
Client started
took: 1978, count=1000000
Jan 05, 2015 4:51:42 PM com.nfsdb.journal.net.StatsCollectingReadableByteChannel
INFO: received 39000097 bytes @ 153.059222 MB/s from: /192.168.1.12:7075 [1939 c
alls]
took: 406, count=1000000
Whereas running the same on a Linux machine, you can see that the client connected fine but no sync messages for a long time. And then about 15-min later, you start seeing the sync messages but the count remains zero. The debug messages keeps repeating for a minute and then pauses for another minute and this keeps repeating in a cycle.
DEBUG on {LINUX} SERVER:
INFO: sent 71 bytes @ Infinity MB/s to: /192.168.1.19:43112 [4 calls]
Jan 05, 2015 7:55:51 PM com.nfsdb.journal.net.StatsCollectingWritableByteChannel
DEBUG on {LINUX} CLIENT:
took: 1420505748619, count=0
Jan 05, 2015 7:55:48 PM com.nfsdb.journal.net.StatsCollectingReadableByteChannel
INFO: received 57 bytes @ 0.001394 MB/s from: /192.168.1.19:7075 [3 calls]
*** Version Details ***
nfsdb-core-2.1.1-SNAPSHOT.jar
disruptor-3.3.0.jar
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.1 LTS
Release: 14.04
Codename: trusty
$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
*** Noticed the same behavior on Ubuntu running OpenJDK ***
Please advise. I can try this on a CentOS system as well in the next few days and publish my results.
For INT and LONG types NaN comes up on top, for DOUBLE on bottom. This should be consistent.
Hi guys, first of all - thanks for your hard work!
It would be great if you can support inherited class fields for POJOs.
Here is my case - in addition to raw trades data I want to store pre-calculated 1/3/5/60 mins quotes.
Currently, JournalMetadataBuilder.parseClass uses getDeclared fields method to find class fields, so there is no way I can write 1 QuoteInterval class with all required fields/methods and inherit all 1,3,5,60 minutes ones from it, I'll have to maintain an exact copy of the same logic 4 times.
I think such support would be beneficial. Though, in my particular case, if I would be able to partition data by interval field and query it somehow by key + interval value that would make my life easier. But as far as I understand at this point I can only fetch a slice of data based on from-to interval, but there is no way to fetch data by a partition based on other custom criteria (like interval value in my case, which would be int representing # of minutes).
As for my proposed inherited fields request here are the changes I made in my local NFSDB copy for now.
Sorry, I haven't used patch util for a while, so posting my changes as a text here. They're super easy anyway.
private void parseClass() throws JournalConfigurationException {
//Field[] classFields = modelClass.getDeclaredFields();
List<Field> classFields = getAllFields(modelClass);
//this.nameToIndexMap = new TObjectIntHashMap<>(classFields.length, Constants.DEFAULT_LOAD_FACTOR, -1);
this.nameToIndexMap = new TObjectIntHashMap<>(classFields.size(), Constants.DEFAULT_LOAD_FACTOR, -1);
#3 - recursive fields list calculation
public static List<Field> getAllFields(Class<?> type)
{
return getAllFields(new LinkedList<Field>(), type);
}
private static List<Field> getAllFields(List<Field> fields, Class<?> type)
{
Collections.addAll(fields, type.getDeclaredFields());
if (type.getSuperclass() != null)
{
fields = getAllFields(fields, type.getSuperclass());
}
return fields;
}
An appender to work in Complex Event Processing environment and implement send-and-forget pattern.
As a developer I want:
Sometimes you want to replay events or emit back events with the exact time spacing or a multiple of it. This would be a good option to add.
as title
As a developer I want to be able to configure Journal factory from sources other that db.xml.
Other implementations could be database-based, code based or may be even some Sprint based configuration.
Hi.
I have HA requirements similar to #29. To achieve HA I'm going to write persistent log into nfsdb and replicate it. I will write 99% of time and read 1%. So my main criterion is write performance. I should not lose any data during active node failure so I think the best option for me to commit(persist+replicate) data after every append.
I took SimplestAppend.java as a foundation.
Results:
Persisted 1000000 objects in 196ms.
I changed this tests slightly (commit after every append):
public class SimplestAppend {
public static void main(String[] args) throws JournalException {
String basePath = "D:\\tmp";
try (JournalFactory factory = new JournalFactory(ModelConfiguration.CONFIG.build(basePath))) {
// delete existing price journal
Files.delete(new File(factory.getConfiguration().getJournalBase(), Price.class.getName()));
final int count = 1000000;
try (JournalWriter<Price> writer = factory.writer(Price.class)) {
long tZero = System.nanoTime();
Price p = new Price();
for (int i = 0; i < count; i++) {
p.setTimestamp(tZero + i);
writer.append(p);
writer.commit();
}
long end = System.nanoTime();
System.out.println("Persisted " + count + " objects in " +
TimeUnit.NANOSECONDS.toMillis(end - tZero) + "ms.");
}
}
}
}
Results:
0.640: [GC (Allocation Failure) [PSYoungGen: 33280K->864K(38400K)] 33280K->872K(125952K), 0.0013587 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
0.868: [GC (Allocation Failure) [PSYoungGen: 34144K->976K(38400K)] 34152K->984K(125952K), 0.0031355 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
1.118: [GC (Allocation Failure) [PSYoungGen: 34256K->1152K(38400K)] 34264K->1160K(125952K), 0.0586353 secs] [Times: user=0.03 sys=0.00, real=0.06 secs]
1.437: [GC (Allocation Failure) [PSYoungGen: 34432K->1408K(71680K)] 34440K->1416K(159232K), 0.0045581 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
Persisted 1000000 objects in 1294ms.
Is there any way to reduce garbage and improve performance for single append/commit approach?
Thank you.
Although network related operations are sensitive to client sending or receiving data, there are some that require explicit termination to free up cpu. These typically are hashing and sorting operations that can run to completion without user input.
These can check volatile flag and keep going until this flag is set or cleared.
This is a question rather than an issue.
How it compares to https://github.com/jankotek/mapdb ?
Create zero-GC toString formatter and String parsers for primitives:
int
long
double
float
date - although it isn't a primitive formatter and parser are still needed.
formatter and parser are to be used in import/export tools.
Have to be able to create Journal instance without having Class. This needs JournalMetadat that hasn't been derived from Class but rather assembled programmatically.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.