Git Product home page Git Product logo

tsbs's Introduction

Time Series Benchmark Suite (TSBS)

This repo contains code for benchmarking several time series databases, including TimescaleDB, MongoDB, InfluxDB, CrateDB and Cassandra. This code is based on a fork of work initially made public by InfluxDB at https://github.com/influxdata/influxdb-comparisons.

Current databases supported:

Overview

The Time Series Benchmark Suite (TSBS) is a collection of Go programs that are used to generate datasets and then benchmark read and write performance of various databases. The intent is to make the TSBS extensible so that a variety of use cases (e.g., devops, IoT, finance, etc.), query types, and databases can be included and benchmarked. To this end we hope to help prospective database administrators find the best database for their needs and their workloads. Further, if you are the developer of a time series database and want to include your database in the TSBS, feel free to open a pull request to add it!

Current use cases

Currently, TSBS supports two use cases.

Dev ops

A 'dev ops' use case, which comes in two forms. The full form is used to generate, insert, and measure data from 9 'systems' that could be monitored in a real world dev ops scenario (e.g., CPU, memory, disk, etc). Together, these 9 systems generate 100 metrics per reading interval. The alternate form focuses solely on CPU metrics for a simpler, more streamlined use case. This use case generates 10 CPU metrics per reading.

In addition to metric readings, 'tags' (including the location of the host, its operating system, etc) are generated for each host with readings in the dataset. Each unique set of tags identifies one host in the dataset and the number of different hosts generated is defined by the scale flag (see below).

Internet of Things (IoT)

The second use case is meant to simulate the data load in an IoT environment. This use case simulates data streaming from a set of trucks belonging to a fictional trucking company. This use case simulates diagnostic data and metrics from each truck, and introduces environmental factors such as out-of-order data and batch ingestion (for trucks that are offline for a period of time). It also tracks truck metadata and uses this to tie metrics and diagnostics together as part of the query set.

The queries that are generated as part of this use case will cover both real time truck status and analytics that will look at the time series data in an effort to be more predictive about truck behavior. The scale factor with this use case will be based on the number of trucks tracked.


Not all databases implement all use cases. This table below shows which use cases are implemented for each database:

Database Dev ops IoT
Akumuli
Cassandra X
ClickHouse X
CrateDB X
InfluxDB X X
MongoDB X
QuestDB X X
SiriDB X
TimescaleDB X X
Timestream X
VictoriaMetrics

¹ Does not support the groupby-orderby-limit query ² Does not support the groupby-orderby-limit, lastpoint, high-cpu-1, high-cpu-all queries

What the TSBS tests

TSBS is used to benchmark bulk load performance and query execution performance. (It currently does not measure concurrent insert and query performance, which is a future priority.) To accomplish this in a fair way, the data to be inserted and the queries to run are pre-generated and native Go clients are used wherever possible to connect to each database (e.g., mgo for MongoDB, aws sdk for Timestream).

Although the data is randomly generated, TSBS data and queries are entirely deterministic. By supplying the same PRNG (pseudo-random number generator) seed to the generation programs, each database is loaded with identical data and queried using identical queries.

Installation

TSBS is a collection of Go programs (with some auxiliary bash and Python scripts). The easiest way to get and install the Go programs is to use go get and then make all to install all binaries:

# Fetch TSBS and its dependencies
$ go get github.com/timescale/tsbs
$ cd $GOPATH/src/github.com/timescale/tsbs
$ make

How to use TSBS

Using TSBS for benchmarking involves 3 phases: data and query generation, data loading/insertion, and query execution.

Data and query generation

So that benchmarking results are not affected by generating data or queries on-the-fly, with TSBS you generate the data and queries you want to benchmark first, and then you can (re-)use it as input to the benchmarking phases.

Data generation

Variables needed:

  1. a use case. E.g., iot (choose from cpu-only, devops, or iot)
  2. a PRNG seed for deterministic generation. E.g., 123
  3. the number of devices / trucks to generate for. E.g., 4000
  4. a start time for the data's timestamps. E.g., 2016-01-01T00:00:00Z
  5. an end time. E.g., 2016-01-04T00:00:00Z
  6. how much time should be between each reading per device, in seconds. E.g., 10s
  7. and which database(s) you want to generate for. E.g., timescaledb (choose from cassandra, clickhouse, cratedb, influx, mongo, questdb, siridb, timescaledb or victoriametrics)

Given the above steps you can now generate a dataset (or multiple datasets, if you chose to generate for multiple databases) that can be used to benchmark data loading of the database(s) chosen using the tsbs_generate_data tool:

$ tsbs_generate_data --use-case="iot" --seed=123 --scale=4000 \
    --timestamp-start="2016-01-01T00:00:00Z" \
    --timestamp-end="2016-01-04T00:00:00Z" \
    --log-interval="10s" --format="timescaledb" \
    | gzip > /tmp/timescaledb-data.gz

# Each additional database would be a separate call.

Note: We pipe the output to gzip to reduce on-disk space. This also requires you to pipe through gunzip when you run your tests.

The example above will generate a pseudo-CSV file that can be used to bulk load data into TimescaleDB. Each database has it's own format of how it stores the data to make it easiest for its corresponding loader to write data. The above configuration will generate just over 100M rows (1B metrics), which is usually a good starting point. Increasing the time period by a day will add an additional ~33M rows so that, e.g., 30 days would yield a billion rows (10B metrics)

IoT use case

The main difference between the iot use case and other use cases is that it generates data which can contain out-of-order, missing, or empty entries to better represent real-life scenarios associated to the use case. Using a specified seed means that we can do this in a deterministic and reproducible way for multiple runs of data generation.

Query generation

Variables needed:

  1. the same use case, seed, # of devices, and start time as used in data generation
  2. an end time that is one second after the end time from data generation. E.g., for 2016-01-04T00:00:00Z use 2016-01-04T00:00:01Z
  3. the number of queries to generate. E.g., 1000
  4. and the type of query you'd like to generate. E.g., single-groupby-1-1-1 or last-loc

For the last step there are numerous queries to choose from, which are listed in Appendix I. Additionally, the file scripts/generate_queries.sh contains a list of all of them as the default value for the environmental variable QUERY_TYPES. If you are generating more than one type of query, we recommend you use the helper script.

For generating just one set of queries for a given type:

$ tsbs_generate_queries --use-case="iot" --seed=123 --scale=4000 \
    --timestamp-start="2016-01-01T00:00:00Z" \
    --timestamp-end="2016-01-04T00:00:01Z" \
    --queries=1000 --query-type="breakdown-frequency" --format="timescaledb" \
    | gzip > /tmp/timescaledb-queries-breakdown-frequency.gz

Note: We pipe the output to gzip to reduce on-disk space. This also requires you to pipe through gunzip when you run your tests.

For generating sets of queries for multiple types:

$ FORMATS="timescaledb" SCALE=4000 SEED=123 \
    TS_START="2016-01-01T00:00:00Z" \
    TS_END="2016-01-04T00:00:01Z" \
    QUERIES=1000 QUERY_TYPES="last-loc low-fuel avg-load" \
    BULK_DATA_DIR="/tmp/bulk_queries" scripts/generate_queries.sh

A full list of query types can be found in Appendix I at the end of this README.

Benchmarking insert/write performance

TSBS has two ways to benchmark insert/write performance:

  • On the fly simulation and load with tsbs_load
  • Pre-generate data to a file and load it either with tsbs_load or the db specific executables tsbs_load_*

Using the unified tsbs_load executable

The tsbs_load executable can load data in any of the supported databases. It can use a pregenerated data file as input, or simulate the data on the fly.

You first start by generating a config yaml file populated with the default values for each property with:

$ tsbs_load config --target=<db-name> --data-source=[FILE|SIMULATOR]

for example, to generate an example for TimescaleDB, loading the data from file

$ tsbs_load config --target=timescaledb --data-source=FILE
Wrote example config to: ./config.yaml

You can then run tsbs_load with the generated config file with:

$ tsbs_load load timescaledb --config=./config.yaml

For more details on how to use tsbs_load check out the supplemental docs

Using the database specific tsbs_load_* executables

TSBS measures insert/write performance by taking the data generated in the previous step and using it as input to a database-specific command line program. To the extent that insert programs can be shared, we have made an effort to do that (e.g., the TimescaleDB loader can be used with a regular PostgreSQL database if desired). Each loader does share some common flags -- e.g., batch size (number of readings inserted together), workers (number of concurrently inserting clients), connection details (host & ports), etc -- but they also have database-specific tuning flags. To find the flags for a particular database, use the -help flag (e.g., tsbs_load_timescaledb -help).

Here's an example of loading data to a remote timescaledb instance with SSL required, with a gzipped data set as created in the instructions above:

cat /tmp/timescaledb-data.gz | gunzip | tsbs_load_timescaledb \
--postgres="sslmode=require" --host="my.tsdb.host" --port=5432 --pass="password" \
--user="benchmarkuser" --admin-db-name=defaultdb --workers=8  \
--in-table-partition-tag=true --chunk-time=8h --write-profile= \
--field-index-count=1 --do-create-db=true --force-text-format=false \
--do-abort-on-exist=false

For simpler testing, especially locally, we also supply scripts/load/load_<database>.sh for convenience with many of the flags set to a reasonable default for some of the databases. So for loading into TimescaleDB, ensure that TimescaleDB is running and then use:

# Will insert using 2 clients, batch sizes of 10k, from a file
# named `timescaledb-data.gz` in directory `/tmp`
$ NUM_WORKERS=2 BATCH_SIZE=10000 BULK_DATA_DIR=/tmp \
    scripts/load/load_timescaledb.sh

This will create a new database called benchmark where the data is stored. It will overwrite the database if it exists; if you don't want that to happen, supply a different DATABASE_NAME to the above command.

Example for writing to remote host using load_timescaledb.sh:

# Will insert using 2 clients, batch sizes of 10k, from a file
# named `timescaledb-data.gz` in directory `/tmp`
$ NUM_WORKERS=2 BATCH_SIZE=10000 BULK_DATA_DIR=/tmp DATABASE_HOST=remotehostname
DATABASE_USER=user DATABASE \
    scripts/load/load_timescaledb.sh

By default, statistics about the load performance are printed every 10s, and when the full dataset is loaded the looks like this:

time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
# ...
1518741528,914996.143291,9.652000E+08,1096817.886674,91499.614329,9.652000E+07,109681.788667
1518741548,1345006.018902,9.921000E+08,1102333.152918,134500.601890,9.921000E+07,110233.315292
1518741568,1149999.844750,1.015100E+09,1103369.385320,114999.984475,1.015100E+08,110336.938532

Summary:
loaded 1036800000 metrics in 936.525765sec with 8 workers (mean rate 1107070.449780/sec)
loaded 103680000 rows in 936.525765sec with 8 workers (mean rate 110707.044978/sec)

All but the last two lines contain the data in CSV format, with column names in the header. Those column names correspond to:

  • timestamp,
  • metrics per second in the period,
  • total metrics inserted,
  • overall metrics per second,
  • rows per second in the period,
  • total number of rows,
  • overall rows per second.

For databases, like Cassandra, that do not use rows when inserting, the last three values are always empty (indicated with a -).

The last two lines are a summary of how many metrics (and rows where applicable) were inserted, the wall time it took, and the average rate of insertion.

Benchmarking query execution performance

To measure query execution performance in TSBS, you first need to load the data using the previous section and generate the queries as described earlier. Once the data is loaded and the queries are generated, just use the corresponding tsbs_run_queries_ binary for the database being tested:

$ cat /tmp/queries/timescaledb-cpu-max-all-eight-hosts-queries.gz | \
    gunzip | tsbs_run_queries_timescaledb --workers=8 \
        --postgres="host=localhost user=postgres sslmode=disable"

You can change the value of the --workers flag to control the level of parallel queries run at the same time. The resulting output will look similar to this:

run complete after 1000 queries with 8 workers:
TimescaleDB max cpu all fields, rand    8 hosts, rand 12hr by 1h:
min:    51.97ms, med:   757.55, mean:  2527.98ms, max: 28188.20ms, stddev:  2843.35ms, sum: 5056.0sec, count: 2000
all queries                                                     :
min:    51.97ms, med:   757.55, mean:  2527.98ms, max: 28188.20ms, stddev:  2843.35ms, sum: 5056.0sec, count: 2000
wall clock time: 633.936415sec

The output gives you the description of the query and multiple groupings of measurements (which may vary depending on the database).


For easier testing of multiple queries, we provide scripts/generate_run_script.py which creates a bash script with commands to run multiple query types in a row. The queries it generates should be put in a file with one query per line and the path given to the script. For example, if you had a file named queries.txt that looked like this:

last-loc
avg-load
high-load
long-driving-session

You could generate a run script named query_test.sh:

# Generate run script for TimescaleDB, using queries in `queries.txt`
# with the generated query files in /tmp/queries for 8 workers
$ python generate_run_script.py -d timescaledb -o /tmp/queries \
    -w 8 -f queries.txt > query_test.sh

And the resulting script file would look like:

#!/bin/bash
# Queries
cat /tmp/queries/timescaledb-last-loc-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable"  | tee query_timescaledb_timescaledb-last-loc-queries.out

cat /tmp/queries/timescaledb-avg-load-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable"  | tee query_timescaledb_timescaledb-avg-load-queries.out

cat /tmp/queries/timescaledb-high-load-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable"  | tee query_timescaledb_timescaledb-high-load-queries.out

cat /tmp/queries/timescaledb-long-driving-session-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable"  | tee query_timescaledb_timescaledb-long-driving-session-queries.out

Query validation (optional)

Additionally each tsbs_run_queries_ binary allows you print the actual query results so that you can compare across databases that the results are the same. Using the flag -print-responses will return the results.

Appendix I: Query types

Devops / cpu-only

Query type Description
single-groupby-1-1-1 Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 1 hour
single-groupby-1-1-12 Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 12 hours
single-groupby-1-8-1 Simple aggregrate (MAX) on one metric for 8 hosts, every 5 mins for 1 hour
single-groupby-5-1-1 Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 1 hour
single-groupby-5-1-12 Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 12 hours
single-groupby-5-8-1 Simple aggregrate (MAX) on 5 metrics for 8 hosts, every 5 mins for 1 hour
cpu-max-all-1 Aggregate across all CPU metrics per hour over 1 hour for a single host
cpu-max-all-8 Aggregate across all CPU metrics per hour over 1 hour for eight hosts
double-groupby-1 Aggregate on across both time and host, giving the average of 1 CPU metric per host per hour for 24 hours
double-groupby-5 Aggregate on across both time and host, giving the average of 5 CPU metrics per host per hour for 24 hours
double-groupby-all Aggregate on across both time and host, giving the average of all (10) CPU metrics per host per hour for 24 hours
high-cpu-all All the readings where one metric is above a threshold across all hosts
high-cpu-1 All the readings where one metric is above a threshold for a particular host
lastpoint The last reading for each host
groupby-orderby-limit The last 5 aggregate readings (across time) before a randomly chosen endpoint

IoT

Query type Description
last-loc Fetch real-time (i.e. last) location of each truck
low-fuel Fetch all trucks with low fuel (less than 10%)
high-load Fetch trucks with high current load (over 90% load capacity)
stationary-trucks Fetch all trucks that are stationary (low avg velocity in last 10 mins)
long-driving-sessions Get trucks which haven't rested for at least 20 mins in the last 4 hours
long-daily-sessions Get trucks which drove more than 10 hours in the last 24 hours
avg-vs-projected-fuel-consumption Calculate average vs. projected fuel consumption per fleet
avg-daily-driving-duration Calculate average daily driving duration per driver
avg-daily-driving-session Calculate average daily driving session per driver
avg-load Calculate average load per truck model per fleet
daily-activity Get the number of hours truck has been active (vs. out-of-commission) per day per fleet
breakdown-frequency Calculate breakdown frequency by truck model

Contributing

We welcome contributions from the community to make TSBS better!

You can help either by opening an issue with any suggestions or bug reports, or by forking this repository, making your own contribution, and submitting a pull request.

Before we accept any contributions, Timescale contributors need to sign the Contributor License Agreement (CLA). By signing a CLA, we can ensure that the community is free and confident in its ability to use your contributions.

tsbs's People

Contributors

akulkarni avatar anjabruls avatar ankit-jha avatar antekresic avatar arajkumar avatar cevian avatar dandv avatar erimatnor avatar filipecosta90 avatar foobar avatar hagen1778 avatar kev009 avatar kovrus avatar lazin avatar leehampton avatar lilvinz avatar mfreed avatar olofr avatar patrickspacesurfer avatar pauldix avatar robatticus avatar rw avatar sdressler avatar solomonann avatar spolcyn avatar sunsingerus avatar svenklemm avatar toddlipcon avatar tylerfontaine avatar utley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tsbs's Issues

load to timescale fails with pq: invalid input syntax for type json

I have an AWS instance with Ubuntu 16.04.5 LTS where I've installed go 1.9 and postgres 9.6.10 (from http://apt.postgresql.org/pub/repos/apt/ ) and the current timescale (using https://blog.timescale.com/tutorial-installing-timescaledb-on-aws-c8602b767a98). I then installed tsbs and generated a timescale database with
bin/tsbs_generate_data -use-case="devops" -seed=123 -scale-var=8000 -timestamp-start="2018-08-08T00:00:00Z" -timestamp-end="2018-08-09T00:00:00Z" -log-interval="5s" -format="timescaledb" | gzip > tsbs-devops-8000-1day-5s-timescaledb-data.gz
Attempting to load I used load_timescaledb.sh with these environment variables set
postgres@ip-172-31-18-107:$ export PATH="/home/ubuntu/gosrc/bin:$PATH"
postgres@ip-172-31-18-107:
$ export GOPATH=/home/ubuntu/gosrc
postgres@ip-172-31-18-107:/data$ export DATA_FILE=/data/tsbs-devops-8000-1day-5s-timescaledb-data.gz

postgres@ip-172-31-18-107:/data$ /home/ubuntu/gosrc/src/github.com/timescale/tsbs/scripts/load_timescaledb.sh
Bulk loading file /data/tsbs-devops-8000-1day-5s-timescaledb-data.gz

  • source /home/ubuntu/gosrc/src/github.com/timescale/tsbs/scripts/timescaledb.conf
    ++ JSON_TAGS=false
    ++ DATABASE_NAME=benchmark
    ++ IN_TABLE_PARTITION_TAG=true
    ++ USE_HYPERTABLE=true
  • pg_isready -h localhost
    localhost:5432 - accepting connections
  • cat /data/tsbs-devops-8000-1day-5s-timescaledb-data.gz
  • gunzip
  • tsbs_load_timescaledb --postgres=sslmode=disable --db-name=benchmark --host=localhost --user=postgres --workers=8 --batch-size=10000 --reporting-period=10s --use-hypertable=true --use-jsonb-tags=false --in-table-partition-tag=true --hash-workers=false --time-partition-index=false --partitions=1 --chunk-time=8h --write-profile= --field-index-count=1
    time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
    panic: pq: invalid input syntax for type json

goroutine 51 [running]:
main.(*processor).processCSI(0xc420622000, 0xc42348fe00, 0x5, 0xc423bed000, 0x7d0, 0x900, 0xc4204e3400)
/home/ubuntu/gosrc/src/github.com/timescale/tsbs/cmd/tsbs_load_timescaledb/process.go:186 +0x1888
main.(*processor).ProcessBatch(0xc420622000, 0x7df9c0, 0xc423a1df50, 0x1, 0x0, 0x0)
/home/ubuntu/gosrc/src/github.com/timescale/tsbs/cmd/tsbs_load_timescaledb/process.go:232 +0x192
github.com/timescale/tsbs/load.(*BenchmarkRunner).work(0x7fc000, 0x7e2020, 0x81afc0, 0xc420528a10, 0xc42052d780, 0x7)
/home/ubuntu/gosrc/src/github.com/timescale/tsbs/load/loader.go:202 +0xdd
created by github.com/timescale/tsbs/load.(*BenchmarkRunner).RunBenchmark
/home/ubuntu/gosrc/src/github.com/timescale/tsbs/load/loader.go:105 +0x156

I am able to load to influx. What versions should I be using?

bigint in cassandra for timestamp

Hi,

as already stated in a comment to a TS to C* comparison article,

you really don't need timestamp in C* row to be bigint

timestamp_ns bigint,

Each cassandra row will only contain one day of values, so it's 24*60*60/10 = 8640 entries.

I suggest to use 2-byte smallint, which is 4 times smaller than 8-byte bigint.

Add support for MonetDB

MonetDB is an analytic column store, FOSS, well-known in the research community and has some use in industry. It is a general-purpose DBMS, but being analytics-focused, should do pretty well on time-series (perhaps not for insertion), and regardless - is useful to have as a reference for comparison.

Would you consider adding MonetDB support to TSBS?

out of memory allocating heap arena metadata

  • OS: windows 10 64bit
  • Memory: 16GB
  • the process use 9GB

memory usage is unfriendly

--scale=10000000 --timestamp-start="2019-12-24T00:00:00Z" --timestamp-end="2019-12-31T00:00:00Z"

 tsbs_generate_data.exe --use-case="iot" --seed=123 --scale=10000000 --timestamp-start="2019-12-24T00:00:00Z" --timestamp-end="2019-12-31T00:00:00Z" --log-interval="24h" --format="timescaledb" | gzip > testdata1.gz

fatal error: out of memory allocating heap arena metadata

runtime stack:
runtime.throw(0x971029, 0x2c)
        D:/go/1.13/src/runtime/panic.go:774 +0x79
runtime.(*mheap).sysAlloc(0xd3c800, 0x2000, 0x1000, 0x4)
        D:/go/1.13/src/runtime/malloc.go:724 +0x6d8
runtime.(*mheap).grow(0xd3c800, 0x1, 0xffffffff)
        D:/go/1.13/src/runtime/mheap.go:1252 +0x49
runtime.(*mheap).allocSpanLocked(0xd3c800, 0x1, 0xd55428, 0x75b7a150)
        D:/go/1.13/src/runtime/mheap.go:1163 +0x298
runtime.(*mheap).alloc_m(0xd3c800, 0x1, 0x450008, 0x75b7a150)
        D:/go/1.13/src/runtime/mheap.go:1015 +0xd0
runtime.(*mheap).alloc.func1()
        D:/go/1.13/src/runtime/mheap.go:1086 +0x53
runtime.systemstack(0x0)
        D:/go/1.13/src/runtime/asm_amd64.s:370 +0x6b
runtime.mstart()
        D:/go/1.13/src/runtime/proc.go:1146

goroutine 1 [running]:
runtime.systemstack_switch()
        D:/go/1.13/src/runtime/asm_amd64.s:330 fp=0xc461235820 sp=0xc461235818 pc=0x458e20
runtime.(*mheap).alloc(0xd3c800, 0x1, 0x10008, 0x203180f)
        D:/go/1.13/src/runtime/mheap.go:1085 +0x91 fp=0xc461235870 sp=0xc461235820 pc=0x426341
runtime.(*mcentral).grow(0xd3cf98, 0x0)
        D:/go/1.13/src/runtime/mcentral.go:255 +0x82 fp=0xc4612358b0 sp=0xc461235870 pc=0x417dd2
runtime.(*mcentral).cacheSpan(0xd3cf98, 0x203180f)
        D:/go/1.13/src/runtime/mcentral.go:106 +0x305 fp=0xc461235910 sp=0xc4612358b0 pc=0x4178f5
runtime.(*mcache).refill(0x180008, 0x8)
        D:/go/1.13/src/runtime/mcache.go:138 +0x8c fp=0xc461235930 sp=0xc461235910 pc=0x41737c
runtime.(*mcache).nextFree(0x180008, 0x8, 0x10, 0x8c9cc0, 0xc603ff6940)
        D:/go/1.13/src/runtime/malloc.go:854 +0x8e fp=0xc461235968 sp=0xc461235930 pc=0x40ba2e
runtime.mallocgc(0x30, 0x8a6a00, 0x1, 0xc603fd0cd0)
        D:/go/1.13/src/runtime/malloc.go:1022 +0x7d2 fp=0xc461235a08 sp=0xc461235968 pc=0x40c3b2
runtime.growslice(0x8a6a00, 0xc603ff51c0, 0x1, 0x1, 0x2, 0xc603fd0cd0, 0x0, 0x1)
        D:/go/1.13/src/runtime/slice.go:181 +0x1e9 fp=0xc461235a70 sp=0xc461235a08 pc=0x443639
github.com/timescale/tsbs/cmd/tsbs_generate_data/serialize.(*Point).AppendTag(...)
        D:/goproject/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/serialize/point.go:118
github.com/timescale/tsbs/cmd/tsbs_generate_data/common.(*BaseSimulator).Next(0xc407cf6000, 0xc603ff9580, 0x0)
        D:/goproject/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/common/simulator.go:122 +0x452 fp=0xc461235b68 sp=0xc461235a70 pc=0x7f8bc2
github.com/timescale/tsbs/cmd/tsbs_generate_data/iot.(*Simulator).getNextEntry(0xc2f2b34200, 0x5, 0xc603fe5e30, 0xd54000, 0xc603fe5d00)
        D:/goproject/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/iot/simulator.go:235 +0x3a9 fp=0xc461235bf8 sp=0xc461235b68 pc=0x80a579
github.com/timescale/tsbs/cmd/tsbs_generate_data/iot.(*Simulator).generateBatch(0xc2f2b34200, 0xc603fe5e30, 0x7, 0x8, 0xc603fe5e30)
        D:/goproject/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/iot/simulator.go:187 +0xcc fp=0xc461235c50 sp=0xc461235bf8 pc=0x809ffc
github.com/timescale/tsbs/cmd/tsbs_generate_data/iot.(*Simulator).simulateNextBatch(0xc2f2b34200, 0x1)
        D:/goproject/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/iot/simulator.go:164 +0x44e fp=0xc461235d48 sp=0xc461235c50 pc=0x809e3e
github.com/timescale/tsbs/cmd/tsbs_generate_data/iot.(*Simulator).Next(0xc2f2b34200, 0xc2f2b34480, 0xa09020)
        D:/goproject/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/iot/simulator.go:89 +0x262 fp=0xc461235d78 sp=0xc461235d48 pc=0x809992
github.com/timescale/tsbs/internal/inputs.(*DataGenerator).runSimulator(0xd381e0, 0xa148c0, 0xc2f2b34200, 0xa095a0, 0xd532b8, 0xd38420, 0x0, 0x0)
        D:/goproject/src/github.com/timescale/tsbs/internal/inputs/generator_data.go:159 +0x208 fp=0xc461235e50 sp=0xc461235d78 pc=0x85edd8
github.com/timescale/tsbs/internal/inputs.(*DataGenerator).Generate(0xd381e0, 0xa0dfc0, 0xd38420, 0xc0000ac300, 0x95fa63)
        D:/goproject/src/github.com/timescale/tsbs/internal/inputs/generator_data.go:150 +0x181 fp=0xc461235eb8 sp=0xc461235e50 pc=0x85eb61
main.main()
        D:/goproject/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/main.go:65 +0x75 fp=0xc461235f60 sp=0xc461235eb8 pc=0x861015
runtime.main()
        D:/go/1.13/src/runtime/proc.go:203 +0x21e fp=0xc461235fe0 sp=0xc461235f60 pc=0x43255e
runtime.goexit()
        D:/go/1.13/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc461235fe8 sp=0xc461235fe0 pc=0x45ae01

goroutine 6 [syscall, 5 minutes]:
os/signal.signal_recv(0x0)
        D:/go/1.13/src/runtime/sigqueue.go:147 +0xa3
os/signal.loop()
        D:/go/1.13/src/os/signal/signal_unix.go:23 +0x29
created by os/signal.init.0
        D:/go/1.13/src/os/signal/signal_unix.go:29 +0x48

Timescale load_timescaledb.sh problem

input has wrong header format while loading timescaledb.sh, stated as follow:

osboxes@osboxes:~/go/src/github.com/timescale/tsbs/scripts$ sh load_timescaledb.sh load_timescaledb.sh: 5: load_timescaledb.sh: [[: not found
Bulk loading file /tmp/timescaledb-data.gz

  • pg_isready -h localhost
    localhost:5432 - accepting connections
  • cat /tmp/timescaledb-data.gz
  • gunzip
  • /home/osboxes/go/bin/tsbs_load_timescaledb --db-name=benchmark --host=localhost --user=postgres --workers=6 --batch-size=10000 --reporting-period=10s --use-hypertable=true --use-jsonb-tags=false --in-table-partition-tag=true --hash-workers=false --time-partition-index=false --partitions=1 --chunk-time=8h --write-profile= --field-index-count=1 --do-create-db=true --force-text-format=false
    2019/05/24 22:08:24 input has wrong header format: EOF

ERROR: unsupported jsonb version number 123

When executing tsbs_load_timescaledb with data file, generated for devops usecase, it fails with error:

panic: ERROR: unsupported jsonb version number 123 (SQLSTATE XX000)

goroutine 27 [running]:
main.(*processor).processCSI(0xc42068ede0, 0xc4200183c0, 0x6, 0xc420adc800, 0x457, 0x500, 0x7757a0)
        /home/semen/work/src/github.com/timescale/tsbs/cmd/tsbs_load_timescaledb/process.go:237 +0xe0c
main.(*processor).ProcessBatch(0xc42068ede0, 0x7f2ec0, 0xc42009ca20, 0x4d2901, 0x1, 0x0)
        /home/semen/work/src/github.com/timescale/tsbs/cmd/tsbs_load_timescaledb/process.go:292 +0x192
github.com/timescale/tsbs/load.(*BenchmarkRunner).work(0x9956e0, 0x7f7fe0, 0x9b26d0, 0xc4200b10b0, 0xc42009c870, 0x0)
        /home/semen/work/src/github.com/timescale/tsbs/load/loader.go:253 +0xc6
created by github.com/timescale/tsbs/load.(*BenchmarkRunner).RunBenchmark
        /home/semen/work/src/github.com/timescale/tsbs/load/loader.go:118 +0x16a

Issue is reproduced on Windows and Ubuntu, on Postgre 10, 11.
datadev.txt

Add ClickHouse to benchmark

ClickHouse is another time-series database used for metrics storage and analysis on a large scale. For example, CloudFlare is storing ~6PiB of analytics data in a ClickHouse cluster:
https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/

ClickHouse also has a benchmarks/comparisons page showing performance with a number of different queries on a test dataset -- perhaps those can be included in your benchmark suite as well:
https://clickhouse.yandex/benchmark.html

Instructions for dumping queries

Looking at the data files generated from the command tsbs_generate_queries such as:

tsbs_generate_queries -use-case="cpu-only" -seed=123 -scale=4000 -timestamp-start="2019-01-01T00:00:00Z" -timestamp-end="2019-01-04T00:00:00Z" -queries=10 -query-type="double-groupby-1" -format="influx"

the output seems to be a mixture of binary / url encoded values. Is their a command to dump out what just the individual query lines that is to be submitted look like?
After some work of manually copy and pasting and using an online url decoder I can work backwards to this:

SELECT mean(usage_user) from cpu where time >= '2019-01-02T23:54:10Z' and time < '2019-01-03T11:54:10Z' group by time(1h),hostname�B��9Influx mean of 1 metrics, all hosts, random 12h0m0s by 1hOInflux mean of 1 metrics, all hosts, random 12h0m0s by 1h: 2019-01-03T05:47:30Z

Just wondering if there is a way to make the tool do it.

Thanks,

tsbs_load_timescaledb not working

I tried tsbs for benchmarking but it's not working.
The scripts inside scripts/load_timescaledb.sh are missing parameters.
I got around it but the binary tsbs_load_timescaledb just gets stuck and doesn't report anything. My setup is running on minikube and I'm able to connect to it via psql utility.

pqsql connection working

➜  timescaleDB psql -h 192.168.64.3 -p 30823 -U iot_asset_monitor -d astimescaledb
Password for user iot_asset_monitor:
psql (12.4, server 11.8)
Type "help" for help.

astimescaledb=# \q

tsbs_load_timescaledb gets stuck with no reporting

➜  tsbs git:(master) ✗ tsbs_load_timescaledb --postgres="sslmode=disable" --db-name=astimescaledb --host=192.168.64.3 --port=30823 --pass=a93304eb94caef31a17aae11d04181d5d009 --user=iot_asset_monitor --workers=2 --batch-size=10000 --reporting-period=10s --use-hypertable=true --use-jsonb-tags=false --in-table-partition-tag=true --hash-workers=false --time-partition-index=false --partitions=1 --chunk-time=8h --write-profile= --field-index-count=1 --do-create-db=true --force-text-format=false
^C

generate_queries.sh fails on a query type

generate_queries.sh fails when building queries for :

format : crate , query that fails building : high-cpu-all
format : mongo , query that fails building : high-cpu-all

I get the following error message :
panic: number of hosts cannot be < 1; got 0

Steps to reproduce :

Run (you must export $GOPATH and replace BULK_DATA_DIR with a folder on your system)

crate bug :

FORMATS='crate' SCALE=4000 SEED=123 TS_START='2021-02-28T00:00:00Z' TS_END='2021-03-01T00:00:01Z' QUERIES=1000 QUERY_TYPES='high-cpu-all' BULK_DATA_DIR='/data/research/benchmark-ts-databases/bulk_queries/small_data_queries/crate' $GOPATH/src/github.com/timescale/tsbs/scripts/generate_queries.sh

mongo bug :

FORMATS='mongo' SCALE=4000 SEED=123 TS_START='2021-02-28T00:00:00Z' TS_END='2021-03-01T00:00:01Z' QUERIES=1000 QUERY_TYPES='high-cpu-all' BULK_DATA_DIR='/data/research/benchmark-ts-databases/bulk_queries/small_data_queries/mongo' $GOPATH/src/github.com/timescale/tsbs/scripts/generate_queries.sh

Detailed error message :

panic: number of hosts cannot be < 1; got 0

goroutine 1 [running]:
github.com/timescale/tsbs/cmd/tsbs_generate_queries/databases/mongo.panicIfErr(...)
        /localhome/home/****/go_projects/src/github.com/timescale/tsbs/cmd/tsbs_generate_queries/databases/mongo/devops.go:17
github.com/timescale/tsbs/cmd/tsbs_generate_queries/databases/mongo.(*Devops).HighCPUForHosts(0xc00079a010, 0x791b00, 0xc0007a6000, 0x0)
        /localhome/home/****/go_projects/src/github.com/timescale/tsbs/cmd/tsbs_generate_queries/databases/mongo/devops.go:324 +0xee5
github.com/timescale/tsbs/cmd/tsbs_generate_queries/uses/devops.(*HighCPU).Fill(0xc00079e000, 0x791b00, 0xc0007a6000, 0x416c6f, 0x7f1977530f00)
        /localhome/home/****/go_projects/src/github.com/timescale/tsbs/cmd/tsbs_generate_queries/uses/devops/high_cpu.go:31 +0x84
github.com/timescale/tsbs/internal/inputs.(*QueryGenerator).runQueryGeneration(0xc000161f08, 0x78aa20, 0xc00079a010, 0x78ab40, 0xc00079e000, 0x92ae40, 0x0, 0x0)
        /localhome/home/****/go_projects/src/github.com/timescale/tsbs/internal/inputs/generator_queries.go:221 +0x1dc
github.com/timescale/tsbs/internal/inputs.(*QueryGenerator).Generate(0xc000161f08, 0x78d720, 0x92ae40, 0x0, 0x0)
        /localhome/home/****/go_projects/src/github.com/timescale/tsbs/internal/inputs/generator_queries.go:92 +0x174
main.main()
        /localhome/home/****/go_projects/src/github.com/timescale/tsbs/cmd/tsbs_generate_queries/main.go:94 +0xae

Cassandra tsbs_run_queries_cassandra error: can not unmarshal bigint into *float64

Hi guys,

after fixing #124 I run into another issue with executing TSBS against a Cassandra instance.

As outlined in #124, the data is loaded in Cassandra and the following dummy queries are generated:

tsbs_generate_queries --use-case "cpu-only" --seed "123" --scale "1000" --timestamp-start "2016-01-01T00:00:00Z" --timestamp-end "2016-01-01T01:00:01Z" --queries "10" --query-type "single-groupby-1-1-1" --format "cassandra" --file "/tmp/cassandra-queries"

These queries are executed with the following command:
tsbs_run_queries_cassandra --aggregation-plan "server" --db-name "benchmark" --workers "1" --host "1.2.3.4" --file "/tmp/cassandra-queries" --debug "1" --print-responses

This ends in the following error message:

[hlqe] Do: HumanLabel: Cassandra 1 cpu metric(s), random    1 hosts, random 1h0m0s by 1m, HumanDescription: Cassandra 1 cpu metric(s), random    1 hosts, random 1h0m0s by 1m: 2016-01-01T00:00:00Z, MeasurementName: cpu, AggregationType: max, TimeStart: 2016-01-01 00:00:00.646325489 +0000 UTC, TimeEnd: 2016-01-01 01:00:00.646325489 +0000 UTC, GroupByDuration: 1m0s, TagSets: [[hostname=host_249]]
[hlqe] query planning took 6.979200ms
[qpsa] query with server aggregation plan has 61 CQLQuery objects
panic: can not unmarshal bigint into *float64

goroutine 2603 [running]:
github.com/timescale/tsbs/query.(*BenchmarkRunner).processorHandler(0xc0001044d0, 0xc00019e450, 0xc00061c2d0, 0xf9fba0, 0xc33d00, 0xc00004c3e0, 0x0)
	tsbs/query/benchmarker.go:196 +0x530
created by github.com/timescale/tsbs/query.(*BenchmarkRunner).Run
	tsbs/query/benchmarker.go:156 +0x2a2

Any help is much appreciated!

data file in invalid format

tsbs_load_timescaledb -batch-size 15000 -db-name benchmark -host 1
27.0.0.1 -port 5432 -user postgres -pass 123456 -file E:/tmp/timescaledb-data2.g
z -force-text-format false

data file in invalid format; got �7(t��=��j expected tags

cpu-max-all-1 and cpu-max-all-8 wrong influxql group by time clause and consequent result set

hi there, if you guys follow the next scripts in order to compare timescale's and influx's results for cpu-max-all-1 and cpu-max-all-8 queries, you will notice that there is a wrong group by time clause on influx ( group by time(1m) should be group by time(1h). submitting a PR with the fix and improvements on resultset output for influx ( so that we can easily check this type of error).

replicate wrong resultset ( using a version of tsbs_run_queries_influx that outputs the influxql so that it is easy to catch the mistake ):

influx

commands

## INFLUX
# Generate data and queries
tsbs_generate_data --format influx --use-case cpu-only --scale=10 --seed=123 --file /tmp/bulk_data/influx_data
tsbs_generate_queries --queries=1 --format influx --use-case cpu-only --scale 10 --seed 123 --query-type cpu-max-all-1     --file /tmp/bulk_data/influx_query_cpu-max-all-1
tsbs_generate_queries --queries=1 --format influx --use-case cpu-only --scale 10 --seed 123 --query-type cpu-max-all-8     --file /tmp/bulk_data/influx_query_cpu-max-all-8

# Remove previous database
curl -X POST http://localhost:8086/query?q=drop%20database%20benchmark

# insert data
tsbs_load_influx --workers=1 --file=/tmp/bulk_data/influx_data

tsbs_run_queries_influx --workers=1 --max-queries=1 --file=/tmp/bulk_data/influx_query_cpu-max-all-1 --print-responses > influx_query_cpu-max-all-1.json

tsbs_run_queries_influx --workers=1 --max-queries=1 --file=/tmp/bulk_data/influx_query_cpu-max-all-8 --print-responses > influx_query_cpu-max-all-8.json

Portion of influx_query_cpu-max-all-1.json

As you can see we're returning results in buckets of 1min where it should be in buckets of one hour ( timescale resultset ). Same applies for cpu-max-all-8 query and resultset.

{
  "influxql": "SELECT max(usage_user),max(usage_system),max(usage_idle),max(usage_nice),max(usage_iowait),max(usage_irq),max(usage_softirq),max(usage_steal),max(usage_guest),max(usage_guest_nice) from cpu where (hostname = 'host_9') and time \u003e= '2016-01-01T02:16:22Z' and time \u003c '2016-01-01T10:16:22Z' group by time(1m)",
  "response": {
    "results": [
      {
        "series": [
          {
            "columns": [
              "time",
              "max",
              "max_1",
              "max_2",
              "max_3",
              "max_4",
              "max_5",
              "max_6",
              "max_7",
              "max_8",
              "max_9"
            ],
            "name": "cpu",
            "values": [
              [
                "2016-01-01T02:16:00Z",
                36,
                9,
                49,
                13,
                32,
                3,
                9,
                29,
                23,
                52
              ],
              [
                "2016-01-01T02:17:00Z",
                37,
                10,
                47,
                11,
                32,
                7,
                10,
                28,
                25,
                52
              ],
(...)

timescale

commands

## TIMESCALE
# Generate data and queries
tsbs_generate_data --format timescaledb --use-case cpu-only --scale=10 --seed=123 --file /tmp/bulk_data/timescaledb_data
tsbs_generate_queries --queries=1 --format timescaledb --use-case cpu-only --scale 10 --seed 123 --query-type cpu-max-all-1     --file /tmp/bulk_data/timescaledb_query_cpu-max-all-1
tsbs_generate_queries --queries=1 --format timescaledb --use-case cpu-only --scale 10 --seed 123 --query-type cpu-max-all-8     --file /tmp/bulk_data/timescaledb_query_cpu-max-all-8


# insert data
tsbs_load_timescaledb --pass=password --postgres="sslmode=disable port=5433" --db-name=benchmark --host=127.0.0.1 --user=postgres --workers=1 --file=/tmp/bulk_data/timescaledb_data

tsbs_run_queries_timescaledb --pass=password --postgres="sslmode=disable port=5433" --db-name=benchmark --hosts=127.0.0.1 --user=postgres --workers=1 --max-queries=1 --file=/tmp/bulk_data/timescaledb_query_cpu-max-all-1 --print-responses > timescaledb_query_cpu-max-all-1.json

tsbs_run_queries_timescaledb --pass=password --postgres="sslmode=disable port=5433" --db-name=benchmark --hosts=127.0.0.1 --user=postgres --workers=1 --max-queries=1 --file=/tmp/bulk_data/timescaledb_query_cpu-max-all-8 --print-responses > timescaledb_query_cpu-max-all-8.json

Portion of timescaledb_query_cpu-max-all-1.json

{
  "query": "SELECT time_bucket('3600 seconds', time) AS hour,\n        max(usage_user) as max_usage_user, max(usage_system) as max_usage_system, max(usage_idle) as max_usage_idle, max(usage_nice) as max_usage_nice, max(usage_iowait) as max_usage_iowait, max(usage_irq) as max_usage_irq, max(usage_softirq) as max_usage_softirq, max(usage_steal) as max_usage_steal, max(usage_guest) as max_usage_guest, max(usage_guest_nice) as max_usage_guest_nice\n        FROM cpu\n        WHERE tags_id IN (SELECT id FROM tags WHERE hostname IN ('host_9')) AND time \u003e= '2016-01-01 02:16:22.646325 +0000' AND time \u003c '2016-01-01 10:16:22.646325 +0000'\n        GROUP BY hour ORDER BY hour",
  "results": [
    {
      "hour": "2016-01-01T02:00:00Z",
      "max_usage_guest": 38,
      "max_usage_guest_nice": 52,
      "max_usage_idle": 55,
      "max_usage_iowait": 46,
      "max_usage_irq": 25,
      "max_usage_nice": 17,
      "max_usage_softirq": 33,
      "max_usage_steal": 44,
      "max_usage_system": 11,
      "max_usage_user": 48
    },
    {
      "hour": "2016-01-01T03:00:00Z",
      "max_usage_guest": 40,
      "max_usage_guest_nice": 59,
      "max_usage_idle": 84,
      "max_usage_iowait": 52,
      "max_usage_irq": 35,
      "max_usage_nice": 27,
      "max_usage_softirq": 33,
      "max_usage_steal": 85,
      "max_usage_system": 9,
      "max_usage_user": 27
    },
(...)

[tsbs_run_queries_timescaledb] Some query types do not work with timescale-use-json=true

TSBS version: commit ID 56292c5

Hi,

We have an issue when executing queries generated by calling tsbs_generate_queries --timescale-use-json=true .... At the execution, we face various errors:

When executing queries generated with the query type avg-load and high-load, the error message is:

panic: ERROR: operator does not exist: double precision / jsonb (SQLSTATE 42883)goroutine 9 [running]:
github.com/timescale/tsbs/pkg/query.(*BenchmarkRunner).processorHandler(0xc00012c6e0, 0xc000026ed0, 0xc0000788c0, 0xc80440, 0x9f1140, 0xc000191100, 0x2)
	/app/tsbs/src/github.com/timescale/tsbs/pkg/query/benchmarker.go:196 +0x293
created by github.com/timescale/tsbs/pkg/query.(*BenchmarkRunner).Run
	/app/tsbs/src/github.com/timescale/tsbs/pkg/query/benchmarker.go:156 +0x206

When executing queries generated with the query type avg-vs-projected-fuel-consumption, the error message is:

panic: ERROR: function avg(text) does not exist (SQLSTATE 42883)goroutine 22 [running]:
github.com/timescale/tsbs/pkg/query.(*BenchmarkRunner).processorHandler(0xc00014e630, 0xc00011ce00, 0xc0001008c0, 0xc80440, 0x9f1140, 0xc0001b70f0, 0x2)
	/app/tsbs/src/github.com/timescale/tsbs/pkg/query/benchmarker.go:196 +0x293
created by github.com/timescale/tsbs/pkg/query.(*BenchmarkRunner).Run
	/app/tsbs/src/github.com/timescale/tsbs/pkg/query/benchmarker.go:156 +0x206

Is it a known issue with the JSONB use case? Should we just skip these query types for the time being in the JSONB use case or do you have a workaround we could apply?

Thanks a lot for this open source tool :)

groupby-orderby-limit query incorrect on influxdb

The description of this query is:

The last 5 aggregate readings (across time) before a randomly chosen endpoint

but the query produced for influx looks like:

SELECT max(usage_user) from cpu WHERE time < '2016-01-01T14:37:05Z' group by time(1m) limit 5

this is missing the appropriate ORDER BY clause to get the last five readings.. Instead it seems to pick up the first five minutes of the dataset.

Load data to mongodb error

When I try to load data into mongodb, it throws the error like this.
2019/10/07 17:58:37 Bulk aggregate update err: read tcp 127.0.0.1:58292->127.0.0.1:27017: i/o timeout

I use the command:
tsbs_generate_data -use-case="devops" -seed=123 -scale=1000 -timestamp-start="2016-01-01T00:00:00Z" -timestamp-end="2016-01-02T00:00:00Z" -log-interval="10s" -format="mongo"| gzip > /tmp/mongo-data.gz
Anyone also encounter the same issue?

Report "data file in invalid format" when loading data

The data load program load_timescaledb.sh reports error information , then the process stops.

Environment:
CPU:Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz2
Memory:128GB
Disk: 200GB SSD, 4TB HDD
Network Interface:Intel 82599ES 10-Gigabit
2
Docker Image: timescaledb:1.6.1-pg11 from Docker Hub

Script:
NUM_WORKERS=40 BATCH_SIZE=10000 BULK_DATA_DIR=./tsdata ./go/src/github.com/timescale/tsbs/scripts/load_timescaledb.sh

Output:
Bulk loading file ./tsdata/timescaledb-data.gz

  • pg_isready -h localhost
    localhost:5433 - accepting connections
  • cat ./tsdata/timescaledb-data.gz
  • gunzip
  • /home/tsbs/go/bin/tsbs_load_timescaledb --postgres=sslmode=disable --db-name=benchmark --host
    =localhost --user=postgres --port=5433 --workers=40 --batch-size=10000 --reporting-period=10s --use-hypertable=true --use-jsonb-tags=false --in-table-partition-tag=true --hash-workers=false --time-partition-index=false --partitions=1 --chunk-time=8h --write-profile= --field-index-count=1 --do-create-db=true --force-text-format=falsetime,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
    1587711242,2643131.91,2.644480E+07,2643131.91,528730.33,5.290000E+06,528730.33
    1587711252,2799028.55,5.444991E+07,2721080.99,559703.60,1.089000E+07,544217.12
    1587711262,2731927.99,8.175475E+07,2724694.13,546288.75,1.635000E+07,544907.18
    1587711272,2678968.47,1.085444E+08,2713264.17,535999.85,2.171000E+07,542680.63
    1587711282,2875825.70,1.373027E+08,2745773.15,575000.02,2.746000E+07,549143.85
    1587711292,2706018.79,1.643629E+08,2739147.99,540999.76,3.287000E+07,547786.61
    1587711302,2635174.76,1.907146E+08,2724295.76,527000.23,3.814000E+07,544817.35
    1587711312,2739610.14,2.181107E+08,2726209.93,548000.19,4.362000E+07,545215.18
    1587711322,2809039.54,2.462011E+08,2735412.70,561999.99,4.924000E+07,547080.05
    1587711332,2730225.80,2.735034E+08,2734894.04,545999.88,5.470000E+07,546972.04
    1587711342,2609809.17,2.996015E+08,2723523.22,522000.15,5.992000E+07,544701.97
    1587711352,2796843.77,3.275699E+08,2729633.00,559000.03,6.551000E+07,545893.43
    1587711362,2764614.28,3.552160E+08,2732323.76,553000.06,7.104000E+07,546440.07
    1587711372,2734374.67,3.825598E+08,2732470.25,546999.81,7.651000E+07,546480.05
    1587711382,2650353.49,4.090635E+08,2726995.96,529997.58,8.181000E+07,545381.25
    1587711392,2859353.37,4.376568E+08,2735268.00,572002.99,8.753000E+07,547045.05
    1587711402,2775310.38,4.654100E+08,2737623.36,554999.52,9.308000E+07,547512.95
    1587711412,2684571.81,4.922557E+08,2734676.14,537000.28,9.845000E+07,546928.93
    1587711422,2560156.15,5.178572E+08,2725491.12,511999.87,1.035700E+08,545090.60
    1587711432,2874537.94,5.466026E+08,2732943.28,574999.59,1.093200E+08,546586.02
    1587711442,2735035.01,5.739530E+08,2733042.88,547000.12,1.147900E+08,546605.73
    1587711452,2614275.39,6.000957E+08,2727644.49,523000.44,1.200200E+08,545532.79
    1587711462,2670242.29,6.267982E+08,2725148.79,533999.66,1.253600E+08,545031.36
    1587711472,2845153.53,6.552497E+08,2730148.88,569000.31,1.310500E+08,546030.05
    1587711482,2714677.57,6.823965E+08,2729530.04,542999.91,1.364800E+08,545908.84
    1587711492,2624641.87,7.086429E+08,2725495.96,524999.73,1.417300E+08,545104.66
    1587711502,2680046.05,7.354433E+08,2723812.66,536000.09,1.470900E+08,544767.46
    1587711512,2785988.02,7.633032E+08,2726033.17,557000.00,1.526600E+08,545204.33
    1587711522,2644154.61,7.897448E+08,2723209.82,528999.72,1.579500E+08,544645.56
    1587711532,2504506.80,8.147899E+08,2715919.84,500999.76,1.629600E+08,543190.73
    1587711542,2710326.74,8.418931E+08,2715739.42,542000.79,1.683800E+08,543152.34
    1587711552,2798867.36,8.698891E+08,2718337.78,559853.61,1.739800E+08,543674.38
    1587711562,2616161.60,8.960444E+08,2715242.34,523126.93,1.792100E+08,543051.89
    1587711572,2590590.66,9.219498E+08,2711576.23,518009.41,1.843900E+08,542315.37
    2020/04/24 14:59:34 data file in invalid format; got 4901.14user 73.58system 43:49.66elapsed 18
    9%CPU (0avgtext+0avgdata 54616maxresident)k expected tags

The data file timescaledb-data.gz is 3-days data generated by the following command :
$HOME/go/bin/tsbs_generate_data --use-case="iot" --seed=123 --scale=4000 --times
tamp-start="2016-01-01T00:00:00Z" --timestamp-end="2016-01-04T00:00:00Z" --log-interval="10s" --format="timescaledb" | gzip > $HOME/tsdata/timescaledb-data.gz

Does anyone know how to solve this problem?

Benchmark workflow

The objective here is to bring a wide discussion about the ideal model to run
multiple benchmarks and allow us to correlate them later.

Discussing the idea with @ryanbooz and @zseta , generally, we start with a simple plan, like:

Let's test database A against database B with scenarios X and Z.
Let's also run the same benchmark in different machine sizes to compare throughput and
efficiency. Plus, let's see how the performance goes with different parallelization levels and so on.

And then we start our journey in a few steps:

  1. Provide the machines - set up the tsbs machine that will send data and
    queries to the targets with different configuration
  2. Setup the target machines installing OS/database and providing a common-auth to connect
  3. Setup the initial configuration we want to benchmark: how many rows of data, how
    dense is the time-series (rows/day), and the type of data IoT or DevOps.
  4. Later we adapt the initial config with the derived configuration targetting different
    machines and parallelization levels.

Machines ready, we're done to start using tsbs_load with --config and
synchronize each run to not mess with parallel benchmarks in the master machine
that could affect the performance.

To run every command, we should be in a screen session to keep it detached
from our ssh connection.

Later, we need to collect all text reports and manually capture and move the
data to a spreadsheet to allow us to correlate the benchmarks. Allowing us to
better understand what params work best in which context.

The most important info we collect is the throughput of rows/sec and metrics/sec of
each scenario: IoT/DevOps.

This is the first part, that covers load but we also have run_queries which also allows using
different sets of configs like parallelization and different types of queries in the context.

From the queries, we have an identifier per query and we also can run several
queries in parallel with variants of the initial configuration, like before and
after compressing data.

After getting metadata from all the queries, we need to manually capture the
performance information of each configuration set and move to a spreadsheet
to later correlate the data.

So, several of these steps are done manually and we also need to keep an eye in
the pipeline to reuse the same tsbs machine to push the data.

We don't have a specific issue here but an open space to see as a community how we can approach the problem and
improve the way we work to have a better flow and also allow us to reuse previous benchmarks without needing to rerun everything again.

tsbs_load_influx: parse error

Hello guys,

data and query generation works fine, but by using the tsbs_load_influx binary, following error occurs:
data_queries_gen
Even with supplying all arguments, the error remains:
parse_error
Obviously, a database is created (here seen in chronograf):
chronograf_db
But no results can be shown:
db_query_result
The .csv file is also empty, showing only the row names but no measurement values.
When processing as described in the documentation, the file is correctly written, but this does not apply to the binary.

Ingestion related benchmarks ( tsbs_load*...) should expose the per command/operation latency summary

Currently, on the ingestion benchmarks, we're only able to assess and compare the different solutions based upon the ingestion rate and total ingestion time. As important as the command rate is the command latency ( specifically on writes for TSDBs ).
Given this is expected to be the de-facto standard for TSDBs benchmarks I believe we're missing that major feature.
PS: If you guys agree I'm completely up for pushing a PR that does exactly that.

Allow some percentage of data to be generated for an older timerange

In preparation for inserting into compressed hypertables, we would like to allow the generated data to accept a second range of time that randomly inserts records for that time range at a fixes percentage relative to the total rows.

This would typically be done after and initial load of data has been completed and tables have been compressed. We would then generate a second set of data, set create_metric_tables=false, and then insert the additional data.

Looking at the current parameters, maybe something like:

  • historical-timestamp-start: the start range for random, historical data
  • historical-timestamp-end: the end range for random, historical data
  • historical-percentage: the percentage of overall data (current + historical) that should come from the historical range.

This is just a place to start the discussion.

Cassandra run_queries issue: no keyspace has been specified

Hi guys,

I am trying to run the TSBS against a Cassandra instance (version 3.11.2).

The load phase works well and the data is inserted into Cassandra. I am using the following command:

cat /opt/workloads/data/cassandra-data.gz | gunzip | /opt/workloads/tsbs/bin/tsbs_load_cassandra --db-name="benchmark" --workers=2 --reporting-period="10s" --batch-size=10 --hosts 192.168.0.214:9042 --replication-factor 1 --consistency ONE

An exemplary set of queries is generated with the following command:

/opt/workloads/tsbs/bin/tsbs_generate_queries --use-case="cpu-only" --seed=123 --scale=1000 --timestamp-start="2016-01-01T00:00:00Z" --timestamp-end="2016-01-01T02:00:01Z" --queries=1000 --query-type="single-groupby-1-1-1" --format="cassandra" | gzip > /opt/workloads/data/cassandra-queries.gz

Yet, when running these queries with:

cat /opt/workloads/data/cassandra-queries.gz | gunzip | /opt/workloads/tsbs/bin/tsbs_run_queries_cassandra --workers 2 --host 192.168.0.214:9042 --aggregation-plan server

I get the following error: No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename

I checked the Cassandra instance and the benchmark keyspace is created and the data is inserted.

Thanks for any help how to fix this issue!

Add support for IoTDB

The IoTDB developed in our lab is an integrated data management engine designed for time series data, it meets the requirements of massive dataset storage, high-speed data input and complex data analysis in the IoT industrial field.

Would you tell me how to add IoTDB support to TSBS?

fatal: git fetch-pack: expected shallow list

[root@clickhouseclusterdbone tsbs]# pwd
/data/testdata/tsbs
[root@clickhouseclusterdbone tsbs]# ls
cmd docs go.mod go.sum internal LICENSE load query README.md scripts
[root@clickhouseclusterdbone tsbs]# go build
go: github.com/jackc/pgx/[email protected] requires
github.com/jackc/[email protected] requires
github.com/jackc/pgx/[email protected] requires
github.com/jackc/[email protected] requires
github.com/jackc/pgx/[email protected] requires
github.com/jackc/[email protected]: invalid version: git fetch --unshallow -f https://github.com/jackc/puddle in /root/go/pkg/mod/cache/vcs/c64e1e439c1833d50f3ae3451304bb415a3c9fd61932b4dd5c008f082305a4b0: exit status 128:
fatal: git fetch-pack: expected shallow list
[root@clickhouseclusterdbone tsbs]#

Make bash variable naming consistent

As pointed out in issue #6, our script files do not follow the same variable conventions. generate_*.sh use a camel-case format while other use all uppercase. We should standardize on all uppercase.

load_cassandra.sh creating errors

Upon generating queries, I was able to load it into timescale but kept getting
runtime error: slice bounds out of range error when I try to do that for cassandra that is on the same server.

BATCH_SIZE=1000 BULK_DATA_DIR=/tmp REPLICATION_FACTOR=2 DATABASE_HOST=nosql3. scripts/load_cassandra.sh Bulk loading file /tmp/cassandra-data.gz

and the error that pops up are as follows

` + nc -z nosql3 9042

  • cqlsh -e 'drop keyspace measurements;'
    Connection error: ('Unable to connect to any servers', {'127.0.0.1:9042': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
  • cat /tmp/cassandra-data.gz
  • gunzip
  • /home/v5s/go/bin/tsbs_load_cassandra --workers=4 --batch-size=1000 --reporting-period=10s --write-timeout=1000s --hosts=nosql3:9042 --replication-factor=2
    time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
    panic: runtime error: slice bounds out of range

goroutine 144 [running]:
main.singleMetricToInsertStatement(0x0, 0x0, 0xb7, 0x0)
/home/v5s/go/src/github.com/timescale/tsbs/cmd/tsbs_load_cassandra/scan.go:41 +0x3b8
main.(*processor).ProcessBatch(0xc420504098, 0x6ef5e0, 0xc4205258e0, 0x1, 0xc420532180, 0x5c8e19)
/home/v5s/go/src/github.com/timescale/tsbs/cmd/tsbs_load_cassandra/main.go:103 +0xa7

github.com/timescale/tsbs/load.(*BenchmarkRunner).work(0x80dc60, 0x6f0a40, 0xc42000e038, 0xc4207bae80, 0xc4204e0930, 0x0)
/home/v5s/go/src/github.com/timescale/tsbs/load/loader.go:253 +0xc6
created by github.com/timescale/tsbs/load.(*BenchmarkRunner).RunBenchmark
/home/v5s/go/src/github.com/timescale/tsbs/load/loader.go:118 +0x16a `

I only edited out the name of the server because of workplace security reasons.

tsbs_load_timescaledb problems

Hello guys,

this time the database is timescaledb.
I made sure the user postgres in postgresql has no password and modified the pg_hba.conf accordingly to 'trust' on localhost, but nevertheless I get following error:
TimescaleDB
But that is not the only issue I got:
Every binary I tested (for influx, timescaledb, mongo) seems to have issues with the parameter "file" and does not work correctly when supplied with it.
Do you have any clues to that?

Have a nice day

"low-fuel" query implementation in influxdb has wrong semantics

The influxdb implementation generates this query:

SELECT "name", "driver", "fuel_state" 
                FROM "diagnostics" 
                WHERE "fuel_state" <= 0.1 AND "fleet" = 'West' 
                GROUP BY "name" 
                ORDER BY "time" DESC 
                LIMIT 1

which doesn't match the expected semantics. This finds, for each vehicle, the last timestamp at which fuel_state < 0.1, rather than finding whether the fuel_state was < 0.1 for the last-collected timestamp.

Given InfluxQL's bizarre semantics, I'm not sure how to express the correct query, but it seems like right now making comparisons between influx and other systems on this query is not a fair comparison.

ClampedRandomWalkDistribution with NormalDistribution with mean > 0 will semi-permanently generate maximum value

Relevant Files
pkg/data/usecases/common/distribution.go

Problem:
If you instantiate a ClampedRandomWalkDistribution by common.CWD(0, 1000, common.ND(50, 1)), and Advance() enough times, you'll constantly generate 1000 at some point. I don't know if this was the intent of ClampedRandomWalkDistribution but it doesn't seem like that's what was expected. Instead, the expectation was to have a random walk with a mean around 50, but never went below 0 or above 1000.

Diagnosis:
The problem lies in that the mean returned by the NormalDistribution is around 50. This frequently positive value is added to the State of the ClampedRandomWalkDistribution during each Advance(). At some point, State > 1000 which results in State = 1000. Because the mean of the underlying NormalDistribution is 50, it's unlikely that State becomes less than 1000 after this happens.

Potential solution:
ClampedRandomWalkDistribution should always be initialized with common.ND(0, stdev) where stdev is the standard deviation needed. ClampedRandomWalkDistribution should also have an Offset attribute. If the random walk should be around some mean value, then Get() should return Offset + State. Adjustments to the Max and Min cutoff are also required.

Other notes:
A similar issue exists with RandomWalk however this isn't totally incorrect even if the behavior is not what I think was intended. If RandomWalk was initialized with common.ND(50, 1), then RandomWalk will take steps with a mean of +50. As a result, RandomWalk would be almost always monotonically increasing (with mean >> 0). Overflow issues not withstanding.

Influx: load_influx.sh works not correctly

Hello guys,

after installing the relevant go files for generating data and the influx db, following problem occurs:
In the attached file
terminal_output
it is seen that statistics of the load performance are not written on the terminal as described in the relevant part of the documentation
referenced_description
If not paying attention to this and executing the file for running the queries, an internal go error occurs
go_error

This happens with following .zip-file. The data generation works just fine.
influx-data.gz

Hope, you can fix it.
Thanks

Add Scylla to benchmark

Scylla is intended to be a more performant, drop-in replacement for Cassandra.

It has been benchmarked a bunch, but it would be interesting to see in this benchmark as well. The scripts for Cassandra code can be copied and modified slightly to fit Scylla.

Command to generate data and queries showing syntax errors

I installed the required directories from the cmd folder manually, and the installation showed no error.
Directories I installed: tsbs_generate_data, tsbs_generate_queries, tsbs_load_timescaledb, tsbs_load_influx, tsbs_run_queries_timescaledb, tsbs_run_queries_influx.

On giving the command:

'/home/maddscientist/go/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/main.go' --use-case="iot" --seed=123 --scale=4000     --timestamp-start="2016-01-01T00:00:00Z"     --timestamp-end="2016-01-04T00:00:00Z"     --log-interval="10s" --format="timescaledb"     | gzip > /tmp/timescaledb-data.gz

I get the following error:

/home/maddscientist/go/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/main.go: line 1: //: Is a directory
/home/maddscientist/go/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/main.go: line 2: //: Is a directory
/home/maddscientist/go/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/main.go: line 3: //: Is a directory
/home/maddscientist/go/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/main.go: line 4: //: Is a directory
/home/maddscientist/go/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/main.go: line 5: syntax error near unexpected token `('
/home/maddscientist/go/src/github.com/timescale/tsbs/cmd/tsbs_generate_data/main.go: line 5: `// ClickHouse pseudo-CSV format (the same as for TimescaleDB)'

On running the command to generate queries:

'/home/maddscientist/go/src/github.com/timescale/tsbs/cmd/tsbs_generate_queries/main.go' --use-case="iot" --seed=123 --scale=4000     --timestamp-start="2016-01-01T00:00:00Z"     --timestamp-end="2016-01-04T00:00:01Z"     --queries=1000 --query-type="breakdown-frequency" --format="timescaledb"     | gzip > /tmp/timescaledb-queries-breakdown-frequency.gz

I get the error:

/home/maddscientist/go/src/github.com/timescale/tsbs/cmd/tsbs_generate_queries/main.go: line 1: //: Is a directory
/home/maddscientist/go/src/github.com/timescale/tsbs/cmd/tsbs_generate_queries/main.go: line 2: //: Is a directory
/home/maddscientist/go/src/github.com/timescale/tsbs/cmd/tsbs_generate_queries/main.go: line 3: package: command not found
/home/maddscientist/go/src/github.com/timescale/tsbs/cmd/tsbs_generate_queries/main.go: line 5: syntax error near unexpected token `newline'
/home/maddscientist/go/src/github.com/timescale/tsbs/cmd/tsbs_generate_queries/main.go: line 5: `import ('

These are the same commands as listed on the README page.
Any help regarding this would be appreciated.

P.S. If I don't specify the path to the command directory, I get an error saying that the command "tsbs_generate_data" is not found, even after I added the directories to my $PATH.
This error was resolved by giving the path in the command itself.

TSBS run clickhouse error

https://github.com/timescale/tsbs/blob/master/docs/clickhouse.md

go get github.com/timescale/tsbs
cd $GOPATH/src/github.com/timescale/tsbs/cmd
go get ./...
go install ./...

As this guide i run this commend on CentOS 7.6 under golang1.14.6,return following message:

FORMATS=clickhouse ./generate_data.sh

which: no tsbs_generate_data in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/go/bin:/usr/local/go/bin:/root/go/src/github.com/timescale/tsbs/scripts:/root/bin:/usr/local/go/bin)
tsbs_generate_data not available. It is not specified explicitly and not found in $PATH

I can't sure which step product command:tsbs_generate_data .

Support for an common JSON benchmark results file

Hi there, given this tool is becoming more and more the standard for TSDBs benchmarks wdyt about exporting the benchmark results in a common JSON format, allowing to specify the used configs, overall and detailed results.
I would gladly give it try in presenting a POC. WDYT?

Benchmark should include a test of deletion

With regulation such as GDPR in the EU, deletions is a very important feature to support. Benchmark should measure the performance of random deletions as conformance with GDPR cannot rely on expiry of the data.

Binaries build fail on Apple M1

The following result is the error due to shirou/gopsutil#1000 regarding gopsutil-v2 incompatibility with macOS/arm64.

 go get ./...
# github.com/shirou/gopsutil/process
../../go/pkg/mod/github.com/shirou/[email protected]+incompatible/process/process_darwin.go:567:34: undefined: KinfoProc
../../go/pkg/mod/github.com/shirou/[email protected]+incompatible/process/process_darwin.go:568:8: undefined: KinfoProc
../../go/pkg/mod/github.com/shirou/[email protected]+incompatible/process/process_darwin.go:581:32: undefined: KinfoProc
../../go/pkg/mod/github.com/shirou/[email protected]+incompatible/process/process_darwin.go:585:62: undefined: KinfoProc

Opening the fix PR that addresses this.

Add timescaledb extension to plain postgres db. (maybe an option?)

Hi,

When benchmarking timescaledb vs pg using the same requests is interesting, but requests with time_bucket need timescaledb extension.

Something like


--- a/cmd/tsbs_load_timescaledb/creator.go
+++ b/cmd/tsbs_load_timescaledb/creator.go
@@ -133,8 +133,8 @@ func (d *dbCreator) CreateDB(dbName string) error {
                        dbBench.MustExec(idxDef)
                }
 
+               dbBench.MustExec("CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE")
                if useHypertable {
-                       dbBench.MustExec("CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE")
                        dbBench.MustExec(
                                fmt.Sprintf("SELECT create_hypertable('%s'::regclass, 'time'::name, partitioning_column => '%s'::name, number_partitions => %v::smallint, chu`

timescaledb hook add a small overhead but it's trivial.

Regards
Didier

TimescaleDB "time-bucket" problem

When generating queries with this command:
tsbs_generate_queries --use-case="iot" --seed=123 --scale=40 --timescale-time-bucket=false\ --timestamp-start="2016-01-01T00:00:00Z" \ --timestamp-end="2016-01-04T00:00:01Z" \ --queries=1000 --query-type="breakdown-frequency" --format="timescaledb" \ | gzip > /tmp/postgre-queries-breakdown-frequency.gz

The resulting queries contain the time-bucket function even when the flag --timescale-time-bucket is set to false.
The problem doesn't exist when using the --use-case="devop". That means, it's impossible to test the "iot" queries on the regular PostgreSQL database.

System used: Ubuntu 18.04.3 LTS

Support for influxdb 2.0

Now It seems that I can only find code about influxdb 1.x, so I wonder is there any plan about supporting benchmark with influxdb 2.0? Thanks!

tsbs_generate_data: command not found

I followed all the steps for installation, and did not get any error. But none of the tsbs commands works:

"tsbs_generate_data: command not found"

I tested on ubuntu 18 initialy and again tested on Ubuntu 20.4 .

It would be appreciated if you can give me some hint.

Thanks

Strange decision to remove dbname from connection string in dbCreator initialization routine

I have noticed that tsbs_load_timescaledb cannot connect to database using specific username and dbname distinct from this name, e.g., using the following connection string dbname=postgres user=user. The problem lies in the following fragment of code cmd/tsbs_load_timescaledb/creator.go:22 :

// Needed to connect to user's database in order to drop/create db-name database
re := regexp.MustCompile(`(dbname)=\S*\b`)
d.connStr = strings.TrimSpace(re.ReplaceAllString(d.connStr, ""))

AFAIU the decision to remove dbname from connection string was based on assumption to distinguish the dbname for connection from the working dbname for benchmark. Instead of deletion I propose to check that these names obtained from connection string and option -db-name correspondingly are different.

dataset files - where to download?

I wanted to try the benchmark but I cannot find where to download these files:

timescaledb-high-cpu-1-queries.gz
timescaledb-cpu-max-all-8-queries.gz
timescaledb-groupby-orderby-limit-queries.gz
timescaledb-double-groupby-1-queries.gz
timescaledb-data.gz

Maybe you can share the link where to locate these and download them.

Thank you.

Dependency issue building latest tsbs

It seems building tsbs latest fails due to a missing / deleted tag of the dependency blagojts/viper:

$ go get github.com/timescale/tsbs
package github.com/timescale/tsbs: no Go files in /home/user/go/src/github.com/timescale/tsbs

$ cd ~/go/src/github.com/timescale/tsbs/

$ make
GO111MODULE=on go get ./cmd/tsbs_generate_data
go: github.com/blagojts/[email protected]: invalid pseudo-version: git fetch --unshallow -f origin in /home/user/go/pkg/mod/cache/vcs/16d0bac366e51d6014eaecdfff68e32694d92fd593e195f3958efe74d9f002d7: exit status 128:
	fatal: git fetch-pack: expected shallow list
make: *** [tsbs_generate_data] Error 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.