xephonhq / xephon-b Goto Github PK

View Code? Open in Web Editor NEW

8.0 2.0 3.0 753 KB

A time series database benchmark suite

Home Page: https://xephonhq.github.io/xephon-b/

License: MIT License

Go 69.60% Shell 20.84% Makefile 2.22% Python 2.20% Dockerfile 5.15%

benchmark time-series database tsdb

xephon-b's Introduction

Xephon-B

A time series database benchmark tool. NOTE: it is under major rewrite, See roadmap.

status: Under major rewrite, along with libtsdb-go
Documentation
Slide: Introduce Xephon-B

License

MIT

Authors

About

B is for benchmark and Xephon comes from the animation RahXephon

xephon-b's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger rayleyva fossabot

xephon-b's Issues

[cmd] Setting log level and log src is not working

import (
	"fmt"
	"os"
	"runtime"

	icli "github.com/at15/go.ice/ice/cli"
	goicelog "github.com/at15/go.ice/ice/util/logutil"

	"github.com/xephonhq/xephon-b/pkg/config"
	"github.com/xephonhq/xephon-b/pkg/util/logutil"
)

const (
	myname = "xb"
)

// FIXME: debug logging is not working ....
var log = logutil.Registry

var (
	version   string
	commit    string
	buildTime string
	buildUser string
	goVersion = runtime.Version()
)

var buildInfo = icli.BuildInfo{Version: version, Commit: commit, BuildTime: buildTime, BuildUser: buildUser, GoVersion: goVersion}

var cli *icli.Root
var cfg config.XephonBConfig

func main() {
	cli = icli.New(
		icli.Name(myname),
		icli.Description("Xephon-B Time Series Benchmark cli"),
		icli.Version(buildInfo),
		icli.LogRegistry(log),
	)
	root := cli.Command()
	root.AddCommand(runCmd)
	if err := root.Execute(); err != nil {
		fmt.Fprintln(os.Stderr, err)
		os.Exit(1)
	}
}

func mustLoadConfig() {
	if err := cli.LoadConfigTo(&cfg); err != nil {
		log.Fatal(err)
	}
}

func init() {
	log.AddChild(goicelog.Registry)
}

 xb run                    
info 2018-03-05T00:39:23-08:00 target database is influxdb_0 type influxdb
info 2018-03-05T00:39:23-08:00 workload is workload_0 series 1 value generator is constant
info 2018-03-05T00:39:23-08:00 TODO: worker should do something
info 2018-03-05T00:39:23-08:00 TODO: worker should do something
info 2018-03-05T00:39:23-08:00 TODO: worker should do something
info 2018-03-05T00:39:23-08:00 TODO: worker should do something
info 2018-03-05T00:39:23-08:00 TODO: worker should do something
info 2018-03-05T00:39:23-08:00 TODO: worker should do something
info 2018-03-05T00:39:23-08:00 TODO: worker should do something
info 2018-03-05T00:39:23-08:00 TODO: worker should do something
info 2018-03-05T00:39:23-08:00 TODO: worker should do something
info 2018-03-05T00:39:23-08:00 TODO: worker should do something
````

[runner][worker] Limit total number of points

In order to test disk space usage, limit total number of points is needed

each worker now reports number of points in each request
need to fan out the result channel, it is also needed for multiple reporter
a special reporter that count the total number of requests and cancel the context

Provision scripts for develop environment

Related #7

For people who has to use windows, it's hard to setup the develop environment. (Win10 has bash, but it's too young and cause more problems than it solves) Also install some database using package manager can sometimes mess up your OS

The vagrant box should include

an up to date go environment with glide and gopath set (may need to change vagrant mount folder)
JDK8, maven, gradle (if you want to try some jvm based system)
nvm + latest nodejs (we may need to do some front end stuff)
essential build tools (some database have native binding)
vim, git, curl (which does not ship with ubuntu)
docker (I don't know if win has native docker support now, yes, but require win 10 pro)
docker compose
publish to vagrant cloud

A former box I made for php development can be found here

Ref

Migrate Xephon-B back from Xephon-K

Related to xephonhq/xephon-k#60 Xephon-K clean up

Major issues

requires libtsdb-go to have protocol, client & server implementation in HTTP(s) and gRPC
support tracing, since Xephon-K server will do it as well, it has penalty, but it will given more detailed insight
more workloads, we were only using the extreme workload in CMPS 278 and CMPS 229

TODO

wait for libtsdb-go ...
switch to dep from glide
make extreme workload start running
plugin in BenchBoard
add API to control workload generation, so it can be used for distributed benchmark
plugin in BenchHub

Improve error handling

define custom errors (I think I used have the EndOfPoint error in generator, don't know if I still have it after #13
wrap default error to provide context of errors for trace
when to use panic and recover
when to log the error and when to return it. (should not do both in most cases I think)

Ref

https://github.com/pkg/errors allows to wrap errors and trace cause
https://blog.golang.org/errors-are-values
https://justinas.org/best-practices-for-errors-in-go/

Data loading

Related #1 #9

Generated data is stored to disk and need to be inserted into TSDB

NOTE: the generated data is independent of specify tsdb and is serialized using protobuf. So it need to be transformed before inserted into TSDB

On the fly transform

read and de serialize generated file, post into TSDB
Pro
- take less disk space
- less disk IO
Con
- de serialization overhead
- can not be reused

Pre transform

read and de serialize generated file, save as the exact post format to file
read the saved the file and post exact bytes to TSDB
Pro
- no de serialization overhead
- can be reused
Con
- may take large disk space, due to tag and series name will be duplicated
- larger disk IO

However when it comes to implementation these two have little different

data -> de serialization ->  pack into certain format (i.e. JSON) -> client lib -> TSDB
data -> de serialization ->  pack into certain format (i.e. JSON) -> file -> client lib -> TSDB

filter log by package like in java's logback

Though Java is verbose, but for libraries like logback, you can config which package to log on the fly, which is quite useful, turning on verbose log will have all the package printing log, while you may only want one package printing debug information. ~~Currently, we are using logrus, and it seems by adding a pkg field and adding hook, it's possible to filter log~~

~~add hook~~
create own wheel, see dyweb/Ayi#59, created a logrus like package in ordert to add filter functionality
enable filter log from command line, i.e. --debug x.tsdb.kairosdb only print kairosdb package and its subpackage
enable filter log using config file like log4j

stretchr/testify/suite panic: reflect: Call with too few input arguments

ok  	github.com/xephonhq/xephon-b/pkg/generator	0.002s	coverage: 85.7% of statements
=== RUN   TestSerializerInterface
=== RUN   TestSerializeTestSuite
=== RUN   TestDebugSerializer
--- FAIL: TestDebugSerializer (0.00s)
panic: reflect: Call with too few input arguments [recovered]
	panic: reflect: Call with too few input arguments

goroutine 7 [running]:
panic(0x6a8ca0, 0xc4201f86d0)
	/home/at15/app/go/src/runtime/panic.go:500 +0x1a1
testing.tRunner.func1(0xc4200943c0)
	/home/at15/app/go/src/testing/testing.go:579 +0x25d
panic(0x6a8ca0, 0xc4201f86d0)
	/home/at15/app/go/src/runtime/panic.go:458 +0x243
reflect.Value.call(0xc4201e0ba0, 0xc420036530, 0x13, 0x70fb7d, 0x4, 0xc420034760, 0x1, 0x1, 0x506788, 0x70b060, ...)
	/home/at15/app/go/src/reflect/value.go:358 +0x13c1
reflect.Value.Call(0xc4201e0ba0, 0xc420036530, 0x13, 0xc420059f40, 0x1, 0x1, 0x87c140, 0x0, 0xf6)
	/home/at15/app/go/src/reflect/value.go:302 +0xa4
github.com/xephonhq/xephon-b/vendor/github.com/stretchr/testify/suite.Run.func2(0xc4200943c0)
	/home/at15/workspace/src/github.com/xephonhq/xephon-b/vendor/github.com/stretchr/testify/suite/suite.go:95 +0x1cb
testing.tRunner(0xc4200943c0, 0xc420223960)
	/home/at15/app/go/src/testing/testing.go:610 +0x81
created by testing.(*T).Run
	/home/at15/app/go/src/testing/testing.go:646 +0x2ec
FAIL	github.com/xephonhq/xephon-b/pkg/serialize	0.005s

On line 95 is method.Func.Call([]reflect.Value{reflect.ValueOf(suite)}), however after seeing https://github.com/pavlo/gosuite/blob/master/gosuite.go#L56,
I changed it to method.Func.Call([]reflect.Value{reflect.ValueOf(suite), reflect.ValueOf(t)}) and the test works

Possible reasons

I am not using it properly, since Ayi also work

[integeration] generator + loader + monitor + tsdb proxy (kairosdb)

Since all these parts already have a primitive prototype, it's time to put them together before dig into each one further. The process should be

start monitor for both client and server into InfluxDB?
generator generate synthetic data
load read the data and feed into tsdb proxy client and into KairosDB
collect latency, db metrics into InfluxDB?

#6 Config file, need to define the syntax of config file before integration
#30 tsdb-proxy and xephon-b are now in different repository now, though xephon-b still have all the code now.

[reporter][counter] Align metrics from libtsdb-go and metrics/result.go

Currently there are three places we define metrics

libtsdb-go return net/http/httptrace results on http clients
metrics/result.go contains some numbers but not all of them can be obtained in what is exposed by libtsdb-go
counter has some numbers, most of them are not updated correctly

TODO

works with http based tsdb clients
compatible with raw tcp based client, i.e. graphite
return result in finalize stage so it can be written to somewhere by manager

[runner][worker] Limit QPS

Besides limit by time/points, we can add extra constraints like limit QPS, it is different from firs two because those two determines the termination of workload, QPS controls the fastest speed each worker thread should be.

Also from YCSB about latency when QPS is limited

if you specify a target of 10 operations per second (and a single thread) then the Client will only execute an operation every 100 milliseconds. If the operation takes 12 milliseconds, then the client will wait for an additional 88 milliseconds before trying the next operation. However, the reported latency will not include this wait time; a latency of 12 milliseconds, not 100, will be reported.

KairosDB client

Related #10 #14

Payload

allow add one point, which turns into bytes right away d7852a9 json serialize example
allow add point to buffer, the pointer to point is stored, it will be grouped by series (TODO: then it comes the problem of tags order ....) maybe use set?

Client

https support (need to disable some check in order to use self sign ceritficates)
share connect to avoid out of file handler problem, like mentioned in hey rakyll/hey#31
config qps
* load following the time in the data
track latency and errors
put bench data into TSDB

Metric

pull KairosDB metrics
pull Cassandra metrics
pull machine metrics

Existing clients

https://github.com/ajityagaty/go-kairosdb not very suitable for benchmark

Ref

https://blog.gopheracademy.com/advent-2016/http-client/ mentions how to use golang 1.7's new context library to handle request cancellation etc.

[db][kairosdb] Invalid json. No content due to end of input

this error didn't show up when running test using libtsdb
points are still written into kairosdb, we can read it in the web ui
it could be we are not setting content type correctly
not draining connection? close body?

WARN 0009 failed to flush {"errors":["Invalid json. No content due to end of input.","Invalid json. No content due to end of input."]}
WARN 0009 failed to flush {"errors":["Invalid json. No content due to end of input.","Invalid json. No content due to end of input."]}
WARN 0009 failed to flush {"errors":["Invalid json. No content due to end of input.","Invalid json. No content due to end of input."]}

cassandra_1  | WARN  [Native-Transport-Requests-12] 2018-03-05 23:58:42,775 BatchStatement.java:301 - Batch for [kairosdb.data_points] is of size 5.469KiB, exceeding specified threshold of 5.000KiB by 0.469KiB.

Time distribution of generated data

In current PR #9, the point generatation is the dumbest, fixed value with fixed time interval. which is used to make sure other part of the program can be implemented ASAP (serialize, bulk load, query etc.) In order to simulate real world use case, complex and configurable data generation is needed. The original problem is discussed in the private repo which contains some paper addressing this issue

Will update the issue when I finish bulk load and query using the simplest point.

Logo and website

init the gh-pages
a logo
a ~~hand drawn~~ landing page
introduction for the lightning talk

Since the lt is on Friday .... got to finish it tonight

Config snapshot util

It is a pain to let people write a copy of their config when they do benchmark, some people change their config right after one test and start a new one immediately and end up can't matching their config and test result.

Xephon-B should take a snapshot of all the necessary config and put it in the report. We only consider the micro benchmark now

config file (may need to filter some credentials or better put credentials in a separated file)
- load config
database information (ie: if using docker, version information can be obtained using docker client)
host information (ie: for development, sometimes it's the problem of the developer's machine, not the code)
- runtime versions
- basic hardware information, mem, disk space etc.

Data generation

Time series database write is different from other NoSQL, typically it's

key a string for describe the source ie cpu.idle
timestamp when does the event happen
value numeric value, integer or float
tags k=>v for adding attributes to data

Examples

Since generate complex data cost a lot resources, it's a wise idea to save the data to the disk.
while influx-comparison use the bulk form of the target database, I think it's better to use a general serialization format, and store meta data in another file (you can even do some dirty trick to the meta
to change the load without generating the data)

Serialization

https://github.com/alecthomas/go_serialization_benchmarks

[refactor] Split tsdb package out into separate repo

Related issues: #28, #18, #15

What the tsdb package doing is like JDBC, except it's in Golang and for TSDB only. And it would provide server side implementation in order to be a proxy. So it's a better idea to make it a standalone repo instead of bundled instead xephon-b. Current problems are

the series and point package in pkg/common is coupled with almost every package in xephon-b including tsdb, possible solutions are:
- have a xephonhq/tsdb-proxy/common package in order to avoid possible cycle problem
tsdb actually has its own command related files in xephonhq/xephon-b/cmd/tsdb-proxy while xepoh-b has its own command related files in xephonhq/xephon-b/pkg/cmd and only binary in xephonhq/xephon-b/cmd/xephon-b
the tracing functionality is not limited to benchmark tool only and can be switched off for normal usage, as for the benchmark, it's possible to use other goroutine in a pull style to collect all those metrics instead of let each goroutine to send into a channel, if the metrics are not collect by someone, it is simple replaced by new ones no need to worry about memory usage when have a lot of metrics for tons of goroutines when tracing is enabled.
create tsdb-proxy repository using git filter-branch
change import and remove unused files from each repo (starting from xephon-b might be easier), though the generator and simulator are really close related to time series logic
make sure both repository pass their travis test
update documentation for both repository

TSDB Shell

Company with TSDB Proxy #18, for easier develop, it's a good idea to have an interactive shell.

Ref

A REPL for go https://github.com/motemen/gore

[runner][reporter] Report progress

Runner should report the overall progress of the benchmark so user can have an estimation, i.e. when limit by time 10% 1s/10s, when limit by points 10% 1M/100M (numbers should be human readable when print to console, it's hard to count the zeros ...)

have multiple reporter for progress, this is different from the reporter for worker results
write to terminal
- by time
- by percentage
expose to http handler?
(optional) report to xephon-b central for distributed processing? then xephon-b itself becomes a framework runs inside benchhub's job scheduler

Config file

Related #1 #2

item with a * prefix will most likely be ignored

use viper to read yaml file
borrow viper patch from Ayi if necessary
one for general config xephon-b.yml
one for test config bench.yml, a copy will be stored with the benchmark result
* use ~/.xephon-b folder to store history of experiments, use some go k-v store

Write

load scenario
load type
... many more

Read

do write first

Workload Read

Typical time series database read are range or aggregation

all data points in the last thirty minutes
all data points in the last moth with lower granularity. compute the average of collected data points to meet the requirement.
use name as criteria
use both name and tag(s) as criteria

Example

https://github.com/influxdata/influxdb-comparisons/tree/master/bulk_query_gen

Cluster deployment

Ref

https://www.nginx.com/blog/service-discovery-in-a-microservices-architecture/

[runner] Server for remote control

Currently Xephon-B just run and stop, however, it could be a server program for

dynamic control during benchmark
run multiple workload at same time (just create multiple managers)
reduce warm up time if dataset is loaded from disk
detect memory and goroutine leak in current runner

TSDB Proxy

Related PR: #19

We need to (can) have a TSDB proxy for the following reason

we have to develop multiple tsdb clients
cAdvisor support various backends, but maybe none of them is the one we want
we need to store benchmark result in tsdb, and we should allow people to choose the one they like. (actually we don't know which one works well for storing the result)
we may have our own tsdb implementation

Clients

official InfluxDB client https://github.com/influxdata/influxdb/blob/master/client/v2/client.go
official RiakTS client (use protocol buffer) https://github.com/basho/riak-go-client

Proxy

send metrics to multiple tsdbs https://github.com/go-kit/kit/tree/master/metrics
A proxy from graphite to riak https://github.com/dams/graphite-riakts

Ref

https://github.com/influxdata/influxdb/tree/master/services InfluxDB can parse other protocols like graphite and OpenTSDB
https://github.com/eBay/fabio A Golang HTTP(S) Proxy with Consul
https://github.com/influxdata/influxdb-relay InfluxDB's write to multiple instances
https://github.com/flike/kingshard A MySQL proxy
https://github.com/youtube/vitess Vitess is a database clustering system for horizontal scaling of MySQL

Script for license check

#Currently, xephon-b is released under MIT, however if there are dependencies use GPL or some strange
license, it may need to switch library. (not license)

It should do the following

loop the vendor folder and find each repo's license ~~(since we are using glide there should not be nested vendor like old node.js)~~
deal with duplicate since some projec commit vendor into scm
print a tree
work on windows (under gitbash and/or msys64)

Since it's just a util script, I will write it in python

Container monitor

We need to provide some insight about the databases, like if a database fails with most of the system idle, there is certainly wrong configuration or poor design.

Existing solutions

https://github.com/stefanprodan/dockprom Docker hosts and containers monitoring with Prometheus, Grafana, cAdvisor, NodeExporter and AlertManager, provide configuration files to glue them together, Prometheus is used as TSDB, Grafana is used for visualization, cAdvisor for container monitor, NodeExporter for host metrics
https://github.com/vegasbrianc/docker-monitoring similar to dockerprom
https://github.com/weaveworks/scope Monitoring, visualisation & management for Docker & Kubernetes
http://rancher.com/comparing-monitoring-options-for-docker-deployments/
https://grafana.net/dashboards/1244 a dashboard for monitoring prometheus using grafana

Ref

https://github.com/iovisor/bcc Tools for BPF-based Linux IO analysis, networking, monitoring, and more
https://github.com/facebook/osquery query system metrics using SQL

[reporter] Stuck after reporter Run is canceled by context

got stuck before finalize reporter, for counter reporter, there is nothing to finalize, should be some go routine problem on my end ...