centaurusinfra / regionless-storage-service Goto Github PK
View Code? Open in Web Editor NEWA geo-distributed regionless metadata storage service
License: Apache License 2.0
A geo-distributed regionless metadata storage service
License: Apache License 2.0
part of 730 tests
gaol: test 1M record load (each record value is 10KB)
export NUM_OF_SI=5
export SI_INSTANCE_TYPE=t2.large
export RKV_INSTANCE_TYPE=t2.xlarge
export JAEGER_INSTANCE_TYPE=t2.xlarge
export JAEGER_ROOT_DISK_VOLUME=200
export YCSB_INSTANCE_TYPE=t2.large
after the test lab has been set up, login to ycsb vm, cd work/go-ycsb, make change to workloads/workloada file, with following changes:
threadcount=2
fieldlength=1000
recordcount=1000000
operationcount=1000000
workload=core
then run ./bin/go-ycsb load rkv -P workloads/workloada
Run finished, takes 1h9m37.403247648s
INSERT - Takes(s): 4177.4, Count: 999990, OPS: 239.4, Avg(us): 8113, Min(us): 4344, Max(us): 217343, 99th(us): 25071, 99.9th(us): 43391, 99.99th(us): 62943
This is to set up the most basic e2e framework that
No certain features such as partition, replication are required for this task. The goal is to have a running service that works with simple storage instance. Later on features will be built upon this service.
during perf run, observed that rkv may exit in the middle of load test; below is /tmp/rkv.log file
this does not always happen; 2 times in about 20 runs.
$ cat /tmp/rkv.log
The url is 35.166.131.63:6666 and the pool is &{0x6c7ba0 <nil> <nil> 80 12000 0s false 0s {0 0} false 0 {0 {0 0}} <nil> {0 <nil> <nil>} 0 0}
The url is 34.217.123.197:6666 and the pool is &{0x6c7ba0 <nil> <nil> 80 12000 0s false 0s {0 0} false 0 {0 {0 0}} <nil> {0 <nil> <nil>} 0 0}
The url is 18.237.48.185:6666 and the pool is &{0x6c7ba0 <nil> <nil> 80 12000 0s false 0s {0 0} false 0 {0 {0 0}} <nil> {0 <nil> <nil>} 0 0}
The url is 34.215.240.3:6666 and the pool is &{0x6c7ba0 <nil> <nil> 80 12000 0s false 0s {0 0} false 0 {0 {0 0}} <nil> {0 <nil> <nil>} 0 0}
The url is 52.89.46.186:6666 and the pool is &{0x6c7ba0 <nil> <nil> 80 12000 0s false 0s {0 0} false 0 {0 {0 0}} <nil> {0 <nil> <nil>} 0 0}
2022/07/29 02:14:45 ERROR: fail init redis: dial tcp 52.89.46.186:6666: connect: connection refused
ycsb log indicated that exit happened after 4750 seconds
INSERT - Takes(s): 4710.0, Count: 357472, OPS: 75.9, Avg(us): 25817, Min(us): 7364, Max(us): 19316735, 99th(us): 33439, 99.9th(us): 5459967, 99.99th(us): 12992511
INSERT - Takes(s): 4720.0, Count: 357475, OPS: 75.7, Avg(us): 25891, Min(us): 7364, Max(us): 19316735, 99th(us): 33439, 99.9th(us): 5484543, 99.99th(us): 12992511
INSERT - Takes(s): 4730.0, Count: 357475, OPS: 75.6, Avg(us): 25891, Min(us): 7364, Max(us): 19316735, 99th(us): 33439, 99.9th(us): 5484543, 99.99th(us): 12992511
INSERT - Takes(s): 4740.0, Count: 357476, OPS: 75.4, Avg(us): 25941, Min(us): 7364, Max(us): 19316735, 99th(us): 33439, 99.9th(us): 5496831, 99.99th(us): 13049855
INSERT - Takes(s): 4750.0, Count: 357477, OPS: 75.3, Avg(us): 25997, Min(us): 7364, Max(us): 20168703, 99th(us): 33439, 99.9th(us): 5541887, 99.99th(us): 13213695
INSERT_ERROR - Takes(s): 9.5, Count: 8294, OPS: 876.4, Avg(us): 2085, Min(us): 1776, Max(us): 9503, 99th(us): 4483, 99.9th(us): 5543, 99.99th(us): 6971
INSERT - Takes(s): 4760.0, Count: 357477, OPS: 75.1, Avg(us): 25997, Min(us): 7364, Max(us): 20168703, 99th(us): 33439, 99.9th(us): 5541887, 99.99th(us): 13213695
INSERT_ERROR - Takes(s): 19.5, Count: 16847, OPS: 865.6, Avg(us): 2113, Min(us): 1776, Max(us): 14327, 99th(us): 4527, 99.9th(us): 5759, 99.99th(us): 11015
INSERT - Takes(s): 4770.0, Count: 357477, OPS: 74.9, Avg(us): 25997, Min(us): 7364, Max(us): 20168703, 99th(us): 33439, 99.9th(us): 5541887, 99.99th(us): 13213695
INSERT_ERROR - Takes(s): 29.5, Count: 25455, OPS: 863.9, Avg(us): 2117, Min(us): 1776, Max(us): 14327, 99th(us): 4519, 99.9th(us): 5675, 99.99th(us): 9503
INSERT - Takes(s): 4780.0, Count: 357477, OPS: 74.8, Avg(us): 25997, Min(us): 7364, Max(us): 20168703, 99th(us): 33439, 99.9th(us): 5541887, 99.99th(us): 13213695
Goals:
***************** properties *****************
"threadcount"="4"
"dotransactions"="false"
"fieldlength"="160 #intended 500; due to the encoding, 160 length would yield about 500 payload"
"operationcount"="10000000"
"requestdistribution"="uniform"
"recordcount"="10000000"
"insertproportion"="0"
"readproportion"="0.5"
"workload"="core"
"readallfields"="true"
"updateproportion"="0.5"
"scanproportion"="0"
**********************************************
Yaeger crashed at the end.
Run finished, takes 9h20m19.700252511s
INSERT - Takes(s): 33619.7, Count: 9999999, OPS: 297.4, Avg(us): 13412, Min(us): 4300, Max(us): 1058815, 99th(us): 21951, 99.9th(us): 36703, 99.99th(us): 56831
ubuntu@ip-172-31-13-231:~$ free -g
total used free shared buff/cache available
Mem: 31 4 25 0 1 26
Swap: 0 0 0
ubuntu@ip-172-31-9-38:~$ free -g
total used free shared buff/cache available
Mem: 31 13 16 0 1 17
Swap: 0 0 0
when rkv uses mem database as storage backend, go-ycsb load phase fails with INSERT_ERROR (go-ycsb workload specifies threadcount 4):
$ ./bin/go-ycsb load rkv -P workloads/workloada
***************** properties *****************
"dotransactions"="false"
"operationcount"="1000"
"scanproportion"="0"
"workload"="core"
"readallfields"="true"
"threadcount"="4"
"requestdistribution"="uniform"
"updateproportion"="0.5"
"recordcount"="1000"
"readproportion"="0.5"
"insertproportion"="0"
**********************************************
Run finished, takes 179.69837ms
INSERT - Takes(s): 0.2, Count: 55, OPS: 310.2, Avg(us): 1403, Min(us): 492, Max(us): 3987, 99th(us): 3213, 99.9th(us): 3987, 99.99th(us): 3987
INSERT_ERROR - Takes(s): 0.1, Count: 941, OPS: 6705.2, Avg(us): 523, Min(us): 187, Max(us): 9167, 99th(us): 3247, 99.9th(us): 7919, 99.99th(us): 9167
rkv log indicates concurrent map write causing the crash
The url is 172.31.9.140:6379 and the pool is &{0x6c8780 <nil> <nil> 80 12000 0s false 0s {0 0} false 0 {0 {0 0}} <nil> {0 <nil> <nil>} 0 0}
The url is 172.31.12.96:6380 and the pool is &{0x6c8780 <nil> <nil> 80 12000 0s false 0s {0 0} false 0 {0 {0 0}} <nil> {0 <nil> <nil>} 0 0}
fatal error: concurrent map writes
goroutine 315 [running]:
runtime.throw(0x7cdf55, 0x15)
/home/howell/go/go1.16.9/src/runtime/panic.go:1117 +0x72 fp=0xc0003f3e30 sp=0xc0003f3e00 pc=0x437ab2
runtime.mapassign_faststr(0x7636c0, 0xc0003821e0, 0xc00002264c, 0x3, 0x0)
/home/howell/go/go1.16.9/src/runtime/map_faststr.go:211 +0x3f1 fp=0xc0003f3e98 sp=0xc0003f3e30 pc=0x415eb1
github.com/regionless-storage-service/pkg/database.MemDatabase.Put(...)
/home/howell/work/regionless-storage-service/pkg/database/mem.go:31
github.com/regionless-storage-service/pkg/database.(*MemDatabase).Put(0xc0003c6000, 0xc00002264c, 0x3, 0xc000468000, 0xdd7, 0x0, 0x0, 0x0, 0x83be20)
<autogenerated>:1 +0x65 fp=0xc0003f3ed0 sp=0xc0003f3e98 pc=0x6c8ba5
github.com/regionless-storage-service/pkg/piping.(*ChainPiping).Write.func1(0xc000022640, 0x83be20, 0xc00051dd10, 0xc0000ca2a0, 0xc00002264c, 0x3, 0xc000468000, 0xdd7)
/home/howell/work/regionless-storage-service/pkg/piping/chain_piping_manager.go:61 +0x18d fp=0xc0003f3fa0 sp=0xc0003f3ed0 pc=0x6d244d
runtime.goexit()
/home/howell/go/go1.16.9/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc0003f3fa8 sp=0xc0003f3fa0 pc=0x46d3e1
created by github.com/regionless-storage-service/pkg/piping.(*ChainPiping).Write
/home/howell/work/regionless-storage-service/pkg/piping/chain_piping_manager.go:57 +0x32d
...
Other things worthy of noting:
Start with consistency hashing to allow partitioning storage to different underlying storage instance.
this is not major, but test UX.
Typical workflow to run perf test is run follwoing scripts in order, e.g.
the KEY_FILE and KEY_NAME could (and should) get values from si_def.json
redis persistence is disabled (save "")
export NUM_OF_SI=6
export RKV_ROOT_DISK_VOLUME=100
export SI_ROOT_DISK_VOLUME=100
export SI_INSTANCE_TYPE=t2.xlarge
export RKV_INSTANCE_TYPE=t2.2xlarge
export JAEGER_INSTANCE_TYPE=t2.2xlarge
export JAEGER_ROOT_DISK_VOLUME=200
export YCSB_INSTANCE_TYPE=t2.2xlarge
export YCSB_ROOT_DISK_VOLUME=40
records: 5M
k-v payload: 5KB value
load test only
workloada setting
threadcount=4
fieldlength=160 #intended 500; due to the encoding, 160 length would yield about 500 payload
recordcount=5000000
operationcount=5000000
ycsb log:
INSERT - Takes(s): 10840.0, Count: 3188140, OPS: 294.1, Avg(us): 13548, Min(us): 4082, Max(us): 1046015, 99th(us): 22239, 99.9th(us): 29471, 99.99th(us): 223231
INSERT - Takes(s): 10850.0, Count: 3191224, OPS: 294.1, Avg(us): 13548, Min(us): 4082, Max(us): 1046015, 99th(us): 22239, 99.9th(us): 29471, 99.99th(us): 223231
INSERT - Takes(s): 10860.0, Count: 3194339, OPS: 294.1, Avg(us): 13547, Min(us): 4082, Max(us): 1046015, 99th(us): 22239, 99.9th(us): 29471, 99.99th(us): 223231
INSERT - Takes(s): 10870.0, Count: 3197327, OPS: 294.1, Avg(us): 13547, Min(us): 4082, Max(us): 1046015, 99th(us): 22223, 99.9th(us): 29471, 99.99th(us): 223231
INSERT - Takes(s): 10880.0, Count: 3200144, OPS: 294.1, Avg(us): 13547, Min(us): 4082, Max(us): 1046015, 99th(us): 22223, 99.9th(us): 29455, 99.99th(us): 223231
... // approaching to 4M records (where redis is almost exhausting its memory), noticing OPS is actually very low (<1 ops at this moment)
INSERT - Takes(s): 13820.0, Count: 3931533, OPS: 284.5, Avg(us): 13969, Min(us): 4082, Max(us): 19005439, 99th(us): 22079, 99.9th(us): 29439, 99.99th(us): 1022463
INSERT - Takes(s): 13830.0, Count: 3931539, OPS: 284.3, Avg(us): 13986, Min(us): 4082, Max(us): 22298623, 99th(us): 22079, 99.9th(us): 29455, 99.99th(us): 1022463
INSERT - Takes(s): 13840.0, Count: 3931541, OPS: 284.1, Avg(us): 13993, Min(us): 4082, Max(us): 22298623, 99th(us): 22079, 99.9th(us): 29455, 99.99th(us): 1022975
INSERT - Takes(s): 13850.0, Count: 3931542, OPS: 283.9, Avg(us): 13999, Min(us): 4082, Max(us): 22790143, 99th(us): 22079, 99.9th(us): 29455, 99.99th(us): 1022975
... // even no ops sometimes
INSERT - Takes(s): 14550.0, Count: 3932125, OPS: 270.2, Avg(us): 14673, Min(us): 4082, Max(us): 33439743, 99th(us): 22111, 99.9th(us): 29855, 99.99th(us): 5308415
INSERT - Takes(s): 14560.0, Count: 3932125, OPS: 270.1, Avg(us): 14673, Min(us): 4082, Max(us): 33439743, 99th(us): 22111, 99.9th(us): 29855, 99.99th(us): 5308415
INSERT - Takes(s): 14570.0, Count: 3932125, OPS: 269.9, Avg(us): 14673, Min(us): 4082, Max(us): 33439743, 99th(us): 22111, 99.9th(us): 29855, 99.99th(us): 5308415
INSERT - Takes(s): 14580.0, Count: 3932125, OPS: 269.7, Avg(us): 14673, Min(us): 4082, Max(us): 33439743, 99th(us): 22111, 99.9th(us): 29855, 99.99th(us): 5308415
...
INSERT - Takes(s): 14640.0, Count: 3932125, OPS: 268.6, Avg(us): 14673, Min(us): 4082, Max(us): 33439743, 99th(us): 22111, 99.9th(us): 29855, 99.99th(us): 5308415
INSERT - Takes(s): 14650.0, Count: 3932126, OPS: 268.4, Avg(us): 14682, Min(us): 4082, Max(us): 33980415, 99th(us): 22111, 99.9th(us): 29855, 99.99th(us): 5320703
INSERT - Takes(s): 14660.0, Count: 3932129, OPS: 268.2, Avg(us): 14708, Min(us): 4082, Max(us): 34570239, 99th(us): 22111, 99.9th(us): 29871, 99.99th(us): 5345279
INSERT - Takes(s): 14670.0, Count: 3932133, OPS: 268.0, Avg(us): 14722, Min(us): 4082, Max(us): 34570239, 99th(us): 22111, 99.9th(us): 29871, 99.99th(us): 5402623
INSERT - Takes(s): 14680.0, Count: 3932136, OPS: 267.9, Avg(us): 14729, Min(us): 4082, Max(us): 34570239, 99th(us): 22111, 99.9th(us): 29871, 99.99th(us): 5423103
...
INSERT - Takes(s): 17050.0, Count: 3933443, OPS: 230.7, Avg(us): 17128, Min(us): 4082, Max(us): 34570239, 99th(us): 22191, 99.9th(us): 31215, 99.99th(us): 12156927
INSERT - Takes(s): 17060.0, Count: 3933451, OPS: 230.6, Avg(us): 17144, Min(us): 4082, Max(us): 34570239, 99th(us): 22191, 99.9th(us): 31231, 99.99th(us): 12230655
INSERT - Takes(s): 17070.0, Count: 3933454, OPS: 230.4, Avg(us): 17155, Min(us): 4082, Max(us): 34570239, 99th(us): 22191, 99.9th(us): 31231, 99.99th(us): 12230655
rkv memory usage when system is about 4M records
total used free shared buff/cache available
Mem: 31G 1.7G 28G 844K 1.5G 29G
Swap: 0B 0B 0B
jaeger cpu and disk usage when system is about 4M recods
total used free shared buff/cache available
Mem: 31Gi 7.8Gi 22Gi 1.0Mi 926Mi 23Gi
Swap: 0B 0B 0B
Filesystem Size Used Avail Use% Mounted on
/dev/root 194G 2.4G 192G 2% /
ycsb client cpu and disk usage when system is about 4M recods
total used free shared buff/cache available
Mem: 31Gi 334Mi 30Gi 0.0Ki 788Mi 30Gi
Swap: 0B 0B 0B
Filesystem Size Used Avail Use% Mounted on
/dev/root 39G 4.3G 35G 11% /
ubuntu@ip-172-31-3-247:~/work/go-ycsb$ ./bin/go-ycsb load rkv -P workloads/workloada
***************** properties *****************
"readproportion"="0.5"
"updateproportion"="0.5"
"requestdistribution"="uniform"
"workload"="core"
"readallfields"="true"
"dotransactions"="false"
"recordcount"="10000"
"fieldlength"="1000"
"threadcount"="4"
"scanproportion"="0"
"insertproportion"="0"
"operationcount"="10000"
**********************************************
INSERT - Takes(s): 10.0, Count: 1564, OPS: 156.6, Avg(us): 25321, Min(us): 18416, Max(us): 57023, 99th(us): 43679, 99.9th(us): 56319, 99.99th(us): 57023
INSERT - Takes(s): 20.0, Count: 3116, OPS: 155.9, Avg(us): 25435, Min(us): 8336, Max(us): 57023, 99th(us): 43519, 99.9th(us): 55999, 99.99th(us): 57023
INSERT - Takes(s): 30.0, Count: 4496, OPS: 149.9, Avg(us): 26455, Min(us): 8336, Max(us): 87103, 99th(us): 49503, 99.9th(us): 66495, 99.99th(us): 87103
INSERT - Takes(s): 40.0, Count: 5843, OPS: 146.1, Avg(us): 27147, Min(us): 8336, Max(us): 87103, 99th(us): 51871, 99.9th(us): 67903, 99.99th(us): 83455
INSERT - Takes(s): 50.0, Count: 7243, OPS: 144.9, Avg(us): 27380, Min(us): 8336, Max(us): 87103, 99th(us): 51423, 99.9th(us): 66495, 99.99th(us): 83455
This is a finding when doing the consistency validation.
localreplicanum=0
, remotereplicanum=3
, remotestorelatencythresholdinmillisec=0
)num_client=5
, duration=20
Result: a client can get a value which is not existing in the redis backend. For example, in the following screenshot client 4 got a value 582
. From the original log the corresponding revision number is 132
. The returned value from the curl command for revision 132
is 311
.
Currently rkv update op updates the value of the specified key; each key keeps the history of values in list of revisions.
Taking an imaginary scenario that increase the request-counter (key named as "count") on receiving requests:
There exist multiple components (e.g. server handler go-routines) updating the count. We need a mechanism to ensure the correctness of count update. One simple approach is updating with comparison of latest revision number.
Goals:
This is NOT an issue, but a tracking report of 0730 perf testings.
It should have been a wiki page; unfortunately for now wiki page is still not available for this repo in private shape.
Perf tests are conducted with go-ycab with rkv driver.
IMPORTANT: a full test has 2 steps: load + run, like below
./bin/go-ycsb load rkv -P workloads/workloada
./bin/go-ycsb run rkv -P workloads/workloada
record count | config | total time(s) | insert count | ops | latency(us): avg(us) | min | max | 99% | 99.9% | 99.99% | rkv used mem | jaeger notes |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1K | redisx2, all t2.micro(1cpu, 1GB mem), client threads=1 | 4.45 | 1000 | 226 | 4405 | 3386 | 26479 | 10191 | 21135 | 26479 | 150M | createKV: 3060us, set kv:3038us, put index: 8us |
1M | redisx2, rkv+jaeger t2.medium (1cpu, 4GB), client thread=100 | 52'4" | 1,000,000 | 320 | 312349 | 4732 | 1531903 | 452607 | 1305599 | 1353727 | 838M | |
10M | redisx5, rkv t2.large(2cpu, 8GB), others t2.medium, threads 100 | 7:14'11" | 10,000,000 | 384 | 260467 | 1412 | 7389183 | 479999 | 1273855 | 1341439 | 4.0G | jaeger crashed, likely due to insufficient mem |
Run finished, takes 605.65671ms
INSERT - Takes(s): 0.6, Count: 1000, OPS: 1666.0, Avg(us): 572, Min(us): 461, Max(us): 3987, 99th(us): 1698, 99.9th(us): 3965, 99.99th(us): 3987
this happens in perf test (go-ycsb run); not always; occasionally 1 out of per run (1000 ops)
./bin/go-ycsb load rkv -P workload/workloada
./bin/go-ycsb run rkv -P workload/workloada
notice rn command might yield an UOPDATE_ERROR sometimes.
threadcount=32
fieldlength=80
recordcount=100000000
operationcount=100000000
INSERT - Takes(s): 150.0, Count: 8962, OPS: 59.8, Avg(us): 16698, Min(us): 1429, Max(us): 29279, 99th(us): 25103, 99.9th(us): 25951, 99.99th(us): 28735
INSERT - Takes(s): 160.0, Count: 9550, OPS: 59.7, Avg(us): 16713, Min(us): 1429, Max(us): 29279, 99th(us): 25087, 99.9th(us): 25951, 99.99th(us): 28735
Run finished, takes 2m46.888740023s
INSERT - Takes(s): 166.9, Count: 10000, OPS: 59.9, Avg(us): 16650, Min(us): 1429, Max(us): 29279, 99th(us): 25087, 99.9th(us): 26015, 99.99th(us): 28735
READ - Takes(s): 10.0, Count: 7002, OPS: 700.3, Avg(us): 1420, Min(us): 915, Max(us): 8407, 99th(us): 2225, 99.9th(us): 4899, 99.99th(us): 7427
Run finished, takes 14.256774727s
READ - Takes(s): 14.3, Count: 10000, OPS: 701.5, Avg(us): 1418, Min(us): 915, Max(us): 8839, 99th(us): 2229, 99.9th(us): 4555, 99.99th(us): 8407
INSERT - Takes(s): 110.0, Count: 145534, OPS: 1323.1, Avg(us): 17606, Min(us): 1333, Max(us): 1020927, 99th(us): 26975, 99.9th(us): 40895, 99.99th(us): 1018367
INSERT - Takes(s): 120.0, Count: 162599, OPS: 1355.1, Avg(us): 17438, Min(us): 1333, Max(us): 1015807, 99th(us): 27023, 99.9th(us): 39423, 99.99th(us): 1012735
INSERT - Takes(s): 130.0, Count: 179170, OPS: 1378.3, Avg(us): 17356, Min(us): 1333, Max(us): 1013247, 99th(us): 27023, 99.9th(us): 38687, 99.99th(us): 1010175
Tried ./setup_test_lab.sh ./si_def_4_region_micro.json with RemoteStoreLatencyThresholdInMilliSec set up to 100ms in the configuration.
Got zero remote stores 2 times since it get the latencies few than 100ms from the stores across the states.
Updated to 50ms to make it work
Since we have different si_def settings and users might not have knowledge to set RemoteStoreLatencyThresholdInMilliSec to a reasonable number to distinguish local & remote stores.
Might considering a new strategy to create latency histograms to pick remote instances
threadcount=4
fieldlength=160 #intended 500; due to the encoding, 160 length would yield about 500 payload
recordcount=50000
operationcount=50000
workload=core
***************** properties *****************
"insertproportion"="0"
"fieldlength"="160 #intended 500; due to the encoding, 160 length would yield about 500 payload"
"scanproportion"="0"
"threadcount"="4"
"recordcount"="50000"
"readallfields"="true"
"dotransactions"="true"
"updateproportion"="0.5"
"requestdistribution"="uniform"
"readproportion"="0.5"
"operationcount"="50000"
"workload"="core"
**********************************************
add a component called replication manager that is in charge of replicating writes to replicas while maintaining consistency
threadcount=4
fieldlength=160 #intended 500; due to the encoding, 160 length would yield about 500 payload
recordcount=30000000
operationcount=30000000
workload=core
***************** properties *****************
"requestdistribution"="uniform"
"recordcount"="30000000"
"readproportion"="0.5"
"scanproportion"="0"
"workload"="core"
"insertproportion"="0"
"updateproportion"="0.5"
"fieldlength"="160 #intended 500; due to the encoding, 160 length would yield about 500 payload"
"operationcount"="30000000"
"dotransactions"="false"
"threadcount"="4"
"readallfields"="true"
**********************************************
Yaeger crashed due to out of memory.
Run finished, takes 28h4m22.484828118s
INSERT - Takes(s): 101062.5, Count: 29999999, OPS: 296.8, Avg(us): 13438, Min(us): 4116, Max(us): 3053567, 99th(us): 22319, 99.9th(us): 29327, 99.99th(us): 37087
ubuntu@ip-172-31-2-182:~$ free -g
total used free shared buff/cache available
Mem: 31 11 18 0 1 19
Swap: 0 0 0
Spot-checked a few SI, all have the following:
ubuntu@ip-172-31-11-22:~$ free -g
total used free shared buff/cache available
Mem: 31 15 14 0 1 15
Swap: 0 0 0
***************** properties *****************
"recordcount"="300000"
"threadcount"="4"
"operationcount"="300000"
"readallfields"="true"
"insertproportion"="0"
"scanproportion"="0"
"requestdistribution"="uniform"
"dotransactions"="true"
"fieldlength"="160 #intended 500; due to the encoding, 160 length would yield about 500 payload"
"workload"="core"
"readproportion"="0.5"
"updateproportion"="0.5"
**********************************************
Run finished, takes 11m21.721168971s
READ - Takes(s): 681.7, Count: 149873, OPS: 219.8, Avg(us): 8072, Min(us): 1878, Max(us): 58815, 99th(us): 17807, 99.9th(us): 25311, 99.99th(us): 33727
UPDATE - Takes(s): 681.7, Count: 150127, OPS: 220.2, Avg(us): 10081, Min(us): 3522, Max(us): 81407, 99th(us): 20447, 99.9th(us): 28575, 99.99th(us): 36383
***************** properties *****************
"threadcount"="4"
"requestdistribution"="uniform"
"fieldlength"="160 #intended 500; due to the encoding, 160 length would yield about 500 payload"
"updateproportion"="0"
"insertproportion"="0"
"recordcount"="300000"
"dotransactions"="true"
"scanproportion"="0"
"operationcount"="300000"
"readallfields"="true"
"workload"="core"
"readproportion"="1"
**********************************************
READ - Takes(s): 10.0, Count: 8002, OPS: 800.4, Avg(us): 4989, Min(us): 1865, Max(us): 31775, 99th(us): 14679, 99.9th(us): 19391, 99.99th(us): 31727
READ - Takes(s): 20.0, Count: 15952, OPS: 797.7, Avg(us): 5006, Min(us): 1865, Max(us): 31775, 99th(us): 14687, 99.9th(us): 21887, 99.99th(us): 26927
READ - Takes(s): 30.0, Count: 24022, OPS: 800.8, Avg(us): 4987, Min(us): 1865, Max(us): 31775, 99th(us): 14607, 99.9th(us): 22383, 99.99th(us): 26927
key is user8077940190266422784 and error is Get "http://rkv:8090/kv?key=user8077940190266422784": dial tcp 52.42.125.43:8090: connect: cannot assign requested address
key is user6962341607726016868 and error is Get "http://rkv:8090/kv?key=user6962341607726016868": dial tcp 52.42.125.43:8090: connect: cannot assign requested address
key is user6337301133090096462 and error is Get "http://rkv:8090/kv?key=user6337301133090096462": dial tcp 52.42.125.43:8090: connect: cannot assign requested address
key is user6971927150098868116 and error is Get "http://rkv:8090/kv?key=user6971927150098868116": dial tcp 52.42.125.43:8090: connect: cannot assign requested address
key is user6266626724665952454 and error is Get "http://rkv:8090/kv?key=user6266626724665952454": dial tcp 52.42.125.43:8090: connect: cannot assign requested address
key is user6298913883621934819 and error is Get "http://rkv:8090/kv?key=user6298913883621934819": dial tcp 52.42.125.43:8090: connect: cannot assign requested address
key is user7551286215126588905 and error is Get "http://rkv:8090/kv?key=user7551286215126588905": dial tcp 52.42.125.43:8090: connect: cannot assign requested address
52.42.125.43:8090 is the RKV server
***************** properties *****************
"insertproportion"="0"
"readproportion"="0"
"requestdistribution"="uniform"
"operationcount"="300000"
"recordcount"="300000"
"dotransactions"="true"
"updateproportion"="1"
"scanproportion"="0"
"readallfields"="true"
"workload"="core"
"threadcount"="4"
"fieldlength"="160 #intended 500; due to the encoding, 160 length would yield about 500 payload"
**********************************************
UPDATE - Takes(s): 10.0, Count: 3030, OPS: 303.1, Avg(us): 13181, Min(us): 5500, Max(us): 32831, 99th(us): 22431, 99.9th(us): 27423, 99.99th(us): 32831
UPDATE - Takes(s): 20.0, Count: 6147, OPS: 307.4, Avg(us): 12997, Min(us): 5500, Max(us): 32831, 99th(us): 22399, 99.9th(us): 27055, 99.99th(us): 29855
UPDATE - Takes(s): 30.0, Count: 9277, OPS: 309.3, Avg(us): 12918, Min(us): 5500, Max(us): 32831, 99th(us): 22207, 99.9th(us): 27727, 99.99th(us): 32063
...
...
Run finished, takes 16m11.70180699s
UPDATE - Takes(s): 971.7, Count: 300000, OPS: 308.7, Avg(us): 12936, Min(us): 3868, Max(us): 1020415, 99th(us): 22831, 99.9th(us): 29535, 99.99th(us): 35679
seems a regression. Prior commit does not have this issue.
curl -X POST http://127.0.0.1:8090/kv -d '{"key":"a","value":"3"}'
, it gets normal response The key value pair (a,3) has been saved as revision 1 at 127.0.0.1:6379,172.31.9.140:6379,172.31.12.96:6380
curl http://127.0.0.1:8090/kv?key=a
, it gets back the unexpected response storage not found for 127.0.0.1:6379
come up with a set of methods to evaluate the goals for 630 release
This test is to establish confidence and trim bugs before the 100M test.
threadcount=25
fieldlength=160 #intended 500; due to the encoding, 160 length would yield about 500 payload
recordcount=5000000
operationcount=5000000
nohup ./bin/go-ycsb load rkv -P workloads/workloada > load.log &
16t, 3+3+2+3
22t, 3+3+2+3
25t, 3+3+2+3
32t, 3+3+2+3
3h41m34.857277847s, also estimated as 40min/60*5=3.33 hr
2h59m57.94355228s
get the latest version of rkv source code (commit de9428e), build and start rkv server; seemingly fine; however trying to curl curl http://127.0.0.1:8090/kv?key=a
gets back unexpected response curl: (7) Failed to connect to 127.0.0.1 port 8090: Connection refused
The expected response is rev not found
, since there is no such key yet
This seems a regression. Reverting back to prior commit, the curl gets response mvcc: Revision not found
getting result from rkv; the header indicates json, but the the body is text string
starting rkv service, run following comamnd to create one {ket, rev} pair:
curl -XPOST http://127.0.0.1:8090/kv -d '{"key":"k", "value":"234"}'
assuming the revision created is 1,
curl -v http://127.0.0.1:8090/kv?key=k\&rev=1
gets
* Trying 127.0.0.1:8090...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8090 (#0)
> GET /kv?key=k&rev=1 HTTP/1.1
> Host: 127.0.0.1:8090
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 202 Accepted
< Content-Type: application/json
< Date: Mon, 24 Oct 2022 19:05:14 GMT
< Content-Length: 37
<
The value is 234 with the revision 1
body should be of json format
get non-existent revision for a valid key returns the value of the last-known revision.
given the rkv service is running at http://127.0.0.1:8090/kv, run following comand to create a k-v pair
curl -XPOST http://127.0.0.1:8090/kv -d '{"key":"k", "value":"234"}'
assuming the revision created is 1, run
curl http://127.0.0.1:8090/kv?key=k\&rev=999
gets
The value is 234 with the revision 1
an error message indicating the key-revision combination not exist
after posting values for a specific key, query of its value returns unexpected response: the number of nodes is 1, which means there is no replica
What is expected: the latest value of the key
What is specific: only one (the local) redis is set up
{
"ConsistentHash": "rendezvous",
"BucketSize": 10,
"ReplicaNum": 1,
"StoreType": "redis",
"Concurrent": true,
"Stores": [
{
"RegionType": "local",
"Name": "store1",
"Host": "127.0.0.1",
"Port": 16378
}
]
}
curl -X POST 127.0.0.1:8090/kv -d '{"key":"testk", "value":"testv"}'
; notice the response is The key value pair (testk,testv) has been saved as revision 6 at 127.0.0.1:16378
, assuming successcurl 127.0.0.1:8090/kv?key=testk
the number of nodes is 1, which means there is no replica
As it mentioned in 79353ac#diff-a3d824da3c42420cd5cbb0a4a2c0e7b5bfddd819652788a0596d195dc6e31fa5R70
// returned items identifing backend stores by name, NOT by hostname:port - backend may be other than redis type
Redis backend use hostip:port while others might use name.
The commit does not add any check of backend store type before changing hostip:port(not hostname:port as it describe) to name, which causes the following exceptions when using redis backends
I also checked the config_test.go and found all the test cases have been changed to test DummyLatency datastores.
when running go-ycsb, setting fieldlength=500, so that the total value length would be 5KB. However, in backend redis, the value saved in char about 3 times more than 5KB (noticing the saved value in word in 5000).
below is part of value retrieved from redis:
char number of such value is 17340
map[field0:[87 78 107 72 121 106 114 113 120 67 82 85 107 74 89 106 105 76 70 102 81 121 100 69 104 70 115 117 116 89 67 76 79 70 117 97 80 80 118 71 71 112 113 113 109 72 78 79 73 70 78 121 112 77 100 90 112 69 112 66 88 67 75 76 79 73 82 98 75 100 110 74 65 121 100 76 77 109 72 85 122 83 122 75 73 83 107 112 111 72 119 119 107 119 98 101 81 73 116 85 90 118 87 78 77 113 66 107 66 103 111 113 80 85 102 79 114 111 109 68 118 67 110 72 86 69 74 104 98 99 119 112 75 103 75 103 111 106 104 104 116 120 85 88 109 117 71 120 112 98 86 117 70 78 85 84 110 84 89 116] field1:[87 103 118 70 76 99 107 89 108 105 107 85 120 117 71 97 106 86 71 75 99 98 80 118 108 107 119 103 116 101 68 79 78 81 105 69 122 85 65 71 72 78 67 87 86 107 81 90 108 111 65 84 117 88 82 116 111 117 73 102 66 97 122 116 67 107 103 121 86 98 105 112 67 73 73 69 99 87 90 86 122 113 99 77 112 77 86 105 79 114 115 118 117 101 104 118 115 109 120 69 74 117 73 121 70 65 78 87 120 80 117 100 109 118 100 70 120 85 87 69 114 74 114 117 89 86 116 106 117 107 74 100 76 109 89 75 106 89 114 107 114 100 106 112 72 109 97 108 66 105 102 85 120 90 69 111 77 70 87 84] field2:[83 109 99 105 113 85 100 79 102 122 122 104 111 88 70 74 120 119 113 72 76 75 122 78 78 114 80 69 88 73
...
67 78 114 105 68 122 115 69 122 88 86 98 81 88 90 99 98 118 87 76 113 97 120 84 86 103 76 113 90 67 103 105 87 97 112 78 101 77 98 115 84 88 90 81 84 119 101 106 115 65 98 104 99 107 70 114 102 104 112 73 105 113 81 83 65 102 67 73 77 69 65 103 88 120 121 117 77 119 72 99 98 69 71 103 74 112 106 75 108 86 120 67 98 99 98 79 106 79 101 84 105 78 78 97 73 122 65 85 89 122 112 100 101 121 69 120 116 66 65 119 114 98 87 112 120 78 69 121 76 100 104 115 80 65 101 97 80 122 82 105 84 105 77 99 69 111 88 102 89 111 69 97 84 83 79 78 78 80 116 75 97 73 122 72 67 76 87 115 120 104 76 80 102 106 106 118 109 85 75 70 89 85 90 119 80 90 74 78 111 117 89 122 79 104 110 86 102 103 69 68 121 80 81 122 122 99 101 99 108 103 67 70 81 105 69 109 66 84 89 114 85 81 88 71 108 102 114 106 119 120 113 113 104 110 106 90 83 104 114 84 97 70 101 82 101 105 84 87 99 86 121 81 82 67 110 87 86 76 115 106 84 65 72 107]]"
this bug was found when running perf test with 2 micro configuration.
The config.json file used in this case is
{
"ConsistentHash": "rendezvous",
"BucketSize": 10,
"ReplicaNum": 2,
"StoreType": "redis",
"Concurrent": true,
"Stores": [
{
"Name": "hwperf-0824-1-rkv-lab-si-0",
"Host": "54.219.184.67",
"Port": 6666
},
{
"Name": "hwperf-0824-1-rkv-lab-si-1",
"Host": "54.183.189.182",
"Port": 6666
},
{
"Name": "hwperf-0824-1-rkv-lab-si-2",
"Host": "35.89.67.43",
"Port": 6666
},
{
"Name": "hwperf-0824-1-rkv-lab-si-3",
"Host": "35.90.217.98",
"Port": 6666
}
]
}
rkv server log has error message:
...
2022/08/24 17:20:56 http: panic serving 35.90.155.56:35186: runtime error: invalid memory address or nil pointer dereference
goroutine 2522 [running]:
net/http.(*conn).serve.func1(0xc000089720)
/usr/local/go/src/net/http/server.go:1804 +0x153
panic(0x7655c0, 0xa011f0)
/usr/local/go/src/runtime/panic.go:971 +0x499
go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End(0xc000232180, 0x0, 0x0, 0x0)
/home/ubuntu/regionless-storage-service/vendor/go.opentelemetry.io/otel/sdk/trace/span.go:402 +0x345
panic(0x7655c0, 0xa011f0)
/usr/local/go/src/runtime/panic.go:965 +0x1b9
main.(*KeyValueHandler).createKV(0xc0000c6800, 0x83b590, 0xc00025b880, 0xc0001ec100, 0x0, 0x0, 0x0, 0x0)
/home/ubuntu/regionless-storage-service/cmd/http/main.go:221 +0x25d
main.(*KeyValueHandler).ServeHTTP(0xc0000c6800, 0x83b590, 0xc00025b880, 0xc0001ec100)
/home/ubuntu/regionless-storage-service/cmd/http/main.go:105 +0x3c5
net/http.(*ServeMux).ServeHTTP(0xa12100, 0x83b590, 0xc00025b880, 0xc0001ec100)
/usr/local/go/src/net/http/server.go:2428 +0x1ad
net/http.serverHandler.ServeHTTP(0xc00025a0e0, 0x83b590, 0xc00025b880, 0xc0001ec100)
/usr/local/go/src/net/http/server.go:2867 +0xa3
net/http.(*conn).serve(0xc000089720, 0x83bc00, 0xc000394b00)
/usr/local/go/src/net/http/server.go:1932 +0x8cd
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2993 +0x39b
...
jaeger trace shows 1 span with exception log:
cd scritpts
./select_config.sh 2 micro
./setup_test_lab.sh si_def.json
Understand and evaluate partition algorithm, variants of consistency hashing.
started perf test using scripts/setup_test_lab.sh, with following settings
rkv vm type: t2.large
redis SI vm type: t2.medium
at ycsb host, tageting recordcount 10M, thread count 100, ran
./bin/go-ycsb load rkv -P workloads/workloada #workloada is 50%update+50%read, none deletes
observing the changes of redis key size of all SI backends, noticed that they are not monotonic increasing, but sometimes decreased significantly, like below
54.185.154.69:6379> dbsize
(integer) 253084
54.185.154.69:6379> dbsize
(integer) 515344
54.185.154.69:6379> dbsize
(integer) 516363
54.185.154.69:6379> dbsize
(integer) 518244
54.185.154.69:6379> dbsize
(integer) 518749
54.185.154.69:6379> dbsize
(integer) 525494
54.185.154.69:6379> dbsize
(integer) 606304
54.185.154.69:6379> dbsize
(integer) 3266
54.185.154.69:6379> dbsize
(integer) 90874
54.185.154.69:6379> dbsize
(integer) 91144
54.185.154.69:6379> dbsize
(integer) 91346
when inserting key-value to rkv, revison 2 is missed in the sequence
start rkv server,
run following client commands to insert 3 k-v pairs:
$ curl -X POST 127.0.0.1:8090/kv -d '{"key":"testk", "value":"testv1"}'
The key value pair (testk,testv1) has been saved as revision 1 at 127.0.0.1:16378,127.0.0.1:16378,127.0.0.1:16378
$ curl -X POST 127.0.0.1:8090/kv -d '{"key":"testk", "value":"testv2"}'
The key value pair (testk,testv2) has been saved as revision 3 at 127.0.0.1:16378,127.0.0.1:16378,127.0.0.1:16378
$ curl -X POST 127.0.0.1:8090/kv -d '{"key":"testk", "value":"testv3"}'
The key value pair (testk,testv3) has been saved as revision 4 at 127.0.0.1:16378,127.0.0.1:16378,127.0.0.1:16378
notice that revisions return are 1, 3, 4.
start rkv test lab, run workloada (50% update, 50% read) in 2 steps
Run finished, takes 2.390314559s
READ_ERROR - Takes(s): 2.4, Count: 495, OPS: 207.7, Avg(us): 881, Min(us): 443, Max(us): 8463, 99th(us): 2251, 99.9th(us): 8463, 99.99th(us): 8463
UPDATE - Takes(s): 2.4, Count: 505, OPS: 211.7, Avg(us): 3846, Min(us): 3056, Max(us): 9935, 99th(us): 5827, 99.9th(us): 8551, 99.99th(us): 9935
look at jaeger tracing, find error mvcc: Revision not found
, see the screenshot below:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.