expensify / bedrock Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 82.0 50.31 MB

Rock solid distributed database specializing in active/active automatic failover and WAN replication

Home Page: https://bedrockdb.com

License: GNU Lesser General Public License v3.0

C++ 12.03% Shell 0.04% Makefile 0.07% C 87.86%

bedrock's People

Contributors

Stargazers

Watchers

bedrock's Issues

Docs need navigation bar

Typically there is some sort of hierarchy or navigation bar on a docs website.

For example Bootstrap has a navbar on the left hand side to see the different sections of the docs.

I think that Bedrock should have something similar. For example, I finally found vs_mysql but it doesn't have any navigation to know where I am or where to go next.

Allow application to log to STDOUT

In order to make Bedrock more operationally friendly to containerized
environments it is desirable to allow the application to log to STDOUT (and
potentially STDERR) rather than sending all log mesages to /dev/log.

Does not compile on Fedora 26

Bedrock does not compile on Fedora 26 (and there is no compiled rpm). The problem seems to be that Fedora 26 ships only with g++-7 /c++-17, and the Bedrock code seems not compatible with this new standard.

Explore hosted log storage alternatives to self-hosting ElasticSearch

Problem:

We pay $32K/mo (which might go to as low as $20K if we reserve instances) for ElasticSearch -- on top of a bunch of engineering time to maintain it. Scaling that to 10x will be expensive and difficult, so let's just double check that we're on the right path.

Solution:

Price out a few cloud hosted log services, such as Splunk (on the high end) and ?? (on the low end).

Cc: @Expensify/infra

Plugin examples

What would really help are a couple of examples on how to write (and compile and load) a plugin that defines a sqllite stored procedure or function.

If you currently use a couple of mySQL "CREATE FUNCTION ..." procedures a "simple database conversion" won't get you there.

Cant install on Ubuntu Server 16.04

Like what I read about bedrock and I was just about to start a small project - perfect to try something new.
Tried to install the three first steps worked just fine but the last, install sent me this:
`sudo apt-get install bedrock
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
bedrock : Depends: libpcrecpp0 but it is not installable
E: Unable to correct problems, you have held broken packages.
`
What should I do about that?

MariaDB 10.0 Client (Ubuntu 16.04) Crashes Bedrock

Bedrock version: 4323470

Using the MariaDB 10.0 client on Ubuntu 16.04LTS seems to crash bedrock.

Test case:

root@ubuntu6:~# service bedrock start
root@ubuntu6:~# ps aux|grep bedrock
root      5531  0.0  0.3 400332  7912 ?        Ssl  04:34   0:00 /usr/sbin/bedrock -fork -nodeName bedrock -db /var/lib/bedrock/bedrock.db -serverHost 0.0.0.0:8888 -nodeHost 0.0.0.0:8889 -priority 200 -pidfile /var/run/bedrock.pid -quorumCheckpoint 100 -readThreads 4 -plugins status,db,jobs,cache,mysql -v -cache 10001
root      5538  0.0  0.0  14224  1020 pts/0    S+   04:34   0:00 grep --color=auto bedrock
root@ubuntu6:~# bedrock -version
43234709c216b19ae7ce7ea7a5219467bf50e773
root@ubuntu6:~# mysql --version
mysql  Ver 15.1 Distrib 10.0.28-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2
root@ubuntu6:~# mysql -h127.0.0.1 -e "SELECT 1 AS foo, 2 AS bar;"
ERROR 2006 (HY000) at line 1: MySQL server has gone away
root@ubuntu6:~#

Unfortunately, there doesn't seem to be anything logged:

root@ubuntu6:~# cat /var/log/syslog
Dec 23 04:42:14 ubuntu6 bedrock: xxxxx (STCPServer.cpp:57) operator() [main] [dbug] Accepting socket from '127.0.0.1:58732' on port 'localhost:3306'
Dec 23 04:42:14 ubuntu6 bedrock: xxxxx (BedrockServer.cpp:473) postSelect [main] [info] Plugin 'MySQL' accepted a socket from '127.0.0.1:58732'
Dec 23 04:42:14 ubuntu6 bedrock: xxxxx (MySQL.cpp:252) onPortAccept [main] [info] {MySQL} Accepted MySQL request from '127.0.0.1:58732'
Dec 23 04:42:14 ubuntu6 bedrock: xxxxx (MySQL.cpp:263) onPortRecv [main] [dbug] {MySQL} Received command #-123: '0B00000185A0000000726F6F740000'
Dec 23 04:42:14 ubuntu6 bedrock: xxxxx (MySQL.cpp:338) onPortRecv [main] [info] {MySQL} Sending OK
Dec 23 04:42:14 ubuntu6 bedrock: xxxxx (MySQL.cpp:263) onPortRecv [main] [dbug] {MySQL} Received command #3: '210000000373656C65637420404076657273696F6E5F636F6D6D656E74206C696D69742031'
Dec 23 04:42:14 ubuntu6 bedrock: xxxxx (MySQL.cpp:269) onPortRecv [main] [info] {MySQL} Processing query 'select @@version_comment limit 1'
root@ubuntu6:~# ps aux|grep bedrock
root      5735  0.0  0.0  14224  1004 pts/0    S+   04:42   0:00 grep --color=auto bedrock
root@ubuntu6:~#

Bedrock Logo?

I added an entry for Bedrock to my encyclopedia of databases:

https://dbdb.io/db/bedrock

Is there a logo that I can use for the article?

Thanks!

-- Andy

License question

Hi,
Have any plan to change license to developer friendly MIT or BSD?

Timeout param not working properly

I don't know if this is something particular to the jobs plugin or that affects all bedrock.
The timeout param is not working. Tested this by:

Clearing the jobs table.
Starting a request to GetJob with a timeout of 60 seconds and connection wait.
Create a new job to be processed now.
Job is not returned, we always return a timeout.
Make the same GetJob query again, job is returned.

/cc @tylerkaraszewski @quinthar

freebsd compile assistance

Hi All,

Thanks for creating and maintaining this project!

What's happening here when using gmake on freebsd?

g++ -g -std=gnu++11 -DSVERSION="\"c178f76262d25959e73b2ecb82e5925f23045b40\"" -Wall -I/usr/home/sean/bin/Bedrock -I/usr/home/sean/bin/Bedrock/mbedtls/include -MMD -MF libstuff/libstuff.d -MT libstuff/libstuff.h.gch -c libstuff/libstuff.h
/bin/sh: g++: not found
gmake: *** No rule to make target 'libstuff/libstuff.d', needed by '.build/libstuff/SData.d'.  Stop.

% gmake -v
GNU Make 4.2.1
Built for i386-portbld-freebsd11.0
Copyright (C) 1988-2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

ArchLinux PKGBUILD

Just wanted to say that I discovered this project today and found it really interesting (given my love for sqlite) so I wanted to give it a test run and created an ArchLinux package:

https://aur.archlinux.org/packages/bedrock/

Thanks for contributing such an interesting project, looking forward to make use of it on the future!

Tag v1.0 release

Can you please tag the exact version that corresponds to the v1.0 public release? I'd like to review the commit since then to discuss when to do a v1.1 and so on.

Starting bedrock (via systemctl): bedrock.serviceFailed to start bedrock.service: Unit bedrock.service not found. with installation on Ubuntu 16.04

The result:

sh@sh-ubuntu:~$ sudo apt-get install bedrock
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  linux-headers-4.4.0-36 linux-headers-4.4.0-36-generic linux-image-4.4.0-36-generic linux-image-extra-4.4.0-36-generic linux-signed-image-4.4.0-36-generic linux-tools-4.4.0-36
  linux-tools-4.4.0-36-generic
Use 'sudo apt autoremove' to remove them.
The following NEW packages will be installed:
  bedrock
0 upgraded, 1 newly installed, 0 to remove and 52 not upgraded.
Need to get 2,845 kB of archives.
After this operation, 7,920 kB of additional disk space will be used.
Get:1 https://apt.bedrockdb.com/ubuntu xenial/main amd64 bedrock all 1.0.20~7a4c774 [2,845 kB]
Fetched 2,845 kB in 47s (60.1 kB/s)                                                                                                                                               
Selecting previously unselected package bedrock.
(Reading database ... 309425 files and directories currently installed.)
Preparing to unpack .../bedrock_1.0.20~7a4c774_all.deb ...
Unpacking bedrock (1.0.20~7a4c774) ...
Processing triggers for systemd (229-4ubuntu10) ...
Processing triggers for ureadahead (0.100.0-19) ...
ureadahead will be reprofiled on next reboot
Setting up bedrock (1.0.20~7a4c774) ...
[....] Starting bedrock (via systemctl): bedrock.serviceFailed to start bedrock.service: Unit bedrock.service not found.
 failed!
dpkg: error processing package bedrock (--configure):
 subprocess installed post-installation script returned error exit status 5
Processing triggers for systemd (229-4ubuntu10) ...
Processing triggers for ureadahead (0.100.0-19) ...
Errors were encountered while processing:
 bedrock
E: Sub-process /usr/bin/dpkg returned an error code (1)

Info:

PackageArchitecture: all
ProcVersionSignature: Ubuntu 4.4.0-38.57-generic 4.4.19
RelatedPackageVersions:
 dpkg 1.18.4ubuntu1.1
 apt  1.2.12~ubuntu16.04.1
Tags: loki third-party-packages
Title: package bedrock 1.0.20~7a4c774 [origin: apt.bedrockdb.com] failed to install/upgrade: subprocess installed post-installation script returned error exit status 5
Uname: Linux 4.4.0-38-generic x86_64
UnreportableReason: This is not an official elementary package. Please remove any third party package and try again.
UpgradeStatus: No upgrade log present (probably fresh install)
_MarkForUpload: True

Database checkpoints

So I managed to insert a few rows into a table using the mySQL interface but when I look at the sqlite files the *shm and *wal files are being updated but the main .db file never is. No matter how long I wait. Doesn't seem good:

total 136
-rw-r--r-- 1 ubuntu ubuntu 26624 Jun 22 20:07 bedrock.db
-rw-r--r-- 1 ubuntu ubuntu 32768 Jun 22 21:57 bedrock.db-shm
-rw-r--r-- 1 ubuntu ubuntu 43000 Jun 22 21:57 bedrock.db-wal
-rw-r--r-- 1 ubuntu ubuntu 26624 Jun 22 20:05 bells.db
-rwxrwxr-x 1 ubuntu ubuntu  1629 Jun 22 20:05 sqliteinit

I tried doing the magic sqlite thing:

$res = $db->query("PRAGMA wal_checkpoint(PASSIVE);");

but Bedrock wasn't at all happy about that:

Jun 22 22:07:22 : xxxxx (SPerformanceTimer.cpp:72) log [sync] [info] [performance] 10001313us elapsed, 4us in Commit Lock, 10001309us other. 0.00% usage.
Jun 22 22:07:25 : xxxxx (STCPServer.cpp:52) acceptSocket [main] [dbug] Accepting socket from '127.0.0.1:56344' on port 'localhost:3306'
Jun 22 22:07:25 : xxxxx (BedrockServer.cpp:860) postPoll [main] [info] Plugin 'MySQL' accepted a socket from '127.0.0.1:56344'
Jun 22 22:07:25 : xxxxx (MySQL.cpp:224) onPortAccept [main] [info] {MySQL} Accepted MySQL request from '127.0.0.1:56344'
Jun 22 22:07:25 : xxxxx (MySQL.cpp:234) onPortRecv [main] [dbug] {MySQL} Received command #-115: '390000018DA20B00000000C00800000000000000000000000000000000000000000000000000006D7973716C5F6E61746976655F70617373776F726400'
Jun 22 22:07:25 : xxxxx (MySQL.cpp:313) onPortRecv [main] [info] {MySQL} Sending OK
Jun 22 22:07:25 : xxxxx (MySQL.cpp:234) onPortRecv [main] [dbug] {MySQL} Received command #3: '2000000003505241474D412077616C5F636865636B706F696E742850415353495645293B'
Jun 22 22:07:25 : xxxxx (MySQL.cpp:244) onPortRecv [main] [info] {MySQL} Processing query 'PRAGMA wal_checkpoint(PASSIVE);'
Jun 22 22:07:25 : xxxxx (BedrockServer.cpp:932) postPoll [main] [info] Waiting for 'Query' to complete.
Jun 22 22:07:25 : xxxxx (BedrockServer.cpp:418) worker [worker0] [info] [performance] Dequeued command Query in worker, 0 commands in queue.
Jun 22 22:07:25 : xxxxx (BedrockCore.cpp:16) peekCommand [worker0] [dbug] Peeking at 'Query'
Jun 22 22:07:25 : xxxxx (DB.cpp:71) peekCommand [worker0] [info] {af871f6548d7:DB} Query appears to be read/write, queuing for processing.
Jun 22 22:07:25 : xxxxx (BedrockCore.cpp:36) peekCommand [worker0] [info] Command 'Query' is not peekable, queuing for processing.
Jun 22 22:07:25 : xxxxx (BedrockServer.cpp:535) worker [worker0] [info] [performance] Sending non-parallel command Query to sync thread. Sync thread has 0 queued commands.
Jun 22 22:07:25 : xxxxx (BedrockServer.cpp:283) sync [sync] [info] [performance] Sync thread dequeued command Query. Sync thread has 0 queued commands.
Jun 22 22:07:25 : xxxxx (BedrockCore.cpp:16) peekCommand [sync] [dbug] Peeking at 'Query'
Jun 22 22:07:25 : xxxxx (DB.cpp:71) peekCommand [sync] [info] {af871f6548d7:DB} Query appears to be read/write, queuing for processing.
Jun 22 22:07:25 : xxxxx (BedrockCore.cpp:36) peekCommand [sync] [info] Command 'Query' is not peekable, queuing for processing.
Jun 22 22:07:25 : xxxxx (BedrockCore.cpp:79) processCommand [sync] [dbug] Processing 'Query'
Jun 22 22:07:25 : xxxxx (SQLite.cpp:199) beginConcurrentTransaction [sync] [dbug] [concurrent] Beginning transaction
Jun 22 22:07:25 : xxxxx (libstuff.cpp:2150) SQuery [sync] [dbug] BEGIN CONCURRENT
Jun 22 22:07:25 : xxxxx (libstuff.cpp:2150) SQuery [sync] [dbug] PRAGMA schema_version;
Jun 22 22:07:25 : xxxxx (libstuff.cpp:2150) SQuery [sync] [dbug] PRAGMA wal_checkpoint(PASSIVE);;
Jun 22 22:07:25 : xxxxx (SQLite.cpp:145) _sqliteLogCallback [sync] [info] {SQLITE} Code: 6, Message: statement aborts at 1: [PRAGMA wal_checkpoint(PASSIVE);] database table is locked
Jun 22 22:07:25 : xxxxx (libstuff.cpp:2183) SQuery [sync] [warn] 'read/write transaction', query failed with error #6 (database table is locked): PRAGMA wal_checkpoint(PASSIVE);;
Jun 22 22:07:25 : xxxxx (DB.cpp:122) processCommand [sync] [alrt] {af871f6548d7:DB} Query failed: 'PRAGMA wal_checkpoint(PASSIVE);;'
Jun 22 22:07:25 : xxxxx (SQLite.cpp:437) rollback [sync] [info] Rolling back transaction: 
Jun 22 22:07:25 : xxxxx (libstuff.cpp:2150) SQuery [sync] [dbug] ROLLBACK
Jun 22 22:07:25 : xxxxx (SQLite.cpp:446) rollback [sync] [info] Rollback successful.
Jun 22 22:07:25 : xxxxx (BedrockCore.cpp:162) _handleCommandException [sync] [alrt] Error processing command 'Query' (502 Query failed), ignoring: Query^M commandExecuteTime: 1498169245484941^M Connection: ^M format: json^M plugin: MySQL^M query: PRAGMA wa
Jun 22 22:07:25 : xxxxx (BedrockServer.cpp:1008) _reply [sync] [info] Plugin 'MySQL' handling response '502 Query failed' to request 'Query'
Jun 22 22:07:25 : xxxxx (MySQL.cpp:234) onPortRecv [main] [dbug] {MySQL} Received command #1: '0100000001'
Jun 22 22:07:25 : xxxxx (MySQL.cpp:313) onPortRecv [main] [info] {MySQL} Sending OK
Jun 22 22:07:25 : xxxxx (libstuff.cpp:1697) S_recvappend [main] [warn] recv(0.0.0.0:0) failed with response 'Connection reset by peer' (#104), closing.
Jun 22 22:07:25 : xxxxx (STCPManager.cpp:136) postPoll [main] [dbug] Connection to '127.0.0.1:56344' died (recv=0, send=1)
Jun 22 22:07:25 : xxxxx (STCPManager.cpp:207) closeSocket [main] [dbug] Closing socket '127.0.0.1:56344'

Unclear on how the duplication works, and what happens if a single DB is updated directly.

This is partially related to #35. I would love to get a more in depth explanation of how Bedrock actually does its duplication, and what happens in certain common cases that distributed databases run into. But in particular, I had a use case while playing around with Bedrock that I would love more documentation about.

I followed the multizone example where I had 3 different Bedrock nodes running. I created a database with an auto-incrementing primary key. I created a record (ID=1) and saw it replicated to all 3 databases. I then shut them all down, and opened node0.db directly with SQLite and added a new record (ID=2). I then restarted all of the nodes, and node0 showed both records (ID=1 and ID=2), but node1 and node2 only showed the second record (ID=2). I then shut down all of the nodes, and deleted the unreplicated record (ID=2) from node0. I then restarted all of the nodes, and created a new record (ID=3). Except that node1 and node2 showed its ID as 2, but node0 showed its ID as 3.

A deeper dive into how Bedrock works would explain this behavior and allow me to know what kinds of things I should/shouldn't do. For example, it appears that I shouldn't expect that I could migrate a single database directly, and then start nodes and expect them to replicate; instead I would have to make all database changes while multiple nodes were already running.

Existing pid file should raise warning on startup

Bedrock version: 4323470

When using "service bedrock start", the script should check to see if the pid file exists, and if so, raise an error instead of silently ignoring it. (or should be handled in another way better).

This causes a bit of confusion when bedrock crashes (and didn't clean up its files).

Things are running well:

root@ubuntu6:~# ps aux|grep bedrock
root      5764  0.0  0.3 400332  8012 ?        Ssl  04:46   0:00 /usr/sbin/bedrock -fork -nodeName bedrock -db /var/lib/bedrock/bedrock.db -serverHost 0.0.0.0:8888 -nodeHost 0.0.0.0:8889 -priority 200 -pidfile /var/run/bedrock.pid -quorumCheckpoint 100 -readThreads 4 -plugins status,db,jobs,cache,mysql -v -cache 10001
root      5771  0.0  0.0  14224  1012 pts/0    S+   04:47   0:00 grep --color=auto bedrock
root@ubuntu6:~# bedrock -version
43234709c216b19ae7ce7ea7a5219467bf50e773
root@ubuntu6:~# cat /var/run/bedrock.pid 
5764root@ubuntu6:~#

Crash the server:

root@ubuntu6:~# mysql -h127.0.0.1 -e "SELECT 1 AS foo, 2 AS bar;"
ERROR 2006 (HY000) at line 1: MySQL server has gone away
root@ubuntu6:~# ps aux|grep bedrock
root      5779  0.0  0.0  14224   932 pts/0    S+   04:47   0:00 grep --color=auto bedrock

pid file remains:

root@ubuntu6:~# cat /var/run/bedrock.pid 
5764root@ubuntu6:~#

Start server, but it didn't really start because pid file remains:

root@ubuntu6:~# service bedrock start
root@ubuntu6:~# ps aux|grep bedrock
root      5805  0.0  0.0  14224  1004 pts/0    S+   04:48   0:00 grep --color=auto bedrock
root@ubuntu6:~# cat /var/run/bedrock.pid 
5764root@ubuntu6:~#

Better way to start the server is to use restart, which cleans up pids:

root@ubuntu6:~# service bedrock restart
root@ubuntu6:~# ps aux|grep bedrock
root      5829  0.0  0.3 400332  8060 ?        Ssl  04:48   0:00 /usr/sbin/bedrock -fork -nodeName bedrock -db /var/lib/bedrock/bedrock.db -serverHost 0.0.0.0:8888 -nodeHost 0.0.0.0:8889 -priority 200 -pidfile /var/run/bedrock.pid -quorumCheckpoint 100 -readThreads 4 -plugins status,db,jobs,cache,mysql -v -cache 10001
root      5836  0.0  0.0  14224   932 pts/0    S+   04:48   0:00 grep --color=auto bedrock
root@ubuntu6:~# cat /var/run/bedrock.pid 
5829root@ubuntu6:~# 
root@ubuntu6:~#

Thanks,
-will

Add `retryAfter` to Bedrock::Jobs::GetJob

Problem:

Most jobs are automatically processed, such as via BedrockWorkerManager. And automatically retrying a job that is automatically processed is generally a bad idea, as not knowing anything else, whatever failed the first time will likely fail again. (And if it failed for a reason that will likely not happen again, then the worker should throw a RetryException... but that's a different story.)

However, manually processed jobs (eg, by loading a webpage that dequeues a job for processing) often fail for reasons that are non-repeatable, such as the person who loaded the webpage to process that job just closing the browser and walking away. In that scenario, we want to detect an "abandoned" manual job and automatically retry it.

(Note: in this scenario the website should also expire the job, such that if the web user does come back and try to finish it, the website should refuse it. Otherwise we have two web users doing the same job, which ain't great.)

Solution:

There are likely several ways to do this, but here is my recommendation:

Add a retryAfter column to jobs, default 0.
Update CreateJob such that it records the new optional retryAfter parameter.
Update GetJob such that the place that marks a job as RUNNING to check if retryAfter is set. If so:
- set state=RUNQUEUED
  - We use a new state to signal that it is not just running, but queued to re-run in the future as well.
  - This requires updating everywhere that checks for state="RUNNING" to be (state="RUNNING" || state="RUNQUEUED")
- set nextRun=SCURRENT_TIMESTAMP()+retryAfter, thereby ensuring it'll be re-run at the right time
Update GetJob (in both peekCommand and processCommand) to replace state='QUEUED' with state IN ('QUEUED', 'RUNQUEUED')
Update the repeat logic in FinishJob to set lastScheduled = nextRun-retryAfter;
To be safe, update CreateJob to check, if parentJobID is set, that the parent doesn't have a retryAfter set. (If so, return 402 Auto-retrying parents cannot have children.) Let's not automatically retry parent jobs that create child jobs -- therein lies the path to madness.
- Note: It's fine for an automatically processed parent to have auto-retrying manual children. However, it's not fine for an auto-retrying manual parent to have children.
To be safe, update UpdateJob to return 402 Auto-retrying jobs cannot be updated once running if state='RUNQUEUED'. If a job is to be automatically retried, we can't have it modifying itself prior to the retry.

Performance considerations

I'm guessing this will have a negligible effect upon GetJob performance. However, query plan interpretation is an art that I haven't mastered. Here it is with the current query:

dbarrett@www1:~$ sudo readdb.sh "EXPLAIN QUERY PLAN
> SELECT jobID, name, data
> FROM jobs
> WHERE
>  state='QUEUED'
>  AND current_timestamp >= nextRun
>  AND name GLOB 'www-prod/*'
> ORDER BY priority DESC, nextRun ASC
> LIMIT 1;
> "
selectid    order       from        detail                                                                    
----------  ----------  ----------  --------------------------------------------------------------------------
0           0           0           SEARCH TABLE jobs USING INDEX jobsStateNextRunName (state=? AND nextRun<?)
0           0           0           USE TEMP B-TREE FOR ORDER BY

And here it is with the proposed update:

dbarrett@www1:~$ sudo readdb.sh "EXPLAIN QUERY PLAN
> SELECT jobID, name, data
> FROM jobs
> WHERE
>  state IN ('QUEUED', 'RUNQUEUED')
>  AND current_timestamp >= nextRun
>  AND name GLOB 'www-prod/*'
> ORDER BY priority DESC, nextRun ASC
> LIMIT 1;
> "
selectid    order       from        detail                                                                    
----------  ----------  ----------  --------------------------------------------------------------------------
0           0           0           SEARCH TABLE jobs USING INDEX jobsStateNextRunName (state=? AND nextRun<?)
0           0           0           EXECUTE LIST SUBQUERY 1                                                   
0           0           0           USE TEMP B-TREE FOR ORDER BY                                              
dbarrett@www1:~$

Thoughts or concerns on this before I get started? @cead22 @iwiznia @flodnv @tylerkaraszewski @mcnamamj

Add Bedrock::Jobs::CancelJob

Problem:

There are scenarios where a parent has a series of children, but the work done by one of the children obviates the others. In this scenario it'd be nice to make a "best attempt" to cancel the other children (eg, if they haven't yet started RUNNING), but currently there is no safe way to do this.

Solution:

Add a formal CancelJob command that:

Applies the CancelJob logic recursively to all children of the job being cancelled
If this job is RUNNING, do nothing -- the train has already left the station.
If the job is FINISHED or CANCELLED, do nothing -- there's nothing left to cancel.
If the job is QUEUED:
- If the job has a parentJobID (indicating it is a child), then the child job is put into the CANCELLED state. If this is the last child, the parent is resumed.
- Otherwise, delete the job
If the job is PAUSED, do nothing -- one of its children must still be running so we can't cancel the parent.

How does this sound? @cead22 @iwiznia @tylerkaraszewski @mcnamamj

Catch exceptions in main()?

I expect that exception handling is usually supported by a C++ program. I wonder why your function "main" does not contain corresponding try and catch instructions so far.

How do you think about recommendations by Matthew Wilson in an article?

Would you like to adjust the implementation if you consider effects for uncaught/unhandled exceptions like they are described by Danny Kalev?

Documentation on how the synchronization happens?

I just came across this today via O'Reilly's Four Short Links, and I can't seem to find anything about how the actual replication and consensus stuff work on the project.

cannot get cluster to start, unauthenticated node(s)

I have a 2-node cluster I am trying to get up and running and I am seeing errors in the logs. The nodes are definitely talking together but will not start. The logs refer to unauthenticated peers. I see nothing in the docs about this.

I added OTHERNODES line to the /etc/init.d/bedrock startup script to point to each other and altered the PRIORITY, changed the port to 18888 of the main process to avoid collissions. Otherwise, no other changes on Ubuntu 14.04 Server.

Here is an excerpt from the logs:

Jan  2 17:19:47 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:2202) _onConnect [write0] [info] {bedrock2/SEARCHING} ->{10.211.1.19} Sending LOGIN
Jan  2 17:19:47 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:724) update [write0] [info] {bedrock2/SEARCHING} Signed in to 0 of 1 full peers (1 with permaslaves), timeout in 271825ms
Jan  2 17:19:47 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:724) update [write0] [info] {bedrock2/SEARCHING} Signed in to 0 of 1 full peers (1 with permaslaves), timeout in 271824ms
Jan  2 17:19:47 ubusdm16 bedrock: xxxxx (STCPNode.cpp:198) postSelect [write0] [hmmm] {bedrock2} ->{10.211.1.19} Lost peer connection after 1ms, reconnecting in 1923ms
Jan  2 17:19:47 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:724) update [write0] [info] {bedrock2/SEARCHING} Signed in to 0 of 1 full peers (1 with permaslaves), timeout in 271824ms
Jan  2 17:19:48 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:724) update [write0] [info] {bedrock2/SEARCHING} Signed in to 0 of 1 full peers (1 with permaslaves), timeout in 270824ms
Jan  2 17:19:49 ubusdm16 bedrock: xxxxx (STCPNode.cpp:218) postSelect [write0] [info] {bedrock2} ->{10.211.1.19} Retrying the connection
Jan  2 17:19:49 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:2202) _onConnect [write0] [info] {bedrock2/SEARCHING} ->{10.211.1.19} Sending LOGIN
Jan  2 17:19:49 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:724) update [write0] [info] {bedrock2/SEARCHING} Signed in to 0 of 1 full peers (1 with permaslaves), timeout in 269901ms
Jan  2 17:19:49 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:724) update [write0] [info] {bedrock2/SEARCHING} Signed in to 0 of 1 full peers (1 with permaslaves), timeout in 269899ms
Jan  2 17:19:49 ubusdm16 bedrock: xxxxx (STCPNode.cpp:198) postSelect [write0] [hmmm] {bedrock2} ->{10.211.1.19} Lost peer connection after 1ms, reconnecting in 4817ms
Jan  2 17:19:49 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:724) update [write0] [info] {bedrock2/SEARCHING} Signed in to 0 of 1 full peers (1 with permaslaves), timeout in 269899ms
Jan  2 17:19:50 ubusdm16 bedrock: xxxxx (STCPNode.cpp:105) postSelect [write0] [warn] {bedrock2} Unauthenticated node 'bedrock1' attempted to connected, rejecting.

Here is the process execution options:

784 ?        Ssl    0:00 /usr/sbin/bedrock -fork -nodeName bedrock2 -db /var/lib/bedrock/bedrock.db -serverHost 0.0.0.0:18888 -nodeHost 0.0.0.0:8889 -priority 200 -pidfile /var/run/bedrock.pid -quorumCheckpoint 100 -readThreads 4 -plugins status,db,jobs,cache,mysql -peerList 10.211.1.19:8889

Properly handle UNIX signals

Bedrock currently ignores the proscribed behavior of most of the signals defined
in:

POSIX.1-1990
POSIX.1-2001
4.2BSD
4.4BSD
SVr4
Sys V

There are five basic actions defined for signal handling as per the POSIX
programmers manual:

  T     Abnormal termination of the process.

  A     Abnormal termination of the process with additional actions.

  I     Ignore the signal.

  S     Stop the process.

  C     Continue the process, if it is stopped; otherwise, ignore the signal.

  The effects on the process in each case are described in the System Interfaces
   volume of POSIX.1‐2008, Section 2.4.3, Signal Actions.

Additionally, the signals defined as per the POSIX programmers manual are as
follows:

The ISO C standard only requires the signal names SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV, and SIGTERM to be defined.

The following signals shall be supported on all implementations (default actions are explained below the table):

┌──────────┬────────────────┬────────────────────────────────────────────────────┐
│ Signal   │ Default Action │                    Description                     │
├──────────┼────────────────┼────────────────────────────────────────────────────┤
│SIGABRT   │       A        │ Process abort signal.                              │
│SIGALRM   │       T        │ Alarm clock.                                       │
│SIGBUS    │       A        │ Access to an undefined portion of a memory object. │
│SIGCHLD   │       I        │ Child process terminated, stopped,                 │
│          │                │ or continued.                                      │
│SIGCONT   │       C        │ Continue executing, if stopped.                    │
│SIGFPE    │       A        │ Erroneous arithmetic operation.                    │
│SIGHUP    │       T        │ Hangup.                                            │
│SIGILL    │       A        │ Illegal instruction.                               │
│SIGINT    │       T        │ Terminal interrupt signal.                         │
│SIGKILL   │       T        │ Kill (cannot be caught or ignored).                │
│SIGPIPE   │       T        │ Write on a pipe with no one to read it.            │
│SIGQUIT   │       A        │ Terminal quit signal.                              │
│SIGSEGV   │       A        │ Invalid memory reference.                          │
│SIGSTOP   │       S        │ Stop executing (cannot be caught or ignored).      │
│SIGTERM   │       T        │ Termination signal.                                │
│SIGTSTP   │       S        │ Terminal stop signal.                              │
│SIGTTIN   │       S        │ Background process attempting read.                │
│SIGTTOU   │       S        │ Background process attempting write.               │
│SIGUSR1   │       T        │ User-defined signal 1.                             │
│SIGUSR2   │       T        │ User-defined signal 2.                             │
│SIGPOLL   │       T        │ Pollable event.                                    │
│SIGPROF   │       T        │ Profiling timer expired.                           │
│SIGSYS    │       A        │ Bad system call.                                   │
│SIGTRAP   │       A        │ Trace/breakpoint trap.                             │
│SIGURG    │       I        │ High bandwidth data is available at a socket.      │
│SIGVTALRM │       T        │ Virtual timer expired.                             │
│SIGXCPU   │       A        │ CPU time limit exceeded.                           │
│SIGXFSZ   │       A        │ File size limit exceeded.                          │
│          │                │                                                    │
└──────────┴────────────────┴────────────────────────────────────────────────────┘

Currently Bedrock's behavior is to only watch for the following signals:

SIGQUIT
SIGTTIN
SIGTTOU
SIGUSR2

In the event of any other signal it will exit as per BedrockServer.cpp#L495:

// For anything else, just shutdown -- but only if we're not already shutting down

This means that if the system is attempting to notify the host of a change in
window size (SIGWINCH), the system will exit. While it is the prerogative of
Bedrock to decide how to respond to signals, it seems naive to quit on signals
which the Linux man pages (man 7 signal) proscribe a default behavior of
"ignore" (as is the case of SIGCONT).

In testing, I confirmed that Bedrock exited in all of the following cases:

(BedrockServer.cpp:506) postSelect [main] [info] Beginning graceful shutdown due
to 'SIGCONT, (unknown#50)', closing command port on 'localhost:8888'

(BedrockServer.cpp:506) postSelect [main] [info] Beginning graceful shutdown due
to 'SIGINT, (unknown#34)', closing command port on 'localhost:8888'

(BedrockServer.cpp:506) postSelect [main] [info] Beginning graceful shutdown due
to 'SIGURG, (unknown#55)', closing command port on 'localhost:8888'

(BedrockServer.cpp:506) postSelect [main] [info] Beginning graceful shutdown due
to 'SIGUSR1, (unknown#42)', closing command port on 'localhost:8888'

(BedrockServer.cpp:506) postSelect [main] [info] Beginning graceful shutdown due
to 'SIGWINCH, (unknown#60)', closing command port on 'localhost:8888'

This was discovered while running Bedrock conncurrently with a piece of software
containing a memory leak which, during page swapping, issued SIGWINCH.

This issue also raises concerns about the ability to trigger a core dump for
analysis of a running system.

Clarify, if the solution can run on MS Windows

I did not find a note about possible compatibility of bedrock with MS Windows (none, possible, planned, implemented).

For some use cases would MS Windows implementation help.

Bedrock should create the database if it doesn't exist

The bedrock binary doesn't create the database unless you run "-clean". This should just be default behavior.

Improve child job handling

Background:

Bedrock::Jobs has the concept of a "child" job: just set parentJobID when calling CreateJob and the parent/child linkage will be retained. The design pattern is that the parent would create child jobs to work in parallel, and then "pause" itself (by calling FinishJob on itself) to to wait for the children to complete. When all children complete, the parent would be "resumed", and harvest the results of the children.

Problem:

There are a couple concerns with the current design:

The code that "resumes" the parent doesn't explicitly verify that it's currently PAUSED. In theory it should always be, but if it's not for any reason, this might double-resume the parent, with undefined results.
Right now the child is created in the QUEUED state, meaning it can start running the instant it's created. This means it could even finish prior to the parent calling FinishJob on itself, meaning Bedrock will think all the children are done and never pause itself. If the worker only looks at the children upon resume (which is what the design assumes), then it might never harvest the results.
The design assumes that children are created by their own parents, OR by a sibling. Race conditions abound if arbitrary parent relationships are allowed, but it's safe for a job to create its own children (in which the parent will be RUNNING), as well as for one child to create a job with the same parent as its own (in which case the parent will be PAUSED).
There is no explicit way for a worker to know if the job they're processing is running for the first time, or if it's paused and being resumed. Rather, the parent job is expected to save an array of child jobs into its own data while calling FinishJob (and thus pausing itself), and then querying the results of those jobs via Bedrock::DB. This works, but is clunky.
Though we don't currently have any jobs that work this way, if a parent job creates more child jobs after being resumed (eg, has child jobs, is paused/resumed, and then creates more children), when it's resumed a second time it's unclear which of the child jobs were part of the first or second batch.

Solution:

I suggest we tighten this up as follows:

Verify that the parent is PAUSED when any child (not just the last) calls FinishJob, and return 405 Parent must be PAUSED when children are running. By making this check early and often, we help shorten the window between whatever problem happened and we discover it.
Create children in the PAUSED state, and then mark them as QUEUED when the parent is paused. However, if the parent is PAUSED (indicating a child is creating a sibling job) then set the new child to QUEUED.
Verify that the parent is RUNNING or PAUSED when any attempt is made to create a child job, and return 405 Only parents can create their own children otherwise.
When finishing a child job, mark it as FINISHED rather than deleting it immediately. Also, when resuming a parent job, return a JSON object (eg, associative array) of the data objects for all FINISHED child jobs, with the jobID as the key, and an object consisting of data and state. This clearly signals to the worker "You are being resumed after finishing a bunch of child jobs", as well as alleviates the need for the worker to manually query the results of the children. This is backwards compatible with the current method, but obviates it over time.
When finishing a parent job, delete any FINISHED child jobs even if there are other PAUSED child jobs outstanding (as is done by step (2) above). This way the next time the parent is resumed, the child jobs returned (via step (4) above) will only be for the latest round.
- Admittedly, this might be a bit of a premature optimization given that we don't actually intend to do this right now. But given the low cost of the change and the generality of this tool, I suggest we tighten up this edge case now while we're thinking of it.

Thoughts or concerns on the above before I get started? @cead22 @tylerkaraszewski @iwiznia @mcnamamj

Replication / failover details?

I saw in the docs that this project uses paxos under-the-hood for leader election, but I was having a bit of trouble finding where in the code-base this is implemented. Could someone point me to the paxos impl?

bedrock : Depends: libstdc++6 (> 7.1) but 5.4.0-6ubuntu1~16.04.9 is to be installed

Hi,

I'm trying to install bedrock on an Ubuntu 16.04 image on Google Computing Engine.
Here is the error I'm getting:

Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
 bedrock : Depends: libstdc++6 (> 7.1) but 5.4.0-6ubuntu1~16.04.9 is to be installed
E: Unable to correct problems, you have held broken packages.

It's a bare image, nothing has been installed on it except docker.

I have successfully installed bedrock (and ran it) on a VM for another client where I installed Ubuntu 16.04 from the official ISO in January.

Am I missing something?

Encrypted Communication Between Nodes

Hello,

Unless I am mistaken, It doesn't seem that communication between nodes is encrypted with TLS/SSL.

Are there any plans to introduce it? This really is mandatory for an enterprise app operating over a WAN.

any plan to becomr package manager for debian

This product looks fantastic. I would love it become a standard part of Debian whivh is my distrinution of choice Any plan to make it easier for installing om it and their derivatives?

Make Bedrock::Cache::WriteCache return current contents

Background

We are going to use Bedrock's Cache plugin to make use of the distributed/replicated storage for checking incoming webhook notifications and if they might have been processed by a different node already. We also want to avoid running into race conditions (e.g. in case the same notification gets sent twice in the same second) by doing a read first and a write after that.

Solution

Let's update Bedrock::Cache::WriteCache to return the current contents of a value if we're overwriting.

Bedrock doesn't allow concurrent schema changes

@quinthar It is currently not possible to create a table once the system has started:

$ nc localhost 8000
Query: create table foobar ( foo int, bar int );

502 Query failed
error: cannot modify database schema within CONCURRENT transaction

Bedrock creates a CONCURRENT transaction for every query, which doesn't allow schema changes. Replacing the CONCURRENT transaction by a normal one (by replacing line 85 in BedrockCore.cpp from if (!_db.beginConcurrentTransaction()) to if (!_db.beginTransaction()) fixes the issue. However this change affects every transaction.

Rework CLI flag mechanism

When attempting to use new software on a POSIX system many users will attempt to
begin by running the utility with --help.

When attempting to do this with bedrock the application actually starts and
seemingly just ignores the flags. In addition because there is no output to
STDOUT or STDERR it is unclear if the application is:

operating
malicious
crashing
operating in some other unexpected way

While my initial request was being framed as "add support for the flag --help"
in thinking about this further it would seem that the request becomes "make the
binary behave in more predictable ways".

This could include a general reworking of the flags, though it could also be
more focused on failing/exiting with a non-zero error code and displaying the
help context when unexpected input is provided.

Is compilation on OSX supported?

First, thx for the outstanding project, I wish it would get more exposure!

On the YN thread here https://news.ycombinator.com/item?id=12739771 @quinthar says that most of the development happens on OSX.

After some fiddling around (gcc instead of gcc-6 and g++ instead of g++-6 for my system) the build breaks here:

$ make all
g++ -O0 -g -std=gnu++11 -DSVERSION="\"954c75063b232bd3d39f56c81e2acce75d380a69\"" -Wall -I/private/tmp/Bedrock -I/private/tmp/Bedrock/mbedtls/include -MMD -MF .build/libstuff/STCPNode.d  -o .build/libstuff/STCPNode.o -c libstuff/STCPNode.cpp
libstuff/STCPNode.cpp:167:45: error: no matching function for call to 'max'
                            peer->latency = max(STimeNow() - message.calc64("Timestamp"), 1ul);
                                            ^~~

It would be great to have some instructions how to build bedrock directly on OSX (if it is already the used workflow for current development).

[EDIT] I also had to init git submodules:

$ git submodule init
$ git submodule update

Add multi-threaded replication

Problem:

Bedrock already leverages multiple CPUs for read capacity. However, it only has a single write thread. And while selective synchronization largely decouples write performance from network performance, the write thread is limited to the capabilities of a single CPU. (In theory it's also limited by the system's disk performance, but on a modern system it's basically impossible for a single CPU to saturate IO, so that's not the bottleneck.)

Now, some databases (eg, MySQL) do allow multi-threaded writes on the single database. However, it is still limited to single threaded replication -- meaning there is only a single connection between the master and the slave, and thus only a single CPU on the slave that is processing the replicated traffic. This means it's not possible to take full advantage of the master's many CPUs, lest it generate more commit volume than a single CPU on the slave can replicate.

Solution:

Add true multi-threaded writes and replication. Unfortunately, multi-threaded replication is very complicated. It requires multiple connections between the master and each slave (to communicate the replication in parallel instead of serializing over a single connection), and since this means commits will arrive on each slave in a different order, it must be possible to safely apply the commits in any order. Finding sets of non-conflicting transactions is not easy. Luckily, SQLite's session extension does essentially this. Accordingly, we can upgrade Bedrock to have true multi-threaded replication as follows:

Phase 1: Prove it'll work.

First we need to confirm that we actually have enough potential concurrency in our write transactions to make it worth it. This depends very heavily on the actual application load being put onto the database -- if every write just increments the same row, then clearly that's not going to be parallelizable. But if you have a very large database with multiple users making changes to very different parts of it, then there should be ample opportunities for parallel writes. And it doesn't actually take much: if you can do even two transactions at the same time, that doubles your write capacity. So, first let's see how much parallelism we can get.

Compile with the SQLITE_ENABLE_SESSION flag
Download https://www.sqlite.org/src/timeline?t=changebatch and make sure src/ext/session/sqlite3changebatch.h/cpp are included.
Add a column batchID and changeset to the journal to keep track of which batch the commit is in, and the changeset for that commit.
Define globals variables like changebatch and batchID somewhere, with synchronized access. When you go MASTERING, initialize these by:
Open a new database handle
Call sqlite3changebatch_new() with that handle to initialize changebatch
You don't need to hold onto that handle -- it's stored inside the changebatch
Initialize batchID = (SELECT batchID FROM journal ORDER BY id DESC LIMIT 1)+1;
* Note: Because we serialize commits AND update the batch while committing, that means the last row will always have the highest batchID. I can't think of any scenario in which two nodes might somehow get different batchIDs for the same commit, but let's think about this carefully.
Clean these up when you stand down by:
Call sqlite3changebatch_db() to get the database handle
Call sqlite3changebatch_delete() to delete the changebatch
Delete the database handle
Call sqlite3session_create() inside SQLite::beginTransaction(), using the write thread's database handle.

@tylerkaraszewski - I think we only call these on master? I don't recall off the top of my head. If these are used on slaves, this plan needs to be adjusted.

Inside SQLite::commit(), lock the global changebatch.
Call sqlite3session_fullchangeset()inside SQLite::commit() to get a binary blob describing every change that happened during this transaction, called something like changeset.
Attempt to add changeset to changebatch using sqlite3changebatch_add().
If it returns SQLITE_OK, this means changeset does not conflict with any changeset already added to changebatch. In other words, changeset can be safely applied in parallel with other changesets already in changebatch, in any order.
If it returns anything else, this means changeset does conflict with at least one of the existing changesets in changebatch. In this case, increment the batchID, call sqlite3changebatch_zero() to reset the batchset, and then call sqlite3changebatch_add() again to initialize with changeset (eg, seeding the new accumulated changebatch with this first transaction).
Record the batchID in the journal table, along with changeset.
Either way, call COMMIT as normal. (The master doesn't need to apply the changeset as the changes have already been applied to its database.)
For now, we don't actually use the changeset itself -- call sqlite3changeset_delete(). (Later we'll send this to slaves.)
Call sqlite3session_delete() as we are done with the session for this transaction.
Unlock changebatch.

Note: SQLite already serializes commits on the same database, so us adding our own serialization on top of this doesn't really give up any concurrency. However, this way we ensure that all of the changebatch management code is atomically processed with the actual COMMIT on the master.

Deploy this to production and monitor how the changeset creation affects overall performance, as well as determine the average batch size under real load.
We can analyze the journal table to understand average batch sizes, or even figure out which transactions are most likely to conflict.

Phase 2: Implement binary replication

This is left largely TBD. But in general:

Upgrade the replication protocol to take either the current raw SQL or a binary changeset. Let's support both simultaneously so when we roll it out for testing, we can enable it on master with a command-line switch -- but if everything breaks, we just kill the master and the cluster falls back to SQL replication.
Add a command-line switch to the master to use binary replication.
Confirm that binary replication has an acceptable performance impact.

Phase 3: Add multiple write threads

This is where the magic happens:

Somehow make the slave write thread wait for the previous batch to finish before committing transactions in the new batch.
Add multiple write threads to the master and all slaves (they must have the same number of write threads), and have each connect to the other nodes to form N parallel clusters. Just make the node port equal to nodePort + writeThreadID, such that every write thread is talking to the corresponding write thread in each of its peers.
This means every write thread is going to have a different state -- it's conceivable that you might actually be SLAVING on one write thread while simultaneously MASTERING on another. (This would hopefully be a crazy, very temporary edge case.) Accordingly, wait for all write threads to agree on a state before doing anything that depends on the state (eg, don't open your command port until all write threads agree you are either SLAVING or MASTERING.
The write threads themselves should "just work" as there is already an internal queue of messages, so they should just pull from the queue in parallel.

Conclusion

That's the basic idea. There are a lot of details to be resolved, so let's update this issue with the plan before we get started.

Crash when sysbench disconnects

Hi!

I've been testing bedrock with sysbench (just want to see MySQL compatibility), and noticed it would always crash when I Ctrl+C sysbench.

Bedrock version: 4323470

Test case:

root@ubuntu2:~# service bedrock start
 * Starting Expensify Bedrock Server bedrock                                                                                                                      [ OK ] 
root@ubuntu2:~# ps aux|grep bedrock
root      4278  0.0  0.0 399492  7604 ?        Ssl  15:25   0:00 /usr/sbin/bedrock -fork -nodeName bedrock -db /var/lib/bedrock/bedrock.db -serverHost 0.0.0.0:8888 -nodeHost 0.0.0.0:8889 -priority 200 -pidfile /var/run/bedrock.pid -quorumCheckpoint 100 -readThreads 4 -plugins status,db,jobs,cache,mysql -v -cache 10001
root      4287  0.0  0.0  11748  2128 pts/2    S+   15:25   0:00 grep --color=auto bedrock
root@ubuntu2:~# bedrock -version
43234709c216b19ae7ce7ea7a5219467bf50e773
root@ubuntu2:~# timeout 2 sysbench/sysbench/sysbench --test=sysbench/sysbench/tests/db/oltp.lua --mysql-user=root --mysql-password= --mysql-db=test --mysql-table-engine=innodb --mysql-ignore-duplicates=on --oltp-read-only=off --oltp-dist-type=uniform --oltp-skip-trx=on --oltp-auto-inc=off --init-rng=on --oltp-test-mode=complex --max-requests=0 --report-interval=1 --num-threads=8 --mysql-host=127.0.0.1 --mysql-port=3306 --oltp-table-size=5000000 --oltp-tables-count=4 --max-time=600 run > bench.log 2>&1
root@ubuntu2:~# ps aux|grep bedrock
root      4302  0.0  0.0  11748  2240 pts/2    S+   15:26   0:00 grep --color=auto bedrock
root@ubuntu2:~#

Stacktrace:

Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SSignal.cpp:62) SSendSignal [main] [eror] Got SIGSEGV, logging stack trace.
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SLog.cpp:16) SLogStackTrace [main] [warn] /usr/sbin/bedrock(_Z14SLogStackTracev+0x24) [0x59ec29]
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SLog.cpp:16) SLogStackTrace [main] [warn] /usr/sbin/bedrock(_Z11SSendSignali+0x48e) [0x59dc86]
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SLog.cpp:16) SLogStackTrace [main] [warn] /lib/x86_64-linux-gnu/libc.so.6(+0x36cb0) [0x7f45837dccb0]
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SLog.cpp:16) SLogStackTrace [main] [warn] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSs6appendERKSs+0x1c) [0x7f4583e3bf2c]
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SLog.cpp:16) SLogStackTrace [main] [warn] /usr/sbin/bedrock(_ZN11STCPManager6Socket4sendERKSs+0x27) [0x5a51bd]
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SLog.cpp:16) SLogStackTrace [main] [warn] /usr/sbin/bedrock(_ZN19BedrockPlugin_MySQL21onPortRequestCompleteERK5SDataPN11STCPManager6SocketE+0xae7) [0x5c10fd]
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SLog.cpp:16) SLogStackTrace [main] [warn] /usr/sbin/bedrock(_ZN13BedrockServer10postSelectERSt3mapIi6pollfdSt4lessIiESaISt4pairIKiS1_EEERm+0x5608) [0x5328ea]
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SLog.cpp:16) SLogStackTrace [main] [warn] /usr/sbin/bedrock(main+0x2b01) [0x5ae993]
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SLog.cpp:16) SLogStackTrace [main] [warn] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f45837c7f45]
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SLog.cpp:16) SLogStackTrace [main] [warn] /usr/sbin/bedrock() [0x475915]

Right before the stack trace, minus some test data:

Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (STCPManager.cpp:255) closeSocket [main] [dbug] Closing socket '127.0.0.1:38330'
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (STCPManager.cpp:255) closeSocket [main] [dbug] Closing socket '127.0.0.1:38332'
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockServer.cpp:38) BedrockServer_WorkerThread_ProcessDirectMessages [read1] [info] Processing direct message 'CANCEL_REQUEST'
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (STCPManager.cpp:255) closeSocket [main] [dbug] Closing socket '127.0.0.1:38334'
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockServer.cpp:38) BedrockServer_WorkerThread_ProcessDirectMessages [read0] [info] Processing direct message 'CANCEL_REQUEST'
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockServer.cpp:38) BedrockServer_WorkerThread_ProcessDirectMessages [read2] [info] Processing direct message 'CANCEL_REQUEST'
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockServer.cpp:38) BedrockServer_WorkerThread_ProcessDirectMessages [write0] [info] Processing direct message 'CANCEL_REQUEST'
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockServer.cpp:54) BedrockServer_WorkerThread_ProcessDirectMessages [read2] [info] No need to cancel request #3381 because not queued, ignoring.
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockServer.cpp:54) BedrockServer_WorkerThread_ProcessDirectMessages [write0] [info] No need to cancel request #3381 because not queued, ignoring.
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockServer.cpp:54) BedrockServer_WorkerThread_ProcessDirectMessages [read1] [info] No need to cancel request #3381 because not queued, ignoring.
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockServer.cpp:54) BedrockServer_WorkerThread_ProcessDirectMessages [read0] [info] No need to cancel request #3381 because not queued, ignoring.
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockNode.cpp:97) _peekCommand [read3] [info] Plugin 'DB' peeked command 'Query'
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockNode.cpp:116) _peekCommand [read3] [info] Responding '200 OK' to read-only 'Query'.
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SQLiteNode.cpp:292) openCommand [read3] [info] {bedrock/MASTERING} Processed peekable command 'Query' (bedrock#619)
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SQLiteNode.cpp:342) getProcessedCommand [read3] [info] {bedrock/MASTERING} Returning processed command 'Query' (bedrock#619)
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockServer.cpp:137) BedrockServer_WorkerThread [read3] [info] Peek successful. Putting command 'bedrock#619' on processed list.
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (SQLiteNode.cpp:405) closeCommand [read3] [dbug] {bedrock/MASTERING} Closing command 'bedrock#619'
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockServer.cpp:38) BedrockServer_WorkerThread_ProcessDirectMessages [read3] [info] Processing direct message 'CANCEL_REQUEST'
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockServer.cpp:54) BedrockServer_WorkerThread_ProcessDirectMessages [read3] [info] No need to cancel request #3381 because not queued, ignoring.
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockServer.cpp:634) postSelect [main] [info] Processed command 'Query' #3381 (result '200 OK') from 'internal' in 133=0+133 ms
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (BedrockServer.cpp:656) postSelect [main] [info] Plugin 'MySQL' handling response '200 OK' to request 'Query'
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (libstuff.cpp:1380) _SParseJSONObject [main] [dbug] Parsed: 'headers':'["c"]'
Dec 22 15:26:10 ubuntu2 bedrock: xxxxx (libstuff.cpp:1380) _SParseJSONObject [main] [dbug] Parsed: 'rows':'[["01744209700-28851195920-33965673137-73349934359-0917788...<snip>...

The full log (13MB compressed to 700KB), minus the sysbench data, can be found here.

Thanks,
-will

Remove an unnecessary null pointer check

An extra null pointer check is not needed in the macro "SDELETE".

3/4

I've noticed that Bedrock sometimes thinks there are 4 peers though I only have 3 copies running. From the logs it looks like one node is listed twice, under both its name and IP address, and others are griping about the conflict.

Aug 11 18:21:02 : xxxxx (SQLiteNode.cpp:1095) _onMESSAGE [sync] [info] {us-east-1a/WAITING} ->{us-west-2c} Peer switched from 'STANDINGUP' to 'SEARCHING' commit #14060 (B24B4EBA8995FF5A8C3154EE2170B6C68D0E6999)
Aug 11 18:21:02 : xxxxx (STCPNode.cpp:169) postPoll [sync] [dbug] {us-east-1a} ->{us-west-2c} Received 'STATE': STATE^M CommitCount: 14060^M Hash: B24B4EBA8995FF5A8C3154EE2170B6C68D0E6999^M Priority: 215^M State: WAITING^M Transfer-Encoding: ^M Content-Leng
Aug 11 18:21:02 : xxxxx (SQLiteNode.cpp:1095) _onMESSAGE [sync] [info] {us-east-1a/WAITING} ->{us-west-2c} Peer switched from 'SEARCHING' to 'WAITING' commit #14060 (B24B4EBA8995FF5A8C3154EE2170B6C68D0E6999)
Aug 11 18:21:02 : xxxxx (STCPNode.cpp:169) postPoll [sync] [dbug] {us-east-1a} ->{us-west-2c} Received 'STATE': STATE^M CommitCount: 14060^M Hash: B24B4EBA8995FF5A8C3154EE2170B6C68D0E6999^M Priority: 215^M State: STANDINGUP^M Transfer-Encoding: ^M Content-L
Aug 11 18:21:02 : xxxxx (SQLiteNode.cpp:1095) _onMESSAGE [sync] [info] {us-east-1a/WAITING} ->{us-west-2c} Peer switched from 'WAITING' to 'STANDINGUP' commit #14060 (B24B4EBA8995FF5A8C3154EE2170B6C68D0E6999)
Aug 11 18:21:02 : xxxxx (SQLiteNode.cpp:1229) _onMESSAGE [sync] [hmmm] {us-east-1a/WAITING} **->{us-west-2c} Denying standup request because peer '100.121.34.215' is 'STANDINGUP'**
Aug 11 18:21:02 : xxxxx (SQLiteNode.cpp:501) update [sync] [hmmm] {us-east-1a/WAITING} ->{us-west-2c} Multiple peers trying to stand up (also '100.121.34.215'), let's hope they sort it out.
Aug 11 18:21:02 : xxxxx (SQLiteNode.cpp:561) update [sync] [dbug] {us-east-1a/WAITING} Connected to 4 of 4 full peers (4 with permaslaves), priority=132

$ docker exec switchboard_office_1 discover region
100.121.68.174|region|us-east-2a
100.121.44.132|region|us-east-1a
**100.121.34.215|region|us-west-2c**

All nodes are started with an alpha-numeric -nodeName and with only IP addresses in the -peerList.
xx

Cannot start bedrock server

I am running on Centos 7 so had to compile from source. Having no compilation errors, I am trying to start the bedrock server. I tried this both as a user and as root and am getting the same error:

Nov 21 15:53:27 pi-node2 bedrock: xxxxx (BedrockServer.cpp:1076) postPoll [main] [info] Ready to process commands, opening command port on 'localhost:8888'
Nov 21 15:53:27 pi-node2 bedrock: xxxxx (libstuff.cpp:89) SException [main] [info] Throwing exception with message: 'couldn't bind' from libstuff/libstuff.cpp:1551
Nov 21 15:53:27 pi-node2 bedrock: xxxxx (libstuff.cpp:1584) S_socket [main] [warn] Failed to open TCP port 'localhost:8888': couldn't bind(errno=98 'Address already in use')
Nov 21 15:53:27 pi-node2 bedrock: xxxxx (STCPServer.cpp:21) openPort [main] [eror] Assertion failed: (port.s >= 0) != true
Nov 21 15:53:27 pi-node2 bedrock: xxxxx (SLog.cpp:15) SLogStackTrace [main] [warn] ./bedrock(_Z14SLogStackTracev+0x24) [0x56f8ab]
Nov 21 15:53:27 pi-node2 bedrock: xxxxx (SLog.cpp:15) SLogStackTrace [main] [warn] ./bedrock(_ZN10STCPServer8openPortERKSs+0x42f) [0x57008b]
Nov 21 15:53:27 pi-node2 bedrock: xxxxx (SLog.cpp:15) SLogStackTrace [main] [warn] ./bedrock(_ZN13BedrockServer8postPollERSt3mapIi6pollfdSt4lessIiESaISt4pairIKiS1_EEERm+0x9f7) [0x4bbd61]
Nov 21 15:53:27 pi-node2 bedrock: xxxxx (SLog.cpp:15) SLogStackTrace [main] [warn] ./bedrock(main+0x2b19) [0x4a7542]
Nov 21 15:53:27 pi-node2 bedrock: xxxxx (SLog.cpp:15) SLogStackTrace [main] [warn] /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f3be3ff9c05]
Nov 21 15:53:27 pi-node2 bedrock: xxxxx (SLog.cpp:15) SLogStackTrace [main] [warn] ./bedrock() [0x4a1865]

Any pointers or help?

Allow configuration via environment variables

In order to more easily use Bedrock in a containerized manner, it would be nice
to utilize configuration with environment variables.

-versionOverride
-db
-serverHost host:port
-nodeName
-nodeHost host:port
-peerList
-priority
-plugins
-cacheSize
-readThreads <#>
-queryLog
-maxJournalSize <#commits>

For each of these it would seem that taking the form BEDROCK_ followed by the
option name in all caps would follow, e.g.
BEDROCK_DB=/var/lib/bedrock/bedrock.db or BEDROCK_READTHREADS=12.

The chickens and the eggs

So I was trying to set up a cluster that could “grow” over time — didn’t know all the IP addresses in advance so would start 1 node, then the second would contact the 1st, and the third would contact the 1st and 2nd, etc.

Well, bedrock didn’t like that at all: "Incoming connection failed from … (unauthenticated node) …”. Seems to only want a static list of nodes known at the beginning of time.

So I “fixed” STCPNode.cpp to accept new nodes (it’s safe on my secure overlay network) with an extra addPeer() call and that works for me. I also tried to add a command-line flag to enable/disable this feature but ran into the police-state c++ behavior of having locked up the command “args”, incommunicado, in some object where you couldn’t get at them from where you need them. So no user-friendly command option.

Makefile mistakenly updated for GCC Paths

In commit 75b1f4e it seems that an attempt was made to utilize a different version of GCC than the default version for the developers distro. This change goes against the documented update mechanism for Ubuntu (and other distributions) and creates an case where the software will not compile on some systems (as I found out).

The developer should make the change using update-alternatives[1] on Debian/Ubuntu flavored systems as the default version of the C/C++ compiler is actually a symbolic link using the alternatives mechanism. This allows the user to manage multiple concurrent versions of the compiler or even use third party utilities in their place. This is why in general users do not call gcc, but cc as the use of alternatives allows for users to use/test other toolchains. A more user friendly read on how this works can be found here[2].

[1] http://manpages.ubuntu.com/manpages/precise/man8/update-alternatives.8.html
[2] https://codeyarns.com/2015/02/26/how-to-switch-gcc-version-using-update-alternatives/

reserved identifier violation

I would like to point out that an identifier like "_CT_" does not fit to the expected naming convention of the C++ language standard.
Would you like to adjust your selection for unique names?

Bedrock should be packaged with a config file

Now that it's an official open-source project, we should make it follow expected practices about what the package comes with. A config file would be a good place to start. Currently any persistent custom configs have to be put in the init script, but we shouldn't expect people to hack on the init script, they should have a config file that sets these options. The easiest path to this is probably just a config file with variables, that is sourced by the init script, but we could consider having the binary look at the config file instead.

Add informative version information for -version

Please fix version information:

sh@sh-ubuntu:~$ bedrock -version
7a4c77495b3f079644f465015307204f4d9a8da3

Add arch, package info and product version, something similar like:

sh@sh-ubuntu:~$ mysql --version
mysql  Ver 14.14 Distrib 5.7.16, for Linux (x86_64) using  EditLine wrapper

Add error message for non-existing options

The test:

sh@sh-ubuntu:~$ bedrock --wth
sh@sh-ubuntu:~$ 

sh@sh-ubuntu:~$ bedrock --version
sh@sh-ubuntu:~$

The way it should act -> "Error: unknown option --wth, please use -help to list available options"

Improve signal handling

We currently see some inconsistent behavior in our signal handling, particularly around handling SIGSEGV. Sometimes we get a stack trace, sometimes not. Sometimes we can generate a core dump. Sometimes not.

Which threads are able to do what and the safety of SWARN inside a signal handler are largely uninvestigated. We should improve the robustness of our signal handling, make sure the right signals are processed by the right threads, and make sure that communication from the signal handler back to other parts of the application are done safely.

Verbose by default doesn't seem to good

By default, it seems like verbose logging is enabled:

root@ubuntu2:~# service bedrock start
 * Starting Expensify Bedrock Server bedrock                                                                                                                      [ OK ] 
root@ubuntu2:~# ps aux|grep bedrock
root      3704  0.0  0.0 399492  7604 ?        Ssl  14:17   0:00 /usr/sbin/bedrock -fork -nodeName bedrock -db /var/lib/bedrock/bedrock.db -serverHost 0.0.0.0:8888 -nodeHost 0.0.0.0:8889 -priority 200 -pidfile /var/run/bedrock.pid -quorumCheckpoint 100 -readThreads 4 -plugins status,db,jobs,cache,mysql -v -cache 10001

That doesn't seem like a good default. My 100-query sysbench run added 10M to syslog:

root@ubuntu2:~# logrotate -f /etc/logrotate.conf 
root@ubuntu2:~# ls -la /var/log/syslog
-rw-r----- 1 syslog adm 0 Dec 22 14:21 /var/log/syslog
root@ubuntu2:~# sysbench/sysbench/sysbench --test=sysbench/sysbench/tests/db/oltp.lua --mysql-user=root --mysql-password= --mysql-db=test --mysql-table-engine=innodb --mysql-ignore-duplicates=on --oltp-read-only=off --oltp-dist-type=uniform --oltp-skip-trx=on --oltp-auto-inc=off --init-rng=on --oltp-test-mode=complex --max-requests=0 --report-interval=30 --num-threads=8 --mysql-host=127.0.0.1 --mysql-port=3306 --oltp-table-size=5000000 --oltp-tables-count=4 --max-requests=100 run > bench.log 2>&1
root@ubuntu2:~# ls -la /var/log/syslog
-rw-r----- 1 syslog adm 10327373 Dec 22 14:22 /var/log/syslog
root@ubuntu2:~# grep -v "bedrock" /var/log/syslog | wc -l
0
root@ubuntu2:~#

Maybe there's a better way to handle this?

Thanks,
-will

Update versioning to use `%year%.%month%.%day%` (eg, "17.01.12")

Let's aim to do a monthly release, with "dot releases" if there are just small bugfixes and cleanup, and "full releases" if there are meaningful functional changes. In this case it looks like there are no externally-visible (eg, interface-changing) functional changes, so we'd just do a dot release. Thoughts?

Also, any thoughts on how to put plugins into a dynamically-loading library such that we can use the public repo for our actual servers, and just separately version/release our internal Expensify plugin? This'll help ensure the public repo stays up to date, rather than us continuously struggling to remember to package and release the latest changes.

Cc: @tylerkaraszewski @righdforsa @cead22

Bedrock MySQL plugin does not work with MySQL 5.7 clients

5.7 client fails with ERROR 2027 Malformed packet, 5.6 works fine

expensify / bedrock Goto Github PK

bedrock's People

Contributors

Stargazers

Watchers

Forkers

bedrock's Issues

Problem:

Solution:

Problem:

Solution:

Performance considerations

Problem:

Solution:

Background:

Problem:

Solution:

Background

Solution

Problem:

Solution:

Phase 1: Prove it'll work.

Phase 2: Implement binary replication

Phase 3: Add multiple write threads

Conclusion

Recommend Projects

Recommend Topics

Recommend Org