foundationdb / fdb-document-layer Goto Github PK

View Code? Open in Web Editor NEW

207.0 27.0 29.0 1.79 MB

A document data model on FoundationDB, implementing MongoDB® wire protocol

License: Apache License 2.0

CMake 1.65% Dockerfile 0.35% Shell 2.15% C++ 64.33% C 1.56% Assembly 0.25% Python 29.64% Rich Text Format 0.06%

foundationdb document-database

fdb-document-layer's People

Contributors

Stargazers

Watchers

fdb-document-layer's Issues

Compound index selection in query planner is broken

Document Layer allows users to create compound indexes and updates them properly. But, the query planner is not yet using them. Query planner uses them just like Simple indexes. For example, if we have a compound index on fields a and b, query planner treats the index as a Simple Index on a. So, if there is a query with predicate a == "foo" and b == "bar" - query planner would scan the index with bounds foo, foo0, and runs FilterPlan on it to look for bar. Instead, it should scan with bounds foo:bar, foo:bar0.

Keep listIndexes() response compatible

Document Layer returns only secondary indexes in response to listIndexes(). Mongo drivers expect primary index (on _id, which is maintained anyway) also part of the response. It's an easy fix, we might as well do it just to make the drivers happy.

Include primary index
Add background rebuild info

Store numeric field names as string keys, instead of integers

Document Layer is storing numeric field names as binary integers, this is limiting numeric field names to be an integer. I am guessing this decision is taken to maintain the order of numeric fields. This is not a valid assumption. All JSON field names are strings.

In [74]: db.coll.insert({'_id': 'foo', '12345678901234567890': 'test'})
Out[74]: 'foo'

In [75]: for row in db.coll.find():
    ...:     print row
    ...:
{u'_id': u'foo', u'-1': u'test'}

Also, changing this behavior would make array expansion code much cleaner. As array indexes are integers, it is hard to guess if a kye is pointing to an array element or a numeric field.

Related code is here

void insertElementRecursive(bson::BSONElement const& elem, Reference<IReadWriteContext> cx) {
	std::string fn = elem.fieldName();
	if (std::all_of(fn.begin(), fn.end(), ::isdigit)) {
		const char* c_fn = fn.c_str();
		insertElementRecursive(atoi(c_fn), elem, cx);
	} else {
		insertElementRecursive(fn, elem, cx);
	}
}

Read-only transactions

It's just syntactic sugar on top of FDB transactions. This makes code much better and more readable by making it clear if a transaction was ever used for updates. We can't do this with current code, Flow bindings classes are not virtual.

blocked on: apple/foundationdb#1027

Database view is broken with mongo-express

mongo-express admin web interface is broken with

    at /app/server/node_modules/mongo-express/lib/routes/database.js:40:49
    at handleCallback (/app/server/node_modules/mongodb/lib/utils.js:95:56)
    at /app/server/node_modules/mongodb/lib/db.js:313:5
    at /app/server/node_modules/mongodb-core/lib/connection/pool.js:455:18
    at process.internalTickCallback (internal/process/next_tick.js:70:11)

It is failing in this line of the code, where it is trying to show numExtents of dbStats.

https://docs.mongodb.com/manual/reference/command/dbStats/#dbStats.numExtents

Blocking #29

Use std::string for printable strings only

In quite a few places we use std::string to store byte buffers that are not printable strings. We should use Standalone<StringRef> instead. And use std::string only for printable strings.

Strip debug symbols into separate file

This will help to keep the binary smaller.

Create official Docker image on Docker Hub

As with FoundationDB, there's no official Docker image, despite the presence of a Dockerfile. It'd be nice to have an official image.

Run smoke tests part of PRB

We have some smoke tests part of correctness. Although we call them unit tests, we need to setup FDB and Doc Layer instances and the python scripts run some smoke tests. We should have some kind of verification with PRBs.

Fail on capped collection creation

We don't support capped collections. Until we support we should explicitly fail if someone tries to create one.

Query planner generating wrong bounds for compound indexes

$ python test/correctness/document-correctness.py --doclayer-host localhost --doclayer-port 27018 forever doclayer mm --seed 9203356367461099619 --num-doc 300 --num-iter 1 --no-update --no-sort --no-numeric-fieldnames
Instance: 0459504932136
========================================================
ID : 35746 iteration : 1
========================================================
Query results didn't match!
Query: {'$and': [{u'E': None}, {'$and': [{u'C': None}, {u'A': {'$lte': 'c'}}]}]}
Projection: OrderedDict([(u'C', True), (u'D', True)])

  pymongo.collection   (0)
  mongo_model          (1): {u'_id': datetime.datetime(1970, 1, 22, 10, 7, 43)}

  RESULT SET DIFFERENCES (as 'sets' so order within the returned results is not considered)
    Only in mongo_model : {'_id': 1970-01-22 10:07:43}

python /Users/bmuppana/src/fdb-document-layer/test/correctness/document-correctness.py --mongo-host localhost --mongo-port 27018 --doclayer-host localhost --doclayer-port 27018 forever doclayer mm --seed 9203356367461099619 --num-doc 300 --num-iter 1 --no-update --no-sort --no-numeric-fieldnames

Found this against d2840e9. Consistently reproducible with the above seed.

explain() response should have user readable keys

explain() return information about query plan. Whether an index is being used or not, and how the scan is being used. Right now, it looks something like this

In [13]: db.correctness475041058659.find({'A': { 'A': None, 'C': {}}}).explain()
Out[13]:
{u'explanation': {u'source_plan': {u'projection': u'{}',
   u'source_plan': {u'bounds': {u'begin': u'3\\x10\\x00\\xff\\x00\\xff\\x00\\xff\\x0aA\\x00\\xff\\x03C\\x00\\xff\\x05\\x00\\xff\\x00\\xff\\x00\\xff\\x00\\xff\\x00\\xff\\x00',
     u'end': u'3\\x10\\x00\\xff\\x00\\xff\\x00\\xff\\x0aA\\x00\\xff\\x03C\\x00\\xff\\x05\\x00\\xff\\x00\\xff\\x00\\xff\\x00\\xff\\x00\\xff\\x00'},
    u'index name': u'A_1_B_1_B_1',
    u'type': u'index scan'},
   u'type': u'projection'},
  u'type': u'non-isolated'}}

Although, this gives an overview of what’s going on. It will be much more useful if the keys are user readable, instead of FDB keys.

Unique indexes treat missing value as "unique"

If you create an unique index and the fields of the index are missing, that will be treated as a unique value for the index. This is not how MongoDB behaves.

Parallel index rebuilds

This is a follow-up feature request for #4.

Once we have fault-tolerant index builds we should look into parallel builds. This can follow task bucket pattern. Tasks should be small enough to fit in one transaction. If the task doesn't finish in a single transaction, it should be split. This makes each task atomic. Maintaining status could be as simple as tracking pending tasks. Reading shard keys between index bounds would give us an estimation of the index size.

It would have been ideal to use the task buckets here. But, the task buckets live in fdbclient and use ReadYourWritesTransaction. They don't go through fdb_flow. Also, index rebuild is a very specific case we are better off having the custom code.

build error on boostrap on ubuntu

building error with "error: version 'fdb' requested but 'g++-fdb' not found and version '7.3.0' of default 'g++' does not match", here is the full log.

zhifan@ubuntu-zhifan:~/github$ curl -L -J -O https://dl.bintray.com/boostorg/release/1.67.0/source/boost_1_67_0.tar.gz && tar -xzf boost_1_67_0.tar.gz && cd boost_1_67_0 && ./bootstrap.sh --prefix=./ && echo "using gcc : fdb : ${CXX} ;" >> ./tools/build/src/user-config.jam && cat ./tools/build/src/user-config.jam && ./b2 toolset=gcc-fdb install --with-filesystem --with-system
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:09 --:--:--     0
100 98.5M  100 98.5M    0     0  3102k      0  0:00:32  0:00:32 --:--:-- 5666k
curl: Saved to filename 'boost_1_67_0.tar.gz'
Building Boost.Build engine with toolset gcc... tools/build/src/engine/bin.linuxx86_64/b2
Detecting Python version... 2.7
Detecting Python root... /usr
Unicode/ICU support for Boost.Regex?... /usr
Generating Boost.Build configuration in project-config.jam...

Bootstrapping is done. To build, run:

    ./b2
    
To adjust configuration, edit 'project-config.jam'.
Further information:

   - Command line help:
     ./b2 --help
     
   - Getting started guide: 
     http://www.boost.org/more/getting_started/unix-variants.html
     
   - Boost.Build documentation:
     http://www.boost.org/build/doc/html/index.html

using gcc : fdb :  ;
/home/zhifan/github/boost_1_67_0/tools/build/src/tools/gcc.jam:125: in gcc.init from module gcc
error: toolset gcc initialization:
error: version 'fdb' requested but 'g++-fdb' not found and version '7.3.0' of default 'g++' does not match
error: initialized from /home/zhifan/github/boost_1_67_0/tools/build/src/user-config.jam:1
/home/zhifan/github/boost_1_67_0/tools/build/src/build/toolset.jam:44: in toolset.using from module toolset
/home/zhifan/github/boost_1_67_0/tools/build/src/build/project.jam:1052: in using from module project-rules
/home/zhifan/github/boost_1_67_0/tools/build/src/user-config.jam:1: in modules.load from module user-config
/home/zhifan/github/boost_1_67_0/tools/build/src/build-system.jam:255: in load-config from module build-system
/home/zhifan/github/boost_1_67_0/tools/build/src/build-system.jam:453: in load-configuration-files from module build-system
/home/zhifan/github/boost_1_67_0/tools/build/src/build-system.jam:607: in load from module build-system
/home/zhifan/github/boost_1_67_0/tools/build/src/kernel/modules.jam:295: in import from module modules
/home/zhifan/github/boost_1_67_0/tools/build/src/kernel/bootstrap.jam:139: in boost-build from module
/home/zhifan/github/boost_1_67_0/boost-build.jam:17: in module scope from module

My env:

zhifan@ubuntu-zhifan:~/github/boost_1_67_0$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.3.0-27ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)

Consolidate constant values

Currently, some constant values like system namespace or wire protocol version are defined as plain string literals at random places all over the code base. This will soon become a tech debt and will probably bite our ass pretty soon in the future. Thus it's a good idea to consolidate these values into one/few places (think of the error_definitions.h for example).

Add renameCollection command

> db.adminCommand( { renameCollection: "vishaltest.stores", to : "vishaltest.stores123"})

{

"errmsg" : "no such cmd: renamecollection",

"bad cmd" : "{ renameCollection: &quot;vishaltest.stores&quot;, to: &quot;vishaltest.stores123&quot; }",

"ok" : 0

}

> use vishaltest

switched to db vishaltest

> db.stores.renameCollection("stores1234")

{

"errmsg" : "no such cmd: renamecollection",

"bad cmd" : "{ renameCollection: &quot;vishaltest.stores&quot;, to: &quot;vishaltest.stores1234&quot;, dropTarget: false }",

"ok" : 0

}

>

build error on boostrap on ubuntu

building error with "error: version 'fdb' requested but 'g++-fdb' not found and version '7.3.0' of default 'g++' does not match", but I don't know how to install g++-fdb. here is the full log.

zhifan@ubuntu-zhifan:~/github$ curl -L -J -O https://dl.bintray.com/boostorg/release/1.67.0/source/boost_1_67_0.tar.gz && tar -xzf boost_1_67_0.tar.gz && cd boost_1_67_0 && ./bootstrap.sh --prefix=./ && echo "using gcc : fdb : ${CXX} ;" >> ./tools/build/src/user-config.jam && cat ./tools/build/src/user-config.jam && ./b2 toolset=gcc-fdb install --with-filesystem --with-system
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:09 --:--:--     0
100 98.5M  100 98.5M    0     0  3102k      0  0:00:32  0:00:32 --:--:-- 5666k
curl: Saved to filename 'boost_1_67_0.tar.gz'
Building Boost.Build engine with toolset gcc... tools/build/src/engine/bin.linuxx86_64/b2
Detecting Python version... 2.7
Detecting Python root... /usr
Unicode/ICU support for Boost.Regex?... /usr
Generating Boost.Build configuration in project-config.jam...

Bootstrapping is done. To build, run:

    ./b2
    
To adjust configuration, edit 'project-config.jam'.
Further information:

   - Command line help:
     ./b2 --help
     
   - Getting started guide: 
     http://www.boost.org/more/getting_started/unix-variants.html
     
   - Boost.Build documentation:
     http://www.boost.org/build/doc/html/index.html

using gcc : fdb :  ;
/home/zhifan/github/boost_1_67_0/tools/build/src/tools/gcc.jam:125: in gcc.init from module gcc
error: toolset gcc initialization:
error: version 'fdb' requested but 'g++-fdb' not found and version '7.3.0' of default 'g++' does not match
error: initialized from /home/zhifan/github/boost_1_67_0/tools/build/src/user-config.jam:1
/home/zhifan/github/boost_1_67_0/tools/build/src/build/toolset.jam:44: in toolset.using from module toolset
/home/zhifan/github/boost_1_67_0/tools/build/src/build/project.jam:1052: in using from module project-rules
/home/zhifan/github/boost_1_67_0/tools/build/src/user-config.jam:1: in modules.load from module user-config
/home/zhifan/github/boost_1_67_0/tools/build/src/build-system.jam:255: in load-config from module build-system
/home/zhifan/github/boost_1_67_0/tools/build/src/build-system.jam:453: in load-configuration-files from module build-system
/home/zhifan/github/boost_1_67_0/tools/build/src/build-system.jam:607: in load from module build-system
/home/zhifan/github/boost_1_67_0/tools/build/src/kernel/modules.jam:295: in import from module modules
/home/zhifan/github/boost_1_67_0/tools/build/src/kernel/bootstrap.jam:139: in boost-build from module
/home/zhifan/github/boost_1_67_0/boost-build.jam:17: in module scope from module

My env:

zhifan@ubuntu-zhifan:~/github/boost_1_67_0$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.3.0-27ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)

Prometheus plugin

Right now, Document Layer does not have any other metric reporting plugin other than the default ConsoleMetric reporter which simply log the metrics to TraceFiles, which is an FDB specific logging infrastructure. Thus we want a new metric plugin that is more familiar to the community. Prometheus has picked up a lot of momentum during recent years so we would go with that.

Thanks to the design, providing a new metric reporting plugin is easy:

First either write your own cpp Prometheus client code, or use the semi-official one here. ^§
Then simply wrap the client code around the IMetric interface.
Finally, when start up the DocLayer process, tell it to load the plugin library by passing in the --metric_plugin and --metric_plugin_config arguments.

We think this will benefit the community in many ways.

^§ When using/creating the client code, keep in mind that DocLayer is written in Flow, whose model has only one process and one thread running on a giant event loop. And thus the plugin code must NOT be blocking.

sort() should use indexes

Query planner uses indexes only to satisfy predicates. Sorting is done in memory. If we have a query like db.coll.find().sort('section').limit(5), without indexes, it would have to bring in all documents into memory and sort. Whereas with indexes, assuming index on section is present it just has to fetch 5 documents. It's bad both in terms of the number of keys that needs to be fetched from FDB and memory utilization.

Fault-tolerant index rebuild

If the Document Layer instance bounces, ongoing index rebuild tasks would stop and wouldn't restart later. It is possible to find these stopped tasks by querying special keys. By maintaining index status we can restart these index rebuilds. To make sure two instances do the same work, we have to maintain some kind of locks based on FoundationDB keys.

Add clang-format check as a CMake target

We have a .clang-format style file committed to the repo. All commits should make sure they follow the style. We should have a CMake target that fails if the format is not good. This will make it easy to enforce.

Standardized performance tests and publish numbers

Even though Document Layer implements MongoDB API, it has completely different performance characteristics. The reasons for this are described here. We should do a standardized performance test and document the issues and procedure. We should also set a continuous performance test runs to identify any regressions. But, that is not in the scope of this issue.

Support "collStats" command for Spark compatibility

collStats command is used across many different tools and frameworks like MongoExpress and Spark. Some Spark jobs wouldn't even start without some stats in collStats. Following is the collStats response format.

{
  "ns" : <string>,
  "count" : <number>,
  "size" : <number>,
  "avgObjSize" : <number>,
  "storageSize" : <number>,
  "capped" : <boolean>,
  "max" : <number>,
  "maxSize" :  <number>,
  "wiredTiger" : {   
  },
  "nindexes" : <number>,         // number of indexes
  "totalIndexSize" : <number>,   // total index size in bytes
  "indexSizes" : {                // size of specific indexes in bytes
          "_id_" : <number>,
          "username" : <number>
  },
  // ...
  "ok" : <number>
}

We don't have to implement all the fields part of this issue.

count - This is important for Spark jobs to work reasonably well. We can use Atomic operations to maintain the count. A trivial implementation would just maintain a single counter, which would generate a hotkey. Considering write hotkeys are not as bad, and also Atomic operations don't cause any conflict ranges, this could be a reasonable immediate solution.

mongo-express compatibility

mongo-express is a popular web-based admin tool. This issue is more of an umbrella issue that covers all tasks to support compatibility.

Subtasks

#30 - Database view is broken with mongo-express

Support mixed directions for compound indexes

Document Layer indexes are always ascending irrespective of the direction provided in the index specification. This is fine for simple indexes as Index Scan can call reverse getRange() on FDB to get keys in descending order. But, mixed directions in compound indexes can be very useful. Especially for SortPlan.

getIndexes() is not working

According to the forum post here.

Correctness issue - "Key length exceeds limit"

$ python test/correctness/document-correctness.py --doclayer-host localhost --doclayer-port 27018 forever doclayer mm --seed 4994950075151235634 --num-doc 300 --num-iter 1 --no-update --no-sort --no-numeric-fieldnames
Instance: 621285605982
========================================================
ID : 44348 iteration : 1
========================================================
 Key length exceeds limit
 .......
python /Users/bmuppana/src/fdb-document-layer/test/correctness/document-correctness.py --mongo-host localhost --mongo-port 27018 --doclayer-host localhost --doclayer-port 27018 forever doclayer mm --seed 4994950075151235634 --num-doc 300 --num-iter 1 --no-update --no-sort --no-numeric-fieldnames

Stack trace from trace logs

<Event Severity="40" Time="1547842023.948421" Type="BD_doIndexUpdate" ID="0000000000000000" error="Key length exceeds limit" Backtrace="atos -o fdbdoc.debug -arch x86_64 -l 0x10e5c4000 0x10e968c91 0x10e968d6f 0x10e7eedb3 0x10e800482 0x10e7ee699 0x10e7f53ce 0x10e7f4de5 0x10e7f7437 0x10e7f3af7 0x10e5eed2e 0x10e7f4ee4 0x10e7f52a0 0x10e7dd43e 0x10e7dc125 0x10e7dc07d 0x10e7dd582 0x10e7dbc37 0x10e744cdc 0x10e77f1d8 0x10e7fabd3 0x10e7fb8d2 0x10e7f99d7 0x10e77ce6e 0x10e77cc95 0x10e77d157 0x10e77c947 0x10e7c3b2e 0x10e8d61b5 0x10e8d6a57 0x10e8d6177 0x10e77ce6e 0x10e8d28b5 0x10e8d3509 0x10e8d5672 0x10e8d1af7 0x10e7ca18c 0x10e7ca118 0x..." Machine="127.0.0.1:27018" LogGroup="default" />

Found this against d2840e9. Consistently reproducible with the above seed.

dropIndex() fails with inverted_range error

In [2]: db.coll.create_index('A', name='A_index')
Out[2]: 'A_index'

In [3]: db.coll.drop_index('A_index')

From server logs

S -> C: REPLY: documents=[ { ok: 0.0, err: "Range begin key larger than end key", code: 2005 } ], responseFlags=0, cursorID=0, startingFrom=0 (HEADER: messageLength=108, requestID=0, responseTo=114807987, opCode=1)

"-Wreturn-type" breaking the build

We had to disable this warning due to bad code in one place. It should be easy to reproduce by re-enabling this flag here.

New collection creation can fail, by racing with itself on collection metadata

Instance: 449970630051
Traceback (most recent call last):
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/document-correctness.py", line 550, in <module>
    okay = ns['func'](ns)
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/document-correctness.py", line 438, in start_forever_test
    return test_forever(ns)
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/document-correctness.py", line 378, in test_forever
    (client1, client2, collection1, collection2) = get_clients_and_collections(ns)
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/document-correctness.py", line 43, in get_clients_and_collections
    transactional_shim.remove(collection1)
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/transactional_shim.py", line 8, in func_wrapper
    ret = func(*args, **kwargs)
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/transactional_shim.py", line 47, in func_wrapper
    return func(*args, **kwargs)
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/transactional_shim.py", line 56, in _gen_func
    return getattr(collection, name)(*args, **kwargs)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/collection.py", line 2996, in remove
    spec_or_id, multi, write_concern, collation=collation)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/collection.py", line 1123, in _delete_retryable
    _delete, session)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/mongo_client.py", line 1102, in _retryable_write
    return self._retry_with_session(retryable, func, s, None)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/mongo_client.py", line 1079, in _retry_with_session
    return func(session, sock_info, retryable)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/collection.py", line 1119, in _delete
    retryable_write=retryable_write)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/collection.py", line 1099, in _delete
    _check_write_command_response(result)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/helpers.py", line 207, in _check_write_command_response
    _raise_last_write_error(write_errors)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/helpers.py", line 189, in _raise_last_write_error
    raise WriteError(error.get("errmsg"), error.get("code"), error)
pymongo.errors.WriteError: "Collection metadata changed during operation."

This turned out to be a race condition on the new collection metadata creation.

On every request, collection metadata is fetched with the function assembleCollectionContext(). This function creates a new collection in case the collection is not already present. Also, the collection context is cached for the sake of performance. When a new collection is created, it's not immediately inserted into the cache, as it is possible transaction might fail. Next request would insert into the cache.

Race condition happens with following steps

A write request on an empty collection creates a new collection in the transaction context but does not insert into the cache
If the request takes too long and non-isolated, NonIsolatedPlan splits the transaction and commits the transaction, which persists new collection metadata
Later before creating new transaction it checks the metadata is still valid, while checking due to the nature of assembleCollectionContext(), it creates new context again. And obviously, it doesn't match the old context that was created. Hence the issue.

Couple of things we need to do for this issue

Change assembleCollectionContext() to only create new collection, only if asked
Only insert and upserts should create new collection

NonIsolatedPlan should maintain its own transactions

NonIsolatedPlan is used for requests that can't guarantee atomic operations. For example, updates or queries can have predicates that can match too many documents to read within 5 seconds, consequently in one transaction. NonIsolatedPlan splits this task into multiple transactions. The task is checkpointed after each transaction is committed, so in case of failure, the task is retried since the last checkpoint.

NonIsolatedPlan maintains the transactions itself except the first transaction, which is created before metadata was read. The first transaction would be usually read-only transaction until it passed into NonIsolatedPlan, especially RW plan. The only exception is when a new collection is created implicitly. We should maintain the transaction at a single location (function) - that includes creation, retries and commit. We can make this possible by having read-only transactions and splitting collection creation into its own transaction (when we are not in an explicit transaction).

This cleanup needs following subtasks

Add read-only transactions - #69
Split collection creation into its own transaction - #51
Use read-only transactions in places all over query planner
NonIsolatedPlan should create a transaction based on the read version of the initial read-only transaction also add conflict range on metadata

Correctness failure with "Attempting to change status of a different index build"

Occasional failures not reproducible with any specific seed, but reproduce in 1 out of 4000 runs.

Update commands from mongo-go-driver are failing

the mongo driver I'm using: https://github.com/mongodb/mongo-go-driver

The UpdateOne and UpdateMany functions work with MongoDB itself but not with fdb-doc.

The error I got is command failure: {"errmsg": "command [update] failed with err: An unknown error occurred","bad cmd": "{ update: \"clickdata\", updates: [ { q: { auction_id_with_imp_index: \"hohololiiii\" }, u: { $set: {abc: \"aaa\" } }, multi: false } ] }","ok": {"$numberInt":"0"}}

The only update function I can make it work is the FindOneAndUpdate function, but that function will always return something even the update didn't work.

Ubuntu package is not starting the service after install

The issue is reported on forums. I also tried this on 16.04, same issue there as well.

Slow query logging based on query time

We have slow query logging, which logs the query and plan if the plan contains a table scan. We measure query time for metrics, we should also log slow queries taking time more than a certain threshold.

Don't build packages part of default build

At the moment distribution packages are built with default build (with no make targets). packages target builds tar.gz. Ideally, we should build all packages with packages target. This is cleaner and makes default build faster, which is what we usually care in development.

Support "$maxTimeMS" option to timeout on cursors

https://docs.mongodb.com/manual/reference/operator/meta/maxTimeMS/

Add indexes to MongoModel in correctness

So far, MongoModel, in correctness code, is silently ignoring indexes. As indexes are not important to verify the correctness of Document Layer. But, some indexes could have an impact on the functionality. We are doing relevant work for unique indexes at #67.

There are other cases like this. We are having quite a few correctness failures due to Multi index limitations, as we don't allow compound indexes on arrays that can create more than 1000 index keys for a document. We should generate errors from MongoModel for this kind of failures to be consistent.

Blocking #67

Fix Boost warnings on macOS build

Linking        bin/fdbdoc
ld: warning: text-based stub file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation.tbd and library file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//IOKit.framework/IOKit.tbd and library file /System/Library/Frameworks//IOKit.framework/IOKit are out of sync. Falling back to library file for linking.
ld: warning: direct access in function 'boost::system::error_category::std_category::equivalent(int, std::__1::error_condition const&) const' from file '.objs/./ConsoleMetric.actor.g.cpp.o' to global weak symbol 'typeinfo for boost::system::error_category::std_category' from file '/Users/bmuppana/src/boost_1_67_0/stage/lib/libboost_filesystem.a(codecvt_error_category.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'boost::system::error_category::std_category::equivalent(std::__1::error_code const&, int) const' from file '.objs/./ConsoleMetric.actor.g.cpp.o' to global weak symbol 'typeinfo for boost::system::error_category::std_category' from file '/Users/bmuppana/src/boost_1_67_0/stage/lib/libboost_filesystem.a(codecvt_error_category.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >::rethrow() const' from file '.extdep/osx/flow-6.0.8-osx-x86_64/lib/libflow.a(Net2.actor.g.cpp.o)' to global weak symbol 'typeinfo for boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >' from file '.objs/./IMetric.cpp.o' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.

Support mutual TLS authentication

The Document Layer doesn't implement any authentication mechanisms. As long as the Document Layer is used as a sidecar this is not a problem, as it can be configured to accept only local connections. But, if we want to use the Document Layer as a service we have to depend on other security mechanisms. Mutual TLS is the strongest mechanism, among the authentication schemes supported by MongoDB drivers. This issue should address it.

FoundationDB does support mutual TLS. As the Document Layer is also in Flow, we should be able to reuse that code.

To be able to work seamlessly with drivers, Document Layer's TLS implementation should be compatible with MongoDB drivers.

Two OP_QUERY commands broken with wrong case check

findAndModify and collStats are broken with #26, when we changed the command name check with all small case. We should instead look for the first field in the command query BSON.

Publish CentOS Packages

Downloads page has macOS and Ubuntu packages, but not CentOS. Test and publish CentOS packages.

Write developer guide on testing Document Layer

Document Layer has correctness framework that depends on a deterministic comparison of query results between the in-memory simulation of MongoDB and our implementation. We should have some documentation on that, the least on how to run it.

Add unique index to MongoModel and correctness

At the moment, we only have unit tests on unique indexes. We should update correctness to test unique indexes. MongoModel doesn't implement indexes as indexes in MongoModel are not necessary to test the correctness of indexes in Doc Layer. But, unique indexes show the difference in behavior in case there can be duplicates. We should update MongoModel to have this check to be consistent with Doc Layer.

"buildInfo" is returning wrong version information

As reported here, mongo shell is returning version 2.4.0. We are compatible with 3.0.0 it should return that instead.

The issue is mongo shell depends on buildInfo command response to find out version. We are updating isMaster but not buildInfo. We should change there too.

Note: Not directly related to this, but with buildInfo we don't return anything about git hash or branch information, as we are not really MongoDB. So far, that didn't seem to cause any issues, or we didn't notice yet. We do have a custom command getdoclayerversion that gives us more information on Document Layer versions.

Support wire protocol 4 (Server v3.2)

Wire protocol 4 adds

find command
- It is likely this is just same as OP_QUERY. If so, it's a relatively easy task.
getMore command
- It is likely this is just same as OP_GETMORE. If so, it's a relatively easy task as we already support OP_GETMORE.
OP_COMMAND
- This is an internal message sent between Mongo servers. mongos doesn't implement this as well. The client never sends this message. We don't have to implement this.

Add deletes to deterministic correctness run

We have functional tests for deletes. And some basic deletes are done part of deterministic correctness. We should add delete tests with randomized queries there.

Allow values to be larger than 100K

Document layer stores a field under single key in FDB. So, we are limited by FDB value size (100K). By splitting it across multiple FDB keys we can support bigger fields.

Potential design

DataValue defines the type of value. Other than all the normal types, we also have arrays and objects. We could have another type, ex: SPLIT_VALUE, and this could just piggyback on arrays code. We have to change the code in insert and updates, which assume values are always stored under a single key. Path expansion should not treat split key component as part of the key. So, array expansion rules should be carefully adjusted to work with this.

Change correctness to use pytests

Correctness implements unit tests with some custom framework. There would be a lot of benefits if we use a standard testing framework

Easier for new developers with a standard framework
Framework like PyTest can also manage dependencies like fdbserver and fdbdoc processes
We can use asserts instead of passing around the error code. Tests would be more readable.

Subtasks

Simple PyTest fixture for collection management - #83
Move unit tests onto PyTest - #83
CI should run PyTest based tests - #83
Add a fixture to start DocLayer instance
Add a fixture to download and start FoundationDB instance
Move randomized correctness onto PyTest

foundationdb / fdb-document-layer Goto Github PK

fdb-document-layer's People

Contributors

Stargazers

Watchers

Forkers

fdb-document-layer's Issues

Subtasks

Potential design

Subtasks

Recommend Projects

Recommend Topics

Recommend Org