Git Product home page Git Product logo

specifications's Introduction

MongoDB Specifications

Documentation Status

This repository holds in progress and completed specification for features of MongoDB, Drivers, and associated products. Also contained is a rudimentary system for producing these documents.

Driver Mantras

See Documentation.

Writing Documents

Write documents using GitHub Flavored Markdown, following the MongoDB Documentation Style Guidelines.

Store all source documents in the source/ directory.

Linting

This repo uses pre-commit for managing linting. pre-commit performs various checks on the files and uses tools that help follow a consistent style within the repo.

To set up pre-commit locally, run:

brew install pre-commit
pre-commit install

To run pre-commit manually, run pre-commit run --all-files.

To run a manual hook like shellcheck manually, run:

pre-commit run --all-files --hook-stage manual shellcheck

Prose test numbering

When numbering prose tests, always use relative numbered bullets (1.). New tests must be appended at the end of the test list, since drivers may refer to existing tests by number.

Outdated tests must not be removed completely, but may be marked as such (e.g. by striking through or replacing the entire test with a note (e.g. Removed).

Building Documents

We use mkdocs to render the documentation. To see a live view of the documentation, run:

pip install mkdocs
mkdocs serve

Converting to JSON

There are many YAML to JSON converters. There are even several converters called yaml2json in NPM. Alas, we are not using yaml2json anymore, but instead the js-yaml package. Use only that converter, so that JSON is formatted consistently.

Run npm install -g js-yaml, then run make in the source directory at the top level of this repository to convert all YAML test files to JSON.

Licensing

All the specs in this repository are available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License.

specifications's People

Contributors

ajdavis avatar alcaeus avatar benjirewis avatar bjori avatar blink1073 avatar christkv avatar craiggwilson avatar daprahamian avatar derickr avatar durran avatar eramongodb avatar estolfo avatar jameskovacs avatar jmikola avatar jyemin avatar kevinalbs avatar kmahar avatar mbroadst avatar nbbeeken avatar p avatar p-mongo avatar patrickfreed avatar prashantmital avatar rathisekaran avatar rozza avatar rstam avatar saghm avatar shaneharvey avatar vincentkam avatar xdg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

specifications's Issues

Retryable writes spec does not mention the time limits within which driver should retry a write

As per the driver specification for retryable writes (present in retryable-writes.rst), there is no mention of the time limit within which the mongo driver should issue a retry write (if applicable). The spec only mentions limitation on the number of times a write can be retried (one time).

Does this mean that it is legal for a driver to issue a retry write command after waiting for random amount of time?

ObjectId plans beyond 2106

Hi, we are probably dead by then, but I wonder what is the plan beyond 2106?

 
As you may know, ObjectId will overflow on February 7, 2106; there are a couple of ways forward then.

  • extend ObjectId by 1 byte
  • do nothing, using the current specification, and let it overflow into 1970.

 
These impact how we design the schema now, as we can either:

  • simply store ObjectId if the plan is to extend the timestamp portion of ObjectId
  • store ObjectId and time as separate fields, and never use ObjectId as time

 
What do you say?

P.S. BTW, the specification is currently incorrect; the ObjectId will overflow in Feb, not in Jan.

Licensing

What is the GridFS specification licensed under?

upsertedId & uspertedCount in UpdateResult could be confusing

I've always read "upsert" to mean "update or insert" (which could be wrong of course) however UpdateResult implies it means "insert" given that UpdateResult.upsertedCount and UpdateResult.upsertedId are only set if an insert occurred.

I wonder if it'd be clearer if it was UpdateResult.insertedCount and UpdateResult.insertedId which would align more with
$setOnInsert (i.e. it's not $setOnUpsert)?

Perhaps a minor grievance but I always find myself having to double check + test how upsertedCount behaves.

https://github.com/mongodb/specifications/blob/021cbc80e1e444023fd05d8092df4546e639db40/source/crud/crud.rst

/**
   * The number of documents that were upserted.
   *
   * NOT REQUIRED: Drivers may choose to not provide this property so long as
   * it is always possible to infer whether an upsert has taken place. Since
   * the "_id" of an upserted document could be null, a null "upsertedId" may
   * be ambiguous in some drivers. If so, this field can be used to indicate
   * whether an upsert has taken place.
   */
  upsertedCount: Int64;

  /**
   * The identifier of the inserted document if an upsert took place.
   */
  upsertedId: any;

Run Travis to check rst syntax

We can add a Travis/CI job to check rst syntax like this:

pip install docutils
rst2html.py transactions/transactions.rst > output.html
transactions/transactions.rst:991: (WARNING/2) Title underline too short.

Drivers add the "TransientTransactionError" label to network errors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
transactions/transactions.rst:991: (WARNING/2) Title underline too short.

Drivers add the "TransientTransactionError" label to network errors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It would also be great to check that the json files are up to date by running the makefile and checking if any json files have changed.
CC: @prashantmital

Why maxIdleTimeMs defaults to zero?

While investigating some issues about connection management in one of the services of my application I stumbled upon the maxIdleTimeMs default value which is set to 0 meaning that there is no idle limit and therefore connection may remain unused but still active.

I was wondering why is that so?

Reading this FAQ (specific to Node.js driver) I see there are concerns about the application being in charge of dealing with pool configuration tuning, so I guess the problem is to find an "ideal" time that works for most of the scenarios?

"fast fail" with serverSelectionTimeoutMS

Hi,

As written, the spec seems to ignore a use case that is quite important: the ability to get feedback immediately if a server or cluster is down at client creation. A concrete example of where this might be problematic is a status probe. You can imagine writing a small script to run as part of a monitoring tool like Munin that reports the number of current connections and looks something like:

import os
import sys

from pymongo.mongo_client import MongoClient

def main():
    if len(sys.argv) < 2:
        try:
            client = MongoClient(os.environ.get('MUNIN_MONGODB_CUR_CONN_CNT_URI',
                                                'mongodb://127.0.0.1:27017'),
                                 connect=True)
            status = client.get_database('admin').command({'serverStatus': 1})
            print 'status.value {}'.format(status['connections']['current'])
        except Exception:
            print 'status.value 0'
    elif sys.argv[1] == 'config':
        print """\
graph_title Current MongoDB Connections
graph_vlabel Connections
status.label Connections
"""
    return 0

if __name__ == '__main__':
    sys.exit(main())

Using pymongo==2.8, I get immediate feedback if the server is down in the form of a ConnectionFailure (e.g., pymongo.errors.ConnectionFailure: [Errno 61] Connection refused). Using pymongo==3.2, I am forced to wait for the entire duration of serverSelectionTimeoutMS (defaulting to 30s) which is incredibly inconvenient. Of course, you can set serverSelectionTimeoutMS to be something small like 1ms, which will give you immediate feedback, however you are now racing against the background monitoring threads making the actual connections and, unless you are connecting to a local instance, you will likely get a ServerSelectionTimeoutError even if the server is up (this is the “fail fast” method mentioned in the spec). Regardless, in this context, it doesn’t make much sense to set serverSelectionTimeoutMS to be less than connectTimeoutMS, since, by definition, you are indicating that you are willing to wait up to connectTimeoutMS to allow the underlying socket connection to be established with a server.

Given the situation described above, I would suggest updating the spec to add a section describing how to “fail fast” in this context. It might be that another option is required to be passed to MongoClient to “fail fast” on initial connection. In this case, the Topology class might raise a ConnectionFailure or ServerSelectionError in Topology.select_servers if a socket error is raised for that server indicating that there is nothing listening on the foreign port during a period of connectTimeoutMs after Topology.open has been called (thereby starting the monitoring threads).

I apologize if this is too pymongo specific, but this is the manifestation of the spec that I am most familiar with.

Cheers,
Greg

Regenerate templated spec test files in Travis

Various spec tests are generated via scripts. We should integrate these scripts with Travis to validate that the template files have been updated correctly.

Examples:

Seedlist discovery database name in connection string

The mongodb+srv protocol for mongodb atlas doesn't support specifying the db name in the connection string.
My suggestion for the new specification would be:
mongodb+srv://{hostname}.{domainname}/{database}?{options}

This way, the mongo client could automatically connect to a specific database, which makes executing commands/scripts a lot easier. It also fixes this mongoose issue.

I have provided a pull request for the implementation in the node-mongodb-native repository.

Two tests with same name in same suite.

These two tests in array.json in the bson corpus have the same name. When autogenerating tests, some test frameworks will fail because two tests in the same suite cannot have the same name. The names should be changed to reflect more precisely what is meant by "set incorrectly."

"description": "Single Element Array with index set incorrectly",

"description": "Single Element Array with index set incorrectly",

What is the actual purpose of giving a name for a replica set?

I have read server discovery specs, specifically TopologyDescription part. For me (completely unaware of internal implementation of servers), the servers part of this data structure seems to be the unique identifier of that replica set.

Hence, actually I didn't figure out why a replica set must have a name, and why a client is required to pass this name when connecting to this replica set. The only thing I found was a comment here:

Drivers use setname to ensure the list of hosts you specified actually matches that setname for all hosts.

But, why? By providing a seed list in connection string, I'm connecting to a replica set, which its mongod instances are up on specified (host, port) pairs. Isn't this enough for identifying the replica set?

Is there a case where a server can be part of more than one replica sets? This answer says no. If so, why a client must have concern of the setName?

To be more precise, what would be wrong with following connection string to show connecting to a replica set?
mongodb://user:pass@localhost:27017/?replicaSet=true

Is it actually correct that PyMongo requires the setName to connect to a Replica Set?

Is the statement mentioned in this section about PyMongo requires replicaSet parameter to find out it is going to connect to a Replica Set actually valid? That spec says:

PyMongo requires a non-null setName in order to begin replica-set monitoring, regardless of the number of seeds.

Using pymongo==3.10.1 on Python 3.7.5, this is what I tried to see what is the real behavior:
image

Does this mean that while not setting the setName, the driver understands it is connecting to a Replica Set, thus setting the primary attribute?

Connection string compatibility issues

I have been asked a number of times about compatibility between general-purpose connection-string, and MongoDB implementation, which I have compiled into a small page - Compatibility, highlighting 2 known issues.

I believe those 2 issues were an oversight when the original spec was written here. I don't know if that will ever be addressed, but at least no harm pointing out, in case another major update for the connection is due.

DRIVERS-1859 BSON corpus code and symbol tests are partially testing the wrong types

In https://github.com/mongodb/specifications/blob/master/source/bson-corpus/tests/code.json starting with the fourth test (line 21 and following), the tests switch from correctly testing BSON type 0x0D (JavaScript code) to incorrectly testing BSON type 0x02 (regular strings).

Unfortunately, the comparison canonical_extjson seems to have been generated from the BSON, meaning that the tests succeed for the wrong reason -- they are regular string tests, not code tests.

Likewise, all of the decodeErrors tests are for regular strings, not code, despite the descriptions.

Insertion of hashMap<String,Object> greater than 16 mb

Facing an issue with the hashMap<String,Object> greater than 16 mb.I looked at the gridfs and it doesnt have any support for the DBObject.Please let me know the Solution.

hashMap saves unitName and data of length > 16Mb.It is not persisting in the mongo database.

Regards,
Sudha

Self Reference has 404

The link in the following section:

Monitoring SDAM events

The required driver specification for providing lifecycle hooks into server discovery and monitoring for applications to consume can be found in the [SDAM Monitoring Specification].

Outlined with square brackets refers back to spec itself and is a 404.

Unable to connect to mongodb atlas from ec2 instance, authentication using IAM ROLE

mongodb+srv://development.YYYYY.mongodb.net/testdb?authSource=%24external&authMechanism=MONGODB-AWS

We have a cluster setup on Mongo Atlas and above is the connection string to a DB in that cluster.

We have an EC2 instance in our aws a/c with an IAM role attached to it. We have a node application running on that instance and this uses the mongoose package to establish the DB connection.

We have also setup an IAM Role type user in the mongo cluster. So in theory my node app running on the EC2 should be able to connect to the mongo atlas DB using the above connection string - without my having to pass any keys or secrets. But this isn't working and i get an error like:
MongoNetworkTimeoutError: Network request to http://169.254.169.254/latest/api/token timed out after undefined ms

This means - the application isn't able to retrieve the session token from the instance meta data.

I am though - able to connect to the DB from the same ec2 instance using mongosh. Which means that the IAM Role based authentication is working fine.

Am I missing something?

Broken link mongoc-handshake.c

There is a broken link in the handshake specification.

I have attached the patch.diff in txt format as issues do not support .diff format uploads and I don't have fork permission on this repo to make pull requests.

patch.txt

Best regards,

Manuel.

DriverBench full_bson.json uses legacy binary type:

full_bson.json includes the following segment:

"BOQAeydE": {
  "$binary": "RWNVVkhUUmVsWEhzcnhxV25WdnVQTWFERUJMRFRrYmlOVUFGYmh3QWZzRGtTRkVHT1lrTWFKR2twUUFIZVVGWkJ1Y1RlYlpNTGF2VG51Vk8=",
  "$type": "00"
},

This is the legacy binary format, and should look something like:

"Binary": {
  "$binary": {
    "base64": "RWNVVkhUUmVsWEhzcnhxV25WdnVQTWFERUJMRFRrYmlOVUFGYmh3QWZzRGtTRkVHT1lrTWFKR2twUUFIZVVGWkJ1Y1RlYlpNTGF2VG51Vk8=",
    "subType": "03"
  }
},

no build/ directory generated after make

I ran make in the source/ directory and it ran and appeared to modify a ton of json files, but I do not see that a build/ directory was generated. Can you help point out what I'm doing wrong? Or even better, is there a website that has all the PDF specifications already generated for download?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.