azure / azure-cosmos-python Goto Github PK

View Code? Open in Web Editor NEW

148.0 60.0 139.0 3.09 MB

🚨🚨🚨This SDK is now maintained at https://github.com/Azure/azure-sdk-for-python 🚨🚨🚨

Home Page: https://github.com/Azure/azure-sdk-for-python

License: MIT License

Python 100.00%

azure-cosmos-python's Introduction

🚨This SDK is now maintained at https://github.com/Azure/azure-sdk-for-python 🚨

More information: #197

Azure Cosmos DB SQL API client library for Python

Azure Cosmos DB is a globally distributed, multi-model database service that supports document, key-value, wide-column, and graph databases.

Use the Azure Cosmos DB SQL API SDK for Python to manage databases and the JSON documents they contain in this NoSQL database service.

Create Cosmos DB databases and modify their settings
Create and modify containers to store collections of JSON documents
Create, read, update, and delete the items (JSON documents) in your containers
Query the documents in your database using SQL-like syntax

Looking for source code or API reference?

Please see the latest version of the Azure Cosmos DB Python SDK for SQL API

Getting started

Azure subscription - Create a free account
Azure Cosmos DB account - SQL API
Python 2.7 or 3.5.3+

If you need a Cosmos DB SQL API account, you can create one with this Azure CLI command:

az cosmosdb create --resource-group <resource-group-name> --name <cosmos-account-name>

Installation

pip install azure-cosmos

Configure a virtual environment (optional)

Although not required, you can keep your your base system and Azure SDK environments isolated from one another if you use a virtual environment. Execute the following commands to configure and then enter a virtual environment with venv:

python3 -m venv azure-cosmosdb-sdk-environment
source azure-cosmosdb-sdk-environment/bin/activate

Key concepts

Interaction with Cosmos DB starts with an instance of the CosmosClient class. You need an account, its URI, and one of its account keys to instantiate the client object.

Get credentials

Use the Azure CLI snippet below to populate two environment variables with the database account URI and its primary master key (you can also find these values in the Azure portal). The snippet is formatted for the Bash shell.

RES_GROUP=<resource-group-name>
ACCT_NAME=<cosmos-db-account-name>

export ACCOUNT_URI=$(az cosmosdb show --resource-group $RES_GROUP --name $ACCT_NAME --query documentEndpoint --output tsv)
export ACCOUNT_KEY=$(az cosmosdb keys list --resource-group $RES_GROUP --name $ACCT_NAME --query primaryMasterKey --output tsv)

Create client

Once you've populated the ACCOUNT_URI and ACCOUNT_KEY environment variables, you can create the CosmosClient.

import azure.cosmos.cosmos_client as cosmos_client
import azure.cosmos.errors as errors
import azure.cosmos.http_constants as http_constants

import os
url = os.environ['ACCOUNT_URI']
key = os.environ['ACCOUNT_KEY']
client = cosmos_client.CosmosClient(url, {'masterKey': key})

Usage

When you create a Cosmos DB Database Account, you specify the API you'd like to use when interacting with its documents: SQL, MongoDB, Gremlin, Cassandra, or Azure Table.

This SDK is used to interact with an SQL API database account.

Once you've initialized a CosmosClient, you can interact with the primary resource types in Cosmos DB:

Database: A Cosmos DB account can contain multiple databases. A database may contain a number of containers.
Container: A container is a collection of JSON documents. You create (insert), read, update, and delete items in a container.
Item: An Item is the dictionary-like representation of a JSON document stored in a container. Each Item you add to a container must include an id key with a value that uniquely identifies the item within the container.

For more information about these resources, see Working with Azure Cosmos databases, containers and items.

Examples

The following sections provide several code snippets covering some of the most common Cosmos DB tasks, including:

Create a database
Create a container
Replace throughput for a container
Get an existing container
Insert data
Delete data
Query the database
Modify container properties

Create a database

After authenticating your CosmosClient, you can work with any resource in the account. You can use CosmosClient.CreateDatabase to create a database.

database_name = 'testDatabase'
try:
    database = client.CreateDatabase({'id': database_name})
except errors.HTTPFailure:
    database = client.ReadDatabase("dbs/" + database_name)

Create a container

This example creates a container with 400 RU/s as the throughput, using CosmosClient.CreateContainer. If a container with the same name already exists in the database (generating a 409 Conflict error), the existing container is obtained instead.

import azure.cosmos.documents as documents
container_definition = {
    'id': 'products',
    'partitionKey': {
        'paths': ['/productName'],
        'kind': documents.PartitionKind.Hash
    }
}
try:
    container = client.CreateContainer(
        "dbs/" + database['id'], container_definition, {'offerThroughput': 400})
except errors.HTTPFailure as e:
    if e.status_code == http_constants.StatusCodes.CONFLICT:
        container = client.ReadContainer(
            "dbs/" + database['id'] + "/colls/" + container_definition['id'])
    else:
        raise e

Replace throughput for a container

A single offer object exists per container. This object contains information regarding the container's throughput. This example retrieves the offer object using CosmosClient.QueryOffers, and modifies the offer object and replaces the throughput for the container using CosmosClient.ReplaceOffer.

# Get the offer for the container
offers = list(client.QueryOffers(
    "Select * from root r where r.offerResourceId='" + container['_rid'] + "'"))
offer = offers[0]
print("current throughput for " + container['id'] + ": " +
      str(offer['content']['offerThroughput']))

# Replace the offer with a new throughput
offer['content']['offerThroughput'] = 1000
client.ReplaceOffer(offer['_self'], offer)
print("new throughput for " + container['id'] + ": " +
      str(offer['content']['offerThroughput']))

Get an existing container

Retrieve an existing container from the database using CosmosClient.ReadContainer:

database_id = 'testDatabase'
container_id = 'products'
container = client.ReadContainer("dbs/" + database_id + "/colls/" + container_id)

Insert data

To insert items into a container, pass a dictionary containing your data to CosmosClient.UpsertItem. Each item you add to a container must include an id key with a value that uniquely identifies the item within the container.

This example inserts several items into the container, each with a unique id:

for i in range(1, 10):
    client.UpsertItem(
        "dbs/" + database_id + "/colls/" + container_id,
        {
             'id': 'item{0}'.format(i),
             'productName': 'Widget',
             'productModel': 'Model {0}'.format(i)
        }
    )

Delete data

To delete items from a container, use CosmosClient.DeleteItem. The SQL API in Cosmos DB does not support the SQL DELETE statement.

for item in client.QueryItems(
    "dbs/" + database_id + "/colls/" + container_id,
    'SELECT * FROM products p WHERE p.productModel = "DISCONTINUED"',
    {'enableCrossPartitionQuery': True}):
    
    client.DeleteItem(
        "dbs/" + database_id + "/colls/" + container_id + "/docs/" + item['id'],
        {'partitionKey': 'Pager'})

Query the database

A Cosmos DB SQL API database supports querying the items in a container with CosmosClient.QueryItems using SQL-like syntax.

This example queries a container for items with a specific id:

database = client.get_database_client(database_name)
container = database.get_container_client(container_name)

# Enumerate the returned items
import json
for item in client.QueryItems(
    "dbs/" + database_id + "/colls/" + container_id,
    'SELECT * FROM ' + container_id + ' r WHERE r.id="item3"',
    {'enableCrossPartitionQuery': True}):
    
    print(json.dumps(item, indent=True))

NOTE: Although you can specify any value for the container name in the FROM clause, we recommend you use the container name for consistency.

Perform parameterized queries by passing a dictionary containing the parameters and their values to CosmosClient.QueryItems:

discontinued_items = client.QueryItems(
    "dbs/" + database_id + "/colls/" + container_id,
    {
        'query': 'SELECT * FROM root r WHERE r.id=@id',
        'parameters': [
            {'name': '@id', 'value': 'item3'}
        ]
    },
    {'enableCrossPartitionQuery': True})
for item in discontinued_items:
    print(json.dumps(item, indent=True))

For more information on querying Cosmos DB databases using the SQL API, see Query Azure Cosmos DB data with SQL queries.

Modify container properties

Certain properties of an existing container can be modified. This example sets the default time to live (TTL) for items in the container to 10 seconds:

container = client.ReadContainer("dbs/" + database_id + "/colls/" + container_id)
container['defaultTtl'] = 10
modified_container = client.ReplaceContainer(
    "dbs/" + database_id + "/colls/" + container_id, container)
# Display the new TTL setting for the container
print(json.dumps(modified_container['defaultTtl']))

For more information on TTL, see Time to Live for Azure Cosmos DB data.

Troubleshooting

General

When you interact with Cosmos DB using the Python SDK, errors returned by the service correspond to the same HTTP status codes returned for REST API requests:

HTTP Status Codes for Azure Cosmos DB

For example, if you try to create a container using an ID (name) that's already in use in your Cosmos DB database, a 409 error is returned, indicating the conflict. In the following snippet, the error is handled gracefully by catching the exception and displaying additional information about the error.

try:
    container = client.CreateContainer("dbs/" + database['id'], container_definition)
except errors.HTTPFailure as e:
    if e.status_code == http_constants.StatusCodes.CONFLICT:
        print("""Error creating container
HTTP status code 409: The ID (name) provided for the container is already in use.
The container name must be unique within the database.""")
    else:
        raise e

Next steps

For more extensive documentation on the Cosmos DB service, see the Azure Cosmos DB documentation on docs.microsoft.com.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

azure-cosmos-python's People

Contributors

Stargazers

Watchers

Forkers

guyburstein shipunyc nomiero hamza2404 farukc colinmcintosh xou rnagpal albertwgchu redcraig ganadiniakshay dmitrynovik petsuter machow yangjiao2 paulpc mingaliu jlhmv amitdhanbad moderakh davidflorez yairbeer intellifora diogo-ferreira l-leniac-l jrsequeira priyaaswani andrimarjonsson shruti100 nikunj08 adammodlin mzkaramat austindonnelly datasheded spu309 jdk514 pacohams pojdrovic omgan edwardsp haokanga squdgy anesmaz parthsha geriamir aditink bharad1 kokleongkl sweetien ihaq999 sdlogy snobu xiaotinghe knutfr erniepdev ahlfors rrs0108 mcandocia-rmarkbio markjbrown ktaskn mariamschaudry ntumlta2018 elenaterenzi johanste mmacy bugra amalnathpa alanrabelo megado123 akboba01 deborahc nuno-andre sagarsharmas dennisscullycap cutezjz raymond-deepsea ameetbora jaley ahmedosso maujy srinathnarayanan swails webdev456 erjosito jsaedtler gregs138 ashusharma-com pigeon700 rocuevas9511 yinchuandong ucdaviduc aruncsk mfow tagheuerhq daveotte-zz hyeongook privalof hxdoan keefcwarlord anu0012

azure-cosmos-python's Issues

How to serialize a dict type object when creating a document using CreateDocument ?

I submit a issue for the first time.

I am new to Cosmos DB and I have a question related to creating a embedded ( or nested) document.
I am trying to POST and create a document represented as below

{
  "id":  "123",
  "title": "aaa",
  "desc": "bbb",
  "steps": [
    { "foo": "1", "bar": "xxx" },
    { "foo": "2", "bar": "yyy" },
  ]
}

However, an error occurs like below...

{"code":"BadRequest","message":"Message: {\"Errors\":[\"The request payload is invalid. Ensure to provide a valid request payload.\"]} ... }

I read this article and noticed that the request body is a JSON and it's invalid.

So, I tried to serialized entire document as a string, however, another error occurs.

File "/path/to/pydocumentdb/document_client.py", line 2760, in __ValidateResource
  id = resource.get('id')
AttributeError: 'str' object has no attribute 'get'

Finally, I tried to serialized the steps field only so that the document can be represented as a dict like below.

doc = {
  "id":  "123",
  "title": "aaa",
  "desc": "bbb",
  "steps": [
    { "foo": "1", "bar": "xxx" },
    { "foo": "2", "bar": "yyy" },
  ]
}
doc["steps"] = json.dumps(doc["steps"])

This approach is not good because the type of steps in the document inserted into Document DB is still string.
I would like to convert steps field into dict type...

I am at a loss what to do next.

Is there any solution to this problem?

Intermixing Mongo API and DocumentDB API breaks document structure

Working with Cosmos I initially loaded documents through the Mongo API. When shifting over to using pydocumentdb I've run into formatting issues with the generated results ($t and $v key-values are inserted throughout the document).

An example of this problem can be found on stackoverflow, but no solution is provided for switching the API used to access the data.

Expected Format:

Actual Format:

Many `crud_tests.py` tests fail due to code 429 - Request rate is large

Hey there,

I have:

a fresh checkout of azure-documentdb-python
provisioned a brand new docdb instance (and haven't changed any settings)
entered my credentials in tests/crud_tests.py
added azure-document-db to the path, so tests can import and use it

Running:

python tests/crudtests.py

gives:

Ran 72 tests in 412.725s

FAILED (errors=40)

One of the errors is:

{"code":"Unauthorized","message":"The input authorization token can't serve the request. 
Please check that the expected payload is built as per the protocol, 
and check the key being used. 
Server used the following payload to sign: 
'get\nmedia\nfrunapr8tqabaaaaaaaaahm8gry=\ntue, 07 jun 2016 14:37:35 gmt\n\n'
\r\nActivityId: 24c813cf-cb23-49db-8015-4c5001c1eb8e"}

The other 39 errors are all code:429, "Request rate is large".

A second test run has 34 of 72 fail.
This time there's one:
"code":"BadRequest","message":"Cross partition query is required but disabled. Please set x-ms-documentdb-query-enablecrosspartition to true"
and the rest are: code:429, "Request rate is large".

Is this expected?
Do I have to change any of my docdb settings to account for a higher request rate?

Thanks,
Craig

Few questions (docs are hard to understand)

Hi, I'm using QueryDocuments and the docs for pydocdb are very hard to understand, so i will ask here few questions:

where can i find the structure for a query dict (query (dict) or (str), query)
same for options (dict), the request options for the request.
How can i use next() on the results, let's say i want to fetch 10 results and later another 10. should i use in the query top 10 or is it some kind of an option i'm missing?

thanks

Multi-threaded DocumentClient

Hi, sorry if this isn't the best place but I have a question regarding pydocumentdb.document_client.DocumentClient.

I'm building an API based on flask, and I'd like to know what is the best approach:

Should I keep a single instance of DocumentClient OR
Should I instantiate it several times, once per request?

In other words, is DocumentClient thread-safe?

Thanks in advance!

CollectionManagement sample has an IDisposable that doesn't match it's docs

If you look at IDisposable in the ManageCollections sample you will notice it says that it's for use in a context manager to call a close() method. But if you look at the actual code there's no call to a close() method. I also didn't find a close() method on the DocumentClient in the SDK docs for the single use of IDisposable in the code. Lastly, in Python's stdlib there is contextlib.closing() for this exact use-case.

Missing support for direct connection mode (TCP and HTTPS)

Currently, pydocumentdb version 2.3.1 SDK does not support the direct connection mode for HTTPS nor TCP. Please add support.

[Metadata]
Metadata-Version: 2.0
Name: pydocumentdb
Version: 2.3.1
Summary: Azure DocumentDB Python SDK
Home-page: https://github.com/Azure/azure-documentdb-python
Author: Microsoft
Author-email: [email protected]
License: MIT

[Impact]
Performance
SDK consistency across multiple platforms

[Repro Steps]
File: https://github.com/Azure/azure-documentdb-python/blob/d0929b4fdbf66780004c0744be74beeb7c90fa27/pydocumentdb/documents.py

class ConnectionMode(object):
"""Represents the connection mode to be used by the client.
:ivar int Gateway:
Use the Azure Cosmos DB gateway to route all requests. The
gateway proxies requests to the right data partition.
"""
Gateway = 0

Samples need to be improved

This is not an issue, however, I strongly suggest maintainers provide more samples to us which would help python developers to have a better understanding of how to use this SDK. E.g., in sample folder, documents management sample is missing. As stated in #87 the documentation can hardly be understood since it is not complete at all, thus we really need good samples to walk us through these APIs. I know different SDKs have their own development plans and milestones, but compared with JaveScript and C#, this python SDK really need to be improved.

Running unit tests with bad master key raises unfriendly exception

I ran the unit tests with an intentionally invalid master key. The tests fail with:

Error
Traceback (most recent call last):
  File "C:\Users\clam\Documents\azure-documentdb-python\test\crud_tests.py", line 40, in setUp
    databases = list(client.ReadDatabases())
  File "C:\Users\clam\Documents\azure-documentdb-python\pydocumentdb\query_iterable.py", line 68, in next
    callback, self._resource_throttle_retry_policy)
  File "C:\Users\clam\Documents\azure-documentdb-python\pydocumentdb\backoff_retry_utility.py", line 27, in Execute
    raise e
Error: Incorrect padding

Some tests also log WARNING:root:Operation will NOT be retried. Exception: Incorrect padding.

Wondering if the Get Uri helper methods can be moved to pydocumentdb from crud_tests.py

Hi,

I see some very good helper methods in crud_tests.py to get the Uris out of the names of the things I provide. e.g.
def GetDocumentLink(self, database, document_collection, document,
is_name_based=True):
if is_name_based:
return self.GetDocumentCollectionLink(database, document_collection) + '/docs/' + document['id']
else:
return document['_self']

Something like we have in .NET SDK. Are we planning for this?

Collection link as a combination of _self and name doesn't work

Hi,

I have a collection link as follows dbs/someDB==/colls/myCollection
When I try to delete collection with the above link, I get a BadRequest 400 exception. But, when I give the path as the database name (id), it works. So, I'm assuming either if I send both _selfs or both names, it works, but not for any other combination.
I think it would be nice to work with links which could have any combination, as DocumentDB gives us the flexibility of querying either id based or _self based.

Thanks.

document_client has non-ASCII characters but there's no unicode declaration of the source file

Line 930, there's a ’ which is not an ASCII character.

Tests are failing due to connectivity to test server

The following instructions are not clear to run the tests:


2) To run tests:

    $ python test/crud_tests.py

    If you use Microsoft Visual Studio, open the project file python.pyproj,
    and press F5.

I cannot connect to the test server (localhost:443, I believe) and get the following error in all test methods. Is there a way to start the test server with the command above?

Traceback (most recent call last):
  File "test/crud_tests.py", line 40, in setUp
    databases = client.ReadDatabases().ToArray()
  File "/Users/alp/.virtualenvs/docdb/lib/python2.7/site-packages/pydocumentdb/query_iterator.py", line 72, in ToArray
    for element in self:
  File "/Users/alp/.virtualenvs/docdb/lib/python2.7/site-packages/pydocumentdb/query_iterator.py", line 48, in next
    item = self.__FetchOneItem()
  File "/Users/alp/.virtualenvs/docdb/lib/python2.7/site-packages/pydocumentdb/query_iterator.py", line 105, in __FetchOneItem
    if not self.__FetchMore():
  File "/Users/alp/.virtualenvs/docdb/lib/python2.7/site-packages/pydocumentdb/query_iterator.py", line 133, in __FetchMore
    self.__options)
  File "/Users/alp/.virtualenvs/docdb/lib/python2.7/site-packages/pydocumentdb/document_client.py", line 141, in fetch_fn
    options), self.last_response_headers
  File "/Users/alp/.virtualenvs/docdb/lib/python2.7/site-packages/pydocumentdb/document_client.py", line 1680, in __QueryFeed
    headers)
  File "/Users/alp/.virtualenvs/docdb/lib/python2.7/site-packages/pydocumentdb/document_client.py", line 1568, in __Get
    headers)
  File "/Users/alp/.virtualenvs/docdb/lib/python2.7/site-packages/pydocumentdb/synchronized_request.py", line 163, in SynchronizedRequest
    return __InternalRequest(connection_policy, request_options, request_body)
  File "/Users/alp/.virtualenvs/docdb/lib/python2.7/site-packages/pydocumentdb/synchronized_request.py", line 88, in __InternalRequest
    request_options['headers'])
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 973, in request
    self._send_request(method, url, body, headers)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1007, in _send_request
    self.endheaders(body)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 969, in endheaders
    self._send_output(message_body)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 829, in _send_output
    self.send(msg)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 791, in send
    self.connect()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1172, in connect
    self.timeout, self.source_address)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 61] Connection refused

Note: running on OS X.

Possible to change collection ttl after creation?

My application relies heavily on pydocumentdb to interact with cosmos. We're looking for the ability to change TTL of a collection after its creation. We already know how to query the collection offer and change throughput by placing in an replacement offer. But I don't see a field for TTL there.

Is this supported? If not, what's a reasonable workaround?

Thank you!

No retries on non-HTTP network issues

When an HTTP error occurs, e.g. 429, then everything is handled as expected, triggering retries etc. and using the retry options of the connection policy of pydocumentdb.

However, if there is a network error such that a server is not reachable, then this results in an immediate exception without retries. This is because of two things:

pydocumentdb's retry_utility code only handles errors.HTTPFailure errors, which are HTTP errors corresponding to certain HTTP status codes, e.g. 429: https://github.com/Azure/azure-documentdb-python/blob/07e2f3f93ad5abeb114c2d2f83577c25d18f0bb4/pydocumentdb/retry_utility.py#L66
pydocumentdb uses requests to do the actual network requests, however it sets up the requests session with the defaults only which doesn't enable retrying: https://github.com/Azure/azure-documentdb-python/blob/07e2f3f93ad5abeb114c2d2f83577c25d18f0bb4/pydocumentdb/document_client.py#L134 This then in turn leads to the underlying urllib3 not to retry such requests: https://github.com/urllib3/urllib3/blob/1.19.1/urllib3/util/retry.py#L331-L336 (read would be false with default options).

An approach as described in https://www.peterbe.com/plog/best-practice-with-retries-with-requests is typically used to enable retries with requests:

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry


def requests_retry_session(
    retries=3,
    backoff_factor=0.3,
    status_forcelist=(500, 502, 504),
    session=None,
):
    session = session or requests.Session()
    retry = Retry(
        total=retries,
        read=retries,
        connect=retries,
        backoff_factor=backoff_factor,
        status_forcelist=status_forcelist,
    )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount('http://', adapter)
    session.mount('https://', adapter)
    return session

# Usage example...

response = requests_retry_session().get('https://www.peterbe.com/')
print(response.status_code)

s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})

response = requests_retry_session(session=s).get(
    'https://www.peterbe.com'
)

The following is an exception trace resulting from trying to create a document in CosmosDB when the server is unreachable:

Traceback (most recent call last):
  File "C:\Python36\lib\site-packages\urllib3\connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "C:\Python36\lib\site-packages\urllib3\connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "C:\Python36\lib\http\client.py", line 1331, in getresponse
    response.begin()
  File "C:\Python36\lib\http\client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "C:\Python36\lib\http\client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "C:\Python36\lib\socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "C:\Python36\lib\ssl.py", line 1009, in recv_into
    return self.read(nbytes, buffer)
  File "C:\Python36\lib\ssl.py", line 871, in read
    return self._sslobj.read(len, buffer)
  File "C:\Python36\lib\ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python36\lib\site-packages\requests\adapters.py", line 445, in send
    timeout=timeout
  File "C:\Python36\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "C:\Python36\lib\site-packages\urllib3\util\retry.py", line 367, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "C:\Python36\lib\site-packages\urllib3\packages\six.py", line 686, in reraise
    raise value
  File "C:\Python36\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "C:\Python36\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "C:\Python36\lib\site-packages\urllib3\connectionpool.py", line 306, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='....documents.azure.com', port=443): Read timed out. (read timeout=60.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
...
  File "C:\Python36\lib\site-packages\pydocumentdb\document_client.py", line 947, in CreateDocument
    options)
  File "C:\Python36\lib\site-packages\pydocumentdb\document_client.py", line 2365, in Create
    headers)
  File "C:\Python36\lib\site-packages\pydocumentdb\document_client.py", line 2571, in __Post
    headers=headers)
  File "C:\Python36\lib\site-packages\pydocumentdb\synchronized_request.py", line 212, in SynchronizedRequest
    return retry_utility._Execute(client, global_endpoint_manager, _Request, connection_policy, requests_session, resource_url, request_options, request_body)
  File "C:\Python36\lib\site-packages\pydocumentdb\retry_utility.py", line 56, in _Execute
    result = _ExecuteFunction(function, *args, **kwargs)
  File "C:\Python36\lib\site-packages\pydocumentdb\retry_utility.py", line 92, in _ExecuteFunction
    return function(*args, **kwargs)
  File "C:\Python36\lib\site-packages\pydocumentdb\synchronized_request.py", line 127, in _Request
    verify = is_ssl_enabled)
  File "C:\Python36\lib\site-packages\requests\sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Python36\lib\site-packages\requests\sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "C:\Python36\lib\site-packages\requests\adapters.py", line 526, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='....documents.azure.com', port=443): Read timed out. (read timeout=60.0)

TypeError: 'dict_keys' object does not support indexing

I'm trying to do a query on documentdb like this (tested with python 3.4 and 3.6):

query = { 'query': 'SELECT VALUE MAX(c.counter) FROM c' }    
#        
options = {} 
options['enableCrossPartitionQuery'] = True
#options['maxItemCount'] = 2
#
result_iterable = client.QueryDocuments(myCollection, query, options)
  
results = list(result_iterable);
print(results)

And get:

Traceback (most recent call last):
  File "documentdb.py", line 98, in <module>
    for i in result_iterable:
  File "/usr/local/lib/python3.4/dist-packages/pydocumentdb/query_iterable.py", line 107, in __next__
    return next(self._ex_context)
  File "/usr/local/lib/python3.4/dist-packages/pydocumentdb/execution_context/base_execution_context.py", line 103, in __next__
    return self.next()
  File "/usr/local/lib/python3.4/dist-packages/pydocumentdb/execution_context/execution_dispatcher.py", line 70, in next
    return next(self._execution_context)
  File "/usr/local/lib/python3.4/dist-packages/pydocumentdb/execution_context/base_execution_context.py", line 103, in __next__
    return self.next()
  File "/usr/local/lib/python3.4/dist-packages/pydocumentdb/execution_context/execution_dispatcher.py", line 146, in next
    return next(self._endpoint)
  File "/usr/local/lib/python3.4/dist-packages/pydocumentdb/execution_context/endpoint_component.py", line 42, in __next__
    return self.next()
  File "/usr/local/lib/python3.4/dist-packages/pydocumentdb/execution_context/endpoint_component.py", line 98, in next
    operator.aggregate(item[item.keys()[0]])
TypeError: 'dict_keys' object does not support indexing

See this link: https://stackoverflow.com/questions/18552001/accessing-dict-keys-element-by-index-in-python3

Could it be that line 98 in endpoint_component.py should have used next(iter(item)) instead of item.keys()[0] ?

Br. Rune

Documentation styling is broken

The styling of the pydocumentdb documentation appears to be broken.

import config?

Hello,
What is the config module? Where can I download it? Can't I get the Host, Key and ID directly from the azure website?

Connection pool parameter is missing in connection policy

Newer versions of requests library cause HTTP header issue

CurrentMediaStorageUsageInMB: int, current attachment content
(media) usage in MBs (Retrieved from gateway ).

HTTP headers in the latest versions of the request library cause the following error:
requests.exceptions.InvalidHeader: Header value 0 must be of type str or bytes, not <class 'int'>

Judging from a quick lookup in the code, the 'CurrentMediaStorageUsageInMB' value seems to be the culprit.

LoopExit: This operation will block forever error

I am seeing this error on Windows.

File "", line 57, in get_document
documents = list(query_res)
File "build\bdist.win32\egg\pydocumentdb\query_iterable.py", line 111, in next
return self.next()
File "build\bdist.win32\egg\pydocumentdb\query_iterable.py", line 107, in next
return next(self._ex_context)
File "build\bdist.win32\egg\pydocumentdb\execution_context\execution_dispatcher.py", line 62, in next
return next(self._execution_context)
File "build\bdist.win32\egg\pydocumentdb\execution_context\base_execution_context.py", line 93, in next
results = self.fetch_next_block()
File "build\bdist.win32\egg\pydocumentdb\execution_context\base_execution_context.py", line 71, in fetch_next_block
return self._fetch_next_block()
File "build\bdist.win32\egg\pydocumentdb\execution_context\base_execution_context.py", line 156, in _fetch_next_block
return self._fetch_items_helper_with_retries(self._fetch_function)
File "build\bdist.win32\egg\pydocumentdb\execution_context\base_execution_context.py", line 130, in _fetch_items_helper_with_retries
return retry_utility._Execute(self._client, self._client._global_endpoint_manager, callback)
File "build\bdist.win32\egg\pydocumentdb\retry_utility.py", line 51, in _Execute
result = _ExecuteFunction(function, *args, **kwargs)
File "build\bdist.win32\egg\pydocumentdb\retry_utility.py", line 85, in _ExecuteFunction
return function(*args, **kwargs)
File "build\bdist.win32\egg\pydocumentdb\execution_context\base_execution_context.py", line 128, in callback
return self._fetch_items_helper_no_retries(fetch_function)
File "build\bdist.win32\egg\pydocumentdb\execution_context\base_execution_context.py", line 118, in _fetch_items_helper_no_retries
(fetched_items, response_headers) = fetch_function(self._options)
File "build\bdist.win32\egg\pydocumentdb\document_client.py", line 779, in fetch_fn
options), self.last_response_headers
File "build\bdist.win32\egg\pydocumentdb\document_client.py", line 2484, in __QueryFeed
headers)
File "build\bdist.win32\egg\pydocumentdb\document_client.py", line 2334, in __Post
headers=headers)
File "build\bdist.win32\egg\pydocumentdb\synchronized_request.py", line 205, in SynchronizedRequest
return retry_utility._Execute(client, global_endpoint_manager, _Request, connection_policy, requests_session, resource_url, request_options, request_body)
File "build\bdist.win32\egg\pydocumentdb\retry_utility.py", line 51, in _Execute
result = _ExecuteFunction(function, *args, **kwargs)
File "build\bdist.win32\egg\pydocumentdb\retry_utility.py", line 85, in _ExecuteFunction
return function(*args, **kwargs)
File "build\bdist.win32\egg\pydocumentdb\synchronized_request.py", line 121, in _Request
verify = is_ssl_enabled)
File "H:\src\envs\pyh\lib\site-packages\requests\sessions.py", line 475, in request
resp = self.send(prep, **send_kwargs)
File "H:\src\envs\pyh\lib\site-packages\requests\sessions.py", line 585, in send
r = adapter.send(request, **kwargs)
File "H:\src\envs\pyh\lib\site-packages\requests\adapters.py", line 403, in send
timeout=timeout
File "H:\src\envs\pyh\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 578, in urlopen
chunked=chunked)
File "H:\src\envs\pyh\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 385, in _make_request
httplib_response = conn.getresponse(buffering=True)
File "c:\python27\Lib\httplib.py", line 1136, in getresponse
response.begin()
File "c:\python27\Lib\httplib.py", line 453, in begin
version, status, reason = self._read_status()
File "c:\python27\Lib\httplib.py", line 409, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "c:\python27\Lib\socket.py", line 480, in readline
data = self._sock.recv(self._rbufsize)
File "H:\src\envs\pyh\lib\site-packages\gevent-1.2.1-py2.7-win32.egg\gevent_sslgte279.py", line 464, in recv
return self.read(buflen)
File "H:\src\envs\pyh\lib\site-packages\gevent-1.2.1-py2.7-win32.egg\gevent_sslgte279.py", line 319, in read
self._wait(self._read_event, timeout_exc=_SSLErrorReadTimeout)
File "H:\src\envs\pyh\lib\site-packages\gevent-1.2.1-py2.7-win32.egg\gevent_socket2.py", line 182, in _wait
self.hub.wait(watcher)
File "H:\src\envs\pyh\lib\site-packages\gevent-1.2.1-py2.7-win32.egg\gevent\hub.py", line 651, in wait
result = waiter.get()
File "H:\src\envs\pyh\lib\site-packages\gevent-1.2.1-py2.7-win32.egg\gevent\hub.py", line 899, in get
return self.hub.switch()
File "H:\src\envs\pyh\lib\site-packages\gevent-1.2.1-py2.7-win32.egg\gevent\hub.py", line 630, in switch
return RawGreenlet.switch(self)
LoopExit: ('This operation would block forever', <Hub at 0x6970f80 select pending=0 ref=0 resolver=<gevent.resolver_thread.Resolver at 0x6730070 pool=<ThreadPool at 0x67306f0 0/1/10>> threadpool=<ThreadPool at 0x67306f0 0/1/10>>)

Header value types in Python3.5 fail requests validation

We've recently migrated to Python 3.5, and are observing the following stack trace when making docdb requests:

...
  File "/Users/brmatt/Development/infra/acmcloud/accloud/utils/superpod_config_db.py", line 41, in status_doc
    self._status_doc = self.docdb_client.ReadDocument(self.superpod_status_doc)
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/pydocumentdb/document_client.py", line 944, in ReadDocument
    options)
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/pydocumentdb/document_client.py", line 2244, in Read
    url_connection = self._global_endpoint_manager.ReadEndpoint
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/pydocumentdb/global_endpoint_manager.py", line 48, in ReadEndpoint
    self.RefreshEndpointList()
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/pydocumentdb/global_endpoint_manager.py", line 67, in RefreshEndpointList
    database_account = self._GetDatabaseAccount()
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/pydocumentdb/global_endpoint_manager.py", line 85, in _GetDatabaseAccount
    database_account = self._GetDatabaseAccountStub(self.DefaultEndpoint)
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/pydocumentdb/global_endpoint_manager.py", line 105, in _GetDatabaseAccountStub
    return self.Client.GetDatabaseAccount(endpoint)
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/pydocumentdb/document_client.py", line 2082, in GetDatabaseAccount
    headers)
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/pydocumentdb/document_client.py", line 2309, in __Get
    headers)
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/pydocumentdb/synchronized_request.py", line 206, in SynchronizedRequest
    return retry_utility._Execute(client, global_endpoint_manager, _Request, connection_policy, requests_session, resource_url, request_options, request_body)
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/pydocumentdb/retry_utility.py", line 51, in _Execute
    result = _ExecuteFunction(function, *args, **kwargs)
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/pydocumentdb/retry_utility.py", line 85, in _ExecuteFunction
    return function(*args, **kwargs)
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/pydocumentdb/synchronized_request.py", line 122, in _Request
    verify = is_ssl_enabled)
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/requests/sessions.py", line 474, in request
    prep = self.prepare_request(req)
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/requests/sessions.py", line 407, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/requests/models.py", line 303, in prepare
    self.prepare_headers(headers)
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/requests/models.py", line 427, in prepare_headers
    check_header_validity(header)
  File "/Users/brmatt/Development/infra/acmcloud/miniconda_macos/lib/python3.5/site-packages/requests/utils.py", line 796, in check_header_validity
    "not %s" % (value, type(value)))
requests.exceptions.InvalidHeader: Header value 0 must be of type str or bytes, not <class 'int'>

(the header field that is causing the error in this particular case is ContentLength, set around line 198 of synchronized_request.py; forcibly setting the type here exposes other similar type problems in other header values)

This is observed using:

python 3.5.2
requests 2.12.4

What is offerEnableRUPerMinuteThroughput when creating collections?

When creating a collection, then offerEnableRUPerMinuteThroughput can be supplied in the options. I couldn't find any documentation on what this really means. In the Azure Portal one must select a "Throughput", there is no option to enable or disable anything.

Python 3 Support

Are there plans to include support for Python 3?

Query request retry policy may not be working

It looks like ResourceThrottleRetryPolicy in backoff_retry_utility.py is used in QueryIterable to provide automatic retry for the various querying methods in DocumentClient (but nowhere else yet). I suspect that it doesn't work though: backoff_retry_utility.py#L69 tries to access member retry_after_in_milliseconds on the caught exception, but no such member exists on any subclass of DocumentDBError.

Proxy settings not taken into account ?

Reading the documentation it should be possible to specify a proxy to use to connect to an Azure Cosmos DB :

http://azure.github.io/azure-documentdb-python/api/pydocumentdb.documents.html#pydocumentdb.documents.ProxyConfiguration

I try to test it by doing a test similar to this one : https://github.com/Azure/azure-documentdb-python/blob/07caab604adc7c8e3cadab918e8c599ab45f3b4b/test/crud_tests.py#L3734-L3743

But the proxy settings seems totally ignored
By checking the azure-documentdb-python code, it seems the proxy settings are used nowhere

Did I missed something ?

Add functionality to change throughput of collection after creation

My application heavily relies on pydocumentdb to interact with cosmos. We're looking for the ability to change throughput of a collection after its creation, similar to [https://docs.microsoft.com/en-us/azure/cosmos-db/set-throughput#set-throughput-sdk].

I can't seem to find it. Am I missing something? If not, can you add this important functionality? Thank you.

it's time to rename the repo to azure-cosmosdb-python

Because we had renamed documentDB to cosmosDB - it will be right if we will update the repo name as well.

intermittent auth hiesenbug: HTTPFailure: 401 - "The input authorization token can't serve the request"

I'm intermittently getting 401's when running tests:

$ python test/crud_tests.py CRUDTests.test_attachment_crud_name_based
E
======================================================================
ERROR: test_attachment_crud_name_based (__main__.CRUDTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/crud_tests.py", line 1727, in test_attachment_crud_name_based
    self._test_attachment_crud(True);
  File "test/crud_tests.py", line 1843, in _test_attachment_crud
    media_response = client.ReadMedia(valid_attachment['media'])
  File "~/Projects/azure-documentdb-python/pydocumentdb/document_client.py", line 1566, in ReadMedia
    headers)
  File "~/Projects/azure-documentdb-python/pydocumentdb/document_client.py", line 2165, in __Get
    headers)
  File "~/Projects/azure-documentdb-python/pydocumentdb/synchronized_request.py", line 162, in SynchronizedRequest
    return _InternalRequest(connection_policy, request_options, request_body)
  File "~/Projects/azure-documentdb-python/pydocumentdb/synchronized_request.py", line 99, in _InternalRequest
    raise errors.HTTPFailure(response.status, data, headers)
HTTPFailure: Status code: 401
{"code":"Unauthorized","message":"The input authorization token can't serve the request. Please check
 that the expected payload is built as per the protocol, and check the key being used. Server used the 
following payload to sign: 'get\nmedia\nuo0hamswdgabaaaaaaaaaj3qfom=\nthu, 09 jun 2016 20:34:49 
gmt\n\n'\r\nActivityId: c0d2a470-5e3e-45f5-a4a1-0c2f05316540"}

Repeating the same test method a few times usually makes the above error occur.

Is there a way to get server logs for my docDB which will show the ActivityId: c0d2a470-5e3e-45f5-a4a1-0c2f05316540"?

I'm running latest master (this commit).

Improve documentation on the site

Please provide more documentation regarding the methods in document_client and other files

Are collections created with "unlimited" storage capacity?

The Azure Portal allows to create collections either with fixed size of 10GB without using partitions, or unlimited storage / partitioned:

I could not figure out how to influence this when using pydocumentdb when creating a collection. There also doesn't seem to be a way to specify the partition key path when doing so. I find this all quite confusing and would be happy if this is clarified in the docs and/or the library is changed to support everything the Azure Portal supports.

Samples Not Working - Python 2.7.11 Windows With py documentdb V2.0

Actually not sure if I've ever got the samples working.

In addition to that I have a project that was working with pydocumentdb V1.9 but it no longer works with V2.0.

It seems like many methods no longer return what they used to? Eg: client.QueryDatabases()

Error messages all point to an issue with iterators? Not sure how to progress from here.

How to create Gremlin queries?

I noticed that the .NET SDK has a CreateGremlinQuery method, but the Python SDK doesn't. I am trying to run a simple query against a graph DB (e.g. g.V()) and it's failing error code SC1001 and message Syntax error, incorrect syntax near 'g'..

Here's an example of a call I am making:

collection_link='dbs/mydb/colls/mycoll'
db_query='g.V()'
client.QueryDocuments(collection_link, db_query))

Am I missing something or is this simply not implemented/included in the Python SDK yet? If latter, any plans/ETA on when this is going to be supported?

Where do i write "pip install pydocumentDB"?

https://docs.microsoft.com/tr-tr/azure/cosmos-db/spark-connector, as the documentation, I didn't understand that where i write "pip install" command. Could you help me about it?

ToArray()

is not exactly idiomatic Python.

Saw the example here. http://azure.microsoft.com/en-us/documentation/articles/documentdb-python-application/

There should not be a need for such a method.

Also the naming convention does not meet Python. Please run this code through pep8.

get_or_create method

Correct me if there's a better way, but I found myself copying and pasting the following code (or something similar) over and over again :

# Given the strings `database_id` and `collection_id` and a relevant `client` (instance of `DocumentClient`)
try:
    database = next(data for data in client.ReadDatabases() if data['id'] == database_id)
except StopIteration:
    database = client.CreateDatabase({'id': database_id})
database_link = database['_self']
try:
    collection = next(data for data in client.ReadCollections(database_link) if data['id'] == collection_id)
except StopIteration:
    collection = client.CreateCollection(database_link, {'id': collection_id})

So, how about a get_or_create_sometypemethod in the DocumentClient class of pydocumentdb/document_client.py, like the one in Django QuerySet class? Then, the code above would be much shorter and more readable like this:

# Given the strings database_id and `collection_id` and a relevant `client` (instance of `DocumentClient`)
database = client.get_or_create_database(database_id)
database_link = database['_self']
collection = client.get_or_create_collection(database_link, collection_id)

p.s. The link to guidelines for contributing to this repository is broken.

Ability to get/set continuation token

Greetings. I am currently developing an API that queries documents in a collection, and returns the results. I would like to add functionality for obtaining the next page. From what I understand about DocumentDB, a continuation token is returned, in addition to the queried documents, if there are more results.

My thoughts are: obtain the token after the results are retrieved and return the token to the client. The client may then use the token in the next request to obtain the next page.

After doing some digging, I found a section of the source code that sets the execution context's continuation token. Working backwards, I ended up writing code like this:

query = {'query': 'select * from c '}
options = {'continuation': continuation} if continuation else {}

query_iterable = client.QueryDocuments(collection_link, query, options)
results = query_iterable.fetch_next_block()
continuation = query_iterable._options.get('continuation')

Unfortunately this does not seem to work for several reasons:

1.) First off, the value for continuation is always None, unless I make another fetch_next_block() call after the first one. (I have several thousand documents so there is an expected next page of results)
2.) Even though I set continuation in the options, I still get the same results as before, and the value obtained for the continuation token is the same as the value that I originally sent.

My questions:

1.) What am I doing wrong in my code? It seems fairly straightforward, and yet it doesn't accomplish what I am trying to do.
2.) Clearly this library handles this behavior internally, so are there any plans for exposing this functionality for general use?

Any suggestions/help is greatly appreciated. Thanks!

synchronized_request.py throws SSLError if I work with a DocumentDB emulator.

pydocumentdb throws SSLError if I work with a DocumentDB emulator. synchronized_request.py checks if users work with a local emulator or not. But the code checking the host is not correct.

At the line 100 of synchronized_request.py,
is_ssl_enabled = (parse_result.hostname != 'localhost')
this compares the host is localhost or not. However, parse=result.hostname returns 127.0.0.1 hence requests module verifies SSL certification and throws SSLError.

My environment is following:
Windows 10 professional
Python 3.6 (Anacoda3)

Can't create StoredProcedure

I use DocumentDB Emulator, but this python SDK can't create stored procedure.

>>> import pydocumentdb.document_client as document_client
>>> c = document_client.DocumentClient(r"https://localhost:8081", {"masterKey": r"C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw=="})
>>> result = c.CreateDatabase({"id": "TEST"})
INFO:Starting new HTTPS connection (1): localhost
C:\Users\user\Source\Repos\dkouma_uploader\env\lib\site-packages\requests\packages\urllib3\connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
C:\Users\user\Source\Repos\dkouma_uploader\env\lib\site-packages\requests\packages\urllib3\connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
>>> result = c.CreateCollection(r"dbs/TEST", {"id": "TEST"})
C:\Users\user\Source\Repos\dkouma_uploader\env\lib\site-packages\requests\packages\urllib3\connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
>>> result = c.CreateDocument(r"dbs/TEST/colls/TEST", {"TEST":1})
C:\Users\user\Source\Repos\dkouma_uploader\env\lib\site-packages\requests\packages\urllib3\connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
C:\Users\user\Source\Repos\dkouma_uploader\env\lib\site-packages\requests\packages\urllib3\connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
>>> f = open("stored_procedures/count_collection.js", "r", encoding="utf-8")
>>> proc = f.read()
>>> c.CreateStoredProcedure(r"dbs/TEST/colls/TEST", str(proc))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\user\Source\Repos\dkouma_uploader\env\lib\site-packages\pydocumentdb\document_client.py", line 1244, in CreateStoredProcedure
    collection_id, path, sproc = self._GetCollectionIdWithPathForSproc(collection_link, sproc)
  File "C:\Users\user\Source\Repos\dkouma_uploader\env\lib\site-packages\pydocumentdb\document_client.py", line 1276, in _GetCollectionIdWithPathForSproc
    DocumentClient.__ValidateResource(sproc)
  File "C:\Users\user\Source\Repos\dkouma_uploader\env\lib\site-packages\pydocumentdb\document_client.py", line 2518, in __ValidateResource
    id = resource.get('id')
AttributeError: 'str' object has no attribute 'get'

Please forget about Warnings.

QueryDocuments list in Parameters

The query with a single value (word) works fine. With the list (words) in Parameters it does not work. The query works in the Storage Explorer. Is there some way to provide a list of strings?

word = 'first'
words = ['first', 'second', 'third']

singlequery = c.client.QueryDocuments(c.collection_link, query={'query': "SELECT * FROM c as P WHERE P.ProductID = @word", "parameters" : [{ "name":"@word", "value": word}]})
multiquery = c.client.QueryDocuments(c.collection_link, query={'query': "SELECT * FROM c as P WHERE P.ProductID IN (@words)", "parameters" : [{ "name":"@words", "value": words}]})

singlequery = JSON list of docs
multiquery = []

Unit tests don't all pass, due to HTTP 429 throttling

When I run the unit tests, many of them end up failing due to throttling:

Error
Traceback (most recent call last):
  File "C:\Users\clam\Documents\azure-documentdb-python\test\crud_tests.py", line 1758, in test_create_default_indexing_policy_name_based
    self._test_create_default_indexing_policy(True);
  File "C:\Users\clam\Documents\azure-documentdb-python\test\crud_tests.py", line 1797, in _test_create_default_indexing_policy
    'path': '/*'
  File "C:\Users\clam\Documents\azure-documentdb-python\pydocumentdb\document_client.py", line 222, in CreateCollection
    options)
  File "C:\Users\clam\Documents\azure-documentdb-python\pydocumentdb\document_client.py", line 1719, in Create
    headers)
  File "C:\Users\clam\Documents\azure-documentdb-python\pydocumentdb\document_client.py", line 1878, in __Post
    headers=headers)
  File "C:\Users\clam\Documents\azure-documentdb-python\pydocumentdb\synchronized_request.py", line 162, in SynchronizedRequest
    return _InternalRequest(connection_policy, request_options, request_body)
  File "C:\Users\clam\Documents\azure-documentdb-python\pydocumentdb\synchronized_request.py", line 99, in _InternalRequest
    raise errors.HTTPFailure(response.status, data, headers)
HTTPFailure: Status code: 429
{"code":"429","message":"Message: {\"Errors\":[\"Request rate is large\"]}\r\nActivityId: 4ff91177-2234-45ab-b5c6-65668ea15a14, Request URI: /apps/28284a68-9708-4ee8-97da-dc5bfad115b9/services/66452474-7c6b-45c8-b2fc-6df0e4f4b5ce/partitions/b36e3fc6-a803-4fb9-893f-1572dd11022f/replicas/130915761969947884p"}

Sometimes the server response is slightly different:

Traceback (most recent call last):
  File "C:\Users\clam\Documents\azure-documentdb-python\test\crud_tests.py", line 394, in test_spatial_index_name_based
    self._test_spatial_index(True);
  File "C:\Users\clam\Documents\azure-documentdb-python\test\crud_tests.py", line 416, in _test_spatial_index
    'path': '/'
  File "C:\Users\clam\Documents\azure-documentdb-python\pydocumentdb\document_client.py", line 222, in CreateCollection
    options)
  File "C:\Users\clam\Documents\azure-documentdb-python\pydocumentdb\document_client.py", line 1719, in Create
    headers)
  File "C:\Users\clam\Documents\azure-documentdb-python\pydocumentdb\document_client.py", line 1878, in __Post
    headers=headers)
  File "C:\Users\clam\Documents\azure-documentdb-python\pydocumentdb\synchronized_request.py", line 162, in SynchronizedRequest
    return _InternalRequest(connection_policy, request_options, request_body)
  File "C:\Users\clam\Documents\azure-documentdb-python\pydocumentdb\synchronized_request.py", line 99, in _InternalRequest
    raise errors.HTTPFailure(response.status, data, headers)
HTTPFailure: Status code: 429
{"code":"429","message":"The request rate is too large. Please retry after sometime.\r\nActivityId: 75450dbb-3eef-48fc-82c9-df8df7c0dede"}

I tried adding a delay of 15 seconds between tests, but this only reduces the number of failures. I'm not the owner of the DocumentDB account, so I don't know what performance level is configured, or what kind of load the Collection is experiencing. It's a testing account that should be seeing little or no other usage.

ODM for DocumentDB?

Having an object-document mapper for DocumentDB would be immensely helpful, instead of embedding raw queries within Python. Wondering if something along those lines is in the works? If not, maybe I could try building a basic ODM module.

Permissions Error on ReadMedia

Hi,

When I try reading a media upload, I get a permissions error. Here is a snippet of what I'm doing:

with open("large_text.txt") as fh:
    client.UpsertAttachmentAndUploadMedia(document['_self'], fh, {'contentType': 'application/text'})

attachment = [i for i in client.ReadAttachments(document['_self'])][0]
media = client.ReadMedia(attachment['media']) # this is where the error occurs

The Error:

{"code":"Unauthorized","message":"The input authorization token can\'t serve the request. Please check that the expected payload is built as per the protocol, and check the key being used. Server used the following payload to sign: ..."}

There isn't much documentation I could find on this, so any feedback is appreciated.

Melih

Timeout when attempting to create database

When you have an account that has the maximum number of allowed databases, attempting to create a new one results in a timeout. There should be a clear error here.

raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='account-id.documents.azure.com', port=443): Read timed out. (read timeout=60.0)

Python 3 support

Is it planned?

Failover to other regions

Hello,
Does this SDK fail over to other Document DB replicated regions when one region is down?

Thanks,
Sasidhar.

Support Partitioning?

Thank you for awesome python SDK for documentdb!

I'm wondering if you are going to support partitioning like .NET SDK does
http://azure.microsoft.com/blog/2015/04/30/announcing-partitioning-support-in-the-documentdb-sdk/

Wrong brackets position closing causes document management code to fail

Noticed a small typo in the document management tutorial that causes the code to fail.
It seems that the brackets in ReadDocuments function were closed too early making the second parameter (param dictionary) to be addressed as a second parameter of the list function (which should receive only one input argument).

Line 89:
documentlist = list(client.ReadDocuments(collection_link**)**, {'maxItemCount': 10})

sholud be
documentlist = list(client.ReadDocuments(collection_link, {'maxItemCount': 10}**)**)

Bug in base.py? TypeError: a bytes-like object is required, not 'str'

Dear project maintainers,

I am trying to read an attachment media as described [here](attach = client.CreateAttachmentAndUploadMedia(doc['_self'], readable_stream = pic)
print(attach))

attach = client.CreateAttachmentAndUploadMedia(doc['_self'], readable_stream = pic)
print(attach)
b  = client.ReadMedia(attach['media'])

and getting an error

TypeError: a bytes-like object is required, not 'str'

after debugging with %pdb in jupyter I narrowed down the issue:

C:\PROGLANG\Anaconda3\lib\base64.py in b64encode(s, altchars)
     59     if altchars is not None:
     60         assert len(altchars) == 2, repr(altchars)
---> 61         return encoded.translate(bytes.maketrans(b'+/', altchars))
     62     return encoded
     63 

TypeError: a bytes-like object is required, not 'str'

> c:\proglang\anaconda3\lib\base64.py(61)b64encode()
     59     if altchars is not None:
     60         assert len(altchars) == 2, repr(altchars)
---> 61         return encoded.translate(bytes.maketrans(b'+/', altchars))
     62     return encoded
     63 

ipdb> altchars
'+-'
ipdb> altchars = b'+-'
ipdb> encoded.translate(bytes.maketrans(b'+/', altchars))
b'lrzzairkcgebaaaaaaaaalzcq9g='
ipdb> q

So it looks like you should pass altchars as binary as maketrans function expects both binary arguments

static bytes.maketrans(from, to)
static bytearray.maketrans(from, to)¶
This static method returns a translation table usable for bytes.translate() that will map each character in from into the character at the same position in to; from and to must both be bytes-like objects and have the same length.

I am using python 3.6.

Full error trace

TypeError Traceback (most recent call last)
in ()
3 documents.MediaReadMode.Streamed)
4
----> 5 b = client.ReadMedia(attach['media'])

C:\PROGLANG\Anaconda3\lib\site-packages\pydocumentdb\document_client.py in ReadMedia(self, media_link)
1661 path = base.GetPathFromLink(media_link)
1662 media_id = base.GetResourceIdOrFullNameFromLink(media_link)
-> 1663 attachment_id = base.GetAttachmentIdFromMediaId(media_id)
1664 headers = base.GetHeaders(self,
1665 default_headers,

C:\PROGLANG\Anaconda3\lib\site-packages\pydocumentdb\base.py in GetAttachmentIdFromMediaId(media_id)
249 if len(buffer) > resoure_id_length:
250 # We are cutting off the storage index.
--> 251 attachment_id = base64.b64encode(buffer[0:resoure_id_length], altchars)
252 if not six.PY2:
253 attachment_id = attachment_id.decode('utf-8')

C:\PROGLANG\Anaconda3\lib\base64.py in b64encode(s, altchars)
59 if altchars is not None:
60 assert len(altchars) == 2, repr(altchars)
---> 61 return encoded.translate(bytes.maketrans(b'+/', altchars))
62 return encoded
63

TypeError: a bytes-like object is required, not 'str'

Add version to module

It is kind of standard to have a version variable that provides the module version. Please add that :)

http://stackoverflow.com/questions/458550/standard-way-to-embed-version-into-python-package