awslabs / amazon-documentdb-tools Goto Github PK

Tools for use with the Amazon DocumentDB managed document database service.

License: Apache License 2.0

Python 98.36% Shell 1.61% Dockerfile 0.04%

amazon-documentdb-tools's Introduction

Amazon DocumentDB Tools

This repository contains several tools to help users with Amazon DocumentDB including migration, monitoring, and performance. A few of the most popular tools are listed below but there are additional tools in the migration, monitoring, operations, and performance folders.

Amazon DocumentDB Index Tool

The DocumentDB Index Tool makes it easy to migrate only indexes (not data) between a source MongoDB deployment and an Amazon DocumentDB cluster.

Amazon DocumentDB Compatibility Tool

The DocumentDB Compatibility Tool examines log files from MongoDB or source code from MongoDB applications to determine if there are any queries which use operators that are not supported in Amazon DocumentDB.

Amazon DocumentDB Global Clusters Automation Tool

The global-clusters-automation automates the global cluster failover process for Disaster Recovery (DR) and Business Continuity Planning (BCP) use cases.

License

This library is licensed under the Apache 2.0 License.

amazon-documentdb-tools's People

Contributors

Stargazers

Watchers

amazon-documentdb-tools's Issues

which your last change there is no way to provide auth information to access the db

why????
usage: documentdb_index_tool.py [-h] [--debug] [--dry-run] --uri URI --dir DIR
[--show-compatible] [--show-issues]
[--dump-indexes] [--restore-indexes]
[--skip-incompatible] [--support-2dsphere]
[--skip-python-version-check]

Question - is there an equivalent to the DocumentDB Compatibility Tool that would work with Mongoengine?

As far as I can see, compat.py looks for certain mongo keywords directly in the lines of the python files or mongo log files. I have doubts about whether this technique would work with abstraction layers on top of pymongo such as Mongoengine. Is there any equivalent tool that would work with the Mongoengine python library?

exclude directory option

for nodejs apps the third parties packages are installed next to the app code, it would be cool if there is a option like:

--exclude-directory node_modules

data differ tool - validate accurately when source is still taking writes

it is not always possible to run long running validation scripts during maintenance window downtime. Any design of being able to validate data without stopping source traffic would be beneficial.

Index Tool: system.profile.metadata.json file breaks --show-issues option

I'm not sure why, but running the --dump-indexes option for the documentdb_index_tool.py tool generates a system.profile.metadata.json file. When I then run the --show-issues option against the metadata directory generated I get the following error:

Failed to load collection metadata: 'indexes'

I found removing the system.profile.metadata.json file from the output resolves this issue.

Python version: Python 3.9.13
MongoDB version: MongoDB 4.0.25 Community

Add support for profiled operations in system.profile collection to compat-tool

Might be as straightforward as writing the code to transform system.profile data into the format the compat-tool currently expects.

Review compatibility tool

Reviewing the compatibility tool: Readme and general functionality.

bug in checking COMPOUND_INDEX_MAX_KEYS?

It seems like the check for DocumentDbLimits.COMPOUND_INDEX_MAX_KEYS currently happens by counting keys in the index dict, not in the index['key'] dict:

https://github.com/awslabs/amazon-documentdb-tools/blob/master/index-tool/migrationtools/documentdb_index_tool.py#L404:

                    # Check for indexes with too many keys
                    if len(index) > DocumentDbLimits.COMPOUND_INDEX_MAX_KEYS:
                        message = 'Index contains more than {} keys'.format(
                            DocumentDbLimits.COMPOUND_INDEX_MAX_KEYS)
                        compatibility_issues[db_name][collection_name][
                            index_name][self.EXCEEDED_LIMITS][message] = len(
                                index)

Instead, it seems to me like it should be farther down, when testing index['key']:
https://github.com/awslabs/amazon-documentdb-tools/blob/master/index-tool/migrationtools/documentdb_index_tool.py#L436-L437

# Check for unsupported index types like text
if key_name == self.INDEX_KEY:
    for index_key_name in index[key_name]:
        key_value = index[key_name][index_key_name]
        if (
            key_value
            in DocumentDbUnsupportedFeatures.UNSUPPORTED_INDEX_TYPES
        ):
            compatibility_issues[db_name][collection_name][
                index_name
            ][self.UNSUPPORTED_INDEX_TYPES_KEY] = key_value

I looked in the tests and didn't think I saw any tests for DocumentDbLimits.INDEX_KEY_MAX_LENGTH;
tmc.metadata.json doesn't contain any index with more than COMPOUND_INDEX_MAX_KEYS=32 keys.

While this bug may not bite many people, it made me doubt that I understood the expected structure passed as metadata to def find_compatibility_issues(metadata).

Unknown option ssl_ca_certs Error During Index dump from MongoAtlas

I am trying to make a migration from MongoAtlas to DocumentDB but i am getting this error

Executed Command:
sudo python3 migrationtools/documentdb_index_tool.py --dump-indexes --host "server" --port 27017 --username username --password pass --auth-db admin --dir ./vindex/
Error:
File "migrationtools/documentdb_index_tool.py", line 723, in
main()
File "migrationtools/documentdb_index_tool.py", line 719, in main
indextool.run()
File "migrationtools/documentdb_index_tool.py", line 519, in run
connection = self._get_db_connection(
File "migrationtools/documentdb_index_tool.py", line 138, in _get_db_connection
mongodb_client = MongoClient(
File "/usr/local/lib/python3.8/dist-packages/pymongo/mongo_client.py", line 706, in init
keyword_opts = common._CaseInsensitiveDictionary(dict(common.validate(
File "/usr/local/lib/python3.8/dist-packages/pymongo/mongo_client.py", line 706, in
keyword_opts = common._CaseInsensitiveDictionary(dict(common.validate(
File "/usr/local/lib/python3.8/dist-packages/pymongo/common.py", line 740, in validate
value = validator(option, value)
File "/usr/local/lib/python3.8/dist-packages/pymongo/common.py", line 144, in raise_config_error
raise ConfigurationError("Unknown option %s" % (key,))
pymongo.errors.ConfigurationError: Unknown option ssl_ca_certs

Cannot delete the last instance of the master cluster of DocumentDB

when we automated your solution, filoverToSecondary some times failing with following error.
Do you have any solution for it.

I also modified the code like deleting all secondary instances first and then last primary instance after waiting 10 seconds.
However no luck.

Deleting Replica instance... docdb-2023-02-04-21-46-253 within cluster arn:aws:rds:us-east-1:858054668523:cluster:docdb-2023-02-04-21-46-25

Deleting Replica instance... docdb-2023-02-04-21-46-25 within cluster arn:aws:rds:us-east-1:858054668523:cluster:docdb-2023-02-04-21-46-25
Deleting Replica instance... docdb-2023-02-04-21-46-252 within cluster arn:aws:rds:us-east-1:858054668523:cluster:docdb-2023-02-04-21-46-25
ERROR OCCURRED WHILE PROCESSING: An error occurred (InvalidDBClusterStateFault) when calling the DeleteDBInstance operation: Cannot delete the last instance of the master cluster. Delete the replica cluster before deleting the last master cluster instance.

Thanks
Soma

compat-tool : log skipped lines to extra file including file name and line number

Any lines skipped due to format or truncation should be logged.

Replace environment variables with command line arguments in data-differ

Make the tool simpler to use, should not need to "source source.vars" but rather --source-uri, --source-db, ...

Index tool is not compatible with PyMongo 4.0.2

Need to propose changes for compatibility with 4.x+ versions of PyMongo.

Docdb-stat tool show opcounters as rate per polling interval

The opcounters metrics should display as a rate per polling interval.

Is the compatibility tool confused about whether $type is supported?

According to the DocumentDB documentation, $type is supported.

However, when I run the compability tool against my MongoDB log, it reports that using $type is an error.

I believe this is because the compatibility tool's dollar.csv file lists $type twice: once as 'yes' for 3.6 and 4.4, and once as 'no' for 3.6 and 4.4.

Compatibility tool - support directory for file input

Add ability to compatibility tool to read multiple log files from a directory rather than just a single file directly

Review data-differ tool

Review data-differ, needs performance optimisation.

Having an issue while trying to dump indexes

Here's the error I get after entering all properties required like in the README. Is there an option to use URI link instead of host,port etc? Since I'm able to do a mongodump with --uri property

index-review should include _id indexes in output

documentdb_index_tool.py should log failure instead of stopping

request that the tool create what indexes it can and log errors for indexes it can't.

Issue - restoring indexes issue: Connection reset by peer

Hello. I'm doing a quick POC of migrating a native Mongo 4.2 database to DocDB 4.0. I've successfully used the dump-indexes against the source database, but run into this connection reset issue when I try to restore-indexes to DocDB.

I have the rds .pem file in the working directory and I'm connecting with the master user credentials. I can do a direct connection via the mongo shell to my DocDB cluster using the connect string from the DocDB console. Also, I can use restore-indexes back the original database - so this connection issue appears to be something specific to DocDB.

Questions:

Is restore-indexes against DocDB supported? Reason I ask is that the instructions for this does not include the hostname or other connection details. I assume this needs host/port/credentials...?
Im running the following, do you see anything wrong here? $TARGET_HOST is the FQDN of the DocumentDB instance and —dir references the output directory of dump-indexes

python3 amazon-documentdb-tools/index-tool/migrationtools/documentdb_index_tool.py
--restore-indexes
--host $TARGET_HOST
--port 27017
--username masteruser
--password
--auth-db admin
--dir output/index_output

--show-issues requires --auth-db argument

From this tutorial using :

python amazon-documentdb-tools/migrationtools/documentdb_index_tool.py \
        --show-issues --dir dump

I've got:

documentdb_index_tool.py: error: --auth-db requires both --username and --password.

Adding some fake --auth-db --username and --password fix it :

python amazon-documentdb-tools/migrationtools/documentdb_index_tool.py \
      --show-issues --dir dump --auth-db 'test' --username 'test' --password 'test'

but shouldn't it work without those arguments ?

Global Clusters Automation fails on DocumentDB 5.0

The solution currently works with DocumentDB 4.0 only (it is hard-coded in the Python).

from bson.json_util import dumps

from bson.json_util import dumps
No module named bson.json_util when running the dump indexes command

Set Max Network bandwidth threshold

Is there is away to set the maximum bandwidth threshold for data copy ?

Index tool creating compound indexes with fields in wrong order

Reproduced issue where compound indexes can be created with fields in the wrong order. Dictionaries in Python2 and Python3 until v3.7 provide no guarantee of maintaining insertion order when iterative over keys. A requirement in Python 3.7 was added to always maintain insert order.

index-tool : add command line option to shorten index names to supported length

Allow user to request that index names longer than 63 characters be shortened automatically and not listed as incompatible.

Review index-tool

Review index-tool: update Readme, help and general functionality

failoverAndConvertToGlobal fails for non-encrypted clusters looking for KMS key id

leveraged the docdb immersion day cloudformation template which creates non-encrypted cluster. then tried to use it with this repo/tool. the failoverAndConvertToGlobal fails for non-encrypted clusters looking for KMS key id. error details below

{
"errorMessage": "'KmsKeyId'",
"errorType": "KeyError",
"stackTrace": [
" File "/var/task/failover_and_convert_lambda_function.py", line 56, in lambda_handler\n convert_to_global_request = prepare_to_convert(global_cluster_members,\n",
" File "/var/task/failover_and_convert_to_global.py", line 38, in prepare_to_convert\n secondary_clusters.append(get_cluster_details(each_cluster))\n",
" File "/var/task/failover_and_convert_to_global.py", line 97, in get_cluster_details\n "kms_key_id": cluster_response['KmsKeyId'],\n"
]
}
Getting global cluster members for global cluster globalcluster2
Begin process to create request to convert regional cluster to global cluster
[ERROR] KeyError: 'KmsKeyId'
Traceback (most recent call last):
File "/var/task/failover_and_convert_lambda_function.py", line 56, in lambda_handler
convert_to_global_request = prepare_to_convert(global_cluster_members,
File "/var/task/failover_and_convert_to_global.py", line 38, in prepare_to_convert
secondary_clusters.append(get_cluster_details(each_cluster))
File "/var/task/failover_and_convert_to_global.py", line 97, in get_cluster_details
"kms_key_id": cluster_response['KmsKeyId'],END RequestId: d4f9821e

Create --dump-indexes option to create single file of createIndex() statements

Add an option when dumping indexes to simply create a single .js file with all createIndex() statements that can be run on the destination server through the MongoDB shell.

default authDB to test

Script ignores index_names and causes errors

Hello there,

Tried using the script to import indexes from a Mongo cluster to DocumentDB but seems like, when re-creating the indexes, the script doesn't try to use the original index name. This is a problem because, if we have too many fields, we'll fall into the namespace name generated from index name is too long problem.

My workaround for it was to manually add this:

index_options['name'] = index_name

before the collection.create_index call on https://github.com/awslabs/amazon-documentdb-tools/blob/master/migrationtools/documentdb_index_tool.py#L481.

Any reason why you're not passing the original index name from the dump?

Thank you

Dumping MongoDB 4.4 indexes creates unusable metadata files

As MongoDB listIndexes no longer includes the namespace "ns" attribute as of version 4.4 the dump-indexes option using MongoDB 4.4+ creates unusable metadata files. Since both the database and collection names are known when the specific command is executed modify the tool to create the "ns" attribute if it is not present in the call.

compat-tool : add log line number when exception occurs, track total exceptions, output exceptions found

If an exception occurs the tool should keep going, output the specific log line number where the exception occurred, and output a warning at the end that {} exceptions were found.

Can support uri?

Add error handling to compatibility tool

Modify tool to display helpful feedback when issues arise with log files.

Output the log file name and line number where error occurred
Output the specific line

index-tool : fix crash when collection had no indexes

Collections without indexes should not crash the tool.

MongoDB Oplog Review Tool - Output format

Generate an CSV output - current text output makes it hard to copy into a spreadsheet for further calculations

Index-tool : detect version of pymongo installed, halt if out of range

documentdb_index_tool.py handling of index names that are too long

Request an option for the tool to suggest index names that have a valid length.

Issue to run "python3 compat/compat.py 4.0 ..."

I am having a trouble to run compat.py

Here is what I have done:

$ pip3 install -r requirements.txt
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at Homebrew/homebrew-core#76621
Requirement already satisfied: mtools>=1.6.4 in /usr/local/lib/python3.9/site-packages (from -r requirements.txt (line 1)) (1.6.4)
Requirement already satisfied: PyYAML>=5.3.1 in /usr/local/lib/python3.9/site-packages (from -r requirements.txt (line 2)) (6.0)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.9/site-packages (from mtools>=1.6.4->-r requirements.txt (line 1)) (2.8.2)
Requirement already satisfied: six in /usr/local/lib/python3.9/site-packages (from mtools>=1.6.4->-r requirements.txt (line 1)) (1.16.0)

$ python3 compat/compat.py 4.0 /mongod.log.2021-10-16T16-24-27 /mgd_44d.output
-bash: /usr/local/lib/python3.9: is a directory

Would you please shed some lights? Thank you.

MongoDB 4.4+ logs are not supported by compat-tool

Although I have set maxLogSizeKB=2000, I still got Skipped log messages. What is the maxLogSizeKB in your environment? What is the reasonable maxLogSizeKB to use compat/compat.py in general?

NOTE - portions of the log file(s) processed were truncated or incorrectly formatted and excluded from the compatibility assessment
Skipped 0 log lines due to log line truncation (default 10KB, consider increasing maxLogSizeKB)
Skipped 200 log lines due to unrecognized log format (missing timestamp)
Skipped 0 log lines due to unusable log format (invalid JSON)

Thank you!

compat-tool remove file/line option

I just runned compat-tool against a big project and found:

The following 13 unsupported operators were found:
  $expr | found 1132 time(s)
  $text | found 126 time(s)
  $graphLookup | found 46 time(s)
  $let | found 24 time(s)
  $switch | found 18 time(s)
  $trunc | found 13 time(s)
  $dateFromParts | found 8 time(s)
  $$REMOVE | found 6 time(s)
  $box | found 6 time(s)
  $centerSphere | found 6 time(s)
  $facet | found 6 time(s)
  $floor | found 6 time(s)
  $log | found 5 time(s)

the log was too big and got hard do read

compat-tool : process all files, regardless of issues found

New features:

Track count of lines skipped due to incorrect format (missing leading timestamp)
Track count of lines skipped due to log line truncation
Track count of lines skipped due to badly formatted JSON (seems to be missing escape characters for double quotes)
With all above, keep processing all requested file(s), output warning at end with full list of counters

compat.py not identifying setWindowFields as unsupported.

Add $setWindowFields as an unsupported aggregation pipeline operator.

Convert yaml.load() to safer yaml.safe_load()

Security best practices recommend using yaml.safe_load() method in place of yaml.load() whenever possible. Please see documentation for details on dangers of using yaml.load() without compensating controls.

https://security.openstack.org/guidelines/dg_avoid-dangerous-input-parsing-libraries.html

https://bandit.readthedocs.io/en/latest/plugins/b506_yaml_load.html

Indexdump not supported on views.

Its known limitation that indexdump is not supported on views. but the concern is the tool stops abruptly on the first encounter of view. its is not skipped and moved on to next indexes in other databases.
its quiet possible that the databases will have views.
Hence this tool is not working with the databases with views. I wanted to know any work around for this.

compat-tool : support MongoDB 4.4+ structured logging

https://docs.mongodb.com/manual/reference/log-messages/

Review DMS Segment Analyzer tool

Review DMS Segment Analyzer tool: update Readme, help and general functionality

Getting an error: bash: --dir: command not found.

I have been trying to dump indexes using this command:
python3 migrationtools/documentdb_index_tool.py --dump-indexes --uri mongodb://<UserName:Password>@<My-DocumentDB-cluster-endpoint>/?ssl=true&tlsCAFile=rds-combined-ca-bundle.pem --dir ./my-dir/

But I keep getting this error: --dir: command not found.

Could you please add an example for a connection command for a DocumentDB cluster/instance in the readme, I think that will help a lot.

Thank you.