Git Product home page Git Product logo

amazon-documentdb-tools's Introduction

Amazon DocumentDB Tools

This repository contains several tools to help users with Amazon DocumentDB including migration, monitoring, and performance. A few of the most popular tools are listed below but there are additional tools in the migration, monitoring, operations, and performance folders.

Amazon DocumentDB Index Tool

The DocumentDB Index Tool makes it easy to migrate only indexes (not data) between a source MongoDB deployment and an Amazon DocumentDB cluster.

Amazon DocumentDB Compatibility Tool

The DocumentDB Compatibility Tool examines log files from MongoDB or source code from MongoDB applications to determine if there are any queries which use operators that are not supported in Amazon DocumentDB.

Amazon DocumentDB Global Clusters Automation Tool

The global-clusters-automation automates the global cluster failover process for Disaster Recovery (DR) and Business Continuity Planning (BCP) use cases.

License

This library is licensed under the Apache 2.0 License.

amazon-documentdb-tools's People

Contributors

aaronkalair avatar anshuvajpayee avatar bootjp avatar brianmhess avatar cod-all avatar dbonser avatar dependabot[bot] avatar gottumuk avatar gurubayari avatar jaduffy avatar karthikv-vijay avatar khaeransori avatar marzuqq avatar meet-bhagdev avatar mihaialdoiu avatar n-turner avatar nishikar avatar reddyu avatar sarjarapu avatar sethusrinivasan avatar tmcallaghan avatar vbcodedev avatar yasiendwieb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amazon-documentdb-tools's Issues

exclude directory option

for nodejs apps the third parties packages are installed next to the app code, it would be cool if there is a option like:

--exclude-directory node_modules

Index Tool: system.profile.metadata.json file breaks --show-issues option

I'm not sure why, but running the --dump-indexes option for the documentdb_index_tool.py tool generates a system.profile.metadata.json file. When I then run the --show-issues option against the metadata directory generated I get the following error:

Failed to load collection metadata: 'indexes'

I found removing the system.profile.metadata.json file from the output resolves this issue.

Python version: Python 3.9.13
MongoDB version: MongoDB 4.0.25 Community

bug in checking COMPOUND_INDEX_MAX_KEYS?

It seems like the check for DocumentDbLimits.COMPOUND_INDEX_MAX_KEYS currently happens by counting keys in the index dict, not in the index['key'] dict:

https://github.com/awslabs/amazon-documentdb-tools/blob/master/index-tool/migrationtools/documentdb_index_tool.py#L404:

                    # Check for indexes with too many keys
                    if len(index) > DocumentDbLimits.COMPOUND_INDEX_MAX_KEYS:
                        message = 'Index contains more than {} keys'.format(
                            DocumentDbLimits.COMPOUND_INDEX_MAX_KEYS)
                        compatibility_issues[db_name][collection_name][
                            index_name][self.EXCEEDED_LIMITS][message] = len(
                                index)

Instead, it seems to me like it should be farther down, when testing index['key']:
https://github.com/awslabs/amazon-documentdb-tools/blob/master/index-tool/migrationtools/documentdb_index_tool.py#L436-L437

# Check for unsupported index types like text
if key_name == self.INDEX_KEY:
    for index_key_name in index[key_name]:
        key_value = index[key_name][index_key_name]
        if (
            key_value
            in DocumentDbUnsupportedFeatures.UNSUPPORTED_INDEX_TYPES
        ):
            compatibility_issues[db_name][collection_name][
                index_name
            ][self.UNSUPPORTED_INDEX_TYPES_KEY] = key_value

I looked in the tests and didn't think I saw any tests for DocumentDbLimits.INDEX_KEY_MAX_LENGTH;
tmc.metadata.json doesn't contain any index with more than COMPOUND_INDEX_MAX_KEYS=32 keys.

While this bug may not bite many people, it made me doubt that I understood the expected structure passed as metadata to def find_compatibility_issues(metadata).

Unknown option ssl_ca_certs Error During Index dump from MongoAtlas

I am trying to make a migration from MongoAtlas to DocumentDB but i am getting this error

Executed Command:
sudo python3 migrationtools/documentdb_index_tool.py --dump-indexes --host "server" --port 27017 --username username --password pass --auth-db admin --dir ./vindex/
Error:
File "migrationtools/documentdb_index_tool.py", line 723, in
main()
File "migrationtools/documentdb_index_tool.py", line 719, in main
indextool.run()
File "migrationtools/documentdb_index_tool.py", line 519, in run
connection = self._get_db_connection(
File "migrationtools/documentdb_index_tool.py", line 138, in _get_db_connection
mongodb_client = MongoClient(
File "/usr/local/lib/python3.8/dist-packages/pymongo/mongo_client.py", line 706, in init
keyword_opts = common._CaseInsensitiveDictionary(dict(common.validate(
File "/usr/local/lib/python3.8/dist-packages/pymongo/mongo_client.py", line 706, in
keyword_opts = common._CaseInsensitiveDictionary(dict(common.validate(
File "/usr/local/lib/python3.8/dist-packages/pymongo/common.py", line 740, in validate
value = validator(option, value)
File "/usr/local/lib/python3.8/dist-packages/pymongo/common.py", line 144, in raise_config_error
raise ConfigurationError("Unknown option %s" % (key,))
pymongo.errors.ConfigurationError: Unknown option ssl_ca_certs

Cannot delete the last instance of the master cluster of DocumentDB

when we automated your solution, filoverToSecondary some times failing with following error.
Do you have any solution for it.

I also modified the code like deleting all secondary instances first and then last primary instance after waiting 10 seconds.
However no luck.

Deleting Replica instance... docdb-2023-02-04-21-46-253 within cluster arn:aws:rds:us-east-1:858054668523:cluster:docdb-2023-02-04-21-46-25

Deleting Replica instance... docdb-2023-02-04-21-46-25 within cluster arn:aws:rds:us-east-1:858054668523:cluster:docdb-2023-02-04-21-46-25
Deleting Replica instance... docdb-2023-02-04-21-46-252 within cluster arn:aws:rds:us-east-1:858054668523:cluster:docdb-2023-02-04-21-46-25
ERROR OCCURRED WHILE PROCESSING: An error occurred (InvalidDBClusterStateFault) when calling the DeleteDBInstance operation: Cannot delete the last instance of the master cluster. Delete the replica cluster before deleting the last master cluster instance.

Thanks
Soma

Having an issue while trying to dump indexes

image

Here's the error I get after entering all properties required like in the README. Is there an option to use URI link instead of host,port etc? Since I'm able to do a mongodump with --uri property

Issue - restoring indexes issue: Connection reset by peer

Hello. I'm doing a quick POC of migrating a native Mongo 4.2 database to DocDB 4.0. I've successfully used the dump-indexes against the source database, but run into this connection reset issue when I try to restore-indexes to DocDB.

I have the rds .pem file in the working directory and I'm connecting with the master user credentials. I can do a direct connection via the mongo shell to my DocDB cluster using the connect string from the DocDB console. Also, I can use restore-indexes back the original database - so this connection issue appears to be something specific to DocDB.

Questions:

  1. Is restore-indexes against DocDB supported? Reason I ask is that the instructions for this does not include the hostname or other connection details. I assume this needs host/port/credentials...?

  2. Im running the following, do you see anything wrong here? $TARGET_HOST is the FQDN of the DocumentDB instance and —dir references the output directory of dump-indexes

python3 amazon-documentdb-tools/index-tool/migrationtools/documentdb_index_tool.py
--restore-indexes
--host $TARGET_HOST
--port 27017
--username masteruser
--password
--auth-db admin
--dir output/index_output

--show-issues requires --auth-db argument

From this tutorial using :

python amazon-documentdb-tools/migrationtools/documentdb_index_tool.py \
        --show-issues --dir dump

I've got:

documentdb_index_tool.py: error: --auth-db requires both --username and --password.

Adding some fake --auth-db --username and --password fix it :

python amazon-documentdb-tools/migrationtools/documentdb_index_tool.py \
      --show-issues --dir dump --auth-db 'test' --username 'test' --password 'test'

but shouldn't it work without those arguments ?

Index tool creating compound indexes with fields in wrong order

Reproduced issue where compound indexes can be created with fields in the wrong order. Dictionaries in Python2 and Python3 until v3.7 provide no guarantee of maintaining insertion order when iterative over keys. A requirement in Python 3.7 was added to always maintain insert order.

failoverAndConvertToGlobal fails for non-encrypted clusters looking for KMS key id

leveraged the docdb immersion day cloudformation template which creates non-encrypted cluster. then tried to use it with this repo/tool. the failoverAndConvertToGlobal fails for non-encrypted clusters looking for KMS key id. error details below

{
"errorMessage": "'KmsKeyId'",
"errorType": "KeyError",
"stackTrace": [
" File "/var/task/failover_and_convert_lambda_function.py", line 56, in lambda_handler\n convert_to_global_request = prepare_to_convert(global_cluster_members,\n",
" File "/var/task/failover_and_convert_to_global.py", line 38, in prepare_to_convert\n secondary_clusters.append(get_cluster_details(each_cluster))\n",
" File "/var/task/failover_and_convert_to_global.py", line 97, in get_cluster_details\n "kms_key_id": cluster_response['KmsKeyId'],\n"
]
}
Getting global cluster members for global cluster globalcluster2
Begin process to create request to convert regional cluster to global cluster
[ERROR] KeyError: 'KmsKeyId'
Traceback (most recent call last):
File "/var/task/failover_and_convert_lambda_function.py", line 56, in lambda_handler
convert_to_global_request = prepare_to_convert(global_cluster_members,
File "/var/task/failover_and_convert_to_global.py", line 38, in prepare_to_convert
secondary_clusters.append(get_cluster_details(each_cluster))
File "/var/task/failover_and_convert_to_global.py", line 97, in get_cluster_details
"kms_key_id": cluster_response['KmsKeyId'],END RequestId: d4f9821e

Script ignores index_names and causes errors

Hello there,

Tried using the script to import indexes from a Mongo cluster to DocumentDB but seems like, when re-creating the indexes, the script doesn't try to use the original index name. This is a problem because, if we have too many fields, we'll fall into the namespace name generated from index name is too long problem.

My workaround for it was to manually add this:

index_options['name'] = index_name

before the collection.create_index call on https://github.com/awslabs/amazon-documentdb-tools/blob/master/migrationtools/documentdb_index_tool.py#L481.

Any reason why you're not passing the original index name from the dump?

Thank you

Dumping MongoDB 4.4 indexes creates unusable metadata files

As MongoDB listIndexes no longer includes the namespace "ns" attribute as of version 4.4 the dump-indexes option using MongoDB 4.4+ creates unusable metadata files. Since both the database and collection names are known when the specific command is executed modify the tool to create the "ns" attribute if it is not present in the call.

Add error handling to compatibility tool

Modify tool to display helpful feedback when issues arise with log files.

  • Output the log file name and line number where error occurred
  • Output the specific line

Issue to run "python3 compat/compat.py 4.0 ..."

I am having a trouble to run compat.py

Here is what I have done:

$ pip3 install -r requirements.txt
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at Homebrew/homebrew-core#76621
Requirement already satisfied: mtools>=1.6.4 in /usr/local/lib/python3.9/site-packages (from -r requirements.txt (line 1)) (1.6.4)
Requirement already satisfied: PyYAML>=5.3.1 in /usr/local/lib/python3.9/site-packages (from -r requirements.txt (line 2)) (6.0)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.9/site-packages (from mtools>=1.6.4->-r requirements.txt (line 1)) (2.8.2)
Requirement already satisfied: six in /usr/local/lib/python3.9/site-packages (from mtools>=1.6.4->-r requirements.txt (line 1)) (1.16.0)

$ python3 compat/compat.py 4.0 /mongod.log.2021-10-16T16-24-27 /mgd_44d.output
-bash: /usr/local/lib/python3.9: is a directory

Would you please shed some lights? Thank you.

MongoDB 4.4+ logs are not supported by compat-tool

Although I have set maxLogSizeKB=2000, I still got Skipped log messages. What is the maxLogSizeKB in your environment? What is the reasonable maxLogSizeKB to use compat/compat.py in general?

NOTE - portions of the log file(s) processed were truncated or incorrectly formatted and excluded from the compatibility assessment
Skipped 0 log lines due to log line truncation (default 10KB, consider increasing maxLogSizeKB)
Skipped 200 log lines due to unrecognized log format (missing timestamp)
Skipped 0 log lines due to unusable log format (invalid JSON)

Thank you!

compat-tool remove file/line option

I just runned compat-tool against a big project and found:

The following 13 unsupported operators were found:
  $expr | found 1132 time(s)
  $text | found 126 time(s)
  $graphLookup | found 46 time(s)
  $let | found 24 time(s)
  $switch | found 18 time(s)
  $trunc | found 13 time(s)
  $dateFromParts | found 8 time(s)
  $$REMOVE | found 6 time(s)
  $box | found 6 time(s)
  $centerSphere | found 6 time(s)
  $facet | found 6 time(s)
  $floor | found 6 time(s)
  $log | found 5 time(s)

the log was too big and got hard do read

compat-tool : process all files, regardless of issues found

New features:

  • Track count of lines skipped due to incorrect format (missing leading timestamp)
  • Track count of lines skipped due to log line truncation
  • Track count of lines skipped due to badly formatted JSON (seems to be missing escape characters for double quotes)
  • With all above, keep processing all requested file(s), output warning at end with full list of counters

Indexdump not supported on views.

Its known limitation that indexdump is not supported on views. but the concern is the tool stops abruptly on the first encounter of view. its is not skipped and moved on to next indexes in other databases.
its quiet possible that the databases will have views.
Hence this tool is not working with the databases with views. I wanted to know any work around for this.

Getting an error: bash: --dir: command not found.

I have been trying to dump indexes using this command:
python3 migrationtools/documentdb_index_tool.py --dump-indexes --uri mongodb://<UserName:Password>@<My-DocumentDB-cluster-endpoint>/?ssl=true&tlsCAFile=rds-combined-ca-bundle.pem --dir ./my-dir/

But I keep getting this error: --dir: command not found.

Could you please add an example for a connection command for a DocumentDB cluster/instance in the readme, I think that will help a lot.

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.