Git Product home page Git Product logo

gridfsmigrate's Introduction

RocketChat GridFS to FileSystem/AmazonS3 Migration

This is a script for migrating files uploaded to RocketChat from the default GridFS upload store to FileSystem/AmazonS3.

migrate -c [command] -d [s3 bucket|output directory] -r [dbname] -t [target]

Help

Run ./migrate.py -h to see all available options

Commands

  • dump : dumps the GridFs stored files into the given folder/s3 bucket and writes a log files
  • updatedb : changes the database entries to point to the new store instead of GridFS
  • removeblobs : removes migrated files from GridFS

Requirements

Dependencies

  • python3 (e.g. apt install python3 python3-pip)
  • packages (pip3 install ...):
    • pymongo
    • boto3

Environment

Steps

  1. Backup your MongoDB database so that you won't loose any data in case of any issues. (MongoDB Backup Methods)

  2. Change Storage Type in RocketChat under Administration> File Upload to FileSystem or AmazonS3. Update the relevant configuration under the corresponding head in configuration page.

  3. Start copying files to the new store

    • File System

         ./migrate.py -c dump -r rocketchat -t FileSystem -d ./uploads
      
    • S3

         ./migrate.py -c dump -r rocketchat -t AmazonS3 -d S3bucket_name
      
  4. Update the database to use new store (use -t AmazonS3 if you are migrating to S3)

     ./migrate.py -c updatedb -d /app/uploads -r rocketchat -t FileSystem
    
  5. Check if everything is working correctly. Ensure that there are no files missing.

  6. Remove obsolete data from GridFS

     ./migrate.py -c removeblobs -d /app/uploads -r rocketchat
    

Troubleshooting

On some configurations, it might help to add the parameters "directconnection=True and connect=False" to the MongoClient constructor, such as:

    MongoClient(..., retryWrites=False, directconnection=True, connect=False)[self.db]

So that the connection happens in Single topology.

gridfsmigrate's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gridfsmigrate's Issues

Is there any way to improve performance

I adjusted the script a little bit on my fork. Following is the pseudocode:

Before:

for item in items:
    dump_file()
for item in items:
   update_db()
for item in items:
    remove_blobs()

After:

for item in items:
    dump_file()
    update_db()
    remove_blobs()

the scanned rows would become less and less, I guess it will improve the performance a little

For MongoDB, I am a newbie. As you can see from the screenshot below, about 20s to dump the file, and 20 more seconds to remove blobs. The migration process is pretty slow. So I am wondering if there is a way to improve it, thanks in advance, and thanks for your script!

image

gridfs.errors.CorruptGridFile: no chunk #0

Hello,

Executing the python script like below

python3 ./migrate.py -c dump -r rocketchat -t FileSystem -d /root/gridfs_dump/

And here is the return error.

Traceback (most recent call last):
  File "/usr/local/lib64/python3.6/site-packages/gridfs/grid_file.py", line 755, in next
    chunk = self._next_with_retry()
  File "/usr/local/lib64/python3.6/site-packages/gridfs/grid_file.py", line 747, in _next_with_retry
    return self._cursor.next()
  File "/usr/local/lib64/python3.6/site-packages/pymongo/cursor.py", line 1215, in next
    raise StopIteration
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./migrate.py", line 229, in <module>
    obj.dumpfiles("rocketchat_uploads", store)
  File "./migrate.py", line 104, in dumpfiles
    data = res.read()
  File "/usr/local/lib64/python3.6/site-packages/gridfs/grid_file.py", line 565, in read
    chunk_data = self.readchunk()
  File "/usr/local/lib64/python3.6/site-packages/gridfs/grid_file.py", line 528, in readchunk
    chunk = self.__chunk_iter.next()
  File "/usr/local/lib64/python3.6/site-packages/gridfs/grid_file.py", line 759, in next
    raise CorruptGridFile("no chunk #%d" % self._next_chunk)
gridfs.errors.CorruptGridFile: no chunk #0

Thanks.

./migrate.py

I am trying to move the file upload from GridFS to Amamzon S3.

When I downloaded the Migrate.py to /home/ubuntu
and Run $ python migrate.py
getting error
bash: syntax error near unexpected token `newline'

Please help

README refers to -d option twice

Just reading this and I can see the -d option keeps getting specified twice eg

./migrate -c dump -d /app/uploads -r rocketchat -t FileSystem -d ./uploads

Still not sure what /app/uploads refers too - I guess the internal Rocket store??

It would be much better if one of these was renamed.

Upload error to S3 (DigitalOcean Spaces)

Hi,

after uploading 1535 files to S3 (DigitalOcean Spaces) an error occurs:

Traceback (most recent call last):
  File "/dumpfiles/gridfsmigrate/./migrate.py", line 236, in <module>
    obj.dumpfiles("rocketchat_uploads", store)
  File "/dumpfiles/gridfsmigrate/./migrate.py", line 121, in dumpfiles
    key = store.put(filename, data, upload)
  File "/dumpfiles/gridfsmigrate/./migrate.py", line 71, in put
    key = self.uniqueID + "/uploads/" + entry['rid'] + "/" + entry[
KeyError: 'userId'

does anyone know what can be core of this problem?

migrate into an already existing s3 bucket

More a question than a bug. Currently we already use S3 as Uploads Backend but we have a lot of uploads in gridfs too. Is it save to migrate the old uploads in gridfs in to the already existing and full of data S3 bucket? Will my data in S3 survive the upload?

thanks and cheers, t.

Doesn't work for me on 5.4.2 installed via Snaps

Seems it can't authenticate with Mongo

pymongo.errors.OperationFailure: Authentication failed., full error: {'operationTime': Timestamp(1675634137, 1), 'ok': 0.0, 'errmsg': 'Authentication failed.', 'code': 18, 'codeName': 'AuthenticationFailed

panic: server returned error on SASL authentication step: BSON field 'saslContinue.mechanism' is an unknown field.

Would it be possible to update this to support MognoDB 5.0.3? I get the following when trying to migrate:


2022/06/19 21:55:20 Connecting to database to detect source upload config
panic: server returned error on SASL authentication step: BSON field 'saslContinue.mechanism' is an unknown field.

goroutine 1 [running]:
main.Parse({0x0?, 0x7f60aa7d5b30?}, {0x7ffc72fb2d55, 0x47}, 0x1, 0x0, {0x7ffc72fb2da9, 0xa}, {0x0, 0x0}, ...)
/root/filestore-migrator/cmd/filestore-migrator/parser.go:200 +0x598
main.main()
/root/filestore-migrator/cmd/filestore-migrator/main.go:36 +0x505


From googling around it seems like this function only works with lower major versions of MongoDB. 4.4.6 seems to allow this function.

Thanks in advanced for any help you might be able to provide. I'd love to get some of my old RocketChat uploads off my local filesystem and up in S3 to clear up some space.

Clarifications on #5

Hey @arminfelder

First of all, thanks for creating this script! A lifesaver for all naive-but-growing RC setups like mine

In the instructions, the fifth point reads

Check if everything is working correctly. Ensure that there are no files missing.

I got up to this point and none of the files in the chat appeared to be missing. However, all of them were also still served from http://my-chat-url/file-uploads/. Did I do it wrong? The logs seem OK and all files are on my S3.

Wanted to clarify this before I delete the obsolete blobs :D

Moving to filesystem breaks images due to extensions

Cool tool!
I just ran it to move files out of GridFS to the local file system. Some images worked, some images didn't work... I spent about a half hour looking at the differences from the working images and the broken images in the rocketchat_uploads collection, as well as on disk.

After uploading a fresh new file, I realized rocket.chat stored the file without an extension in the uploads folder. I tried moving an image I know was broken from image.png to just image, and it started working!

image

Here's a one liner to fix it.

find -type f -name '*.*' | while read f; do mv "${f}" "${f%.*}"; done

Question regarding rocket_uploads paths in rocketchat

Hi,

Thanks for the tool, it was useful with little tweaks for the version we are using.

However, I am bit confused on the paths used for attachment. My apologies to ask this question here instead of RocketChat Forum, as I was unable to get any response there. Could you kindly help me with the below query?

Is url in rocketchat_uploads collection in mongodb supposed to have an absolute path like https://site.com/\<path> or just <path>?
In one of our test instances running rocket chat version 1.x, it is using relative paths, but another instance of 3.x version is using absolute paths.

So I wonder if it is our misconfiguration in the 3.x instance, as my understanding is that the absolute paths will break once the site’s ROOT_URL is changed.
So which is the supposed way and what could cause the absolute paths in URL instead of relative paths?

Thanks in advance!

Not working with non English?

Traceback (most recent call last):
  File "./migrate.py", line 232, in <module>
    obj.dumpfiles("rocketchat_uploads", store)
  File "./migrate.py", line 116, in dumpfiles
    upload['name']))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 30-33: ordinal not in range(128)

No files are copied if i run the dump command

Hello,
i want to migrate the uploads from our RocketChat Installation from GridFS to FileSystem

i have installed phyton3 and phyton-pip.
I have also installed phyton-pip pymongo and boto3
i set filesystem in the RocketChat Upload config. If i upload a file in Rocketchat it is placed in /var/snap/rocketchat-server/common/uploads/

After that i made the py file executable and run the script with the follwoing syntax
./migrate.py -c dump -r rocketchat -t FileSystem -d /var/snap/rocketchat-server/common/uploads/

a csv.log in /var/snap/rocketchat-server/common/uploads/ but its empty and no files are copied.
What did i wrong?

it would be great if someone can help me.
thanks!

Unable to authenticate local db

When trying to run: ./migrate.py -c dump -t rocketchat -d /var/rocketchat/uploads/ I get the following error:

/usr/local/lib/python3.9/dist-packages/pymongo/collection.py:1643: UserWarning: use an explicit session with no_cursor_timeout=True otherwise the cursor may still timeout after 30 minutes, for more info see https://mongodb.com/docs/v4.4/reference/method/cursor.noCursorTimeout/#session-idle-timeout-overrides-nocursortimeout return Cursor(self, *args, **kwargs) Traceback (most recent call last): File "/root/git/gridfsmigrate/./migrate.py", line 243, in <module> obj.dumpfiles("rocketchat_uploads", store) File "/root/git/gridfsmigrate/./migrate.py", line 109, in dumpfiles for upload in uploads: File "/usr/local/lib/python3.9/dist-packages/pymongo/cursor.py", line 1248, in next if len(self.__data) or self._refresh(): File "/usr/local/lib/python3.9/dist-packages/pymongo/cursor.py", line 1165, in _refresh self.__send_message(q) File "/usr/local/lib/python3.9/dist-packages/pymongo/cursor.py", line 1052, in __send_message response = client._run_operation( File "/usr/local/lib/python3.9/dist-packages/pymongo/_csot.py", line 105, in csot_wrapper return func(self, *args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/pymongo/mongo_client.py", line 1302, in _run_operation return self._retryable_read( File "/usr/local/lib/python3.9/dist-packages/pymongo/_csot.py", line 105, in csot_wrapper return func(self, *args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/pymongo/mongo_client.py", line 1414, in _retryable_read with self._socket_from_server(read_pref, server, session) as (sock_info, read_pref): File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__ return next(self.gen) File "/usr/local/lib/python3.9/dist-packages/pymongo/mongo_client.py", line 1254, in _socket_from_server with self._get_socket(server, session) as sock_info: File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__ return next(self.gen) File "/usr/local/lib/python3.9/dist-packages/pymongo/mongo_client.py", line 1189, in _get_socket with server.get_socket(handler=err_handler) as sock_info: File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__ return next(self.gen) File "/usr/local/lib/python3.9/dist-packages/pymongo/pool.py", line 1406, in get_socket sock_info = self._get_socket(handler=handler) File "/usr/local/lib/python3.9/dist-packages/pymongo/pool.py", line 1519, in _get_socket sock_info = self.connect(handler=handler) File "/usr/local/lib/python3.9/dist-packages/pymongo/pool.py", line 1377, in connect sock_info.authenticate() File "/usr/local/lib/python3.9/dist-packages/pymongo/pool.py", line 869, in authenticate auth.authenticate(creds, self) File "/usr/local/lib/python3.9/dist-packages/pymongo/auth.py", line 549, in authenticate auth_func(credentials, sock_info) File "/usr/local/lib/python3.9/dist-packages/pymongo/auth.py", line 475, in _authenticate_default return _authenticate_scram(credentials, sock_info, "SCRAM-SHA-1") File "/usr/local/lib/python3.9/dist-packages/pymongo/auth.py", line 201, in _authenticate_scram res = sock_info.command(source, cmd) File "/usr/local/lib/python3.9/dist-packages/pymongo/pool.py", line 766, in command return command( File "/usr/local/lib/python3.9/dist-packages/pymongo/network.py", line 166, in command helpers._check_command_response( File "/usr/local/lib/python3.9/dist-packages/pymongo/helpers.py", line 181, in _check_command_response raise OperationFailure(errmsg, code, response, max_wire_version) pymongo.errors.OperationFailure: Authentication failed., full error: {'ok': 0.0, 'errmsg': 'Authentication failed.', 'code': 18, 'codeName': 'AuthenticationFailed', '$clusterTime': {'clusterTime': Timestamp(1662666360, 2), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}, 'operationTime': Timestamp(1662666360, 2)}

I've tested the connection using mongosh and it connects just fine. I've also tried including the "directconnection=True, connect=False" in the migrate.py line 97

Support mongo url

For example:

mongo_url = 'mongodb://root:password@host1:3717,host2:3717/rocketchat?replicaSet=rs0&authSource=admin'

Replace MongoClient(host=self.host, port=self.port)[self.db] with MongoClient(mongo_url)[self.db]

Unable to run script.

Hello,

Can you please help me the script is not working for me?
Whenever i run ./migrate.py -c dump -r rocketchat -t FileSystem -d /root/mongo it gives me with following error (Screenshot attached)
image
Also, my mongo does not have any username or password it is open and accessible without credentials.
Can you please guide me here on what am doing wrong?

AttributeError: 'NoneType' object has no attribute 'lower'

Traceback (most recent call last):
  File "./migrate.py", line 234, in <module>
    obj.dumpfiles("rocketchat_uploads", store)
  File "./migrate.py", line 111, in dumpfiles
    fileext = mime.guess_extension(res.content_type)
  File "/usr/lib/python3.7/mimetypes.py", line 191, in guess_extension
    extensions = self.guess_all_extensions(type, strict)
  File "/usr/lib/python3.7/mimetypes.py", line 170, in guess_all_extensions
    type = type.lower()
AttributeError: 'NoneType' object has no attribute 'lower'

Somebody have this issue or can help me ?

umlauts in filenames

We set PYTHONIOENCODING=utf-8 as mentioned in the Readme.

But when the script computes the first file with an umlaut in the filename, in our case:
14. Dumping 29sDGZSx79ozo6JWu Clipboard - 19. März 2020 15:39

we got this error:

  File "migrate.py", line 229, in <module>
    obj.dumpfiles("rocketchat_uploads", store)
  File "migrate.py", line 114, in dumpfiles
    key = store.put(filename, data, upload)
  File "migrate.py", line 45, in put
    file = open(self.outDir + "/" + filename, "wb")
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 48: ordinal not in range(128)```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.